Hadoop2分布式及NN和RM实现HA的实验
时间:2016-06-10 01:10 来源:linux.it.net.cn 作者:it.net.cn
目录结构
-
引言
-
实验环境
-
实验过程
-
演示demo
[一]、 引言
在Hadoop2.x初期的时候写过一篇 hadoop 2.2.0 集群模式安装配置和测试,记录了分布式搭建的最基本的搭建步骤和运行演示,那篇文章中没有对HA的配置做实验,本文会详细介绍 Hadoop2的分布式、NameNode配置HA以及ResourceManage配置HA的实验过程。
[二]、 实验环境
1、各节点及角色分配
本文以5个集群节点为基础做实验环境,具体的角色分配如下:
hostname
NameNode
DataNode
JournalNode
Zookeeper
ZKFC
ResourceManager
nn1.hadoop
√(Active)
√
√
√
√
nn2.hadoop
√(Standby)
√
√
√
√
dn1.hadoop
√
√
√
dn2.hadoop
√
dn3.hadoop
√
2、系统及软件版本
-
CentOS 6.3 64位
-
Java 1.7.0_75
-
Hadoop 2.6.0
-
zookeeper 3.4.6
3、安装JDK (所有节点需要操作)
1
2
3
4
5
6
7
8
//查询openjdk相关安装包
rpm -qa | grep java
//卸载openjdk
rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.x86_64
rpm -e --nodeps java-1.6.0-openjdk-javadoc-1.6.0.0-1.45.1.11.1.el6.x86_64
rpm -e --nodeps java-1.6.0-openjdk-devel-1.6.0.0-1.45.1.11.1.el6.x86_64
rpm -e --nodeps tzdata-java-2012c-1.el6.noarch
rpm -e --nodeps java-1.5.0-gcj-1.5.0.0-29.1.el6.x86_64
Oracle官方下载 64为 jdk :jdk-7u3-linux-x64.rpm 执行安装命令:
rpm -ivh jdk-7u3-linux-x64.rpm
默认的安装路径:/usr/java/jdk1.7.0_75
4、配置hosts (所有节点需要操作)
1
2
3
4
5
172.17.225.61 nn1.hadoop zk1.hadoop
172.17.225.121 nn2.hadoop zk2.hadoop
172.17.225.72 dn1.hadoop zk3.hadoop
172.17.225.76 dn2.hadoop
172.17.225.19 dn3.hadoop
5、确认SSHD已经安装并启动 (所有节点需要操作)
6、配置时钟同步
第一种方法 :(所有节点都要操作)都从公共NTP服务器同步,执行如下:
1
2
3
4
$ cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
$ ntpdate us.pool.ntp.org
$ crontab -e
0-59/10 * * * * /usr/sbin/ntpdate us.pool.ntp.org | logger -t NTP
第二种方法:选一个节点搭建一个NTP服务,其他节点从该NTP服务器同步
7、创建专有用户(所有节点需要操作)
比如创建 hadoop用户,密码也初始化为hadoop, 下面有关hadoop部署配置都是以这个用户操作的
1
2
3
groupadd hadoop
useradd -g hadoop hadoop
passwd hadoop
为hadoop 用户修改环境变量 vi ~/.bash_profile :
1
2
export JAVA_HOME=/usr/java/jdk1.7.0_75
export PATH="$JAVA_HOME/bin:$PATH"
8、SSH免密码登陆
配置所有的NameNode节点 可以免密码登录到其余所有节点,只需要单向免密登录即可,当然你要配置为双向也无妨。有关SSH无密码登录的详细介绍可以参见:Linux(Centos)配置OpenSSH无密码登陆
[三]、 实验过程
1、hadoop2的编译
在实验环境中任节点机器上 下载hadoop 2.6.0的源码,安装配置好Java 和Maven 然后执行 mvn package -Pdist,native -DskipTests -Dtar 进行源码编译,具体可参考:
-
Hadoop2.2.0源码编译
-
Hadoop2.x在Ubuntu系统中编译源码
2、zookeeper安装配置
下载最新稳定版本(3.4.6)部署在ZK的各个节点,修改环境变量vi ~/.bash_profile :
1
2
export ZOOKEEPER_HOME=/usr/local/share/zookeeper
export PATH="$ZOOKEEPER_HOME/bin:$PATH"
修改配置文件:
1
2
3
cd $ZOOKEEPER_HOME
cp conf/zoo_sample.cfg conf/zoo.cfg
vi conf/zoo.cfg
修改成如下:
1
2
3
4
5
6
7
8
9
tickTime=2000
initLimit=10
syncLimit=5
clientPort=2181
dataDir=/bigdata/hadoop/zookeeper/zkdata
dataLogDir=/bigdata/hadoop/zookeeper/zklogs
server.1=zk1.hadoop:2888:3888
server.2=zk2.hadoop:2888:3888
server.3=zk3.hadoop:2888:3888
配置文件中的相关目录路径需要先创建好且hadoop用户具有读写权限,不同zk节点配置不同的myid:
-
在zk1.hadoop 节点中 执行:echo 1 > /bigdata/hadoop/zookeeper/zkdata/myid
-
在zk2.hadoop 节点中 执行:echo 2 > /bigdata/hadoop/zookeeper/zkdata/myid
-
在zk3.hadoop 节点中 执行:echo 3 > /bigdata/hadoop/zookeeper/zkdata/myid
myid中的数值需要和 zoo.cfg中的配置一致。
3、hadoop 安装配置(所有节点需要修改)
3.1、配置环境变量vi ~/.bash_profile :
1
2
export HADOOP_HOME=/usr/local/share/hadoop
export PATH="$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin"
3.2、修改 $HADOOP_HOME/etc/hadoop/core-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
<description>这里的 mycluster为HA集群的逻辑名,
与hdfs-site.xml中的dfs.nameservices配置一致</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/bigdata/hadoop/temp</value>
<description>这里的路径默认是NameNode、DataNode、JournalNode等存放数据的公共目录。
用户也可单独指定每类数据的存储目录。这里目录结构需要自己先创建好</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>zk1.hadoop:2181,zk2.hadoop:2181,zk3.hadoop:2181</value>
<description>这里是zk集群配置中各节点的地址和端口。
注意:数量一定是奇数而且和zoo.cfg中配置的一致</description>
</property>
</configuration>
3.3、修改 $HADOOP_HOME/etc/hadoop/hfds-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
<property>
<name>dfs.replication</name>
<value>3</value>
<description>配置副本数量</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/bigdata/hadoop/dfs/name</value>
<description>namenode元数据存储目录</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/bigdata/hadoop/dfs/data</value>
<description>datanode元数据存储目录</description>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<description>指定HA命名服务,可随意起名,
core-site.xml中fs.defaultFS配置需要引用它</description>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
<description>指定集群下NameNode逻辑名</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>nn1.hadoop:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>nn2.hadoop:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>nn1.hadoop:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>nn2.hadoop:50070</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.mycluster.nn1</name>
<value>nn1.hadoop:53310</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.mycluster.nn2</name>
<value>nn2.hadoop:53310</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled.mycluster</name>
<value>true</value>
<description>故障失败是否自动切换</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://nn1.hadoop:8485;nn2.hadoop:8485;dn1.hadoop:8485/hadoop-journal</value>
<description>配置JournalNode,包含三部分:
1.qjournal 前缀表名协议;
2.然后就是三台部署JournalNode的主机host/ip:端口,三台机器之间用分号分隔;
3.最后的hadoop-journal是journalnode的命名空间,可以随意取名。
</description>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/bigdata/hadoop/dfs/journal/</value>
<description>journalnode的本地数据存放目录</description>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<description> 指定mycluster出故障时执行故障切换的类</description>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
<description>ssh的操作方式执行故障切换</description>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_dsa</value>
<description> 如果使用ssh进行故障切换,使用ssh通信时用的密钥存储的位置</description>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>1000</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>10</value>
</property>
3.4、修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>clusterrm</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>nn1.hadoop</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>nn2.hadoop</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>zk1.hadoop:2181,zk2.hadoop:2181,zk3.hadoop:2181</value>
</property>
PS: yarn-site.xml中的HA相关配置格式和hdfs-site.xml中的HA配置类似。
3.5、修改 $HADOOP_HOME/etc/hadoop/mapred-site.xml
1
2
3
4
5
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
3.6、修改 $HADOOP_HOME/etc/hadoop/salves
1
2
3
dn1.hadoop
dn2.hadoop
dn3.hadoop
4、启动步骤和详细过程:
4.1、启动ZK
在所有的ZK节点执行命令: zkServer.sh start
可借助命令 zkServer.sh status 查看各个ZK的从属关系
4.2、格式化ZK(仅第一次需要做)
任意ZK节点上执行:hdfs zkfc -formatZK
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[hadoop@nn1 micmiu]$ hdfs zkfc -formatZK
15/02/02 16:54:24 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at nn1.hadoop/172.17.225.61:53310
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:host.name=nn1.hadoop
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_75
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.7.0_75/jre
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/local/share/hadoop-2.6.0/etc/hadoop:/usr/local/share/hadoop-2.6.0/......
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/share/hadoop-2.6.0/lib/native
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-279.el6.x86_64
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:user.dir=/usr/local/share/hadoop-2.6.0
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1.hadoop:2181,zk2.hadoop:2181,zk3.hadoop:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@1e884ca9
15/02/02 16:54:24 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1.hadoop/172.17.225.61:2181. Will not attempt to authenticate using SASL (unknown error)
15/02/02 16:54:24 INFO zookeeper.ClientCnxn: Socket connection established to nn1.hadoop/172.17.225.61:2181, initiating session
15/02/02 16:54:24 INFO zookeeper.ClientCnxn: Session establishment complete on server nn1.hadoop/172.17.225.61:2181, sessionid = 0x14b496d55810000, negotiated timeout = 5000
15/02/02 16:54:24 INFO ha.ActiveStandbyElector: Session connected.
15/02/02 16:54:24 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Session: 0x14b496d55810000 closed
15/02/02 16:54:24 INFO zookeeper.ClientCnxn: EventThread shut down
4.3、启动ZKFC
ZKFC(zookeeperFailoverController)是用来监控NN状态,协助实现主备NN切换的,所以仅仅在主备NN节点上启动就行。
1
2
3
4
5
6
7
[hadoop@nn1 micmiu]$ hadoop-daemon.sh start zkfc
starting zkfc, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-zkfc-nn1.hadoop.out
[hadoop@nn1 micmiu]$
[hadoop@nn2 micmiu]$ hadoop-daemon.sh start zkfc
starting zkfc, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-zkfc-nn2.hadoop.out
[hadoop@nn2 micmiu]$ jps
4.4、启动JournalNode 用于主备NN之间同步元数据信息的共享存储系统, 在每个JN节点上启动:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#JN节点1
[hadoop@nn1 micmiu]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-nn1.hadoop.out
[hadoop@nn1 micmiu]$ jps
8499 QuorumPeerMain
8771 DFSZKFailoverController
8895 Jps
8837 JournalNode
#JN节点2
[hadoop@nn2 micmiu]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-nn2.hadoop.out
[hadoop@nn2 micmiu]$ jps
7828 QuorumPeerMain
8198 JournalNode
8082 DFSZKFailoverController
8252 Jps
#JN节点3
[hadoop@dn1 micmiu]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-dn1.hadoop.out
[hadoop@dn1 ~]$ jps
748 QuorumPeerMain
1008 JournalNode
1063 Jps
4.5、格式化并启动主NN
格式化:hdfs namenode -format
注意:只有第一次启动系统时需格式化,请勿重复格式化!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
[hadoop@nn1 micmiu]$ hdfs namenode -format
15/02/02 17:03:05 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = nn1.hadoop/172.17.225.61
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath = /usr/local/share/hadoop-2.6.0/etc/hadoop:/usr/local/share/hadoop-2.6.0/share/hadoop/common/lib/.......
STARTUP_MSG: build = Unknown -r Unknown; compiled by 'hadoop' on 2015-01-29T15:07Z
STARTUP_MSG: java = 1.7.0_75
************************************************************/
15/02/02 17:03:05 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/02/02 17:03:05 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-237f7c54-db75-470c-8baf-d4dcfaddaf2f
15/02/02 17:03:05 INFO namenode.FSNamesystem: No KeyProvider found.
15/02/02 17:03:05 INFO namenode.FSNamesystem: fsLock is fair:true
15/02/02 17:03:05 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/02/02 17:03:05 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
15/02/02 17:03:05 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/02/02 17:03:05 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Feb 02 17:03:05
15/02/02 17:03:05 INFO util.GSet: Computing capacity for map BlocksMap
15/02/02 17:03:05 INFO util.GSet: VM type = 64-bit
15/02/02 17:03:05 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
15/02/02 17:03:05 INFO util.GSet: capacity = 2^21 = 2097152 entries
15/02/02 17:03:05 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
15/02/02 17:03:05 INFO blockmanagement.BlockManager: defaultReplication = 3
15/02/02 17:03:05 INFO blockmanagement.BlockManager: maxReplication = 512
15/02/02 17:03:05 INFO blockmanagement.BlockManager: minReplication = 1
15/02/02 17:03:05 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
15/02/02 17:03:05 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
15/02/02 17:03:05 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/02/02 17:03:05 INFO blockmanagement.BlockManager: encryptDataTransfer = false
15/02/02 17:03:05 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
15/02/02 17:03:05 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
15/02/02 17:03:05 INFO namenode.FSNamesystem: supergroup = supergroup
15/02/02 17:03:05 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/02/02 17:03:05 INFO namenode.FSNamesystem: Determined nameservice ID: mycluster
15/02/02 17:03:05 INFO namenode.FSNamesystem: HA Enabled: true
15/02/02 17:03:05 INFO namenode.FSNamesystem: Append Enabled: true
15/02/02 17:03:06 INFO util.GSet: Computing capacity for map INodeMap
15/02/02 17:03:06 INFO util.GSet: VM type = 64-bit
15/02/02 17:03:06 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
15/02/02 17:03:06 INFO util.GSet: capacity = 2^20 = 1048576 entries
15/02/02 17:03:06 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/02/02 17:03:06 INFO util.GSet: Computing capacity for map cachedBlocks
15/02/02 17:03:06 INFO util.GSet: VM type = 64-bit
15/02/02 17:03:06 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
15/02/02 17:03:06 INFO util.GSet: capacity = 2^18 = 262144 entries
15/02/02 17:03:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
15/02/02 17:03:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/02/02 17:03:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
15/02/02 17:03:06 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/02/02 17:03:06 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
15/02/02 17:03:06 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/02/02 17:03:06 INFO util.GSet: VM type = 64-bit
15/02/02 17:03:06 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
15/02/02 17:03:06 INFO util.GSet: capacity = 2^15 = 32768 entries
15/02/02 17:03:06 INFO namenode.NNConf: ACLs enabled? false
15/02/02 17:03:06 INFO namenode.NNConf: XAttrs enabled? true
15/02/02 17:03:06 INFO namenode.NNConf: Maximum size of an xattr: 16384
15/02/02 17:03:07 INFO namenode.FSImage: Allocated new BlockPoolId: BP-711086735-172.17.225.61-1422867787014
15/02/02 17:03:07 INFO common.Storage: Storage directory /bigdata/hadoop/dfs/name has been successfully formatted.
15/02/02 17:03:07 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/02/02 17:03:07 INFO util.ExitUtil: Exiting with status 0
15/02/02 17:03:07 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn1.hadoop/172.17.225.61
************************************************************/
[hadoop@nn1 micmiu]$
在主NN节点执行命令启动NN: hadoop-daemon.sh start namenode
可以对比查看启动前后NN节点的进程:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#启动前
[hadoop@nn1 micmiu]$ jps
8499 QuorumPeerMain
8771 DFSZKFailoverController
8988 Jps
8837 JournalNode
[hadoop@nn1 micmiu]$ hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-namenode-nn1.hadoop.out
#启动后
[hadoop@nn1 micmiu]$ jps
8499 QuorumPeerMain
9134 Jps
8771 DFSZKFailoverController
8837 JournalNode
9017 NameNode
4.6、在备NN上同步主NN的元数据信息
hdfs namenode -bootstrapStandby
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[hadoop@nn2 ~]$ hdfs namenode -bootstrapStandby
15/02/02 17:04:43 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = nn2.hadoop/172.17.225.121
STARTUP_MSG: args = [-bootstrapStandby]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath = /usr/local/share/hadoop-2.6.0/etc/hadoop:/usr/local/share/hadoop-2.6.0......
STARTUP_MSG: build = Unknown -r Unknown; compiled by 'hadoop' on 2015-01-29T15:07Z
STARTUP_MSG: java = 1.7.0_75
************************************************************/
15/02/02 17:04:43 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/02/02 17:04:43 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
=====================================================
About to bootstrap Standby ID nn2 from:
Nameservice ID: mycluster
Other Namenode ID: nn1
Other NN's HTTP address: http://nn1.hadoop:50070
Other NN's IPC address: nn1.hadoop/172.17.225.61:53310
Namespace ID: 263802668
Block pool ID: BP-711086735-172.17.225.61-1422867787014
Cluster ID: CID-237f7c54-db75-470c-8baf-d4dcfaddaf2f
Layout version: -60
=====================================================
15/02/02 17:04:44 INFO common.Storage: Storage directory /bigdata/hadoop/dfs/name has been successfully formatted.
15/02/02 17:04:45 INFO namenode.TransferFsImage: Opening connection to http://nn1.hadoop:50070/imagetransfer?getimage=1&txid=0&storageInfo=-60:263802668:0:CID-237f7c54-db75-470c-8baf-d4dcfaddaf2f
15/02/02 17:04:45 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
15/02/02 17:04:45 INFO namenode.TransferFsImage: Transfer took 0.00s at 0.00 KB/s
15/02/02 17:04:45 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 352 bytes.
15/02/02 17:04:45 INFO util.ExitUtil: Exiting with status 0
15/02/02 17:04:45 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn2.hadoop/172.17.225.121
************************************************************/
[hadoop@nn2 ~]$
4.7、启动备NN
在备NN上执行命令:hadoop-daemon.sh start namenode
1
2
3
4
5
6
7
8
hadoop@nn2 ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-namenode-nn2.hadoop.out
[hadoop@nn2 ~]$ jps
7828 QuorumPeerMain
8198 JournalNode
8082 DFSZKFailoverController
8394 NameNode
8491 Jps
4.8、设置和确认主NN
本文配置的是自动切换,ZK已经自动选择一个节点作为主NN了,所以这一步可以省略,查看节点状态:
1
2
3
4
[hadoop@nn1 ~]$ hdfs haadmin -getServiceState nn1
active
[hadoop@nn1 ~]$ hdfs haadmin -getServiceState nn2
standby
如果是配置手动切换NN的,这一步是不可缺少的,因为系统还不知道谁是主NN,两个节点的NN都是Standby状态。手动激活主NN的命令:hdfs haadmin -transitionToActive nn1
4.9、在主NN上启动Datanode
启动所有datanode命令:hadoop-daemons.sh start datanode
注意:hadoop-daemons.sh 和 hadoop-daemon.sh 命令的差异
1
2
3
4
5
hadoop@nn1 ~]$ hadoop-daemons.sh start datanode
dn3.hadoop: starting datanode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-datanode-dn3.hadoop.out
dn1.hadoop: starting datanode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-datanode-dn1.hadoop.out
dn2.hadoop: starting datanode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-datanode-dn2.hadoop.out
[hadoop@nn1 ~]$
4.10、启动YARN
方法一:一次性启动ResourceManager和NodeManager命令:start-yarn.sh
方法二:分别启动ResourceManager和NodeManager:
-
yarn-daemon.sh start resourcemanager
-
yarn-daemon.sh start nodemanager(如果有多个datanode,需使用yarn-daemons.sh)
ResourceManager 也配置了HA,根据命令查看节点状态:
yarn rmadmin –getServiceState serviceid
1
2
3
4
5
[hadoop@nn1 ~]$ yarn rmadmin -getServiceState rm1
active
[hadoop@nn1 ~]$ yarn rmadmin -getServiceState rm2
standby
[hadoop@nn1 ~]$
4.11 启动MR JobHistory Server
在dn1.hadoop上运行MRJS :mr-jobhistory-daemon.sh start historyserver
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
//运行MRJS之前
[hadoop@dn1 ~]$ jps
14625 Jps
3568 NodeManager
748 QuorumPeerMain
1008 JournalNode
1194 DataNode
[hadoop@dn1 ~]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/share/hadoop-2.6.0/logs/mapred-hadoop-historyserver-dn1.hadoop.out
//运行MRJS之后
[hadoop@dn1 ~]$ jps
14745 JobHistoryServer
3568 NodeManager
748 QuorumPeerMain
1008 JournalNode
1194 DataNode
14786 Jps
4.12、验证NameNode 和ResourceManager 的HA是否生效
把当前主节点中的相关进程kill掉 查看各节点状态切换情况。
4.13、验证NN HA的透明性
注意验证 hdfs dfs -ls / 和 hdfs dfs -ls hdfs://mycluster/ 的访问效果是一致的:
1
2
3
4
5
6
7
8
9
[hadoop@nn1 ~]$ hdfs dfs -ls /
Found 2 items
drwx------ - hadoop supergroup 0 2015-02-02 23:42 /tmp
drwxr-xr-x - hadoop supergroup 0 2015-02-02 23:39 /user
[hadoop@nn1 ~]$ hdfs dfs -ls hdfs://mycluster/
Found 2 items
drwx------ - hadoop supergroup 0 2015-02-02 23:42 hdfs://mycluster/tmp
drwxr-xr-x - hadoop supergroup 0 2015-02-02 23:39 hdfs://mycluster/user
[hadoop@nn1 ~]$
[五]、 运行wrodcount demo
这个demo的演示可参考:hadoop 2.2.0 集群模式安装配置和测试 中的 wordcount演示步骤,这里不再重复描述了。
(责任编辑:IT)
目录结构
[一]、 引言 在Hadoop2.x初期的时候写过一篇 hadoop 2.2.0 集群模式安装配置和测试,记录了分布式搭建的最基本的搭建步骤和运行演示,那篇文章中没有对HA的配置做实验,本文会详细介绍 Hadoop2的分布式、NameNode配置HA以及ResourceManage配置HA的实验过程。 [二]、 实验环境 1、各节点及角色分配 本文以5个集群节点为基础做实验环境,具体的角色分配如下:
2、系统及软件版本
3、安装JDK (所有节点需要操作)
Oracle官方下载 64为 jdk :jdk-7u3-linux-x64.rpm 执行安装命令: rpm -ivh jdk-7u3-linux-x64.rpm 默认的安装路径:/usr/java/jdk1.7.0_75 4、配置hosts (所有节点需要操作)
5、确认SSHD已经安装并启动 (所有节点需要操作) 6、配置时钟同步 第一种方法 :(所有节点都要操作)都从公共NTP服务器同步,执行如下:
第二种方法:选一个节点搭建一个NTP服务,其他节点从该NTP服务器同步 7、创建专有用户(所有节点需要操作) 比如创建 hadoop用户,密码也初始化为hadoop, 下面有关hadoop部署配置都是以这个用户操作的
为hadoop 用户修改环境变量 vi ~/.bash_profile :
8、SSH免密码登陆 配置所有的NameNode节点 可以免密码登录到其余所有节点,只需要单向免密登录即可,当然你要配置为双向也无妨。有关SSH无密码登录的详细介绍可以参见:Linux(Centos)配置OpenSSH无密码登陆 [三]、 实验过程 1、hadoop2的编译 在实验环境中任节点机器上 下载hadoop 2.6.0的源码,安装配置好Java 和Maven 然后执行 mvn package -Pdist,native -DskipTests -Dtar 进行源码编译,具体可参考:
2、zookeeper安装配置 下载最新稳定版本(3.4.6)部署在ZK的各个节点,修改环境变量vi ~/.bash_profile :
修改配置文件:
修改成如下:
配置文件中的相关目录路径需要先创建好且hadoop用户具有读写权限,不同zk节点配置不同的myid:
myid中的数值需要和 zoo.cfg中的配置一致。 3、hadoop 安装配置(所有节点需要修改) 3.1、配置环境变量vi ~/.bash_profile :
3.2、修改 $HADOOP_HOME/etc/hadoop/core-site.xml
3.3、修改 $HADOOP_HOME/etc/hadoop/hfds-site.xml
3.4、修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml
PS: yarn-site.xml中的HA相关配置格式和hdfs-site.xml中的HA配置类似。 3.5、修改 $HADOOP_HOME/etc/hadoop/mapred-site.xml
3.6、修改 $HADOOP_HOME/etc/hadoop/salves
4、启动步骤和详细过程:
4.1、启动ZK 可借助命令 zkServer.sh status 查看各个ZK的从属关系
4.2、格式化ZK(仅第一次需要做)
4.3、启动ZKFC ZKFC(zookeeperFailoverController)是用来监控NN状态,协助实现主备NN切换的,所以仅仅在主备NN节点上启动就行。
4.4、启动JournalNode 用于主备NN之间同步元数据信息的共享存储系统, 在每个JN节点上启动:
4.5、格式化并启动主NN
在主NN节点执行命令启动NN: hadoop-daemon.sh start namenode 可以对比查看启动前后NN节点的进程:
4.6、在备NN上同步主NN的元数据信息
4.7、启动备NN 在备NN上执行命令:hadoop-daemon.sh start namenode
4.8、设置和确认主NN 本文配置的是自动切换,ZK已经自动选择一个节点作为主NN了,所以这一步可以省略,查看节点状态:
如果是配置手动切换NN的,这一步是不可缺少的,因为系统还不知道谁是主NN,两个节点的NN都是Standby状态。手动激活主NN的命令:hdfs haadmin -transitionToActive nn1 4.9、在主NN上启动Datanode 启动所有datanode命令:hadoop-daemons.sh start datanode 注意:hadoop-daemons.sh 和 hadoop-daemon.sh 命令的差异
4.10、启动YARN 方法一:一次性启动ResourceManager和NodeManager命令:start-yarn.sh 方法二:分别启动ResourceManager和NodeManager:
ResourceManager 也配置了HA,根据命令查看节点状态: yarn rmadmin –getServiceState serviceid
4.11 启动MR JobHistory Server 在dn1.hadoop上运行MRJS :mr-jobhistory-daemon.sh start historyserver
4.12、验证NameNode 和ResourceManager 的HA是否生效 把当前主节点中的相关进程kill掉 查看各节点状态切换情况。
4.13、验证NN HA的透明性
[五]、 运行wrodcount demo 这个demo的演示可参考:hadoop 2.2.0 集群模式安装配置和测试 中的 wordcount演示步骤,这里不再重复描述了。 (责任编辑:IT) |