> Linux集群 > Hadoop >

Hadoop2分布式及NN和RM实现HA的实验

目录结构

  • 引言
  • 实验环境
  • 实验过程
  • 演示demo

[一]、 引言

在Hadoop2.x初期的时候写过一篇 hadoop 2.2.0 集群模式安装配置和测试,记录了分布式搭建的最基本的搭建步骤和运行演示,那篇文章中没有对HA的配置做实验,本文会详细介绍 Hadoop2的分布式、NameNode配置HA以及ResourceManage配置HA的实验过程。

[二]、 实验环境

1、各节点及角色分配

本文以5个集群节点为基础做实验环境,具体的角色分配如下:

hostname NameNode DataNode JournalNode Zookeeper ZKFC ResourceManager
nn1.hadoop √(Active)  
nn2.hadoop √(Standby)  
dn1.hadoop      
dn2.hadoop          
dn3.hadoop          

2、系统及软件版本

  • CentOS  6.3 64位
  • Java 1.7.0_75
  • Hadoop 2.6.0
  • zookeeper 3.4.6

3、安装JDK (所有节点需要操作)

 
1
2
3
4
5
6
7
8
//查询openjdk相关安装包
rpm -qa | grep java
//卸载openjdk
rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.x86_64
rpm -e --nodeps java-1.6.0-openjdk-javadoc-1.6.0.0-1.45.1.11.1.el6.x86_64
rpm -e --nodeps java-1.6.0-openjdk-devel-1.6.0.0-1.45.1.11.1.el6.x86_64
rpm -e --nodeps tzdata-java-2012c-1.el6.noarch
rpm -e --nodeps java-1.5.0-gcj-1.5.0.0-29.1.el6.x86_64

Oracle官方下载 64为 jdk :jdk-7u3-linux-x64.rpm  执行安装命令:

rpm -ivh jdk-7u3-linux-x64.rpm

默认的安装路径:/usr/java/jdk1.7.0_75

4、配置hosts (所有节点需要操作)

 
1
2
3
4
5
172.17.225.61   nn1.hadoop zk1.hadoop
172.17.225.121  nn2.hadoop zk2.hadoop
172.17.225.72   dn1.hadoop zk3.hadoop
172.17.225.76   dn2.hadoop
172.17.225.19   dn3.hadoop

5、确认SSHD已经安装并启动 (所有节点需要操作)

6、配置时钟同步

第一种方法 :(所有节点都要操作)都从公共NTP服务器同步,执行如下:

 
 
1
2
3
4
$ cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
$ ntpdate us.pool.ntp.org
$ crontab -e
0-59/10 * * * * /usr/sbin/ntpdate us.pool.ntp.org | logger -t NTP

第二种方法:选一个节点搭建一个NTP服务,其他节点从该NTP服务器同步

7、创建专有用户(所有节点需要操作)

比如创建 hadoop用户,密码也初始化为hadoop, 下面有关hadoop部署配置都是以这个用户操作的

 
 
1
2
3
groupadd hadoop
useradd -g hadoop hadoop
passwd hadoop

为hadoop 用户修改环境变量 vi ~/.bash_profile :

 
1
2
export JAVA_HOME=/usr/java/jdk1.7.0_75
export PATH="$JAVA_HOME/bin:$PATH"

8、SSH免密码登陆

配置所有的NameNode节点 可以免密码登录到其余所有节点,只需要单向免密登录即可,当然你要配置为双向也无妨。有关SSH无密码登录的详细介绍可以参见:Linux(Centos)配置OpenSSH无密码登陆

[三]、 实验过程

1、hadoop2的编译

在实验环境中任节点机器上 下载hadoop 2.6.0的源码,安装配置好Java 和Maven 然后执行 mvn package -Pdist,native -DskipTests -Dtar  进行源码编译,具体可参考:

  • Hadoop2.2.0源码编译
  • Hadoop2.x在Ubuntu系统中编译源码

2、zookeeper安装配置

下载最新稳定版本(3.4.6)部署在ZK的各个节点,修改环境变量vi ~/.bash_profile :

1
2
export ZOOKEEPER_HOME=/usr/local/share/zookeeper
export PATH="$ZOOKEEPER_HOME/bin:$PATH"

修改配置文件:

 
1
2
3
cd $ZOOKEEPER_HOME
cp conf/zoo_sample.cfg conf/zoo.cfg
vi  conf/zoo.cfg

修改成如下:

 
1
2
3
4
5
6
7
8
9
tickTime=2000
initLimit=10
syncLimit=5
clientPort=2181
dataDir=/bigdata/hadoop/zookeeper/zkdata
dataLogDir=/bigdata/hadoop/zookeeper/zklogs
server.1=zk1.hadoop:2888:3888
server.2=zk2.hadoop:2888:3888
server.3=zk3.hadoop:2888:3888

配置文件中的相关目录路径需要先创建好且hadoop用户具有读写权限,不同zk节点配置不同的myid:

  • 在zk1.hadoop 节点中 执行:echo 1 > /bigdata/hadoop/zookeeper/zkdata/myid
  • 在zk2.hadoop 节点中 执行:echo 2 > /bigdata/hadoop/zookeeper/zkdata/myid
  • 在zk3.hadoop 节点中 执行:echo 3 > /bigdata/hadoop/zookeeper/zkdata/myid

myid中的数值需要和 zoo.cfg中的配置一致。

3、hadoop 安装配置(所有节点需要修改)

3.1、配置环境变量vi ~/.bash_profile :

 
1
2
export HADOOP_HOME=/usr/local/share/hadoop
export PATH="$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin"

3.2、修改 $HADOOP_HOME/etc/hadoop/core-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<configuration>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://mycluster</value>
    <description>这里的 mycluster为HA集群的逻辑名,
    与hdfs-site.xml中的dfs.nameservices配置一致</description>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/bigdata/hadoop/temp</value>
    <description>这里的路径默认是NameNode、DataNode、JournalNode等存放数据的公共目录。
    用户也可单独指定每类数据的存储目录。这里目录结构需要自己先创建好</description>
</property>
<property>
  <name>ha.zookeeper.quorum</name>
  <value>zk1.hadoop:2181,zk2.hadoop:2181,zk3.hadoop:2181</value>
  <description>这里是zk集群配置中各节点的地址和端口。
  注意:数量一定是奇数而且和zoo.cfg中配置的一致</description>
</property>
</configuration>

3.3、修改 $HADOOP_HOME/etc/hadoop/hfds-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
<property>
    <name>dfs.replication</name>
    <value>3</value>
    <description>配置副本数量</description>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/bigdata/hadoop/dfs/name</value>
    <description>namenode元数据存储目录</description>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/bigdata/hadoop/dfs/data</value>
    <description>datanode元数据存储目录</description>
</property>
 
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<description>指定HA命名服务,可随意起名,
core-site.xml中fs.defaultFS配置需要引用它</description>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
<description>指定集群下NameNode逻辑名</description>
</property>
 
<property>
   <name>dfs.namenode.rpc-address.mycluster.nn1</name>
   <value>nn1.hadoop:9000</value>
</property>
<property>
   <name>dfs.namenode.rpc-address.mycluster.nn2</name>
   <value>nn2.hadoop:9000</value>
</property>
 
<property>
   <name>dfs.namenode.http-address.mycluster.nn1</name>
   <value>nn1.hadoop:50070</value>
</property>
<property>
   <name>dfs.namenode.http-address.mycluster.nn2</name>
   <value>nn2.hadoop:50070</value>
</property>
 
<property>  
   <name>dfs.namenode.servicerpc-address.mycluster.nn1</name>  
   <value>nn1.hadoop:53310</value>  
</property>  
<property>  
<name>dfs.namenode.servicerpc-address.mycluster.nn2</name>  
   <value>nn2.hadoop:53310</value>  
</property>
<property>  
<name>dfs.ha.automatic-failover.enabled.mycluster</name>  
<value>true</value>
<description>故障失败是否自动切换</description>
</property>  
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://nn1.hadoop:8485;nn2.hadoop:8485;dn1.hadoop:8485/hadoop-journal</value>
<description>配置JournalNode,包含三部分:
1.qjournal 前缀表名协议;
2.然后就是三台部署JournalNode的主机host/ip:端口,三台机器之间用分号分隔;
3.最后的hadoop-journal是journalnode的命名空间,可以随意取名。
</description>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/bigdata/hadoop/dfs/journal/</value>
<description>journalnode的本地数据存放目录</description>
</property>
 
<property>  
  <name>dfs.client.failover.proxy.provider.mycluster</name>                        
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  <description> 指定mycluster出故障时执行故障切换的类</description>
</property>  
      
<property>      
  <name>dfs.ha.fencing.methods</name>      
  <value>sshfence</value>
  <description>ssh的操作方式执行故障切换</description>
</property>  
    
<property>      
  <name>dfs.ha.fencing.ssh.private-key-files</name>      
  <value>/home/hadoop/.ssh/id_dsa</value>
  <description> 如果使用ssh进行故障切换,使用ssh通信时用的密钥存储的位置</description>  
</property>  
  
<property>  
  <name>dfs.ha.fencing.ssh.connect-timeout</name>  
  <value>1000</value>  
</property>  
  
<property>  
  <name>dfs.namenode.handler.count</name>  
  <value>10</value>  
</property>

3.4、修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
 
<property>
   <name>yarn.resourcemanager.ha.enabled</name>
   <value>true</value>
</property>
 
<property>
   <name>yarn.resourcemanager.cluster-id</name>
   <value>clusterrm</value>
</property>
 
<property>
   <name>yarn.resourcemanager.ha.rm-ids</name>
   <value>rm1,rm2</value>
</property>
 
<property>
   <name>yarn.resourcemanager.hostname.rm1</name>
   <value>nn1.hadoop</value>
</property>
 
<property>
   <name>yarn.resourcemanager.hostname.rm2</name>
   <value>nn2.hadoop</value>
</property>
 
<property>
   <name>yarn.resourcemanager.recovery.enabled</name>
   <value>true</value>
</property>
 
<property>
   <name>yarn.resourcemanager.store.class</name>
   <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
 
<property>
   <name>yarn.resourcemanager.zk-address</name>
   <value>zk1.hadoop:2181,zk2.hadoop:2181,zk3.hadoop:2181</value>
</property>

PS: yarn-site.xml中的HA相关配置格式和hdfs-site.xml中的HA配置类似。

3.5、修改 $HADOOP_HOME/etc/hadoop/mapred-site.xml

 
1
2
3
4
5
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <final>true</final>
</property>

3.6、修改 $HADOOP_HOME/etc/hadoop/salves

 
1
2
3
dn1.hadoop
dn2.hadoop
dn3.hadoop

4、启动步骤和详细过程:

4.1、启动ZK
在所有的ZK节点执行命令: zkServer.sh start

可借助命令 zkServer.sh status  查看各个ZK的从属关系

4.2、格式化ZK(仅第一次需要做)
任意ZK节点上执行:hdfs zkfc -formatZK

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[hadoop@nn1 micmiu]$  hdfs zkfc -formatZK
15/02/02 16:54:24 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at nn1.hadoop/172.17.225.61:53310
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:host.name=nn1.hadoop
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_75
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.7.0_75/jre
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/local/share/hadoop-2.6.0/etc/hadoop:/usr/local/share/hadoop-2.6.0/......
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/share/hadoop-2.6.0/lib/native
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:java.compiler=&lt;NA&gt;
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-279.el6.x86_64
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Client environment:user.dir=/usr/local/share/hadoop-2.6.0
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1.hadoop:2181,zk2.hadoop:2181,zk3.hadoop:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@1e884ca9
15/02/02 16:54:24 INFO zookeeper.ClientCnxn: Opening socket connection to server nn1.hadoop/172.17.225.61:2181. Will not attempt to authenticate using SASL (unknown error)
15/02/02 16:54:24 INFO zookeeper.ClientCnxn: Socket connection established to nn1.hadoop/172.17.225.61:2181, initiating session
15/02/02 16:54:24 INFO zookeeper.ClientCnxn: Session establishment complete on server nn1.hadoop/172.17.225.61:2181, sessionid = 0x14b496d55810000, negotiated timeout = 5000
15/02/02 16:54:24 INFO ha.ActiveStandbyElector: Session connected.
15/02/02 16:54:24 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
15/02/02 16:54:24 INFO zookeeper.ZooKeeper: Session: 0x14b496d55810000 closed
15/02/02 16:54:24 INFO zookeeper.ClientCnxn: EventThread shut down

4.3、启动ZKFC

ZKFC(zookeeperFailoverController)是用来监控NN状态,协助实现主备NN切换的,所以仅仅在主备NN节点上启动就行。

 
1
2
3
4
5
6
7
[hadoop@nn1 micmiu]$ hadoop-daemon.sh start zkfc
starting zkfc, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-zkfc-nn1.hadoop.out
[hadoop@nn1 micmiu]$
 
[hadoop@nn2 micmiu]$ hadoop-daemon.sh start zkfc
starting zkfc, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-zkfc-nn2.hadoop.out
[hadoop@nn2 micmiu]$ jps

4.4、启动JournalNode 用于主备NN之间同步元数据信息的共享存储系统, 在每个JN节点上启动:

 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#JN节点1
[hadoop@nn1 micmiu]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-nn1.hadoop.out
[hadoop@nn1 micmiu]$ jps
8499 QuorumPeerMain
8771 DFSZKFailoverController
8895 Jps
8837 JournalNode
#JN节点2
[hadoop@nn2 micmiu]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-nn2.hadoop.out
[hadoop@nn2 micmiu]$ jps
7828 QuorumPeerMain
8198 JournalNode
8082 DFSZKFailoverController
8252 Jps
#JN节点3
[hadoop@dn1 micmiu]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-journalnode-dn1.hadoop.out
[hadoop@dn1 ~]$ jps
748 QuorumPeerMain
1008 JournalNode
1063 Jps

4.5、格式化并启动主NN
格式化:hdfs namenode -format
注意:只有第一次启动系统时需格式化,请勿重复格式化!

 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
[hadoop@nn1 micmiu]$ hdfs namenode -format
15/02/02 17:03:05 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = nn1.hadoop/172.17.225.61
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.6.0
STARTUP_MSG:   classpath = /usr/local/share/hadoop-2.6.0/etc/hadoop:/usr/local/share/hadoop-2.6.0/share/hadoop/common/lib/.......
STARTUP_MSG:   build = Unknown -r Unknown; compiled by 'hadoop' on 2015-01-29T15:07Z
STARTUP_MSG:   java = 1.7.0_75
************************************************************/
15/02/02 17:03:05 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/02/02 17:03:05 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-237f7c54-db75-470c-8baf-d4dcfaddaf2f
15/02/02 17:03:05 INFO namenode.FSNamesystem: No KeyProvider found.
15/02/02 17:03:05 INFO namenode.FSNamesystem: fsLock is fair:true
15/02/02 17:03:05 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/02/02 17:03:05 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
15/02/02 17:03:05 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/02/02 17:03:05 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Feb 02 17:03:05
15/02/02 17:03:05 INFO util.GSet: Computing capacity for map BlocksMap
15/02/02 17:03:05 INFO util.GSet: VM type       = 64-bit
15/02/02 17:03:05 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
15/02/02 17:03:05 INFO util.GSet: capacity      = 2^21 = 2097152 entries
15/02/02 17:03:05 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
15/02/02 17:03:05 INFO blockmanagement.BlockManager: defaultReplication         = 3
15/02/02 17:03:05 INFO blockmanagement.BlockManager: maxReplication             = 512
15/02/02 17:03:05 INFO blockmanagement.BlockManager: minReplication             = 1
15/02/02 17:03:05 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
15/02/02 17:03:05 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
15/02/02 17:03:05 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/02/02 17:03:05 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
15/02/02 17:03:05 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
15/02/02 17:03:05 INFO namenode.FSNamesystem: fsOwner             = hadoop (auth:SIMPLE)
15/02/02 17:03:05 INFO namenode.FSNamesystem: supergroup          = supergroup
15/02/02 17:03:05 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/02/02 17:03:05 INFO namenode.FSNamesystem: Determined nameservice ID: mycluster
15/02/02 17:03:05 INFO namenode.FSNamesystem: HA Enabled: true
15/02/02 17:03:05 INFO namenode.FSNamesystem: Append Enabled: true
15/02/02 17:03:06 INFO util.GSet: Computing capacity for map INodeMap
15/02/02 17:03:06 INFO util.GSet: VM type       = 64-bit
15/02/02 17:03:06 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
15/02/02 17:03:06 INFO util.GSet: capacity      = 2^20 = 1048576 entries
15/02/02 17:03:06 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/02/02 17:03:06 INFO util.GSet: Computing capacity for map cachedBlocks
15/02/02 17:03:06 INFO util.GSet: VM type       = 64-bit
15/02/02 17:03:06 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
15/02/02 17:03:06 INFO util.GSet: capacity      = 2^18 = 262144 entries
15/02/02 17:03:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
15/02/02 17:03:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/02/02 17:03:06 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
15/02/02 17:03:06 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/02/02 17:03:06 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
15/02/02 17:03:06 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/02/02 17:03:06 INFO util.GSet: VM type       = 64-bit
15/02/02 17:03:06 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
15/02/02 17:03:06 INFO util.GSet: capacity      = 2^15 = 32768 entries
15/02/02 17:03:06 INFO namenode.NNConf: ACLs enabled? false
15/02/02 17:03:06 INFO namenode.NNConf: XAttrs enabled? true
15/02/02 17:03:06 INFO namenode.NNConf: Maximum size of an xattr: 16384
15/02/02 17:03:07 INFO namenode.FSImage: Allocated new BlockPoolId: BP-711086735-172.17.225.61-1422867787014
15/02/02 17:03:07 INFO common.Storage: Storage directory /bigdata/hadoop/dfs/name has been successfully formatted.
15/02/02 17:03:07 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid &gt;= 0
15/02/02 17:03:07 INFO util.ExitUtil: Exiting with status 0
15/02/02 17:03:07 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn1.hadoop/172.17.225.61
************************************************************/
[hadoop@nn1 micmiu]$

在主NN节点执行命令启动NN: hadoop-daemon.sh start namenode

可以对比查看启动前后NN节点的进程:

 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#启动前
[hadoop@nn1 micmiu]$ jps
8499 QuorumPeerMain
8771 DFSZKFailoverController
8988 Jps
8837 JournalNode
[hadoop@nn1 micmiu]$ hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-namenode-nn1.hadoop.out
#启动后
[hadoop@nn1 micmiu]$ jps
8499 QuorumPeerMain
9134 Jps
8771 DFSZKFailoverController
8837 JournalNode
9017 NameNode

4.6、在备NN上同步主NN的元数据信息
hdfs namenode -bootstrapStandby

 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[hadoop@nn2 ~]$ hdfs namenode -bootstrapStandby
15/02/02 17:04:43 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = nn2.hadoop/172.17.225.121
STARTUP_MSG:   args = [-bootstrapStandby]
STARTUP_MSG:   version = 2.6.0
STARTUP_MSG:   classpath = /usr/local/share/hadoop-2.6.0/etc/hadoop:/usr/local/share/hadoop-2.6.0......
STARTUP_MSG:   build = Unknown -r Unknown; compiled by 'hadoop' on 2015-01-29T15:07Z
STARTUP_MSG:   java = 1.7.0_75
************************************************************/
15/02/02 17:04:43 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/02/02 17:04:43 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
=====================================================
About to bootstrap Standby ID nn2 from:
           Nameservice ID: mycluster
        Other Namenode ID: nn1
  Other NN's HTTP address: http://nn1.hadoop:50070
  Other NN's IPC  address: nn1.hadoop/172.17.225.61:53310
             Namespace ID: 263802668
            Block pool ID: BP-711086735-172.17.225.61-1422867787014
               Cluster ID: CID-237f7c54-db75-470c-8baf-d4dcfaddaf2f
           Layout version: -60
=====================================================
15/02/02 17:04:44 INFO common.Storage: Storage directory /bigdata/hadoop/dfs/name has been successfully formatted.
15/02/02 17:04:45 INFO namenode.TransferFsImage: Opening connection to http://nn1.hadoop:50070/imagetransfer?getimage=1&amp;txid=0&amp;storageInfo=-60:263802668:0:CID-237f7c54-db75-470c-8baf-d4dcfaddaf2f
15/02/02 17:04:45 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
15/02/02 17:04:45 INFO namenode.TransferFsImage: Transfer took 0.00s at 0.00 KB/s
15/02/02 17:04:45 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 352 bytes.
15/02/02 17:04:45 INFO util.ExitUtil: Exiting with status 0
15/02/02 17:04:45 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nn2.hadoop/172.17.225.121
************************************************************/
[hadoop@nn2 ~]$

4.7、启动备NN

在备NN上执行命令:hadoop-daemon.sh start namenode

 
 
 
1
2
3
4
5
6
7
8
hadoop@nn2 ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-namenode-nn2.hadoop.out
[hadoop@nn2 ~]$ jps
7828 QuorumPeerMain
8198 JournalNode
8082 DFSZKFailoverController
8394 NameNode
8491 Jps

4.8、设置和确认主NN

本文配置的是自动切换,ZK已经自动选择一个节点作为主NN了,所以这一步可以省略,查看节点状态:

 
 
 
 
1
2
3
4
[hadoop@nn1 ~]$ hdfs haadmin -getServiceState nn1
active
[hadoop@nn1 ~]$ hdfs haadmin -getServiceState nn2
standby

如果是配置手动切换NN的,这一步是不可缺少的,因为系统还不知道谁是主NN,两个节点的NN都是Standby状态。手动激活主NN的命令:hdfs haadmin -transitionToActive nn1

4.9、在主NN上启动Datanode

启动所有datanode命令:hadoop-daemons.sh start datanode

注意:hadoop-daemons.sh 和 hadoop-daemon.sh 命令的差异

 
 
 
1
2
3
4
5
hadoop@nn1 ~]$ hadoop-daemons.sh start datanode
dn3.hadoop: starting datanode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-datanode-dn3.hadoop.out
dn1.hadoop: starting datanode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-datanode-dn1.hadoop.out
dn2.hadoop: starting datanode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-hadoop-datanode-dn2.hadoop.out
[hadoop@nn1 ~]$

4.10、启动YARN

方法一:一次性启动ResourceManager和NodeManager命令:start-yarn.sh

方法二:分别启动ResourceManager和NodeManager:

  • yarn-daemon.sh start resourcemanager
  • yarn-daemon.sh start nodemanager(如果有多个datanode,需使用yarn-daemons.sh)

ResourceManager 也配置了HA,根据命令查看节点状态:

yarn rmadmin –getServiceState serviceid

 
 
 
1
2
3
4
5
[hadoop@nn1 ~]$ yarn rmadmin -getServiceState rm1
active
[hadoop@nn1 ~]$ yarn rmadmin -getServiceState rm2
standby
[hadoop@nn1 ~]$

4.11 启动MR JobHistory Server

在dn1.hadoop上运行MRJS :mr-jobhistory-daemon.sh start historyserver

 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
//运行MRJS之前
[hadoop@dn1 ~]$ jps
14625 Jps
3568 NodeManager
748 QuorumPeerMain
1008 JournalNode
1194 DataNode
[hadoop@dn1 ~]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/share/hadoop-2.6.0/logs/mapred-hadoop-historyserver-dn1.hadoop.out
//运行MRJS之后
[hadoop@dn1 ~]$ jps
14745 JobHistoryServer
3568 NodeManager
748 QuorumPeerMain
1008 JournalNode
1194 DataNode
14786 Jps

4.12、验证NameNode 和ResourceManager 的HA是否生效

把当前主节点中的相关进程kill掉 查看各节点状态切换情况。

4.13、验证NN HA的透明性
注意验证 hdfs dfs -ls /  和  hdfs dfs -ls hdfs://mycluster/ 的访问效果是一致的:

 
 
 
1
2
3
4
5
6
7
8
9
[hadoop@nn1 ~]$ hdfs dfs -ls /
Found 2 items
drwx------   - hadoop supergroup          0 2015-02-02 23:42 /tmp
drwxr-xr-x   - hadoop supergroup          0 2015-02-02 23:39 /user
[hadoop@nn1 ~]$ hdfs dfs -ls hdfs://mycluster/
Found 2 items
drwx------   - hadoop supergroup          0 2015-02-02 23:42 hdfs://mycluster/tmp
drwxr-xr-x   - hadoop supergroup          0 2015-02-02 23:39 hdfs://mycluster/user
[hadoop@nn1 ~]$

[五]、 运行wrodcount demo

这个demo的演示可参考:hadoop 2.2.0 集群模式安装配置和测试 中的 wordcount演示步骤,这里不再重复描述了。




(责任编辑:IT)