hadoop分布式集群搭建
时间:2014-12-21 19:41 来源:linux.it.net.cn 作者:IT
hadoop版本:hadoop-0.20.205.0-1.i386.rpm
下载地址:http://www.fayea.com/apache-mirror/hadoop/common/hadoop-0.20.205.0/
jdk版本:jdk-6u35-linux-i586-rpm.bin
下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk6u35-downloads-1836443.html
环境为redhat6.2 32bit
master: 192.169.1.133
slave1: 192.169.1.134
slave2: 192.169.1.135
总体的步骤:
1.修改主机名/etc/hosts(虚拟机拷贝后若不一致要从新修改,从新分发)
2.创建一个普通账户(hadoop),hadoop以此账户运行。
2.root安装jdk
3.修改环境变量
4.安装hadoop,修改配置文件
5.将虚拟机拷贝2份,分别作为slave1,slave2
6.配置ssh,使两两之间,自己登陆自己都免密码
7.用普通账户格式化namenode
8.启动,并观察是否正常运行了
注意两个错误:
1.Warning: $HADOOP_HOME is deprecated. 关闭
解决方法:将export HADOOP_HOME_WARN_SUPPRESS=TRUE添加到每个节点的/etc/hadoop/hadoop-env.sh配置文件中。
2.提示不能创建虚拟机错误
#[root@master ~]# /usr/bin/start-all.sh
namenode running as process 26878. Stop it first.
slave2: starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-slave1.out
slave2: Unrecognized option: -jvm
slave2: Could not create the Java virtual machine.
slave1: Unrecognized option: -jvm
slave1: Could not create the Java virtual machine.
master: secondarynamenode running as process 26009. Stop it first.
jobtracker running as process 25461. Stop it first.
slave2: starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-slave2.out
slave1: starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-slave1.out
解决方法:root不能启动hadoop,需要用普通账户启动。
注意:下面hdfs-site.xml应该这样配置,否则两个从节点的datanode启动不了,同时应该创建/var/hadoop/data文件夹,并改拥有者和拥有者为hadoop
[root@master hadoop]# cat hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/var/hadoop/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
[plain] view plaincopy
-
------------------------1.设置主机ip地址映射/etc/hosts-----------------------------
-
------------------------2.添加hadoop用户,作为运行hadoop的用户---------------------
-
------------------------3.安装jdk并设置环境变量------------------------------------
-
-
[chen@master 桌面]$ su - root
-
密码:
-
[root@master ~]# useradd hadoop
-
[root@master ~]# passwd hadoop
-
更改用户 hadoop 的密码 。
-
新的 密码:
-
无效的密码: 过短
-
无效的密码: 过于简单
-
重新输入新的 密码:
-
passwd: 所有的身份验证令牌已经成功更新。
-
[root@master ~]# vim /etc/hosts
-
[root@master ~]# cat /etc/hosts
-
192.169.1.133 master
-
192.169.1.134 slave1
-
192.169.1.135 slave2
-
[root@master ~]# cd /home/chen/
-
[root@master chen]# ls
-
hadoop-0.20.205.0-1.i386.rpm 公共的 视频 文档 音乐
-
jdk-6u35-linux-i586-rpm.bin 模板 图片 下载 桌面
-
[root@master chen]# chmod 744 jdk-6u35-linux-i586-rpm.bin #给bin执行权限
-
[root@master chen]# ./jdk-6u35-linux-i586-rpm.bin
-
Unpacking...
-
Checksumming...
-
Extracting...
-
UnZipSFX 5.50 of 17 February 2002, by Info-ZIP (Zip-Bugs@lists.wku.edu).
-
inflating: jdk-6u35-linux-i586.rpm
-
inflating: sun-javadb-common-10.6.2-1.1.i386.rpm
-
inflating: sun-javadb-core-10.6.2-1.1.i386.rpm
-
inflating: sun-javadb-client-10.6.2-1.1.i386.rpm
-
inflating: sun-javadb-demo-10.6.2-1.1.i386.rpm
-
inflating: sun-javadb-docs-10.6.2-1.1.i386.rpm
-
inflating: sun-javadb-javadoc-10.6.2-1.1.i386.rpm
-
Preparing... ########################################### [100%]
-
1:jdk ########################################### [100%]
-
Unpacking JAR files...
-
rt.jar...
-
jsse.jar...
-
charsets.jar...
-
tools.jar...
-
localedata.jar...
-
plugin.jar...
-
javaws.jar...
-
deploy.jar...
-
Installing JavaDB
-
Preparing... ########################################### [100%]
-
1:sun-javadb-common ########################################### [ 17%]
-
2:sun-javadb-core ########################################### [ 33%]
-
3:sun-javadb-client ########################################### [ 50%]
-
4:sun-javadb-demo ########################################### [ 67%]
-
5:sun-javadb-docs ########################################### [ 83%]
-
6:sun-javadb-javadoc ########################################### [100%]
-
-
Java(TM) SE Development Kit 6 successfully installed.
-
-
Product Registration is FREE and includes many benefits:
-
* Notification of new versions, patches, and updates
-
* Special offers on Oracle products, services and training
-
* Access to early releases and documentation
-
-
Product and system data will be collected. If your configuration
-
supports a browser, the JDK Product Registration form will
-
be presented. If you do not register, none of this information
-
will be saved. You may also register your JDK later by
-
opening the register.html file (located in the JDK installation
-
directory) in a browser.
-
-
For more information on what data Registration collects and
-
how it is managed and used, see:
-
http://java.sun.com/javase/registration/JDKRegistrationPrivacy.html
-
-
Press Enter to continue.....
-
-
-
Done.
-
[root@master chen]# vim /etc/profile
-
[root@master chen]# ls /usr/java/jdk1.6.0_35/
-
bin lib register.html THIRDPARTYLICENSEREADME.txt
-
COPYRIGHT LICENSE register_ja.html
-
include man register_zh_CN.html
-
jre README.html src.zip
-
[root@master chen]# tail -3 /etc/profile #设置环境变量
-
export JAVA_HOME=/usr/java/jdk1.6.0_35
-
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
-
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
-
[root@master chen]#
-
-
-----------------------安装hadoop,修改配置文件-----------------------
-
#这一步完后后,将虚拟机拷贝两份,分别作为slave1,slave2
-
#若启动两个拷贝后,ip地址和hosts不一样,要改为实际的ip
-
-
[root@master chen]# ls
-
hadoop-0.20.205.0-1.i386.rpm 公共的
-
jdk-6u35-linux-i586.rpm 模板
-
jdk-6u35-linux-i586-rpm.bin 视频
-
sun-javadb-client-10.6.2-1.1.i386.rpm 图片
-
sun-javadb-common-10.6.2-1.1.i386.rpm 文档
-
sun-javadb-core-10.6.2-1.1.i386.rpm 下载
-
sun-javadb-demo-10.6.2-1.1.i386.rpm 音乐
-
sun-javadb-docs-10.6.2-1.1.i386.rpm 桌面
-
sun-javadb-javadoc-10.6.2-1.1.i386.rpm
-
[root@master chen]# rpm -ivh hadoop-0.20.205.0-1.i386.rpm
-
Preparing... ########################################### [100%]
-
1:hadoop ########################################### [100%]
-
[root@master chen]# cd /etc/hadoop/
-
[root@master hadoop]# ls
-
capacity-scheduler.xml hadoop-policy.xml slaves
-
configuration.xsl hdfs-site.xml ssl-client.xml.example
-
core-site.xml log4j.properties ssl-server.xml.example
-
fair-scheduler.xml mapred-queue-acls.xml taskcontroller.cfg
-
hadoop-env.sh mapred-site.xml
-
hadoop-metrics2.properties masters
-
[root@master hadoop]# vim hadoop-env.sh
-
-
export JAVA_HOME=/usr/java/jdk1.6.0_35
-
-
[root@master hadoop]# vim core-site.xml
-
[root@master hadoop]# cat core-site.xml
-
<?xml version="1.0"?>
-
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
-
<!-- Put site-specific property overrides in this file. -->
-
-
<configuration>
-
<property>
-
<name>fs.default.name</name>
-
<value>hdfs://master:9000</value>
-
</property>
-
</configuration>
-
-
[root@master hadoop]# vim hdfs-site.xml
-
[root@master hadoop]# cat hdfs-site.xml
-
<?xml version="1.0"?>
-
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
-
<!-- Put site-specific property overrides in this file. -->
-
-
<configuration>
-
<property>
-
-
<name>dfs.replication</name>
-
-
<value>2</value>
-
-
</property>
-
</configuration>
-
-
[root@master hadoop]# vim mapred-site.xml
-
[root@master hadoop]# cat mapred-site.xml
-
<?xml version="1.0"?>
-
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
-
<!-- Put site-specific property overrides in this file. -->
-
-
<configuration>
-
<property>
-
-
<name>mapred.job.tracker</name>
-
-
<value>hdfs://master:9001</value>
-
-
</property>
-
</configuration>
-
[root@master hadoop]# cat masters
-
master
-
[root@master hadoop]# cat slaves
-
slave1
-
slave2
-
[root@master hadoop]#
-
-
-
------------------------下面切换到hadoop用户,设置ssh免密码登陆------------------------------
-
-
[hadoop@master ~]$ ssh-keygen -t dsa #这个步骤在两个从节点上也要做一遍
-
Generating public/private dsa key pair.
-
Enter file in which to save the key (/home/hadoop/.ssh/id_dsa):
-
Created directory '/home/hadoop/.ssh'.
-
Enter passphrase (empty for no passphrase):
-
Enter same passphrase again:
-
Your identification has been saved in /home/hadoop/.ssh/id_dsa.
-
Your public key has been saved in /home/hadoop/.ssh/id_dsa.pub.
-
The key fingerprint is:
-
6f:88:68:8a:d6:c7:b0:c7:e2:8b:b7:fa:7b:b4:a1:56 hadoop@master
-
The key's randomart image is:
-
+--[ DSA 1024]----+
-
| |
-
| |
-
| |
-
| |
-
| S |
-
| . E . o |
-
| . @ + . o |
-
| o.X B . |
-
|ooB*O |
-
+-----------------+
-
[hadoop@master ~]$ cd .ssh/
-
[hadoop@master .ssh]$ ls
-
id_dsa id_dsa.pub
-
[hadoop@master .ssh]$ cp id_dsa.pub authorized_keys #必须要将公钥改名为authorized_keys
-
-
#编辑authorized_keys文件,将两个从节点中生成的公钥id_dsa.pub中的内容拷贝到authorized_keys中
-
[hadoop@master .ssh]$ vim authorized_keys
-
[hadoop@master .ssh]$ exit
-
logout
-
[chen@master .ssh]$ su - root
-
密码:
-
[root@master ~]# cd /home/hadoop/
-
[root@master hadoop]# ls
-
[root@master hadoop]# cd .ssh/
-
[root@master .ssh]# ls
-
authorized_keys id_dsa id_dsa.pub
-
-
#切换到root,将authorized_keys分别拷贝到两个从节点的/home/hadoop/.ssh下
-
#这里root拷贝的时候不需要输入密码,因为之前也被我设置免密码了
-
[root@master .ssh]# scp authorized_keys slave1:/home/hadoop/.ssh/
-
authorized_keys 100% 1602 1.6KB/s 00:00
-
[root@master .ssh]# scp authorized_keys slave2:/home/hadoop/.ssh/
-
authorized_keys 100% 1602 1.6KB/s 00:00
-
[root@master .ssh]#
-
-
#这样拷贝完后,三台机用hadoop用户ssh登陆就不需要密码了,
-
#注意,第一次登陆需要,然后再登陆就不需要了,一定要两两之间
-
#自己登陆自己都走一遍
-
-
-
-------------------------------格式化并启动hadoop---------------------------
-
#注意:把三台机的防火墙都关掉测试。
-
[hadoop@master ~]$ /usr/bin/hadoop namenode -format
-
12/09/01 16:52:24 INFO namenode.NameNode: STARTUP_MSG:
-
/************************************************************
-
STARTUP_MSG: Starting NameNode
-
STARTUP_MSG: host = master/192.169.1.133
-
STARTUP_MSG: args = [-format]
-
STARTUP_MSG: version = 0.20.205.0
-
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-205 -r 1179940; compiled by 'hortonfo' on Fri Oct 7 06:19:16 UTC 2011
-
************************************************************/
-
12/09/01 16:52:24 INFO util.GSet: VM type = 32-bit
-
12/09/01 16:52:24 INFO util.GSet: 2% max memory = 2.475 MB
-
12/09/01 16:52:24 INFO util.GSet: capacity = 2^19 = 524288 entries
-
12/09/01 16:52:24 INFO util.GSet: recommended=524288, actual=524288
-
12/09/01 16:52:24 INFO namenode.FSNamesystem: fsOwner=hadoop
-
12/09/01 16:52:24 INFO namenode.FSNamesystem: supergroup=supergroup
-
12/09/01 16:52:24 INFO namenode.FSNamesystem: isPermissionEnabled=true
-
12/09/01 16:52:24 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
-
12/09/01 16:52:24 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
-
12/09/01 16:52:24 INFO namenode.NameNode: Caching file names occuring more than 10 times
-
12/09/01 16:52:24 INFO common.Storage: Image file of size 112 saved in 0 seconds.
-
12/09/01 16:52:25 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
-
12/09/01 16:52:25 INFO namenode.NameNode: SHUTDOWN_MSG:
-
/************************************************************
-
SHUTDOWN_MSG: Shutting down NameNode at master/192.169.1.133
-
************************************************************/
-
[hadoop@master ~]$ /usr/bin/start-all.sh #启动信息像下面这样就正常启动了
-
starting namenode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-namenode-master.out
-
slave2: starting datanode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-datanode-slave2.out
-
slave1: starting datanode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-datanode-slave1.out
-
master: starting secondarynamenode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-secondarynamenode-master.out
-
starting jobtracker, logging to /var/log/hadoop/hadoop/hadoop-hadoop-jobtracker-master.out
-
slave2: starting tasktracker, logging to /var/log/hadoop/hadoop/hadoop-hadoop-tasktracker-slave2.out
-
slave1: starting tasktracker, logging to /var/log/hadoop/hadoop/hadoop-hadoop-tasktracker-slave1.out
-
#但是通过查看java的进程信息查不到,可能原因有两个:
-
#1.防火墙没关
-
#2.若防火墙关了还这样,重启。
-
[hadoop@master ~]$ /usr/java/jdk1.6.0_35/bin/jps
-
28499 Jps
-
[root@master ~]# iptables -F
-
[root@master ~]# exit
-
logout
-
[hadoop@master ~]$ /usr/bin/start-all.sh
-
starting namenode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-namenode-master.out
-
slave2: starting datanode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-datanode-slave2.out
-
slave1: starting datanode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-datanode-slave1.out
-
master: starting secondarynamenode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-secondarynamenode-master.out
-
starting jobtracker, logging to /var/log/hadoop/hadoop/hadoop-hadoop-jobtracker-master.out
-
slave2: starting tasktracker, logging to /var/log/hadoop/hadoop/hadoop-hadoop-tasktracker-slave2.out
-
slave1: starting tasktracker, logging to /var/log/hadoop/hadoop/hadoop-hadoop-tasktracker-slave1.out
-
[hadoop@master ~]$ /usr/java/jdk1.6.0_35/bin/jps
-
30630 Jps
-
---------------------------重启后正常了----------------------
-
------------------------master节点---------------------------
-
[hadoop@master ~]$ /usr/bin/start-all.sh
-
starting namenode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-namenode-master.out
-
slave2: starting datanode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-datanode-slave2.out
-
slave1: starting datanode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-datanode-slave1.out
-
master: starting secondarynamenode, logging to /var/log/hadoop/hadoop/hadoop-hadoop-secondarynamenode-master.out
-
starting jobtracker, logging to /var/log/hadoop/hadoop/hadoop-hadoop-jobtracker-master.out
-
slave2: starting tasktracker, logging to /var/log/hadoop/hadoop/hadoop-hadoop-tasktracker-slave2.out
-
slave1: starting tasktracker, logging to /var/log/hadoop/hadoop/hadoop-hadoop-tasktracker-slave1.out
-
[hadoop@master ~]$ /usr/java/jdk1.6.0_35/bin/jps
-
3388 JobTracker
-
3312 SecondaryNameNode
-
3159 NameNode
-
3533 Jps
-
-
------------------------salve1---------------------------------
-
-
[hadoop@master ~]$ ssh slave1
-
Last login: Sat Sep 1 16:51:48 2012 from slave2
-
[hadoop@slave1 ~]$ su - root
-
密码:
-
[root@slave1 ~]# iptables -F
-
[root@slave1 ~]# setenforce 0
-
[root@slave1 ~]# exit
-
logout
-
[hadoop@slave1 ~]$ /usr/java/jdk1.6.0_35/bin/jps
-
3181 TaskTracker
-
3107 DataNode
-
3227 Jps
-
-
--------------------------slave2------------------------------
-
[hadoop@master ~]$ ssh slave2
-
Last login: Sat Sep 1 16:52:02 2012 from slave2
-
[hadoop@slave2 ~]$ su - root
-
密码:
-
[root@slave2 ~]# iptables -F
-
[root@slave2 ~]# setenforce 0
-
[root@slave2 ~]# exit
-
logout
-
[hadoop@slave2 ~]$ /usr/java/jdk1.6.0_35/bin/jps
-
3165 DataNode
-
3297 Jps
-
3241 TaskTracker
-
[hadoop@slave2 ~]$
(责任编辑:IT)
hadoop版本:hadoop-0.20.205.0-1.i386.rpm 下载地址:http://www.fayea.com/apache-mirror/hadoop/common/hadoop-0.20.205.0/ jdk版本:jdk-6u35-linux-i586-rpm.bin 下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk6u35-downloads-1836443.html 环境为redhat6.2 32bit master: 192.169.1.133 slave1: 192.169.1.134 slave2: 192.169.1.135 总体的步骤: 1.修改主机名/etc/hosts(虚拟机拷贝后若不一致要从新修改,从新分发) 2.创建一个普通账户(hadoop),hadoop以此账户运行。 2.root安装jdk 3.修改环境变量 4.安装hadoop,修改配置文件 5.将虚拟机拷贝2份,分别作为slave1,slave2 6.配置ssh,使两两之间,自己登陆自己都免密码 7.用普通账户格式化namenode 8.启动,并观察是否正常运行了 注意两个错误: 1.Warning: $HADOOP_HOME is deprecated. 关闭 解决方法:将export HADOOP_HOME_WARN_SUPPRESS=TRUE添加到每个节点的/etc/hadoop/hadoop-env.sh配置文件中。 2.提示不能创建虚拟机错误 #[root@master ~]# /usr/bin/start-all.sh namenode running as process 26878. Stop it first. slave2: starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-slave2.out slave1: starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-slave1.out slave2: Unrecognized option: -jvm slave2: Could not create the Java virtual machine. slave1: Unrecognized option: -jvm slave1: Could not create the Java virtual machine. master: secondarynamenode running as process 26009. Stop it first. jobtracker running as process 25461. Stop it first. slave2: starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-slave2.out slave1: starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-slave1.out 解决方法:root不能启动hadoop,需要用普通账户启动。 注意:下面hdfs-site.xml应该这样配置,否则两个从节点的datanode启动不了,同时应该创建/var/hadoop/data文件夹,并改拥有者和拥有者为hadoop
[root@master hadoop]# cat hdfs-site.xml
[plain] view plaincopy
(责任编辑:IT) |