本文主要向大家介绍有关Hadoop集群搭建,从配置环境到简单的命令启动一一进行了讲解。希望对于想接触hadoop的你有所帮助。 首先说一下配置环境:三台电脑
192.168.30.149 hadoop149 namenode和jobtracker ###因为149机器稍微好一点 192.168.30.150 hadoop150 datanode和TaskTracker 192.168.30.148 hadoop150 datanode和TaskTracker 配置ssh无需密码登陆:
$ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
我的master在149可以吧149的.pub文件拷贝到150和148上 然后执行
我用的hadoop版本是hadoop-0.20.2 下载地址: http://labs.renren.com/apache-mirror/hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz 或 http://download.csdn.net/detail/wangyq_412/2834991 http://download.csdn.net/detail/wangyq_412/2835017
google吧 过两天弄个网盘都放在上面再写到这里。
下载后:编辑几个文件:
在/root/hadoop-0.20.2/conf中(这里注意的是几台电脑的hadoop文件路径必须相同):加入如下一句话
[root@localhostconf]# vim
[root@localhostconf]# vim core-site.xml
fs.default.name hdfs://192.168.30.149:9000 ###具体的意义之后会讲解 [root@localhostconf]# vim mapred-site.xml mapred.job.tracker
hdfs://192.168.30.149:9004 [root@localhostconf]# vim hdfs-site.xml
dfs.replication 2 [root@localhostconf]# vim masters
hadoop149
[root@localhostconf]# vim slaves
hadoop150 hadoop148
一共编辑了5个文件,具体意义代表什么,之后会讲到 这里注意要被指/etc/hosts文件,如下(192.168.30.149): [root@localhostconf]# vim /etc/hosts
# Do not removethe following line, or various programs # that requirenetwork functionality will fail. 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6
192.168.30.149hadoop149 192.168.30.150hadoop150 192.168.30.148hadoop148
4.启动hadoop: 这里用简单的命令进行启动, A.格式化文件系统:
#bin/hadoop namenode –format B.启动hadoop #bin/start-all.sh C.利用hadoop自带的例子测试hadoop是否启动成功
#bin/hadoop fs -mkdir input ###在文件系统中创建input文件夹 #bin/hadoopfs -put README.txt input ###把本地readme.txt上传到input中 #bin/hadoop fs –lsr ###查看本件系统所有文件 存在文件并且大小不为0则hadoop文件系统搭建成功。 #bin/hadoopjar hadoop-0.20.2-examples.jar wordcount input/README.txt output ###将输出结果输出到output中 #bin/hadoop jar hadoop-0.20.2-examples.jar wordcount input/1.txt output 11/12/02 17:47:14 INFOinput.FileInputFormat: Total input paths to process : 1 11/12/02 17:47:14 INFO mapred.JobClient:Running job: job_201112021743_0001 11/12/02 17:47:15 INFOmapred.JobClient: map 0% reduce 0% 11/12/02 17:47:22 INFOmapred.JobClient: map 100% reduce 0% 11/12/02 17:47:34 INFOmapred.JobClient: map 100% reduce 100% 11/12/02 17:47:36 INFO mapred.JobClient:Job complete: job_201112021743_0001 11/12/02 17:47:36 INFO mapred.JobClient:Counters: 17 11/12/02 17:47:36 INFOmapred.JobClient: Job Counters 11/12/02 17:47:36 INFOmapred.JobClient: Launched reducetasks=1 11/12/02 17:47:36 INFOmapred.JobClient: Launched maptasks=1 11/12/02 17:47:36 INFOmapred.JobClient: Data-local maptasks=1 11/12/02 17:47:36 INFOmapred.JobClient: FileSystemCounters 11/12/02 17:47:36 INFOmapred.JobClient: FILE_BYTES_READ=32523 11/12/02 17:47:36 INFOmapred.JobClient: HDFS_BYTES_READ=44253 11/12/02 17:47:36 INFOmapred.JobClient: FILE_BYTES_WRITTEN=65078 11/12/02 17:47:36 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=23148 11/12/02 17:47:36 INFOmapred.JobClient: Map-Reduce Framework 11/12/02 17:47:36 INFOmapred.JobClient: Reduce inputgroups=2367 11/12/02 17:47:36 INFOmapred.JobClient: Combine outputrecords=2367 11/12/02 17:47:36 INFOmapred.JobClient: Map inputrecords=734 11/12/02 17:47:36 INFOmapred.JobClient: Reduce shufflebytes=32523 11/12/02 17:47:36 INFOmapred.JobClient: Reduce outputrecords=2367 11/12/02 17:47:36 INFO mapred.JobClient: Spilled Records=4734 11/12/02 17:47:36 INFOmapred.JobClient: Map outputbytes=73334 11/12/02 17:47:36 INFOmapred.JobClient: Combine inputrecords=7508 11/12/02 17:47:36 INFOmapred.JobClient: Map outputrecords=7508 11/12/02 17:47:36 INFOmapred.JobClient: Reduce inputrecords=2367 也可以通过本地浏览器进行查看状态:50070和50030端口(注意配置本地C:\Windows\System32\drivers\etc\hosts文件)
192.168.30.150 hadoop150 192.168.30.149 hadoop149 192.168.30.148 hadoop148 (责任编辑:IT) |