ubuntu 安装 hadoop(单机版)
时间:2015-10-20 15:44 来源:linux.it.net.cn 作者:IT
网上虽然有很多类似的文章,但大部分都有部分细节讲清楚,导致本小哥装了好久才装上。
为此,遂起心想整理下一份尽可能清晰的安装步骤来,遂成此文。
-----------------------------------------------------------------------------------------------------------------------------------
Hadoop是MapReduce的开源实现,现在介绍它的单机模式安装方法。
实验平台:Ubuntu 9.04,Hadoop0.20,JDK1.6
1,我是在deskTop版本的Ubuntu下安装的所以,需要先安装ssh server。这个很好找,直接到到新立得里搜索ssh,第一个就是。
2,安装sun jdk6,切忌一定要是java6及其以上版本。先到更新管理器里把canonical的源加进去。
之后sudo apt-get update.
3, sudo apt-get sun-java6-jdk
4, sudo update-java-alternatives -s java-6-sun
5,增加一个用户组用户,用于hadoop运行及访问。
6,生成SSH证书,配置SSH加密key
配置完成,测试一下:
>> 下面就是安装Hadoop了。
首先是:下载,解压缩,分配权限。
?请至下载 Hadoop 0.20,并解开压缩文件到 /opt 路径。
?$ tar zxvf hadoop-0.20.0.tar.gz
?$ sudo mv hadoop-0.20.0 /opt/
?$ sudo chown -R hadoop:hadoop /opt/hadoop-0.20.0
?$ sudo ln -sf /opt/hadoop-0.20.0 /opt/hadoop
》》 设定 hadoop-env.sh
?进入 hadoop 目录,做进一步的设定。我们需要修改两个档案,第一个是 hadoop-env.sh,需要设定 JAVA_HOME, HADOOP_HOME, PATH 三个环境变量。
/opt$ cd hadoop/
/opt/hadoop$ cat >> conf/hadoop-env.sh << EOF
贴上以下信息
export JAVA_HOME=/usr/lib/jvm/java-6-sun[注意,这里要改成自己的JAVA_HOME地址]
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:/opt/hadoop/bin
EOF
ook,现在安装完毕。
》》设定 hadoop配置文件
?編輯 /opt/hadoop/conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop/hadoop-${user.name}</value>
</property>
</configuration>
?編輯 /opt/hadoop/conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
?編輯 /opt/hadoop/conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
》》格式化HDFS
?以上我们已经设定好 Hadoop 单机测试的环境,接着让我们来启动 Hadoop 相关服务,格式化 namenode, secondarynamenode, tasktracker
?$ cd /opt/hadoop
?$ source /opt/hadoop/conf/hadoop-env.sh
?$ hadoop namenode -format
执行画面如:
09/03/23 20:19:47 INFO dfs.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = /localhost
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
************************************************************/
09/03/23 20:19:47 INFO fs.FSNamesystem: fsOwner=hadooper,hadooper
09/03/23 20:19:47 INFO fs.FSNamesystem: supergroup=supergroup
09/03/23 20:19:47 INFO fs.FSNamesystem: isPermissionEnabled=true
09/03/23 20:19:47 INFO dfs.Storage: Image file of size 82 saved in 0 seconds.
09/03/23 20:19:47 INFO dfs.Storage: Storage directory /tmp/hadoop-hadooper/dfs/name has been successfully formatted.
09/03/23 20:19:47 INFO dfs.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at /localhost
************************************************************/
》》启动Hadoop
?接着用 start-all.sh 来启动所有服务,包含 namenode, datanode,
/opt/hadoop$ bin/start-all.sh
执行画面如:
starting namenode, logging to /opt/hadoop/logs/hadoop-hadooper-namenode-vPro.out
localhost: starting datanode, logging to /opt/hadoop/logs/hadoop-hadooper-datanode-vPro.out
localhost: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadooper-secondarynamenode-vPro.out
starting jobtracker, logging to /opt/hadoop/logs/hadoop-hadooper-jobtracker-vPro.out
也可以通过jps命令:
若安装成功,当输入jps命令时,将会有如下画面:
4706 JobTracker
4582 SecondaryNameNode
4278 NameNode
4413 DataNode
4853 TaskTracker
4889 Jps
》》 停止服务的脚本是:
上面的内容主要参考:http://bbs.chinacloud.cn/archiver/showtopic-589.aspx and
http://www.hadoopor.com/thread-2674-1-1.html
下面是运行hadoop自带wordcount的步骤(主要转自:http://nlslzf.iteye.com/blog/810948)
》》准备执行wordcount任务的数据
a.在hdfs 中建立一个input 目录
/opt/hadoop$ mkdir input
b.先在本地磁盘建立两个输入文件file01 和file02:
/opt/hadoop$ echo “Hello World Bye World” > file01
/opt/hadoop$ echo “Hello World Bye World” > file02
c. 将file01 和file02 拷贝到hdfs 中:
(1)cp file01 ./input
(2)cp file02 ./input
然后,将本地目录input上传到HDFS文件系统上,执行如下命令:
[root@localhost hadoop-0.19.2]# bin/hadoop fs -put input/ input[很关键的一步]
》》执行wordcount:
在hadoop所在目录里有几个jar文件,其中hadoop-0.19.2-examples.jar就是我们需要的,它里面含有wordcount,咱们使用命
下面的hadoop-0.19.2-examples.jar 注意改成自己版本对应的jar
(4)启动wordcount任务
执行如下命令行:
[root@localhost hadoop-0.19.2]# bin/hadoop jar hadoop-0.19.2-examples.jar wordcount input output
元数据目录为input,输出数据目录为output。
任务执行信息如下所示:
10/08/01 19:06:15 INFO mapred.FileInputFormat: Total input paths to process : 4
10/08/01 19:06:15 INFO mapred.JobClient: Running job: job_201008011904_0002
10/08/01 19:06:16 INFO mapred.JobClient: map 0% reduce 0%
10/08/01 19:06:22 INFO mapred.JobClient: map 20% reduce 0%
10/08/01 19:06:24 INFO mapred.JobClient: map 40% reduce 0%
10/08/01 19:06:25 INFO mapred.JobClient: map 60% reduce 0%
10/08/01 19:06:27 INFO mapred.JobClient: map 80% reduce 0%
10/08/01 19:06:28 INFO mapred.JobClient: map 100% reduce 0%
10/08/01 19:06:38 INFO mapred.JobClient: map 100% reduce 26%
10/08/01 19:06:40 INFO mapred.JobClient: map 100% reduce 100%
10/08/01 19:06:41 INFO mapred.JobClient: Job complete: job_201008011904_0002
10/08/01 19:06:41 INFO mapred.JobClient: Counters: 16
10/08/01 19:06:41 INFO mapred.JobClient: File Systems
10/08/01 19:06:41 INFO mapred.JobClient: HDFS bytes read=301489
10/08/01 19:06:41 INFO mapred.JobClient: HDFS bytes written=113098
10/08/01 19:06:41 INFO mapred.JobClient: Local bytes read=174004
10/08/01 19:06:41 INFO mapred.JobClient: Local bytes written=348172
10/08/01 19:06:41 INFO mapred.JobClient: Job Counters
10/08/01 19:06:41 INFO mapred.JobClient: Launched reduce tasks=1
10/08/01 19:06:41 INFO mapred.JobClient: Launched map tasks=5
10/08/01 19:06:41 INFO mapred.JobClient: Data-local map tasks=5
10/08/01 19:06:41 INFO mapred.JobClient: Map-Reduce Framework
10/08/01 19:06:41 INFO mapred.JobClient: Reduce input groups=8997
10/08/01 19:06:41 INFO mapred.JobClient: Combine output records=10860
10/08/01 19:06:41 INFO mapred.JobClient: Map input records=7363
10/08/01 19:06:41 INFO mapred.JobClient: Reduce output records=8997
10/08/01 19:06:41 INFO mapred.JobClient: Map output bytes=434077
10/08/01 19:06:41 INFO mapred.JobClient: Map input bytes=299871
10/08/01 19:06:41 INFO mapred.JobClient: Combine input records=39193
10/08/01 19:06:41 INFO mapred.JobClient: Map output records=39193
10/08/01 19:06:41 INFO mapred.JobClient: Reduce input records=10860
(5)查看任务执行结果
可以通过如下命令行:
bin/hadoop fs -cat output/*
执行结果,截取部分显示如下所示:
vijayarenu 20
violations. 1
virtual 3
vis-a-vis 1
visible 1
visit 1
volume 1
volume, 1
volumes 2
volumes. 1
w.r.t 2
wait 9
waiting 6
waiting. 1
waits 3
want 1
warning 7
warning, 1
warnings 12
warnings. 3
warranties 1
warranty 1
warranty, 1
(6)终止Hadoop相关后台进程
执行如下命令行:
[root@localhost hadoop-0.19.2]# bin/stop-all.sh
执行信息如下所示:
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
已经将上面列出的5个进程jobtracker、tasktracker、namenode、datanode、secondarynamenode
上文主要参见:http://blog.csdn.net/yangchao228/article/details/6646977 和 http://nlslzf.iteye.com/blog/810948
(责任编辑:IT)
网上虽然有很多类似的文章,但大部分都有部分细节讲清楚,导致本小哥装了好久才装上。
为此,遂起心想整理下一份尽可能清晰的安装步骤来,遂成此文。
-----------------------------------------------------------------------------------------------------------------------------------
Hadoop是MapReduce的开源实现,现在介绍它的单机模式安装方法。 实验平台:Ubuntu 9.04,Hadoop0.20,JDK1.6
》》设定 hadoop配置文件 ?編輯 /opt/hadoop/conf/core-site.xml <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop/hadoop-${user.name}</value> </property> </configuration> ?編輯 /opt/hadoop/conf/hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> ?編輯 /opt/hadoop/conf/mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration> 》》格式化HDFS ?以上我们已经设定好 Hadoop 单机测试的环境,接着让我们来启动 Hadoop 相关服务,格式化 namenode, secondarynamenode, tasktracker ?$ cd /opt/hadoop ?$ source /opt/hadoop/conf/hadoop-env.sh ?$ hadoop namenode -format 执行画面如: 09/03/23 20:19:47 INFO dfs.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = /localhost STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.3 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009 ************************************************************/ 09/03/23 20:19:47 INFO fs.FSNamesystem: fsOwner=hadooper,hadooper 09/03/23 20:19:47 INFO fs.FSNamesystem: supergroup=supergroup 09/03/23 20:19:47 INFO fs.FSNamesystem: isPermissionEnabled=true 09/03/23 20:19:47 INFO dfs.Storage: Image file of size 82 saved in 0 seconds. 09/03/23 20:19:47 INFO dfs.Storage: Storage directory /tmp/hadoop-hadooper/dfs/name has been successfully formatted. 09/03/23 20:19:47 INFO dfs.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at /localhost ************************************************************/ 》》启动Hadoop ?接着用 start-all.sh 来启动所有服务,包含 namenode, datanode, /opt/hadoop$ bin/start-all.sh 执行画面如: starting namenode, logging to /opt/hadoop/logs/hadoop-hadooper-namenode-vPro.out localhost: starting datanode, logging to /opt/hadoop/logs/hadoop-hadooper-datanode-vPro.out localhost: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadooper-secondarynamenode-vPro.out starting jobtracker, logging to /opt/hadoop/logs/hadoop-hadooper-jobtracker-vPro.out 也可以通过jps命令:
若安装成功,当输入jps命令时,将会有如下画面:
4706 JobTracker
4582 SecondaryNameNode
4278 NameNode
4413 DataNode
4853 TaskTracker
4889 Jps
上面的内容主要参考:http://bbs.chinacloud.cn/archiver/showtopic-589.aspx and
http://www.hadoopor.com/thread-2674-1-1.html
下面是运行hadoop自带wordcount的步骤(主要转自:http://nlslzf.iteye.com/blog/810948)
》》准备执行wordcount任务的数据
a.在hdfs 中建立一个input 目录
/opt/hadoop$ mkdir input
b.先在本地磁盘建立两个输入文件file01 和file02:
/opt/hadoop$ echo “Hello World Bye World” > file01
/opt/hadoop$ echo “Hello World Bye World” > file02
c. 将file01 和file02 拷贝到hdfs 中:
(1)cp file01 ./input
(2)cp file02 ./input
然后,将本地目录input上传到HDFS文件系统上,执行如下命令:
》》执行wordcount: 在hadoop所在目录里有几个jar文件,其中hadoop-0.19.2-examples.jar就是我们需要的,它里面含有wordcount,咱们使用命 下面的hadoop-0.19.2-examples.jar 注意改成自己版本对应的jar
(4)启动wordcount任务
执行如下命令行: [root@localhost hadoop-0.19.2]# bin/hadoop jar hadoop-0.19.2-examples.jar wordcount input output 元数据目录为input,输出数据目录为output。 任务执行信息如下所示: 10/08/01 19:06:15 INFO mapred.FileInputFormat: Total input paths to process : 4 10/08/01 19:06:15 INFO mapred.JobClient: Running job: job_201008011904_0002 10/08/01 19:06:16 INFO mapred.JobClient: map 0% reduce 0% 10/08/01 19:06:22 INFO mapred.JobClient: map 20% reduce 0% 10/08/01 19:06:24 INFO mapred.JobClient: map 40% reduce 0% 10/08/01 19:06:25 INFO mapred.JobClient: map 60% reduce 0% 10/08/01 19:06:27 INFO mapred.JobClient: map 80% reduce 0% 10/08/01 19:06:28 INFO mapred.JobClient: map 100% reduce 0% 10/08/01 19:06:38 INFO mapred.JobClient: map 100% reduce 26% 10/08/01 19:06:40 INFO mapred.JobClient: map 100% reduce 100% 10/08/01 19:06:41 INFO mapred.JobClient: Job complete: job_201008011904_0002 10/08/01 19:06:41 INFO mapred.JobClient: Counters: 16 10/08/01 19:06:41 INFO mapred.JobClient: File Systems 10/08/01 19:06:41 INFO mapred.JobClient: HDFS bytes read=301489 10/08/01 19:06:41 INFO mapred.JobClient: HDFS bytes written=113098 10/08/01 19:06:41 INFO mapred.JobClient: Local bytes read=174004 10/08/01 19:06:41 INFO mapred.JobClient: Local bytes written=348172 10/08/01 19:06:41 INFO mapred.JobClient: Job Counters 10/08/01 19:06:41 INFO mapred.JobClient: Launched reduce tasks=1 10/08/01 19:06:41 INFO mapred.JobClient: Launched map tasks=5 10/08/01 19:06:41 INFO mapred.JobClient: Data-local map tasks=5 10/08/01 19:06:41 INFO mapred.JobClient: Map-Reduce Framework 10/08/01 19:06:41 INFO mapred.JobClient: Reduce input groups=8997 10/08/01 19:06:41 INFO mapred.JobClient: Combine output records=10860 10/08/01 19:06:41 INFO mapred.JobClient: Map input records=7363 10/08/01 19:06:41 INFO mapred.JobClient: Reduce output records=8997 10/08/01 19:06:41 INFO mapred.JobClient: Map output bytes=434077 10/08/01 19:06:41 INFO mapred.JobClient: Map input bytes=299871 10/08/01 19:06:41 INFO mapred.JobClient: Combine input records=39193 10/08/01 19:06:41 INFO mapred.JobClient: Map output records=39193 10/08/01 19:06:41 INFO mapred.JobClient: Reduce input records=10860 (5)查看任务执行结果 可以通过如下命令行: bin/hadoop fs -cat output/* 执行结果,截取部分显示如下所示: vijayarenu 20 violations. 1 virtual 3 vis-a-vis 1 visible 1 visit 1 volume 1 volume, 1 volumes 2 volumes. 1 w.r.t 2 wait 9 waiting 6 waiting. 1 waits 3 want 1 warning 7 warning, 1 warnings 12 warnings. 3 warranties 1 warranty 1 warranty, 1 (6)终止Hadoop相关后台进程 执行如下命令行: [root@localhost hadoop-0.19.2]# bin/stop-all.sh 执行信息如下所示: stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode 已经将上面列出的5个进程jobtracker、tasktracker、namenode、datanode、secondarynamenode
上文主要参见:http://blog.csdn.net/yangchao228/article/details/6646977 和 http://nlslzf.iteye.com/blog/810948
|