Linux版本
Centos 6.5 (64位)
Hadoop版本
Hadoop 2.7.3
其他所需軟件
java (1.7+) [安裝、配置,參考 http://blog.csdn.net/molaifeng/article/details/50160929]
ssh [安裝、配置,參考 http://blog.csdn.net/molaifeng/article/details/51684086]
vim [安裝,yum install -y vim]
集群概況
皆為虛擬機(jī),網(wǎng)絡(luò)為橋接
10.254.21.30 Master
10.254.21.122 Slave1
10.254.21.29 Slave2
10.254.21.35 Slave3
添加用戶
添加用戶
useradd hadoop
設(shè)置密碼
passwd
擁有管理員權(quán)限
vim /etc/sudoers
hadoop ALL=(ALL) ALL
配置網(wǎng)絡(luò)和主機(jī)名
配置網(wǎng)絡(luò)
在/etc/hosts里加上
10.254.21.30 Master
10.254.21.122 Slave1
10.254.21.29 Slave2
10.254.21.35 Slave3
同時(shí),需查看localhost對(duì)應(yīng)的是127.0.0.1
127.0.0.1 localhost
即
127.0.0.1 localhost
10.254.21.30 Master
10.254.21.122 Slave1
10.254.21.29 Slave2
10.254.21.35 Slave3
修改主機(jī)名
四臺(tái)服務(wù)器修改主機(jī)名分別為Master、Slave1、Slave2、Slave3
vim /etc/sysconfig/network
HOSTNAME=Master
重啟
reboot
查看主機(jī)名
hostname
在Master服務(wù)器上,顯示Master則成功了。
SSH無(wú)密碼登錄
切換到Hadoop用戶
su hadoop
設(shè)置無(wú)密碼登錄
cd ~/.ssh/
ssh-keygen -t rsa # 只管按按回車即可
cat id_rsa.pub >> authorized_keys # 加入本機(jī)授權(quán)
chmod 600 authorized_keys
配置好后,ssh localhost
便不用輸入密碼了,其余三臺(tái)服務(wù)器都操作。最后把Master服務(wù)器上的id_rsa.pub文件內(nèi)容追加到三臺(tái)Slave服務(wù)器上的authorized_keys,使得Master無(wú)需密碼即可登錄三臺(tái)Slave服務(wù)器。
注:如果在ssh登錄過(guò)程中出現(xiàn)“Permanently added (RSA) to the list of known hosts”提示,進(jìn)入/etc/ssh/ssh_config
,把StrictHostKeyChecking no
及UserKnownHostsFile /dev/null
給注釋了即可。
關(guān)閉防火墻
sudo service iptables stop
或是不讓其開啟
sudo chkconfig iptables off
安裝、配置Hadoop
安裝
也可以到官網(wǎng)去下載穩(wěn)定版本.
cd /usr/localhost
sudo wget http://mirrors.cnnic.cn/apache/hadoop/common/stable/hadoop-2.7.3.tar.gz
sudo tar zxf hadoop-2.7.3.tar.gz
sudo mv hadoop-2.7.3 hadoop
sudo chown hadoop.hadoop hadoop -R
由于下載的是編譯的版本,因此可以開箱即用,先看看版本,檢查下是否可用
cd /usr/local/hadoop
./bin/hadoop version
出現(xiàn)版本信息說(shuō)明可用
Hadoop 2.7.3
Subversion https://git-wip-us./repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar
配置
- 修改環(huán)境變量,將hadoop加進(jìn)去,其余三臺(tái)服務(wù)器都要操作
sudo vim /etc/profile
export HADOOP_HOME = /usr/local/hadoop
export PATH = $JAVA_HOme/bin:$HADOOP_HOME/bin:$PATH
在命令行輸入. /etc/profile并回車使其生效
- 修改配置文件【/usr/local/hadoop/etc/hadoop/】
slaves,去掉localhost,添加Slave,即NameNode為Master服務(wù)器,DataNode為三臺(tái)Slave服務(wù)器
Slave1
Slave2
Slave3
hadoop-env.sh
export JAVA_HOME='/usr/lib/jvm/java-1.8.0-openjdk.x86_64'
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
</configuration>
若沒有配置 hadoop.tmp.dir 參數(shù),則默認(rèn)使用的臨時(shí)目錄為 /tmp/hadoo-hadoop,而這個(gè)目錄在重啟時(shí)有可能被系統(tǒng)清理掉,導(dǎo)致必須重新執(zhí)行 format 才行。由于默認(rèn)的安裝包沒有tmp目錄,因此需要手動(dòng)添加, sudo mkdir /usr/local/hadoop/tmp
hdfs-site.xml,其中dfs.replication為3,因?yàn)橛腥齻€(gè)DataNode節(jié)點(diǎn)
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>Master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Master:19888</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
打包hadoop目錄,發(fā)送給三臺(tái)Slave服務(wù)器
sudo tar -zcf ~/hadoop.master.tar.gz /user/local/hadoop
scp ~/hadoop.master.tar.gz hadoop@Slave1:/home/hadoop
在Slave1服務(wù)器上解壓、修改權(quán)限
tar -zxf ~/hadoop.master.tar.gz -C /usr/local
sudo chown hadoop.hadoop hadoop -R
其余Slave兩臺(tái)服務(wù)器操作也一樣。至此,Master、Slave服務(wù)器上環(huán)境都配好了,接下來(lái)就是在Master服務(wù)器上啟動(dòng)Hadoop。
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
分別在Master和Slave服務(wù)器上輸入jps,查看運(yùn)行的Java進(jìn)程
Master
[hadoop@Master hadoop]$ jps
6096 NameNode
6704 JobHistoryServer
6281 SecondaryNameNode
6427 ResourceManager
29535 Jps
Slave1
[hadoop@Slave1 hadoop]$ jps
83617 Jps
56530 DataNode
56635 NodeManager
在Master服務(wù)器上輸入hdfs dfsadmin -report
查看DataNode是否正常啟動(dòng)
Configured Capacity: 55574487040 (51.76 GB)
Present Capacity: 16329748480 (15.21 GB)
DFS Remaining: 16329654272 (15.21 GB)
DFS Used: 94208 (92 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (3):
Name: 10.254.21.35:50010 (Slave3)
Hostname: Slave3
Decommission Status : Normal
Configured Capacity: 18569568256 (17.29 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 11105554432 (10.34 GB)
DFS Remaining: 7463985152 (6.95 GB)
DFS Used%: 0.00%
DFS Remaining%: 40.19%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Oct 27 11:16:56 CST 2016
Name: 10.254.21.29:50010 (Slave2)
Hostname: Slave2
Decommission Status : Normal
Configured Capacity: 18569568256 (17.29 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 11105423360 (10.34 GB)
DFS Remaining: 7464112128 (6.95 GB)
DFS Used%: 0.00%
DFS Remaining%: 40.20%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Oct 27 11:16:56 CST 2016
Name: 10.254.21.122:50010 (Slave1)
Hostname: Slave1
Decommission Status : Normal
Configured Capacity: 18435350528 (17.17 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 17033760768 (15.86 GB)
DFS Remaining: 1401556992 (1.31 GB)
DFS Used%: 0.00%
DFS Remaining%: 7.60%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Oct 27 11:16:56 CST 2016
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
也可以在瀏覽器上輸入http://10.254.21.110:50070查看NameNode和DataNode信息。

運(yùn)行DEMO
在Master服務(wù)器上創(chuàng)建 HDFS 上的用戶目錄
hdfs dfs -mkdir -p /user/hadoop
查看
[hadoop@Master hadoop]$ hdfs dfs -ls /user
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2016-10-21 17:46 /user/hadoop
創(chuàng)建Hadoop用戶的input目錄
hdfs dfs -mkdir input
正確:hdfs dfs -mkdir -p input
上傳到input目錄
./bin/hdfs dfs -put ./etc/hadoop/*.xml input
查看
[hadoop@Master hadoop]$ hdfs dfs -ls input
Found 9 items
-rw-r--r-- 1 hadoop supergroup 4436 2016-10-24 10:03 input/capacity-scheduler.xml
-rw-r--r-- 1 hadoop supergroup 1072 2016-10-24 10:03 input/core-site.xml
-rw-r--r-- 1 hadoop supergroup 9683 2016-10-24 10:03 input/hadoop-policy.xml
-rw-r--r-- 1 hadoop supergroup 1322 2016-10-24 10:03 input/hdfs-site.xml
-rw-r--r-- 1 hadoop supergroup 620 2016-10-24 10:03 input/httpfs-site.xml
-rw-r--r-- 1 hadoop supergroup 3518 2016-10-24 10:03 input/kms-acls.xml
-rw-r--r-- 1 hadoop supergroup 5511 2016-10-24 10:03 input/kms-site.xml
-rw-r--r-- 1 hadoop supergroup 1134 2016-10-24 10:03 input/mapred-site.xml
-rw-r--r-- 1 hadoop supergroup 930 2016-10-24 10:03 input/yarn-site.xml
運(yùn)行實(shí)例
./bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'

在Master服務(wù)器上關(guān)閉 Hadoop 集群
stop-yarn.sh
stop-dfs.sh
mr-jobhistory-daemon.sh stop historyserver
遇到的坑
由于是在虛擬機(jī)上安裝、配置,時(shí)間上是不一致的,需要同步時(shí)間
ntpdate pool.ntp.org
當(dāng)命令行報(bào)WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-Java classes where applicable
此警告時(shí),無(wú)需擔(dān)心,因?yàn)橄碌亩M(jìn)制包,若是礙眼,可以去官網(wǎng)下載源碼包,安裝、編譯即可。
當(dāng)啟動(dòng)Hadoop遇到DataNode沒起來(lái)時(shí),可以在Slave服務(wù)器上刪除tmp目錄,之后再重新啟動(dòng)。
sudo rm -rf tmp
sudo rm -rf logs/*
hdfs namenode -format