大數(shù)據(jù)平臺(tái)Hadoop集群搭建

路人甲Java 2022-03-01

展開(kāi)全文

　　一、概念

　　Hadoop是由java語(yǔ)言編寫(xiě)的，在分布式服務(wù)器集群上存儲(chǔ)海量數(shù)據(jù)并運(yùn)行分布式分析應(yīng)用的開(kāi)源框架，其核心部件是HDFS與MapReduce。HDFS是一個(gè)分布式文件系統(tǒng)，類似mogilefs，但又不同于mogilefs，hdfs由存放文件元數(shù)據(jù)信息的namenode和存放數(shù)據(jù)的服務(wù)器datanode組成；hdfs它不同于mogilefs，hdfs把元數(shù)據(jù)信息放在內(nèi)存中，而mogilefs把元數(shù)據(jù)放在數(shù)據(jù)庫(kù)中；而對(duì)于hdfs的元數(shù)據(jù)信息持久化是依靠secondary name node（第二名稱節(jié)點(diǎn)），第二名稱節(jié)點(diǎn)并不是真正扮演名稱節(jié)點(diǎn)角色，它的主要任務(wù)是周期性地將編輯日志合并至名稱空間鏡像文件中以免編輯日志變得過(guò)大；它可以獨(dú)立運(yùn)行在一個(gè)物理主機(jī)上，并需要同名稱節(jié)點(diǎn)同樣大小的內(nèi)存資源來(lái)完成文件合并；另外它還保持一份名稱空間鏡像的副本，以防名稱節(jié)點(diǎn)掛了，丟失數(shù)據(jù)；然而根據(jù)其工作機(jī)制，第二名稱節(jié)點(diǎn)要滯后主節(jié)點(diǎn)，所以當(dāng)主名稱節(jié)點(diǎn)掛掉以后，丟失數(shù)據(jù)是在所難免的；所以snn（secondary name node）保存鏡像副本的主要作用是盡可能的減少數(shù)據(jù)的丟失；MapReduce是一個(gè)計(jì)算框架，這種計(jì)算框架主要有兩個(gè)階段，第一階段是map計(jì)算；第二階段是Reduce計(jì)算；map計(jì)算的作用是把相同key的數(shù)據(jù)始終發(fā)送給同一個(gè)mapper進(jìn)行計(jì)算；reduce就是把mapper計(jì)算的結(jié)果進(jìn)行折疊計(jì)算（我們可以理解為合并），最終得到一個(gè)結(jié)果；在hadoop v1版本是這樣的架構(gòu)，v2就不是了，v2版本中把mapreduce框架拆分yarn框架和mapreduce，其計(jì)算任務(wù)可以跑在yarn框架上；所以hadoop v1核心就是hdfs+mapreduce兩個(gè)集群；v2的架構(gòu)就是hdfs+yarn+mapreduce；

　　HDFS架構(gòu)

　　提示：從上圖架構(gòu)可以看到，客戶端訪問(wèn)hdfs上的某一文件，首先要向namenode請(qǐng)求文件的元數(shù)據(jù)信息，然后nn就會(huì)告訴客戶端，訪問(wèn)的文件在datanode上的位置，然后客戶端再依次向datanode請(qǐng)求對(duì)應(yīng)的數(shù)據(jù)，最后拼接成一個(gè)完整的文件；這里需要注意一個(gè)概念，datanode存放文件數(shù)據(jù)是按照文件大小和塊大小來(lái)切分存放的，什么意思呢？比如一個(gè)文件100M大小，假設(shè)dn（datanode）上的塊大小為10M一塊，那么它存放在dn上是把100M切分為10M一塊，共10塊，然后把這10塊數(shù)據(jù)分別存放在不同的dn上；同時(shí)這些塊分別存放在不同的dn上，還會(huì)分別在不同的dn上存在副本，這樣一來(lái)使得一個(gè)文件的數(shù)據(jù)塊被多個(gè)dn分散冗余的存放；對(duì)于nn節(jié)點(diǎn)，它主要維護(hù)了那個(gè)文件的數(shù)據(jù)存放在那些節(jié)點(diǎn)，和那些dn存放了那些文件的數(shù)據(jù)塊（這個(gè)數(shù)據(jù)是通過(guò)dn周期性的向nn發(fā)送）；我們可以理解為nn內(nèi)部有兩張表分別記錄了那些文件的數(shù)據(jù)塊分別存放在那些dn上（以文件為中心），和那些dn存放了那些文件的數(shù)據(jù)塊（以節(jié)點(diǎn)為中心）；從上面的描述不難想象，當(dāng)nn掛掉以后，整個(gè)存放在hdfs上的文件都將找不到，所以在生產(chǎn)中我們會(huì)借助zk集群（zookeeper）來(lái)對(duì)nn節(jié)點(diǎn)做高可用；對(duì)于hdfs來(lái)講，它本質(zhì)上不是內(nèi)核文件系統(tǒng)，所以它依賴本地Linux文件系統(tǒng)；

　　mapreduce計(jì)算過(guò)程

　　提示：如上圖所示，首先mapreduce會(huì)把給定的數(shù)據(jù)切分為多個(gè)（切分之前通過(guò)程序員寫(xiě)程序?qū)崿F(xiàn)把給定的數(shù)據(jù)切分為多分，并抽取成kv鍵值對(duì)），然后啟動(dòng)多個(gè)mapper對(duì)其進(jìn)行map計(jì)算，多個(gè)mapper計(jì)算后的結(jié)果在通過(guò)combiner進(jìn)行合并（combiner是有程序員編寫(xiě)程序?qū)崿F(xiàn)，主要實(shí)現(xiàn)合并規(guī)則），把相同key的值根據(jù)某種計(jì)算規(guī)則合并在一起，然后把結(jié)果在通過(guò)partitoner（分區(qū)器，這個(gè)分區(qū)器是通過(guò)程序員寫(xiě)程序?qū)崿F(xiàn)，主要實(shí)現(xiàn)對(duì)map后的結(jié)果和對(duì)應(yīng)reducer進(jìn)行關(guān)聯(lián)）分別發(fā)送給不同的reducer進(jìn)行計(jì)算，最終每個(gè)reducer會(huì)產(chǎn)生一個(gè)最終的唯一結(jié)果；簡(jiǎn)單講mapper的作用是讀入kv鍵值對(duì)，輸出新的kv鍵值對(duì)，會(huì)有新的kv產(chǎn)生；combiner的作用是把當(dāng)前mapper生成的新kv鍵值對(duì)進(jìn)行相同key的鍵值對(duì)進(jìn)行合并，至于怎么合并，合并規(guī)則是什么是由程序員定義，所以combiner就是程序員寫(xiě)的程序?qū)崿F(xiàn)，本質(zhì)上combiner是讀入kv鍵值對(duì)，輸出kv鍵值對(duì)，不會(huì)產(chǎn)生新的kv；partitioner的作用就是把combiner合并后的鍵值對(duì)進(jìn)行調(diào)度至reducer，至于怎么調(diào)度，該發(fā)往那個(gè)reducer，以及由幾個(gè)reducer進(jìn)行處理，由程序員定義；最終reducer折疊計(jì)算以后生成新的kv鍵值對(duì)；

　　hadoop v1與v2架構(gòu)

　　提示：在hadoop v1的架構(gòu)中，所有計(jì)算任務(wù)都跑在mapreduce之上，mapreduce就主要擔(dān)任了兩個(gè)角色，第一個(gè)是集群資源管理器和數(shù)據(jù)處理；到了hadoop v2 其架構(gòu)就為hdfs+yarn+一堆任務(wù)，其實(shí)我們可以把一堆任務(wù)理解為v1中的mapreduce，不同于v1中的mapreduce，v2中mapreduce只負(fù)責(zé)數(shù)據(jù)計(jì)算，不在負(fù)責(zé)集群資源管理，集群資源管理由yarn實(shí)現(xiàn)；對(duì)于v2來(lái)講其計(jì)算任務(wù)都跑在了執(zhí)yarn之上；對(duì)于hdfs來(lái)講，v1和v2中的作用都是一樣的，都是起存儲(chǔ)文件作用；

　　hadoop v2 計(jì)算任務(wù)資源調(diào)度過(guò)程

　　提示：rm（resource manager）收到客戶端的任務(wù)請(qǐng)求，此時(shí)rm會(huì)根據(jù)各dn上運(yùn)行的nm(node manager)周期性報(bào)告的狀態(tài)信息來(lái)決定把客戶端的任務(wù)調(diào)度給那個(gè)nm來(lái)執(zhí)行；當(dāng)rm選定好nm后，就把任務(wù)發(fā)送給對(duì)應(yīng)nm，對(duì)應(yīng)nm內(nèi)部會(huì)起一個(gè)appmaster（am）的容器，負(fù)責(zé)本次任務(wù)的主控端，而appmaster需要啟動(dòng)container來(lái)運(yùn)行任務(wù)，它會(huì)向rm請(qǐng)求，然后rm會(huì)根據(jù)am的請(qǐng)求在對(duì)應(yīng)的nm上啟動(dòng)一個(gè)或多個(gè)container；最后各container運(yùn)行后的結(jié)果會(huì)發(fā)送給am，然后再由am返回給rm，rm再返回給客戶端；在這其中rm主要用來(lái)接收個(gè)nm發(fā)送的各節(jié)點(diǎn)狀態(tài)信息和資源調(diào)度以及接收各am計(jì)算任務(wù)后的結(jié)果并反饋給各客戶端；nm主要用來(lái)管理各node上的資源和上報(bào)狀態(tài)信息給rm；am主要用來(lái)管理各任務(wù)的資源申請(qǐng)和各任務(wù)執(zhí)行后端結(jié)果返回給rm；

　　hadoop生態(tài)圈

　　提示：上圖是hadoop v2生態(tài)圈架構(gòu)圖，其中hdfs和yarn是hadoop的核心組件，對(duì)于運(yùn)行在其上的各種任務(wù)都必須依賴hadoop，也必須支持調(diào)用mapreduce接口；

　　二、hadoop集群部署

　　環(huán)境說(shuō)明

名稱	角色	ip
node01	nn,snn,rm	192.168.0.41
node02	dn,nm	192.168.0.42
node03	dn,nm	192.168.0.43
node04	dn,nm	192.168.0.44

　　各節(jié)點(diǎn)同步時(shí)間

　　配置/etc/hosts解析個(gè)節(jié)點(diǎn)主機(jī)名

　　各節(jié)點(diǎn)安裝jdk

yum install -y java-1.8.0-openjdk-devel

　　提示：安裝devel包才會(huì)有jps命令

　　驗(yàn)證jdk是否安裝完成，版本是否正確，確定java命令所在位置

　　添加JAVA_HOME環(huán)境變量

　　驗(yàn)證JAVA_HOME變量配置是否正確

　　創(chuàng)建目錄，用于存放hadoop安裝包

mkdir /bigdata

　　到此基礎(chǔ)環(huán)境就準(zhǔn)備OK，接下來(lái)下載hadoop二進(jìn)制包

[root@node01 ~]# wget https://mirror./apache/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
--2020-09-27 22:50:16--  https://mirror./apache/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
Resolving mirror. (mirror.)... 202.204.80.77, 219.143.204.117, 2001:da8:204:1205::22
Connecting to mirror. (mirror.)|202.204.80.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 366447449 (349M) [application/octet-stream]
Saving to: 'hadoop-2.9.2.tar.gz’

100%[============================================================================>] 366,447,449 1.44MB/s   in 2m 19s 

2020-09-27 22:52:35 (2.51 MB/s) - 'hadoop-2.9.2.tar.gz’ saved [366447449/366447449]

[root@node01 ~]# ls
hadoop-2.9.2.tar.gz
[root@node01 ~]#

　　解壓hadoop-2.9.3.tar.gz到/bigdata/目錄，并將解壓到目錄鏈接至hadoop

　　導(dǎo)出hadoop環(huán)境變量配置

[root@node01 ~]# cat /etc/profile.d/hadoop.sh
export HADOOP_HOME=/bigdata/hadoop
export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
export HADOOP_YARN_HOME=${HADOOP_HOME}
export HADOOP_MAPPERD_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
[root@node01 ~]#

　　創(chuàng)建hadoop用戶，并設(shè)置其密碼為admin

[root@node01 ~]# useradd hadoop
[root@node01 ~]# echo "admin" |passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.
[root@node01 ~]#

　　各節(jié)點(diǎn)間hadoop用戶做免密登錄

[hadoop@node01 ~]$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:6CNhqdagySJXc4iRBVSoLENddO7JLZMCsdjQzqSFnmw hadoop@node01.test.org
The key's randomart image is:
+---[RSA 2048]----+
| o*==o .         |
| o=Bo o          |
|=oX+   .         |
|+E =.oo.+        |
|o.o B.oBS.       |
|.o * =. o        |
|=.+ o o          |
|oo   . .         |
|                 |
+----[SHA256]-----+
[hadoop@node01 ~]$ ssh-copy-id node01
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'node01 (192.168.0.41)' can't be established.
ECDSA key fingerprint is SHA256:lE8/Vyni4z8hsXaa8OMMlDpu3yOIRh6dLcIr+oE57oE.
ECDSA key fingerprint is MD5:14:59:02:30:c0:16:b8:6c:1a:84:c3:0f:a7:ac:67:b3.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@node01's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'node01'"
and check to make sure that only the key(s) you wanted were added.

[hadoop@node01 ~]$ scp -r ./.ssh node02:/home/hadoop/
The authenticity of host 'node02 (192.168.0.42)' can't be established.
ECDSA key fingerprint is SHA256:lE8/Vyni4z8hsXaa8OMMlDpu3yOIRh6dLcIr+oE57oE.
ECDSA key fingerprint is MD5:14:59:02:30:c0:16:b8:6c:1a:84:c3:0f:a7:ac:67:b3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node02,192.168.0.42' (ECDSA) to the list of known hosts.
hadoop@node02's password: 
id_rsa                                                                                  100% 1679   636.9KB/s   00:00    
id_rsa.pub                                                                              100%  404   186.3KB/s   00:00    
known_hosts                                                                             100%  362   153.4KB/s   00:00    
authorized_keys                                                                         100%  404   203.9KB/s   00:00    
[hadoop@node01 ~]$ scp -r ./.ssh node03:/home/hadoop/
The authenticity of host 'node03 (192.168.0.43)' can't be established.
ECDSA key fingerprint is SHA256:lE8/Vyni4z8hsXaa8OMMlDpu3yOIRh6dLcIr+oE57oE.
ECDSA key fingerprint is MD5:14:59:02:30:c0:16:b8:6c:1a:84:c3:0f:a7:ac:67:b3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node03,192.168.0.43' (ECDSA) to the list of known hosts.
hadoop@node03's password:  
id_rsa                                                                                  100% 1679   755.1KB/s   00:00    
id_rsa.pub                                                                              100%  404   165.7KB/s   00:00    
known_hosts                                                                             100%  543   350.9KB/s   00:00    
authorized_keys                                                                         100%  404   330.0KB/s   00:00    
[hadoop@node01 ~]$ scp -r ./.ssh node04:/home/hadoop/
The authenticity of host 'node04 (192.168.0.44)' can't be established.
ECDSA key fingerprint is SHA256:lE8/Vyni4z8hsXaa8OMMlDpu3yOIRh6dLcIr+oE57oE.
ECDSA key fingerprint is MD5:14:59:02:30:c0:16:b8:6c:1a:84:c3:0f:a7:ac:67:b3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node04,192.168.0.44' (ECDSA) to the list of known hosts.
hadoop@node04's password: 
id_rsa                                                                                  100% 1679   707.0KB/s   00:00    
id_rsa.pub                                                                              100%  404   172.8KB/s   00:00    
known_hosts                                                                             100%  724   437.7KB/s   00:00    
authorized_keys                                                                         100%  404   165.2KB/s   00:00    
[hadoop@node01 ~]$

　　驗(yàn)證：用node01去連接node02,node03,node04看看是否是免密登錄了

　　創(chuàng)建數(shù)據(jù)目錄/data/hadoop/hdfs/{nn,snn,dn},并將其屬主屬組更改為hadoop

　　進(jìn)入到hadoop安裝目錄，創(chuàng)建其logs目錄，并將其安裝目錄的屬主和屬組更改為hadoop

　　提示：以上所有步驟都需要在各節(jié)點(diǎn)挨著做一遍；

　　配置hadoop的core-site.xml

　　提示：hadoop的配置文件語(yǔ)法都是xml格式的配置文件，其中<property>和</property>是一對(duì)標(biāo)簽，里面用name標(biāo)簽來(lái)引用配置的選項(xiàng)的key的名稱，其value標(biāo)簽用來(lái)配置對(duì)應(yīng)key的值；上面配置表示配置默認(rèn)的文件系統(tǒng)地址；hdfs://node01:8020是hdfs文件系統(tǒng)訪問(wèn)的地址；

　　完整的配置

[root@node01 hadoop]# cat core-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node01:8020</value>
        <final>true</final>
    </property>
</configuration>
[root@node01 hadoop]#

View Code

　　配置hdfs-site.xml

　　提示：以上配置主要指定hdfs相關(guān)目錄以及訪問(wèn)web端口信息，副本數(shù)量；

　　完整的配置

[root@node01 hadoop]# cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
                    <property>
                        <name>dfs.replication</name>
                        <value>3</value>
                    </property>
                    <property>
                        <name>dfs.namenode.name.dir</name>
                        <value>file:///data/hadoop/hdfs/nn</value>
                    </property>
                    <property>
                         <name>dfs.namenode.secondary.http-address</name>
                         <value>node01:50090</value>
                    </property>
                    <property>
                        <name>dfs.namenode.http-address</name>
                        <value>node01:50070</value>
                    </property>
                    <property>
                        <name>dfs.datanode.data.dir</name>
                        <value>file:///data/hadoop/hdfs/dn</value>
                    </property>
                    <property>
                        <name>fs.checkpoint.dir</name>
                        <value>file:///data/hadoop/hdfs/snn</value>
                    </property>
                    <property>
                        <name>fs.checkpoint.edits.dir</name>
                        <value>file:///data/hadoop/hdfs/snn</value>
                    </property>

</configuration>
[root@node01 hadoop]#

View Code

　　配置mapred-site.xml

　　提示：以上配置主要指定了mapreduce的框架為yarn;默認(rèn)沒(méi)有mapred-site.xml，我們需要將mapred-site.xml.template修改成mapred.site.xml；這里需要注意我上面是通過(guò)復(fù)制修改文件名，當(dāng)然屬主信息都會(huì)變成root，不要忘記把屬組信息修改成hadoop;

　　完整的配置

[root@node01 hadoop]# cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
                    <property>
                        <name>mapreduce.framework.name</name>
                        <value>yarn</value>
                    </property>

</configuration>
[root@node01 hadoop]#

View Code

　　配置yarn-site.xml

　　提示：以上配置主要配置了yarn框架rm和nm相關(guān)地址和指定相關(guān)類；

　　完整的配置

[root@node01 hadoop]# cat yarn-site.xml
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

                    <property>
                        <name>yarn.resourcemanager.address</name>
                        <value>node01:8032</value>
                    </property>
                    <property>
                        <name>yarn.resourcemanager.scheduler.address</name>
                        <value>node01:8030</value>
                    </property>
                    <property>
                        <name>yarn.resourcemanager.resource-tracker.address</name>
                        <value>node01:8031</value>
                    </property>
                    <property>
                        <name>yarn.resourcemanager.admin.address</name>
                        <value>node01:8033</value>
                    </property>
                    <property>
                        <name>yarn.resourcemanager.webapp.address</name>
                        <value>node01:8088</value>
                    </property>
                    <property>
                        <name>yarn.nodemanager.aux-services</name>
                        <value>mapreduce_shuffle</value>
                    </property>
                    <property>
                        <name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
                        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
                    </property>
                    <property>
                        <name>yarn.resourcemanager.scheduler.class</name>
                        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
                    </property>

</configuration>
[root@node01 hadoop]#

View Code

　　配置slave.xml

[root@node01 hadoop]# cat slaves 
node02
node03
node04
[root@node01 hadoop]#

　　復(fù)制各配置文件到其他節(jié)點(diǎn)

　　到此hadoop配置就完成了；

　　接下來(lái)切換到hadoop用戶下，初始化hdfs

hdfs namenode -format

　　提示：如果執(zhí)行hdfs namenode -format 出現(xiàn)紅框中的提示，說(shuō)明hdfs格式化就成功了；

　　啟動(dòng)hdfs集群

　　提示：hdfs主要由namenode、secondarynamenode和datanode組成，只要看到對(duì)應(yīng)節(jié)點(diǎn)上的進(jìn)程啟動(dòng)起來(lái)，就沒(méi)有多大問(wèn)題；

　　到此hdfs集群就正常啟動(dòng)了

　　驗(yàn)證：把/etc/passwd上傳到hdfs的/test目錄下，看看是否可以正常上傳？

　　提示：可以看到/etc/passwd文件已經(jīng)上傳至hdfs的/test目錄下了；

　　驗(yàn)證：查看hdfs /test目錄下passwd文件，看看是否同/etc/passwd文件內(nèi)容相同？

　　提示：可以看到hdfs上的/test/passwd文件內(nèi)容同/etc/passwd文件內(nèi)容相同；

　　驗(yàn)證：在dn節(jié)點(diǎn)查看對(duì)應(yīng)目錄下的文件內(nèi)容，看看是否同/etc/passwd文件內(nèi)容相同？

[root@node02 ~]# tree /data
/data
└── hadoop
    └── hdfs
        ├── dn
        │   ├── current
        │   │   ├── BP-157891879-192.168.0.41-1601224158145
        │   │   │   ├── current
        │   │   │   │   ├── finalized
        │   │   │   │   │   └── subdir0
        │   │   │   │   │       └── subdir0
        │   │   │   │   │           ├── blk_1073741825
        │   │   │   │   │           └── blk_1073741825_1001.meta
        │   │   │   │   ├── rbw
        │   │   │   │   └── VERSION
        │   │   │   ├── scanner.cursor
        │   │   │   └── tmp
        │   │   └── VERSION
        │   └── in_use.lock
        ├── nn
        └── snn

13 directories, 6 files
[root@node02 ~]# cat /data/hadoop/hdfs/dn/current/BP-157891879-192.168.0.41-1601224158145/
current/        scanner.cursor  tmp/            
[root@node02 ~]# cat /data/hadoop/hdfs/dn/current/BP-157891879-192.168.0.41-1601224158145/current/finalized/subdir0/subdir0/blk_1073741825
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:997:User for polkitd:/:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
chrony:x:998:996::/var/lib/chrony:/sbin/nologin
hadoop:x:1000:1000::/home/hadoop:/bin/bash
[root@node02 ~]#

　　提示：可以看到在dn節(jié)點(diǎn)上的dn目錄下能夠找到我們上傳的passwd文件；

　　驗(yàn)證：查看其它節(jié)點(diǎn)是否有相同的文件？是否有我們指定數(shù)量的副本？

　　提示：在node03和node04上也有相同的目錄和文件；說(shuō)明我們?cè)O(shè)置的副本數(shù)量為3生效了；

　　啟動(dòng)yarn集群

　　提示：可以看到對(duì)應(yīng)節(jié)點(diǎn)上的nm啟動(dòng)了；主節(jié)點(diǎn)上的rm也正常啟動(dòng)了；

　　訪問(wèn)nn的50070和8088，看看對(duì)應(yīng)的web地址是否能夠訪問(wèn)到頁(yè)面？

　　提示：這個(gè)地址是hdfs的web地址，在這個(gè)界面可以看到hdfs的存儲(chǔ)狀況，以及對(duì)hdfs上的文件做操作；

　　提示：8088是yarn集群的管理地址；在這個(gè)界面上能夠看到運(yùn)行的計(jì)算任務(wù)的狀態(tài)信息，集群配置信息，日志等等；

　　驗(yàn)證：在yarn上跑一個(gè)計(jì)算任務(wù)，統(tǒng)計(jì)/test/passwd文件的單詞數(shù)量，看看對(duì)應(yīng)的計(jì)算任務(wù)是否能夠跑起來(lái)？

[hadoop@node01 hadoop]$ yarn jar /bigdata/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar   
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[hadoop@node01 hadoop]$ yarn jar /bigdata/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount
Usage: wordcount <in> [<in>...] <out>
[hadoop@node01 hadoop]$ yarn jar /bigdata/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/passwd /test/passwd-word-count20/09/28 00:58:01 INFO client.RMProxy: Connecting to ResourceManager at node01/192.168.0.41:8032
20/09/28 00:58:01 INFO input.FileInputFormat: Total input files to process : 1
20/09/28 00:58:01 INFO mapreduce.JobSubmitter: number of splits:1
20/09/28 00:58:01 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
20/09/28 00:58:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1601224871685_0001
20/09/28 00:58:02 INFO impl.YarnClientImpl: Submitted application application_1601224871685_0001
20/09/28 00:58:02 INFO mapreduce.Job: The url to track the job: http://node01:8088/proxy/application_1601224871685_0001/
20/09/28 00:58:02 INFO mapreduce.Job: Running job: job_1601224871685_0001
20/09/28 00:58:08 INFO mapreduce.Job: Job job_1601224871685_0001 running in uber mode : false
20/09/28 00:58:08 INFO mapreduce.Job:  map 0% reduce 0%
20/09/28 00:58:14 INFO mapreduce.Job:  map 100% reduce 0%
20/09/28 00:58:20 INFO mapreduce.Job:  map 100% reduce 100%
20/09/28 00:58:20 INFO mapreduce.Job: Job job_1601224871685_0001 completed successfully
20/09/28 00:58:20 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=1144
                FILE: Number of bytes written=399079
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1053
                HDFS: Number of bytes written=1018
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=2753
                Total time spent by all reduces in occupied slots (ms)=2779
                Total time spent by all map tasks (ms)=2753
                Total time spent by all reduce tasks (ms)=2779
                Total vcore-milliseconds taken by all map tasks=2753
                Total vcore-milliseconds taken by all reduce tasks=2779
                Total megabyte-milliseconds taken by all map tasks=2819072
                Total megabyte-milliseconds taken by all reduce tasks=2845696
        Map-Reduce Framework
                Map input records=22
                Map output records=30
                Map output bytes=1078
                Map output materialized bytes=1144
                Input split bytes=95
                Combine input records=30
                Combine output records=30
                Reduce input groups=30
                Reduce shuffle bytes=1144
                Reduce input records=30
                Reduce output records=30
                Spilled Records=60
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=87
                CPU time spent (ms)=620
                Physical memory (bytes) snapshot=444997632
                Virtual memory (bytes) snapshot=4242403328
                Total committed heap usage (bytes)=285212672
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=958
        File Output Format Counters 
                Bytes Written=1018
[hadoop@node01 hadoop]$

　　查看計(jì)算后生成的報(bào)告

[hadoop@node01 hadoop]$ hdfs dfs -ls -R /test
-rw-r--r--   3 hadoop supergroup        958 2020-09-28 00:32 /test/passwd
drwxr-xr-x   - hadoop supergroup          0 2020-09-28 00:58 /test/passwd-word-count
-rw-r--r--   3 hadoop supergroup          0 2020-09-28 00:58 /test/passwd-word-count/_SUCCESS
-rw-r--r--   3 hadoop supergroup       1018 2020-09-28 00:58 /test/passwd-word-count/part-r-00000
[hadoop@node01 hadoop]$ hdfs dfs -cat /test/passwd-word-count/part-r-00000
Management:/:/sbin/nologin      1
Network 1
SSH:/var/empty/sshd:/sbin/nologin       1
User:/var/ftp:/sbin/nologin     1
adm:x:3:4:adm:/var/adm:/sbin/nologin    1
bin:x:1:1:bin:/bin:/sbin/nologin        1
bus:/:/sbin/nologin     1
chrony:x:998:996::/var/lib/chrony:/sbin/nologin 1
daemon:x:2:2:daemon:/sbin:/sbin/nologin 1
dbus:x:81:81:System     1
for     1
ftp:x:14:50:FTP 1
games:x:12:100:games:/usr/games:/sbin/nologin   1
hadoop:x:1000:1000::/home/hadoop:/bin/bash      1
halt:x:7:0:halt:/sbin:/sbin/halt        1
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin        1
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin  1
message 1
nobody:x:99:99:Nobody:/:/sbin/nologin   1
ntp:x:38:38::/etc/ntp:/sbin/nologin     1
operator:x:11:0:operator:/root:/sbin/nologin    1
polkitd:/:/sbin/nologin 1
polkitd:x:999:997:User  1
postfix:x:89:89::/var/spool/postfix:/sbin/nologin       1
root:x:0:0:root:/root:/bin/bash 1
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown    1
sshd:x:74:74:Privilege-separated        1
sync:x:5:0:sync:/sbin:/bin/sync 1
systemd-network:x:192:192:systemd       1
tcpdump:x:72:72::/:/sbin/nologin        1
[hadoop@node01 hadoop]$

　　在8088頁(yè)面上查看任務(wù)的狀態(tài)信息

　　到此hadoop v2集群就搭建完畢了；

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自：路人甲Java > 《待分類》

舉報(bào)/認(rèn)領(lǐng)