目錄
正文 Hadoop HA 原理概述為什么會(huì)有 hadoop HA 機(jī)制呢?HA:High Available,高可用 在Hadoop 2.0之前,在HDFS 集群中NameNode 存在單點(diǎn)故障 (SPOF:A Single Point of Failure)。 對(duì)于只有一個(gè) NameNode 的集群,如果 NameNode 機(jī)器出現(xiàn)故障(比如宕機(jī)或是軟件、硬件 升級(jí)),那么整個(gè)集群將無(wú)法使用,直到 NameNode 重新啟動(dòng) 那如何解決呢?HDFS 的 HA 功能通過(guò)配置 Active/Standby 兩個(gè) NameNodes 實(shí)現(xiàn)在集群中對(duì) NameNode 的 熱備來(lái)解決上述問(wèn)題。如果出現(xiàn)故障,如機(jī)器崩潰或機(jī)器需要升級(jí)維護(hù),這時(shí)可通過(guò)此種方 式將 NameNode 很快的切換到另外一臺(tái)機(jī)器。 在一個(gè)典型的 HDFS(HA) 集群中,使用兩臺(tái)單獨(dú)的機(jī)器配置為 NameNodes 。在任何時(shí)間點(diǎn), 確保 NameNodes 中只有一個(gè)處于 Active 狀態(tài),其他的處在 Standby 狀態(tài)。其中 ActiveNameNode 負(fù)責(zé)集群中的所有客戶端操作,StandbyNameNode 僅僅充當(dāng)備機(jī),保證一 旦 ActiveNameNode 出現(xiàn)問(wèn)題能夠快速切換。 為了能夠?qū)崟r(shí)同步 Active 和 Standby 兩個(gè) NameNode 的元數(shù)據(jù)信息(實(shí)際上 editlog),需提 供一個(gè)共享存儲(chǔ)系統(tǒng),可以是 NFS、QJM(Quorum Journal Manager)或者 Zookeeper,Active Namenode 將數(shù)據(jù)寫(xiě)入共享存儲(chǔ)系統(tǒng),而 Standby 監(jiān)聽(tīng)該系統(tǒng),一旦發(fā)現(xiàn)有新數(shù)據(jù)寫(xiě)入,則 讀取這些數(shù)據(jù),并加載到自己內(nèi)存中,以保證自己內(nèi)存狀態(tài)與 Active NameNode 保持基本一 致,如此這般,在緊急情況下 standby 便可快速切為 active namenode。為了實(shí)現(xiàn)快速切換, Standby 節(jié)點(diǎn)獲取集群的最新文件塊信息也是很有必要的。為了實(shí)現(xiàn)這一目標(biāo),DataNode 需 要配置 NameNodes 的位置,并同時(shí)給他們發(fā)送文件塊信息以及心跳檢測(cè)。 集群規(guī)劃描述:hadoop HA 集群的搭建依賴(lài)于 zookeeper,所以選取三臺(tái)當(dāng)做 zookeeper 集群 ,總共準(zhǔn)備了四臺(tái)主機(jī),分別是 hadoop1,hadoop2,hadoop3,hadoop4 其中 hadoop1 和 hadoop2 做 namenode 的主備切換,hadoop3 和 hadoop4 做 resourcemanager 的主備切換 四臺(tái)機(jī)器 集群服務(wù)器準(zhǔn)備1、 修改主機(jī)名 2、 修改 IP 地址 3、 添加主機(jī)名和 IP 映射 4、 添加普通用戶 hadoop 用戶并配置 sudoer 權(quán)限 5、 設(shè)置系統(tǒng)啟動(dòng)級(jí)別 6、 關(guān)閉防火墻/關(guān)閉 Selinux 7、 安裝 JDK 兩種準(zhǔn)備方式: 1、 每個(gè)節(jié)點(diǎn)都單獨(dú)設(shè)置,這樣比較麻煩。線上環(huán)境可以編寫(xiě)腳本實(shí)現(xiàn) 2、 虛擬機(jī)環(huán)境可是在做完以上 7 步之后,就進(jìn)行克隆 3、 然后接著再給你的集群配置 SSH 免密登陸和搭建時(shí)間同步服務(wù) 8、 配置 SSH 免密登錄 9、 同步服務(wù)器時(shí)間 具體操作可以參考普通分布式搭建過(guò)程http://www.cnblogs.com/qingyunzong/p/8496127.html 集群安裝1、安裝 Zookeeper 集群具體安裝步驟參考之前的文檔http://www.cnblogs.com/qingyunzong/p/8619184.html 2、安裝 hadoop 集群(1)獲取安裝包 從官網(wǎng)或是鏡像站下載 (2)上傳解壓縮 [hadoop@hadoop1 ~]$ ls apps hadoop-2.7.5-centos-6.7.tar.gz movie2.jar users.dat zookeeper.out data log output2 zookeeper-3.4.10.tar.gz [hadoop@hadoop1 ~]$ tar -zxvf hadoop-2.7.5-centos-6.7.tar.gz -C apps/ (3)修改配置文件 配置文件目錄:/home/hadoop/apps/hadoop-2.7.5/etc/hadoop 修改 hadoop-env.sh文件 [hadoop@hadoop1 ~]$ cd apps/hadoop-2.7.5/etc/hadoop/ [hadoop@hadoop1 hadoop]$ echo $JAVA_HOME /usr/local/jdk1.8.0_73 [hadoop@hadoop1 hadoop]$ vi hadoop-env.sh 修改core-site.xml [hadoop@hadoop1 hadoop]$ vi core-site.xml
1 <configuration> 2 <!-- 指定hdfs的nameservice為myha01 --> 3 <property> 4 <name>fs.defaultFS</name> 5 <value>hdfs://myha01/</value> 6 </property> 7 8 <!-- 指定hadoop臨時(shí)目錄 --> 9 <property> 10 <name>hadoop.tmp.dir</name> 11 <value>/home/hadoop/data/hadoopdata/</value> 12 </property> 13 14 <!-- 指定zookeeper地址 --> 15 <property> 16 <name>ha.zookeeper.quorum</name> 17 <value>hadoop1:2181,hadoop2:2181,hadoop3:2181,hadoop4:2181</value> 18 </property> 19 20 <!-- hadoop鏈接zookeeper的超時(shí)時(shí)長(zhǎng)設(shè)置 --> 21 <property> 22 <name>ha.zookeeper.session-timeout.ms</name> 23 <value>1000</value> 24 <description>ms</description> 25 </property> 26 </configuration> 修改hdfs-site.xml [hadoop@hadoop1 hadoop]$ vi hdfs-site.xml
1 <configuration> 2 3 <!-- 指定副本數(shù) --> 4 <property> 5 <name>dfs.replication</name> 6 <value>2</value> 7 </property> 8 9 <!-- 配置namenode和datanode的工作目錄-數(shù)據(jù)存儲(chǔ)目錄 --> 10 <property> 11 <name>dfs.namenode.name.dir</name> 12 <value>/home/hadoop/data/hadoopdata/dfs/name</value> 13 </property> 14 <property> 15 <name>dfs.datanode.data.dir</name> 16 <value>/home/hadoop/data/hadoopdata/dfs/data</value> 17 </property> 18 19 <!-- 啟用webhdfs --> 20 <property> 21 <name>dfs.webhdfs.enabled</name> 22 <value>true</value> 23 </property> 24 25 <!--指定hdfs的nameservice為myha01,需要和core-site.xml中的保持一致 26 dfs.ha.namenodes.[nameservice id]為在nameservice中的每一個(gè)NameNode設(shè)置唯一標(biāo)示符。 27 配置一個(gè)逗號(hào)分隔的NameNode ID列表。這將是被DataNode識(shí)別為所有的NameNode。 28 例如,如果使用"myha01"作為nameservice ID,并且使用"nn1"和"nn2"作為NameNodes標(biāo)示符 29 --> 30 <property> 31 <name>dfs.nameservices</name> 32 <value>myha01</value> 33 </property> 34 35 <!-- myha01下面有兩個(gè)NameNode,分別是nn1,nn2 --> 36 <property> 37 <name>dfs.ha.namenodes.myha01</name> 38 <value>nn1,nn2</value> 39 </property> 40 41 <!-- nn1的RPC通信地址 --> 42 <property> 43 <name>dfs.namenode.rpc-address.myha01.nn1</name> 44 <value>hadoop1:9000</value> 45 </property> 46 47 <!-- nn1的http通信地址 --> 48 <property> 49 <name>dfs.namenode.http-address.myha01.nn1</name> 50 <value>hadoop1:50070</value> 51 </property> 52 53 <!-- nn2的RPC通信地址 --> 54 <property> 55 <name>dfs.namenode.rpc-address.myha01.nn2</name> 56 <value>hadoop2:9000</value> 57 </property> 58 59 <!-- nn2的http通信地址 --> 60 <property> 61 <name>dfs.namenode.http-address.myha01.nn2</name> 62 <value>hadoop2:50070</value> 63 </property> 64 65 <!-- 指定NameNode的edits元數(shù)據(jù)的共享存儲(chǔ)位置。也就是JournalNode列表 66 該url的配置格式:qjournal://host1:port1;host2:port2;host3:port3/journalId 67 journalId推薦使用nameservice,默認(rèn)端口號(hào)是:8485 --> 68 <property> 69 <name>dfs.namenode.shared.edits.dir</name> 70 <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/myha01</value> 71 </property> 72 73 <!-- 指定JournalNode在本地磁盤(pán)存放數(shù)據(jù)的位置 --> 74 <property> 75 <name>dfs.journalnode.edits.dir</name> 76 <value>/home/hadoop/data/journaldata</value> 77 </property> 78 79 <!-- 開(kāi)啟NameNode失敗自動(dòng)切換 --> 80 <property> 81 <name>dfs.ha.automatic-failover.enabled</name> 82 <value>true</value> 83 </property> 84 85 <!-- 配置失敗自動(dòng)切換實(shí)現(xiàn)方式 --> 86 <property> 87 <name>dfs.client.failover.proxy.provider.myha01</name> 88 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 89 </property> 90 91 <!-- 配置隔離機(jī)制方法,多個(gè)機(jī)制用換行分割,即每個(gè)機(jī)制暫用一行 --> 92 <property> 93 <name>dfs.ha.fencing.methods</name> 94 <value> 95 sshfence 96 shell(/bin/true) 97 </value> 98 </property> 99 100 <!-- 使用sshfence隔離機(jī)制時(shí)需要ssh免登陸 --> 101 <property> 102 <name>dfs.ha.fencing.ssh.private-key-files</name> 103 <value>/home/hadoop/.ssh/id_rsa</value> 104 </property> 105 106 <!-- 配置sshfence隔離機(jī)制超時(shí)時(shí)間 --> 107 <property> 108 <name>dfs.ha.fencing.ssh.connect-timeout</name> 109 <value>30000</value> 110 </property> 111 112 <property> 113 <name>ha.failover-controller.cli-check.rpc-timeout.ms</name> 114 <value>60000</value> 115 </property> 116 </configuration> 修改mapred-site.xml [hadoop@hadoop1 hadoop]$ cp mapred-site.xml.template mapred-site.xml [hadoop@hadoop1 hadoop]$ vi mapred-site.xml 1 <configuration> 2 <!-- 指定mr框架為yarn方式 --> 3 <property> 4 <name>mapreduce.framework.name</name> 5 <value>yarn</value> 6 </property> 7 8 <!-- 指定mapreduce jobhistory地址 --> 9 <property> 10 <name>mapreduce.jobhistory.address</name> 11 <value>hadoop1:10020</value> 12 </property> 13 14 <!-- 任務(wù)歷史服務(wù)器的web地址 --> 15 <property> 16 <name>mapreduce.jobhistory.webapp.address</name> 17 <value>hadoop1:19888</value> 18 </property> 19 </configuration> 修改yarn-site.xml [hadoop@hadoop1 hadoop]$ vi yarn-site.xml
1 <configuration> 2 <!-- 開(kāi)啟RM高可用 --> 3 <property> 4 <name>yarn.resourcemanager.ha.enabled</name> 5 <value>true</value> 6 </property> 7 8 <!-- 指定RM的cluster id --> 9 <property> 10 <name>yarn.resourcemanager.cluster-id</name> 11 <value>yrc</value> 12 </property> 13 14 <!-- 指定RM的名字 --> 15 <property> 16 <name>yarn.resourcemanager.ha.rm-ids</name> 17 <value>rm1,rm2</value> 18 </property> 19 20 <!-- 分別指定RM的地址 --> 21 <property> 22 <name>yarn.resourcemanager.hostname.rm1</name> 23 <value>hadoop3</value> 24 </property> 25 26 <property> 27 <name>yarn.resourcemanager.hostname.rm2</name> 28 <value>hadoop4</value> 29 </property> 30 31 <!-- 指定zk集群地址 --> 32 <property> 33 <name>yarn.resourcemanager.zk-address</name> 34 <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value> 35 </property> 36 37 <property> 38 <name>yarn.nodemanager.aux-services</name> 39 <value>mapreduce_shuffle</value> 40 </property> 41 42 <property> 43 <name>yarn.log-aggregation-enable</name> 44 <value>true</value> 45 </property> 46 47 <property> 48 <name>yarn.log-aggregation.retain-seconds</name> 49 <value>86400</value> 50 </property> 51 52 <!-- 啟用自動(dòng)恢復(fù) --> 53 <property> 54 <name>yarn.resourcemanager.recovery.enabled</name> 55 <value>true</value> 56 </property> 57 58 <!-- 制定resourcemanager的狀態(tài)信息存儲(chǔ)在zookeeper集群上 --> 59 <property> 60 <name>yarn.resourcemanager.store.class</name> 61 <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> 62 </property> 63 </configuration> 修改slaves [hadoop@hadoop1 hadoop]$ vi slaves
hadoop1 hadoop2 hadoop3 hadoop4 (4)將hadoop安裝包分發(fā)到其他集群節(jié)點(diǎn) 重點(diǎn)強(qiáng)調(diào): 每臺(tái)服務(wù)器中的hadoop安裝包的目錄必須一致, 安裝包的配置信息還必須保持一致 [hadoop@hadoop1 apps]$ scp -r hadoop-2.7.5/ hadoop2:$PWD [hadoop@hadoop1 apps]$ scp -r hadoop-2.7.5/ hadoop3:$PWD [hadoop@hadoop1 apps]$ scp -r hadoop-2.7.5/ hadoop4:$PWD (5)配置Hadoop環(huán)境變量 千萬(wàn)注意: 1、如果你使用root用戶進(jìn)行安裝。 vi /etc/profile 即可 系統(tǒng)變量 2、如果你使用普通用戶進(jìn)行安裝。 vi ~/.bashrc 用戶變量 本人是用的hadoop用戶安裝的 [hadoop@hadoop1 ~]$ vi .bashrc
export HADOOP_HOME=/home/hadoop/apps/hadoop-2.7.5 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin: 使環(huán)境變量生效 [hadoop@hadoop1 bin]$ source ~/.bashrc
(6)查看hadoop版本 [hadoop@hadoop4 ~]$ hadoop version Hadoop 2.7.5 Subversion Unknown -r Unknown Compiled by root on 2017-12-24T05:30Z Compiled with protoc 2.5.0 From source with checksum 9f118f95f47043332d51891e37f736e9 This command was run using /home/hadoop/apps/hadoop-2.7.5/share/hadoop/common/hadoop-common-2.7.5.jar [hadoop@hadoop4 ~]$ Hadoop HA集群的初始化重點(diǎn)強(qiáng)調(diào):一定要按照以下步驟逐步進(jìn)行操作 重點(diǎn)強(qiáng)調(diào):一定要按照以下步驟逐步進(jìn)行操作 重點(diǎn)強(qiáng)調(diào):一定要按照以下步驟逐步進(jìn)行操作 1、啟動(dòng)ZooKeeper啟動(dòng)4臺(tái)服務(wù)器上的zookeeper服務(wù) hadoop1 [hadoop@hadoop1 conf]$ zkServer.sh start ZooKeeper JMX enabled by default Using config: /home/hadoop/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg Starting zookeeper ... STARTED [hadoop@hadoop1 conf]$ jps 2674 Jps 2647 QuorumPeerMain [hadoop@hadoop1 conf]$ zkServer.sh status ZooKeeper JMX enabled by default Using config: /home/hadoop/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: follower [hadoop@hadoop1 conf]$ hadoop2 [hadoop@hadoop2 conf]$ zkServer.sh start ZooKeeper JMX enabled by default Using config: /home/hadoop/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg Starting zookeeper ... STARTED [hadoop@hadoop2 conf]$ jps 2592 QuorumPeerMain 2619 Jps [hadoop@hadoop2 conf]$ zkServer.sh status ZooKeeper JMX enabled by default Using config: /home/hadoop/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: follower [hadoop@hadoop2 conf]$ hadoop3 [hadoop@hadoop3 conf]$ zkServer.sh start ZooKeeper JMX enabled by default Using config: /home/hadoop/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg Starting zookeeper ... STARTED [hadoop@hadoop3 conf]$ jps 16612 QuorumPeerMain 16647 Jps [hadoop@hadoop3 conf]$ zkServer.sh status ZooKeeper JMX enabled by default Using config: /home/hadoop/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: leader [hadoop@hadoop3 conf]$ hadoop4 [hadoop@hadoop4 conf]$ zkServer.sh start ZooKeeper JMX enabled by default Using config: /home/hadoop/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg Starting zookeeper ... STARTED [hadoop@hadoop4 conf]$ jps 3596 Jps 3567 QuorumPeerMain [hadoop@hadoop4 conf]$ zkServer.sh status ZooKeeper JMX enabled by default Using config: /home/hadoop/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: observer [hadoop@hadoop4 conf]$ 2、在你配置的各個(gè)journalnode節(jié)點(diǎn)啟動(dòng)該進(jìn)程按照之前的規(guī)劃,我的是在hadoop1、hadoop2、hadoop3上進(jìn)行啟動(dòng),啟動(dòng)命令如下 hadoop1 [hadoop@hadoop1 conf]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-journalnode-hadoop1.out [hadoop@hadoop1 conf]$ jps 2739 JournalNode 2788 Jps 2647 QuorumPeerMain [hadoop@hadoop1 conf]$ hadoop2 [hadoop@hadoop2 conf]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-journalnode-hadoop2.out [hadoop@hadoop2 conf]$ jps 2592 QuorumPeerMain 3049 JournalNode 3102 Jps [hadoop@hadoop2 conf]$ hadoop3 [hadoop@hadoop3 conf]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-journalnode-hadoop3.out [hadoop@hadoop3 conf]$ jps 16612 QuorumPeerMain 16712 JournalNode 16766 Jps [hadoop@hadoop3 conf]$ 3、格式化namenode先選取一個(gè)namenode(hadoop1)節(jié)點(diǎn)進(jìn)行格式化 [hadoop@hadoop1 ~]$ hadoop namenode -format
![]() 4、要把在hadoop1節(jié)點(diǎn)上生成的元數(shù)據(jù) 給復(fù)制到 另一個(gè)namenode(hadoop2)節(jié)點(diǎn)上[hadoop@hadoop1 ~]$ cd data/ 5、格式化zkfc重點(diǎn)強(qiáng)調(diào):只能在nameonde節(jié)點(diǎn)進(jìn)行 重點(diǎn)強(qiáng)調(diào):只能在nameonde節(jié)點(diǎn)進(jìn)行 重點(diǎn)強(qiáng)調(diào):只能在nameonde節(jié)點(diǎn)進(jìn)行 [hadoop@hadoop1 data]$ hdfs zkfc -formatZK
![]() 啟動(dòng)集群1、啟動(dòng)HDFS可以從啟動(dòng)輸出日志里面看到啟動(dòng)了哪些進(jìn)程 [hadoop@hadoop1 ~]$ start-dfs.sh Starting namenodes on [hadoop1 hadoop2] hadoop2: starting namenode, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-namenode-hadoop2.out hadoop1: starting namenode, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-namenode-hadoop1.out hadoop3: starting datanode, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-datanode-hadoop3.out hadoop4: starting datanode, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-datanode-hadoop4.out hadoop2: starting datanode, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-datanode-hadoop2.out hadoop1: starting datanode, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-datanode-hadoop1.out Starting journal nodes [hadoop1 hadoop2 hadoop3] hadoop3: journalnode running as process 16712. Stop it first. hadoop2: journalnode running as process 3049. Stop it first. hadoop1: journalnode running as process 2739. Stop it first. Starting ZK Failover Controllers on NN hosts [hadoop1 hadoop2] hadoop2: starting zkfc, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-zkfc-hadoop2.out hadoop1: starting zkfc, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-zkfc-hadoop1.out [hadoop@hadoop1 ~]$ 查看各節(jié)點(diǎn)進(jìn)程是否正常 hadoop1 hadoop2 hadoop3 hadoop4 2、啟動(dòng)YARN在主備 resourcemanager 中隨便選擇一臺(tái)進(jìn)行啟動(dòng) [hadoop@hadoop4 ~]$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/apps/hadoop-2.7.5/logs/yarn-hadoop-resourcemanager-hadoop4.out hadoop3: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.7.5/logs/yarn-hadoop-nodemanager-hadoop3.out hadoop2: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.7.5/logs/yarn-hadoop-nodemanager-hadoop2.out hadoop4: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.7.5/logs/yarn-hadoop-nodemanager-hadoop4.out hadoop1: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.7.5/logs/yarn-hadoop-nodemanager-hadoop1.out [hadoop@hadoop4 ~]$ 正常啟動(dòng)之后,檢查各節(jié)點(diǎn)的進(jìn)程 hadoop1 hadoop2 hadoop3 hadoop4 若備用節(jié)點(diǎn)的 resourcemanager 沒(méi)有啟動(dòng)起來(lái),則手動(dòng)啟動(dòng)起來(lái),在hadoop3上進(jìn)行手動(dòng)啟動(dòng) [hadoop@hadoop3 ~]$ yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /home/hadoop/apps/hadoop-2.7.5/logs/yarn-hadoop-resourcemanager-hadoop3.out [hadoop@hadoop3 ~]$ jps 17492 ResourceManager 16612 QuorumPeerMain 16712 JournalNode 17532 Jps 17356 NodeManager 16830 DataNode [hadoop@hadoop3 ~]$
3、啟動(dòng) mapreduce 任務(wù)歷史服務(wù)器[hadoop@hadoop1 ~]$ mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /home/hadoop/apps/hadoop-2.7.5/logs/mapred-hadoop-historyserver-hadoop1.out [hadoop@hadoop1 ~]$ jps 4016 NodeManager 2739 JournalNode 4259 Jps 3844 DFSZKFailoverController 2647 QuorumPeerMain 3546 DataNode 4221 JobHistoryServer 3407 NameNode [hadoop@hadoop1 ~]$ 4、查看各主節(jié)點(diǎn)的狀態(tài)HDFS [hadoop@hadoop1 ~]$ hdfs haadmin -getServiceState nn1 standby [hadoop@hadoop1 ~]$ hdfs haadmin -getServiceState nn2 active [hadoop@hadoop1 ~]$ YARN [hadoop@hadoop4 ~]$ yarn rmadmin -getServiceState rm1 standby [hadoop@hadoop4 ~]$ yarn rmadmin -getServiceState rm2 active [hadoop@hadoop4 ~]$ 5、WEB界面進(jìn)行查看 HDFS hadoop1 hadoop2 YARN standby節(jié)點(diǎn)會(huì)自動(dòng)跳到avtive節(jié)點(diǎn) MapReduce歷史服務(wù)器web界面
集群性能測(cè)試1、干掉 active namenode, 看看集群有什么變化目前hadoop2上的namenode節(jié)點(diǎn)是active狀態(tài),干掉他的進(jìn)程看看hadoop1上的standby狀態(tài)的namenode能否自動(dòng)切換成active狀態(tài) [hadoop@hadoop2 ~]$ jps 4032 QuorumPeerMain 4400 DFSZKFailoverController 4546 NodeManager 4198 DataNode 4745 Jps 4122 NameNode 4298 JournalNode [hadoop@hadoop2 ~]$ kill -9 4122 hadoop2 hadoop1
自動(dòng)切換成功 2、在上傳文件的時(shí)候干掉 active namenode, 看看有什么變化首先將hadoop2上的namenode節(jié)點(diǎn)手動(dòng)啟動(dòng)起來(lái) [hadoop@hadoop2 ~]$ hadoop-daemon.sh start namenode starting namenode, logging to /home/hadoop/apps/hadoop-2.7.5/logs/hadoop-hadoop-namenode-hadoop2.out [hadoop@hadoop2 ~]$ jps 4032 QuorumPeerMain 4400 DFSZKFailoverController 4546 NodeManager 4198 DataNode 4823 NameNode 4298 JournalNode 4908 Jps [hadoop@hadoop2 ~]$ 找一個(gè)比較大的文件,進(jìn)行文件上傳操作,5秒鐘的時(shí)候干掉active狀態(tài)的namenode,看看文件是否能上傳成功 hadoop2進(jìn)行上傳 [hadoop@hadoop2 ~]$ ll 總用量 194368 drwxrwxr-x 4 hadoop hadoop 4096 3月 23 19:48 apps drwxrwxr-x 5 hadoop hadoop 4096 3月 23 20:38 data -rw-rw-r-- 1 hadoop hadoop 199007110 3月 24 09:51 hadoop-2.7.5-centos-6.7.tar.gz drwxrwxr-x 3 hadoop hadoop 4096 3月 21 19:47 log -rw-rw-r-- 1 hadoop hadoop 9935 3月 24 09:48 zookeeper.out [hadoop@hadoop2 ~]$ hadoop fs -put hadoop-2.7.5-centos-6.7.tar.gz /hadoop/ hadoop1準(zhǔn)備隨時(shí)干掉namenode [hadoop@hadoop1 ~]$ jps 4128 DataNode 4498 DFSZKFailoverController 3844 QuorumPeerMain 4327 JournalNode 5095 Jps 4632 NodeManager 4814 JobHistoryServer 4015 NameNode [hadoop@hadoop1 ~]$ kill -9 4015 hadoop2上的信息,在干掉hadoop1上namenode進(jìn)程的時(shí)候,hadoop2報(bào)錯(cuò) ![]() 在HDFS系統(tǒng)或web界面查看是否上傳成功 命令查看 [hadoop@hadoop1 ~]$ hadoop fs -ls /hadoop/ Found 1 items -rw-r--r-- 2 hadoop supergroup 199007110 2018-03-24 09:54 /hadoop/hadoop-2.7.5-centos-6.7.tar.gz [hadoop@hadoop1 ~]$ web界面下載 發(fā)現(xiàn)HDFS系統(tǒng)的文件大小和我們要上傳的文件大小一致,均為199007110,說(shuō)明在上傳過(guò)程中干掉active狀態(tài)的namenode,我們?nèi)钥梢陨蟼鞒晒Γ琀A起作用了 3、干掉 active resourcemanager, 看看集群有什么變化目前hadoop4上的resourcemanager是活動(dòng)的,干掉他的進(jìn)程觀察情況 [hadoop@hadoop4 ~]$ jps 3248 ResourceManager 3028 QuorumPeerMain 3787 Jps 3118 DataNode 3358 NodeManager [hadoop@hadoop4 ~]$ kill -9 3248 發(fā)現(xiàn)hadoop4的web界面打不開(kāi)了 打開(kāi)hadoop3上YARN的web界面查看,發(fā)現(xiàn)hadoop3上的resourcemanager變?yōu)閍ctive狀態(tài) 4、在執(zhí)行任務(wù)的時(shí)候干掉 active resourcemanager,看看集群有什么變化上傳一個(gè)比較大的文件到HDFS系統(tǒng)上 [hadoop@hadoop1 output2]$ hadoop fs -mkdir -p /words/input/ [hadoop@hadoop1 output2]$ ll 總用量 82068 -rw-r--r--. 1 hadoop hadoop 84034300 3月 21 22:18 part-r-00000 -rw-r--r--. 1 hadoop hadoop 0 3月 21 22:18 _SUCCESS [hadoop@hadoop1 output2]$ hadoop fs -put part-r-00000 /words/input/words.txt [hadoop@hadoop1 output2]$ 執(zhí)行wordcount進(jìn)行單詞統(tǒng)計(jì),在map執(zhí)行過(guò)程中干掉active狀態(tài)的resourcemanager,觀察情況變化 首先啟動(dòng)hadoop4上的resourcemanager進(jìn)程 [hadoop@hadoop4 ~]$ yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /home/hadoop/apps/hadoop-2.7.5/logs/yarn-hadoop-resourcemanager-hadoop4.out [hadoop@hadoop4 ~]$ jps 3028 QuorumPeerMain 3847 ResourceManager 3884 Jps 3118 DataNode 3358 NodeManager [hadoop@hadoop4 ~]$ 在hadoop1上執(zhí)行單詞統(tǒng)計(jì) [hadoop@hadoop1 ~]$ cd apps/hadoop-2.7.5/share/hadoop/mapreduce/ [hadoop@hadoop1 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.5.jar wordcount /words/input/ /words/output/ 在hadoop3上隨時(shí)準(zhǔn)備干掉resourcemanager進(jìn)程 [hadoop@hadoop3 ~]$ jps 3488 JournalNode 3601 NodeManager 4378 Jps 3291 QuorumPeerMain 3389 DataNode 3757 ResourceManager [hadoop@hadoop3 ~]$ kill -9 3757 在map階段進(jìn)行到43%時(shí)干掉resourcemanager進(jìn)程 ![]() 發(fā)現(xiàn)計(jì)算過(guò)程沒(méi)有任何報(bào)錯(cuò),web界面也顯示任務(wù)執(zhí)行成功
|
|
來(lái)自: HK123COM > 《Zookeeper》