Hadoop分布式集群部署

Hadoop分布式集群部署

本篇内容记录部署一个有一个主节点和三个从节点的Hadoop-yarn集群,集群中所用的各个节点必须有一个唯一的主机名和ip地址,并能够基于主机互相通信,可以通过hosts文件进行主机解析。另外,通过master节点启动或者停止整个集群还需要在master节点上配置用于运行服务的用户hdfs和yarn等能够基于秘钥认证方式通过ssh远程连接至各从节点

主机规划:

  • 192.168.214.143 : master
  • 192.168.214.158 : node-1
  • 192.168.214.152 : node-2
  • 192.168.214.173 : node-3

部署前环境准备

  • 各节点时间同步
[root@localhost ~]# yum install ntpdate -y

[root@localhost ~]# ntpdate gudaoyufu.com
  • 各节点安装jdk
yum  install  java-1.7.0-openjdk.x86_64  java-1.7.0-openjdk-devel.x86_64 -y
  • 添加环境变量
vim  /etc/profile.d/java.sh

export JAVA_HOME=/usr

source /etc/profile.d/java.sh

  • 配置主机免秘钥通信
[root@master ~]# ssh-keygen -t rsa

[root@master ~]# ssh-copy-id  -i ~/.ssh/id_rsa.pub root@192.168.214.148
[root@master ~]# ssh-copy-id  -i ~/.ssh/id_rsa.pub root@192.168.214.152
[root@master ~]# ssh-copy-id  -i ~/.ssh/id_rsa.pub root@192.168.214.173

  • 配置master基于主机名与node节点通信
[root@master ~]# vim /etc/hosts

192.168.214.143 master
192.168.214.148 node-1
192.168.214.152 node-2
192.168.214.173 node-3

  • 将master上的hosts文件复制到其他各节点
[root@master ~]# scp /etc/hosts node-1:/etc/hosts
[root@master ~]# scp /etc/hosts node-2:/etc/hosts
[root@master ~]# scp /etc/hosts node-3:/etc/hosts

配置master

  • 各节点创建hadoop用户 useradd hadoop,并设置hadoop用户密码

  • master节点的hadoop用户可以基于秘钥连接node节点,在master切换到hadoop用户,生成秘钥文件,复制其他node节点的hadoop用户的家目录中

[hadoop@master ~]$ ssh-keygen -t rsa -P ''

[hadoop@master ~]$ for i in 1 2 3; do ssh-copy-id -i .ssh/id_rsa.pub hadoop@node-${i};done
  • 创建数据目录并授权
[root@master ~]# mkdir -p /bdapps /data/hadoop/hdfs/{nn,snn,dn}
[root@master ~]# chown -R hadoop:hadoop /data/hadoop/hdfs/

  • 解压hadoop源码至安装目录
[root@master ~]# tar zxf hadoop-2.6.2.tar.gz -C /bdapps/
  • 修改解压后名称,创建logs目录,将安装目录属主属组改为hadoop
[root@master bdapps]# ln -s hadoop-2.6.2/ hadoop
[root@master bdapps]# cd hadoop
[root@master hadoop]# ls
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share
[root@master hadoop]# mkdir logs
[root@master hadoop]# chown -R hadoop:hadoop ./*
  • 配置core-site.xml
<configuration>
        <property>
          <name>fs.defaultFS</name>
          <value>hdfs://master:8020</value>
          <final>true</final>
        </property>
</configuration>~                 
  • 配置yarn-site.xml
<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
  </property>

  <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
  </property>

  <property>
        <name>yarn.resourcemanager.resorce-tracker.address</name>
        <value>master:8031</value>
  </property>

  <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master:8033</value>
  </property>

  <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master:8088</value>
  </property>

  <property>
        <name>yarn.nodemanager.aux-service</name>
        <value>mapreduce_shuffle</value>
  </property>

  <property>
        <name>yarn.nodemanager.auxservice.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

  <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
  </property>

</configuration>

  • 配置hdfs-site.xml
<configuration>
        <property>
          <name>dfs.replication</name>
          <value>2</value>
        </property>

        <property>
          <name>dfs.namenode.name.dir</name>
          <value>file:///data/hadoop/hdfs/nn</value>
        </property>

        <property>
          <name>dfs.datanode.name.dir</name>
          <value>file:///data/hadoop/hdfs/dn</value>
        </property>


        <property>
          <name>fs.checkpoint.dir</name>
          <value>file:///data/hadoop/hdfs/snn</value>
        </property>

        <property>
          <name>fs.checkpoint.edits.dir</name>
          <value>file:///data/hadoop/hdfs/snn</value>
        </property>


</configuration>

  • 配置mapred.xml
<configuration>
  <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
  </property>
</configuration>

配置slaves

[root@master hadoop]# vim slaves
#最好配置主机名
node-1
node-2
node-3

配置datanode节点

node节点装好hadoop并提供好数据目录即可(与master相同)

  • 各node节点创建hadoop安装目录和数据目录
[root@node-1 ~]# mkdir -p /bdapps /data/hadoop/hdfs/{nn,snn,dn}
[root@node-1 ~]# chown -R hadoop:hadoop /data/hadoop/hdfs/

...node-2 , node-3 节点略...
  • node各节点解压hadoop源码包至/bdapps,
[root@node-1 ~]# tar zxf hadoop-2.6.2.tar.gz  -C /bdapps/
[root@node-1 ~]# cd /bdapps/ 
[root@node-1 bdapps]# ln -s hadoop-2.6.2/ hadoop
[root@node-1 bdapps]# cd hadoop
[root@node-1 hadoop]# mkdir logs
[root@node-1 hadoop]# chmod g+w logs
[root@node-1 hadoop]# chown -R hadoop:hadoop ./*

# 上面操作在各node节点都操作
  • 复制master节点的配置文件到各node节点,只复制上面修改的几个文件即可,也可以将整个目录复制过去,其属主属组为hadoop(因为整个集群中只有hadoop一个用户)
[root@master ~]# su - hadoop

[hadoop@master ~]$ scp /bdapps/hadoop/etc/hadoop/* node-1:/bdapps/hadoop/etc/hadoop/
[hadoop@master ~]$ scp /bdapps/hadoop/etc/hadoop/* node-2:/bdapps/hadoop/etc/hadoop/
[hadoop@master ~]$ scp /bdapps/hadoop/etc/hadoop/* node-3:/bdapps/hadoop/etc/hadoop/

## 配置所有节点hadoop运行环境变量

  • master节点和各node节点都配置
[root@master ~]# vim /etc/profile.d/hadoop.sh


export HADOOP_PREFIX=/bdapps/hadoop
export PATH=$PATH:${HADOOP_PREFIX}/bin:${HADOOP_PREFIX}/sbin
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
export HADOOP_MAPPERD_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}

#重读
[root@master ~]# source /etc/profile.d/hadoop.sh 

将上面的文件复制到node节点

[root@master ~]# scp /etc/profile.d/hadoop.sh node-1:/etc/profile.d/

[root@master ~]# scp /etc/profile.d/hadoop.sh node-2:/etc/profile.d/

[root@master ~]# scp /etc/profile.d/hadoop.sh node-3:/etc/profile.d/

#各node节点执行重读指令
 source /etc/profile.d/hadoop.sh 

格式化文件系统

  • master节点

切换到hadoop用户

[hadoop@master ~]$ hdfs namenode -format

提示 : bash: hdfs: command not found

#解决操作如下:

vim /etc/profile

export HADOOP_HOME=本机的hadoop安装路径
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

source /etc/profile

重新执行格式化

[hadoop@master ~]$ hdfs namenode -format


18/10/14 23:10:40 INFO namenode.FSImage: Allocated new BlockPoolId: BP-963308474-192.168.214.143-1539529840197
18/10/14 23:10:40 INFO common.Storage: Storage directory /data/hadoop/hdfs/nn has been successfully formatted.
18/10/14 23:10:40 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/10/14 23:10:40 INFO util.ExitUtil: Exiting with status 0
18/10/14 23:10:40 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.214.143
************************************************************/

  • 使用集群启动脚本方式启动
[hadoop@master ~]$ start-dfs.
start-dfs.cmd  start-dfs.sh   
[hadoop@master ~]$ start-dfs.sh 
Starting namenodes on [master]
The authenticity of host 'master (192.168.214.143)' can't be established.
ECDSA key fingerprint is SHA256:Pv8ulDd/yyzksPi1EoMc62arN7dUwwd8wmD60EKmqoo.
ECDSA key fingerprint is MD5:66:61:29:7d:71:75:d2:af:4d:5b:38:49:df:e1:c4:7d.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added 'master,192.168.214.143' (ECDSA) to the list of known hosts.
hadoop@master's password: 

master: starting namenode, logging to /bdapps/hadoop/logs/hadoop-hadoop-namenode-master.out
node-2: starting datanode, logging to /bdapps/hadoop/logs/hadoop-hadoop-datanode-node-2.out
node-3: starting datanode, logging to /bdapps/hadoop/logs/hadoop-hadoop-datanode-node-3.out
node-1: starting datanode, logging to /bdapps/hadoop/logs/hadoop-hadoop-datanode-node-1.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:Pv8ulDd/yyzksPi1EoMc62arN7dUwwd8wmD60EKmqoo.
ECDSA key fingerprint is MD5:66:61:29:7d:71:75:d2:af:4d:5b:38:49:df:e1:c4:7d.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
hadoop@0.0.0.0's password: 
0.0.0.0: starting secondarynamenode, logging to /bdapps/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
  • 到node节点登录到hadoop用户查看jps进程
[hadoop@node-2 ~]$ jps
1804 Jps
1748 DataNode
#可以看到node节点已经起来了
  • 启动yarn集群
[hadoop@master ~]$ start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-resourcemanager-master.out
node-2: starting nodemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-nodemanager-node-2.out
node-1: starting nodemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-nodemanager-node-1.out
node-3: starting nodemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-nodemanager-node-3.out

  • 上传文件测试
[hadoop@master ~]$ hdfs dfs -put /etc/rc.d/init.d/functions /test/
[hadoop@master ~]$ hdfs dfs -ls /test
Found 1 items
-rw-r--r--   2 hadoop supergroup      17500 2018-10-14 23:39 /test/functions

  • 执行一个测试任务,统计/test/functions文件中的各单词数
[hadoop@master ~]$ yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /test/functions /test/functions /test/wc

18/10/14 23:53:26 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.214.143:8032
18/10/14 23:53:28 INFO input.FileInputFormat: Total input paths to process : 2
18/10/14 23:53:28 INFO mapreduce.JobSubmitter: number of splits:2
18/10/14 23:53:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539531528337_0001
18/10/14 23:53:30 INFO impl.YarnClientImpl: Submitted application application_1539531528337_0001
18/10/14 23:53:30 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1539531528337_0001/
18/10/14 23:53:30 INFO mapreduce.Job: Running job: job_1539531528337_0001

#当这里出现Running job的时候,就可以在在web界面中看到任务信息

《Hadoop分布式集群部署》

  • 查看节点状态

《Hadoop分布式集群部署》

  • 停止集群
stopping yarn daemons
stopping resourcemanager
node-2: stopping nodemanager
node-1: stopping nodemanager
node-3: stopping nodemanager
node-2: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
node-3: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
node-1: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
[hadoop@master ~]$ stop-dfs.sh 
Stopping namenodes on [master]
hadoop@master's password: 
master: stopping namenode
node-2: stopping datanode
node-1: stopping datanode
node-3: stopping datanode
Stopping secondary namenodes [0.0.0.0]

点赞

发表评论

电子邮件地址不会被公开。 必填项已用*标注