环境准备:
装备使用一台主 namenode,一台备namenode,2台datanode。 系统: Red Hat Enterprise Linux Server release 6.6 (Santiago)
配置/etc/hosts
192.168.83.11 hd1 192.168.83.22 hd2 192.168.83.33 hd3 192.168.83.44 hd4
修改主机名:
vi /etc/sysconfig/network
NETWORKING=yes HOSTNAME=hd1
创建hadoop运行用户
groupadd -g 10010 hadoop useradd -u 1001 -g 10010 -d /home/hadoop hadoop
ssh对等性验证包括:
主要包括hd1 到各个节点的对等性验证设置。 在各个节点生成authorizedkeys ,然后全部拷贝到hd1合并,然后把hd1合并后的authorizedkeys 分发到各个子节点。
hd1: ssh-keygen -t rsa cat ~/.ssh/.pub >~/.ssh/authorizedkeys hd2: ssh-keygen -t rsa cat ~/.ssh/.pub >~/.ssh/authorizedkeys scp ~/.ssh/authorizedkeys hd1:~/.ssh/authorizedkeys2 hd3: ssh-keygen -t rsa cat ~/.ssh/.pub >~/.ssh/authorizedkeys scp ~/.ssh/authorizedkeys hd1:~/.ssh/authorizedkeys3 hd4: ssh-keygen -t rsa cat ~/.ssh/.pub >~/.ssh/authorizedkeys scp ~/.ssh/authorizedkeys hd1:~/.ssh/authorizedkeys4
配置hadoop用户环境变量:
vi ~/.bash_profile
export JAVA_HOME=/usr/java/jdk1.8.0_11export JRE_HOME=/usr/java/jdk1.8.0_11/jreexport CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/libexport HADOOP_INSTALL=/usr/hadoop/hadoop-2.7.1export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
安装jdk
[hadoop@hd1 ~]$ java -version java version "1.8.0_11"Java(TM) SE Runtime Environment (build 1.8.0_11-b12)Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)
配置hadoop的基础环境变量,如JDK位置,hadoop一些配置、文件及日志的路径,这些配置都在/home/hadoop-2.7.1/etc/hadoop/hadoop-env.sh文件中,修改以下内容: export JAVAHOME=/usr/java/jdk1.8.011
安装hadoop
从官网http://hadoop.apache.org下载hadoop-2.7.1.tar.gz,只在Master上解压,我的解压路径是 /usr/hadoop/hadoop-2.7.1/
配置hadoop:
由于hadoop运行有3种模式,独立模式,伪分布模式,集群模式,接下就这三种模式分别配置。 我们可以在启动的时候通过--config 参数来指定启动路径,从而达到多种模式共存的目的。我们一般会在环境变量配置hadoop_install参数指定hadoop根目录路径。 独立模式:单节点运行,hadoop默认配置都是独立模式,可以直接启动。 伪分布模式(pseudo):在一台主机模拟集群模式。 hadoop分为core,hdfs和map/reduce三部分。配置文件也被分成了三个core- site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml。
下面来配置伪分布模式:
修改Hadoop核心配置文件core-site.xml,这里配置的是HDFS master(即namenode)的地址和端口号。
fs.defaultFS hdfs://localhost/
配置hdfs-site.xml文件
修改Hadoop中HDFS的配置,配置的备份方式默认为3。
dfs.replication 1
配置mapred-site.xml文件
修改Hadoop中MapReduce的配置文件,配置的是JobTracker的地址和端口。
mapreduce.framework.name yarn
配置yarn-site.xml文件
yarn.resourcemanager.hostname localhost yarn.resourcemanager.aux-services mapreduce_shuffle
hdfs文件格式化:
hadoop namenode -format
STARTUP_MSG: java = 1.8.0_11************************************************************/18/07/23 17:04:32 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]18/07/23 17:04:32 INFO namenode.NameNode: createNameNode [-format]18/07/23 17:04:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFormatting using clusterid: CID-7ebdb3d2-19c1-4c1a-a64f-c3c149d1c07f18/07/23 17:04:34 INFO namenode.FSNamesystem: No KeyProvider found.18/07/23 17:04:34 INFO namenode.FSNamesystem: fsLock is fair:true18/07/23 17:04:34 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=100018/07/23 17:04:34 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true18/07/23 17:04:34 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.00018/07/23 17:04:34 INFO blockmanagement.BlockManager: The block deletion will start around 2018 Jul 23 17:04:3418/07/23 17:04:34 INFO util.GSet: Computing capacity for map BlocksMap18/07/23 17:04:34 INFO util.GSet: VM type = 64-bit18/07/23 17:04:34 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB18/07/23 17:04:34 INFO util.GSet: capacity = 2^21 = 2097152 entries18/07/23 17:04:34 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false18/07/23 17:04:34 INFO blockmanagement.BlockManager: defaultReplication = 318/07/23 17:04:34 INFO blockmanagement.BlockManager: maxReplication = 51218/07/23 17:04:34 INFO blockmanagement.BlockManager: minReplication = 118/07/23 17:04:34 INFO blockmanagement.BlockManager: maxReplicationStreams = 218/07/23 17:04:34 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false18/07/23 17:04:34 INFO blockmanagement.BlockManager: replicationRecheckInterval = 300018/07/23 17:04:34 INFO blockmanagement.BlockManager: encryptDataTransfer = false18/07/23 17:04:34 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 100018/07/23 17:04:34 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)18/07/23 17:04:34 INFO namenode.FSNamesystem: supergroup = supergroup18/07/23 17:04:34 INFO namenode.FSNamesystem: isPermissionEnabled = true18/07/23 17:04:34 INFO namenode.FSNamesystem: HA Enabled: false18/07/23 17:04:34 INFO namenode.FSNamesystem: Append Enabled: true18/07/23 17:04:35 INFO util.GSet: Computing capacity for map INodeMap18/07/23 17:04:35 INFO util.GSet: VM type = 64-bit18/07/23 17:04:35 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB18/07/23 17:04:35 INFO util.GSet: capacity = 2^20 = 1048576 entries18/07/23 17:04:35 INFO namenode.FSDirectory: ACLs enabled? false18/07/23 17:04:35 INFO namenode.FSDirectory: XAttrs enabled? true18/07/23 17:04:35 INFO namenode.FSDirectory: Maximum size of an xattr: 1638418/07/23 17:04:35 INFO namenode.NameNode: Caching file names occuring more than 10 times18/07/23 17:04:35 INFO util.GSet: Computing capacity for map cachedBlocks18/07/23 17:04:35 INFO util.GSet: VM type = 64-bit18/07/23 17:04:35 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB18/07/23 17:04:35 INFO util.GSet: capacity = 2^18 = 262144 entries18/07/23 17:04:35 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.999000012874603318/07/23 17:04:35 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 018/07/23 17:04:35 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 3000018/07/23 17:04:35 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 1018/07/23 17:04:35 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 1018/07/23 17:04:35 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,2518/07/23 17:04:35 INFO namenode.FSNamesystem: Retry cache on namenode is enabled18/07/23 17:04:35 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis18/07/23 17:04:35 INFO util.GSet: Computing capacity for map NameNodeRetryCache18/07/23 17:04:35 INFO util.GSet: VM type = 64-bit18/07/23 17:04:35 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB18/07/23 17:04:35 INFO util.GSet: capacity = 2^15 = 32768 entriesRe-format filesystem in Storage Directory /tmp/hadoop-hadoop/dfs/name ? (Y or N) Y18/07/23 17:07:57 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1239596151-192.168.83.11-153233687718118/07/23 17:07:57 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.18/07/23 17:07:57 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 018/07/23 17:07:57 INFO util.ExitUtil: Exiting with status 018/07/23 17:07:57 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at hd1/192.168.83.11************************************************************/
启动dfs:
[hadoop@hd1 hadoop_pseudo]$ start-dfs.sh --config /usr/hadoop/hadoop-2.7.1/etc/hadoop_pseudo/Starting namenodes on [localhost]localhost: starting namenode, logging to /usr/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-namenode-hd1.outlocalhost: starting datanode, logging to /usr/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-datanode-hd1.outStarting secondary namenodes [0.0.0.0]0.0.0.0: starting secondarynamenode, logging to /usr/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-secondarynamenode-hd1.out
启动yarn:
[hadoop@hd1 hadoop_pseudo]$ start-yarn.sh --config /usr/hadoop/hadoop-2.7.1/etc/hadoop_pseudo/starting yarn daemonsstarting resourcemanager, logging to /usr/hadoop/hadoop-2.7.1/logs/yarn-hadoop-resourcemanager-hd1.outlocalhost: starting nodemanager, logging to /usr/hadoop/hadoop-2.7.1/logs/yarn-hadoop-nodemanager-hd1.out
启动Mapreduce:mr-jobhistory-daemon.sh start historyserver
或者 如下命令启动hadoop伪分布所有组件:
[hadoop@hd1 ~]$ start-all.sh --config /usr/hadoop/hadoop-2.7.1/etc/hadoop_pseudo
当然也可以使用HADOOPCONFDIR来制定hadoop配置文件路径。如下:
export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop_pseudo [hadoop@hd1 ~]$ hadoop fs -ls /18/07/24 07:29:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFound 24 items-rw-r--r-- 1 root root 0 2018-07-23 16:37 /.autofsckdr-xr-xr-x - root root 4096 2018-07-19 20:35 /bindr-xr-xr-x - root root 1024 2018-07-19 19:06 /bootdrwxr-xr-x - root root 4096 2014-08-07 13:29 /cgroupdrwxr-xr-x - root root 3800 2018-07-23 16:37 /devdrwxr-xr-x - root root 12288 2018-07-23 16:37 /etcdrwxr-xr-x - root root 4096 2018-07-20 16:37 /homedr-xr-xr-x - root root 4096 2018-07-19 19:05 /libdr-xr-xr-x - root root 12288 2018-07-19 20:35 /lib64drwx------ - root root 16384 2018-07-19 19:00 /lost+founddrwxr-xr-x - root root 4096 2011-06-28 22:13 /mediadrwxr-xr-x - root root 0 2018-07-23 16:37 /miscdrwxr-xr-x - root root 4096 2011-06-28 22:13 /mntdrwxr-xr-x - root root 0 2018-07-23 16:37 /netdrwxr-xr-x - root root 4096 2018-07-19 19:05 /optdr-xr-xr-x - root root 0 2018-07-23 16:37 /procdr-xr-x--- - root root 4096 2018-07-19 22:53 /rootdr-xr-xr-x - root root 12288 2018-07-19 20:35 /sbindrwxr-xr-x - root root 0 2018-07-23 16:37 /selinuxdrwxr-xr-x - root root 4096 2011-06-28 22:13 /srvdrwxr-xr-x - root root 0 2018-07-23 16:37 /sysdrwxrwxrwt - root root 4096 2018-07-24 07:27 /tmpdrwxr-xr-x - root root 4096 2018-07-20 18:29 /usrdrwxr-xr-x - root root 4096 2018-07-19 19:05 /var
后记: 如果是在win平台安装hadoop 2.7.1,需要另外几个文件(链接:https://pan.baidu.com/s/1w1-cmTDTLWC_sFNWpxrOQA 密码:ozzw),不然启动报错。 把如下几个文件拷贝到hadoop安装目录下bin目录即可。
集群模式部署:
cp -p -r $HADOOP_INSTALL/etc/hadoop $HADOOP_INSTALL/etc/hadoop_cluster
ln -s /usr/hadoop/hadoop-2.7.1/etc/hadoop_cluster /usr/hadoop/hadoop-2.7.1/etc/hadoop
需要修改 slaves、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml 等5个文件。
vi core-site.xml
fs.defaultFS hdfs://hd1:9000 hadoop.tmp.dir file:/usr/hadoop/hadoop-2.7.1/tmp Abase for other temporary directories.
hdfs-site.xml
dfs.namenode.secondary.http-address hd1:50090 dfs.replication 2 dfs.namenode.name.dir file:/usr/hadoop/hadoop-2.7.1/tmp/dfs/name dfs.datanode.data.dir file:/usr/hadoop/hadoop-2.7.1/tmp/dfs/data
mapred-site.xml
mapreduce.framework.name yarn mapreduce.jobhistory.address hd1:10020 mapreduce.jobhistory.webapp.address hd1:19888
yarn-site.xml
yarn.resourcemanager.hostname hd1 yarn.nodemanager.aux-services mapreduce_shuffle
salves
vi salveshd3hd4
把hadoop_cluster文件拷贝到hd2,hd3,hd4
[hadoop etc]$ scp -p -r hadoop_cluster/ hadoop@hd2:/usr/hadoop/hadoop-2.7.1/etc/
[hadoop etc]$ scp -p -r hadoop_cluster/ hadoop@hd3:/usr/hadoop/hadoop-2.7.1/etc/
[hadoop etc]$ scp -p -r hadoop_cluster/ hadoop@hd4:/usr/hadoop/hadoop-2.7.1/etc/
格式化:
hadoop namenode -format --config /usr/hadoop/hadoop-2.7.1/etc/hadoop_cluster/
启动:
[hadoop@hd1 ~]$ start-dfs.sh
[hadoop@hd1 ~]$ start-yarn.sh
[hadoop@hd1 ~]$ mr-jobhistory-daemon.sh start historyserver
hd1部署NameNode实例,hd2不是辅助NameNode实例和DataNode实例
hd3,hd4部署DataNode实例
|NameNode | SecondaryNameNode | DataNode | |
hd1 | Y | ||
hd2 | Y | Y | |
hd3 | Y | ||
hd4 | Y |
hd1:
[hadoop@hd1 sbin]$ jps7764 Jps7017 ResourceManager6734 NameNode
hd2:
[root@hd2 ~]# jps3222 NodeManager3142 SecondaryNameNode3962 Jps3035 DataNode
hd3:
[root@hd3 ~]# jps3600 DataNode3714 NodeManager4086 Jps
hd4:
[root@hd4 ~]# jps3024 NodeManager3373 Jps2909 DataNode
yarn 资源管理框架,有NodeManager和ResourceManager进程,NM在DataNode节点上,而RM在NameNode节点上。