1. 사전 준비 (both Master and Slaves)
- Install openssh-server
$ sudo apt-get install openssh-server
- Install java
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
$ cd /usr/lib/jvm
$ ln -s java-7-oracle jdk
- Add hadoop group and user
$ sudo addgroup hadoop
$ usermod -a -G hadoop hduser
- Configure SSH
$ ssh-keygen -t rsa -P ""
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@[slave_ip]
- Disable IPv6 - it creates problems in Hadoop
$ sudo gedit /etc/sysctl.conf
add the following lines to the end of the file
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
- Download and extract Hadoop
download Hadoop (http://apache.tt.co.kr/hadoop/common/)
$ cd ~/Downloads
$ sudo tar xvzf hadoop-2.2.0.tar.gz -C /usr/local
$ cd /usr/local
$ sudo mv hadoop-2.2.0 hadoop
$ sudo chown -R hduser:hadoop hadoop
2. Hadoop 설정 (both Master and Slaves)
- Configure .bashrc
$ sudo gedit ~/.bashrc
add the following lines to the end of the file
#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
#end of paste
- Modify hadoop-env.sh
$ gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh
modify the JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/jdk/
save and restart the system
$ hadoop version
- Create folder for tmp
$ mkdir -p $HADOOP_INSTALL/tmp
- core-site.xml
$ gedit /usr/local/hadoop/etc/hadoop/core-site.xml
add the following lines between <configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
- hdfs-site.xml
$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode
$ gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
add the following lines between <configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
- mapred-site.xml
$ gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml
add the following lines between <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
- yarn-site.xml
$ gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml
add the following lines between <configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource- tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
- format the namenode (master only)
$ hadoop namenode -format
or
$ hdfs namenode -format
3. Hadoop 실행 및 중지 (only at Master)
- start-dfs.sh
- to start namenode and data node
- $ start-dfs.sh
- $ jps - master
- Jps
- SecondaryNameNode
- NameNode
- $ jps - slave
- Jps
- DataNode
- start-yarn.sh
- to start resourcemanager and nodemanager
- $ start-yarn.sh
- $ jps - master
- Jps
- ResourceManager
- $ jps - slave
- Jps
- NodeManager
- stop-dfs.sh
- stop-yarn.sh
- start-all.sh (deprecated)
- stop-all.sh (deprecated)
4. Hadoop 동작 (only at Master)
$ hadoop jar hadoop-examples.jar randomwriter out
반응형