Install hadoop on CentOS

Share the joy

1. In VMWare, configure the network connection into Bridge Mode. In this way, the virtual machine could be in the same network as your own computer. Virtual machine can directly visit the internet, and virtual machines can visit each other.

In NAT mode, the next hop of virtual machine is your own computer. Virtual machine visit internet via your own computer. In this mode, virtual is invisible to outside.

2. Configure the NIC(network interface card) by setting the DNS server and a static IP.

#su //enter root mode for convenience.

#vi /etc/sysconfig/network-scripts/ifcfg-eth0

3.Configure host

#vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=centmaster
GATEWAY=192.168.1.254

4. Restart NIC

#service network restart

#ping google successfully

5. Configure /etc/hosts：

centmaster      192.168.1.160
centslave1        192.168.1.161
centslave2        192.168.1.162
centslave3        192.168.1.163

6. Check if ssh is installed. If not, install it.

#rpm -qa |grep ssh //check if it is installed
if not, use “yum install openssh-server”
#chkconfig –list sshd //check if SSHD is launched when the virtual machine is on
#chkconfig –level 2345 sshd on //if not, set it is launched as default

7. Configure keygen
#su //Enter root mode run “ssh-keygen -t rsa” in the directory of “(/home/hadoop)”

After this command, you can find “id_dsa”, and “id_dsa.pub” in “/root/.ssh”
#cat ./id_rsa.pub >> authorized_keys;

run “ssh localhost” to check if you can log on your own virtual machine without inputting password. And you will receive the following information, which means you still need to type something in order to connect it:

The authenticity of host ‘localhost (127.0.0.1)’ can’t be established.

ECDSA key fingerprint is 40:2b:3c:88:b0:34:f9:cd:6d:15:b8:7b:c4:f7:02:f9.

Are you sure you want to continue connecting (yes/no)?

In order to solve this problem, do this:

#chmod 700 /home/hadoop/.ssh
#chmod 644 /home/hadoop/.ssh/authorized_keys

8. Install java。Download jdk-8u20-linux-x64.rpm to “/home/hadoop/Downloads”

#rpm -ihv jdk-8u20-linux-x64.rpm    //install
#java -version           //if version shows, it means the installation is successful.
#vi /etc/profile          //add “export JAVA_HOME=/usr/java/jdk1.8.0_20”
#source /etc/profile         //refresh the modified configuration file

9. Install hadoop
#wget http://www.apache.org/dist/hadoop/core/hadoop-2.3.0/hadoop-2.3.0.tar.gz

Download hadoop into “/home/hadoop/Downloads”
#tar -xvf hadoop-2.3.0.tar.gz //Decompress it into “/home/hadoop/hadoop-2.3.0”

10. set JAVA_HOME in “hadoop-2.3.0/etc/hadoop/hadoop-env.sh”
#export JAVA_HOME=/usr/java/jdk1.8.0_20

11. Use vi to edit “etc/hadoop/core-site.xml”, and add the following:
<configuration>
<property>|
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>

12. Use vi to edit “~/.bashrc”, and configure “HADOOP_HOME”.

#cd ~

“~” is the user directory, it has “.bashrc”. “.bashrc” is automatic launched as computer is on.

add the following environment variables in “.bashrc”:

export HADOOP_HOME=/home/hadoop/setup/hadoop3.2.0
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS=”-Djava.library.path=${HADOOP_HOME}/lib”

13. Format namenode

#bash hadoop namenode -format

14. Lacunh hadoop

#bash start-all.sh

Maybe denied error would happen. It is because you don’t have enough right. Use “su” to enter the root mode, and do it again.

#jps //chceck the java process

29888 NameNode
30146 SecondaryNameNode
28725 ResourceManager
29981 DataNode
30382 NodeManager
30654 Jps

15. Check if there are port restrictions on 9000, 50070, 50020, 50090, 50075 in iptables

16. Open “http://192.168.1.160:50070”, it shows home page of naemode. Until now, hadoop successfully installed

Error of “Node 192.168.1.163:50010 is expected to serve this storage”, this lead datanode not launched successfully：

Cause: It is because the datanode VERSION file is different from the VERSION of namenode. The VERSION file exists in “/tmp/hadoop-%username%/dfs/data/VERSION”. Open it, you can see the id are different.

How to solve it：

1. delete directory of “dfs.data.dir”(/tmp/hadoop-%username%/dfs/data)

2. or delete “/dfs/data/current/VERSION”

Default hadoop data are stored here:

dfs.name.dir /tmp/hadoop-%username%/dfs/name

dfs.data.dir /tmp/hadoop-%username%/dfs/data

October 2014
M	T	W	T	F	S	S
				Nov »
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31