Tag Archives: hadoop

Create hadoop work environment in myeclipse

Hadoop requires a lot of jars. Sometimes, we don’t know which jar should we include. It took me for a very long time. I got bored of it. Finally, I found the maven, it can build the hadoop environment easily and fast. Here we go: 1. Create a Maven project 2. Open the pom.xml, put the… Read More »

HA Federation on Hadoop2.0

In Hadoop 1.0, there is only 1 NameNode in whole cluster, this brings the risk of “single point failure”. In order to solve this problem, Hadoop 2 introduces HA mechanism. In the picture, there are 4 DataNodes, and 4 NameNodes. 2 NameNodes are active, 2 NameNodes are standby. Let’s see the left group of NameNode,… Read More »

Hadoop ecosystem

From hadoop 2.0, yarn is introduced. Yarn is a resource management, similar to JobTracker. On yarn, different calculation model can be implemented. Such as MapReduce, Tez, Storm, Spark. Database like hbase, hive are supported on yarn. MapReduce, use Map, Reduce, <Key, Value> to calculate. Storm, there is constant input to the calculation model. Once there… Read More »

MapReduce process on Yarn

1. Client submits a task to ResourceManager 2. ApplicationsManager in RM will asign this task to a Node Manager. It will launch a MapReduceApplicationsMaster. MapReduceApplicationMaster in this NodeManager will be totally responsible for this task. 3. MRApplicationMaster calculates how many CPU/Memory are needed for this task, and response to ApplicationsManager in RM. 4. Resource Scheduler… Read More »

Install hadoop on CentOS

1. In VMWare, configure the network connection into Bridge Mode. In this way, the virtual machine could be in the same network as your own computer. Virtual machine can directly visit the internet, and virtual machines can visit each other. In NAT mode, the next hop of virtual machine is your own computer. Virtual machine… Read More »

Install Hive

JDK, mysql and hadoop should be correctly installed and running before Hive. 1. Download Hive. You can use wget to download hive from hive.apache.org/downloads.html 2. Uncompress it to /home/hadoop/hive-0.13.1 3. Edit hive-site.xml. In hive-0.13.1/conf directory, create a new hive-site.xml from the template: #cp hive-default.xml.template hive-site.xml Add/change the below part in hive-site.xml, to let it adapt… Read More »