Category Archives: database

MapReduce process on Yarn

1. Client submits a task to ResourceManager 2. ApplicationsManager in RM will asign this task to a Node Manager. It will launch a MapReduceApplicationsMaster. MapReduceApplicationMaster in this NodeManager will be totally responsible for this task. 3. MRApplicationMaster calculates how many CPU/Memory are needed for this task, and response to ApplicationsManager in RM. 4. Resource Scheduler… Read More »

Install hadoop on CentOS

1. In VMWare, configure the network connection into Bridge Mode. In this way, the virtual machine could be in the same network as your own computer. Virtual machine can directly visit the internet, and virtual machines can visit each other. In NAT mode, the next hop of virtual machine is your own computer. Virtual machine… Read More »

Install Hive

JDK, mysql and hadoop should be correctly installed and running before Hive. 1. Download Hive. You can use wget to download hive from hive.apache.org/downloads.html 2. Uncompress it to /home/hadoop/hive-0.13.1 3. Edit hive-site.xml. In hive-0.13.1/conf directory, create a new hive-site.xml from the template: #cp hive-default.xml.template hive-site.xml Add/change the below part in hive-site.xml, to let it adapt… Read More »