Tag Archives: mapreduce

Solve the problem that hive is pending for a long time for MapReduce select

I have partition_table in Hive. It run the select command well: Hive>select * from partition_table; When I ran Hive>select count(*) from partition_table; Starting Job = job_1414213419655_0001, Tracking URL = http://centmaster:8088/proxy/application_1414213419655_0001/ Kill Command = /home/hadoop/hadoop-2.3.0/bin/hadoop job  -kill job_1414213419655_0001 After that, Hive is pending there for 40 minutes, until I pressed Ctrl+c to stop it. I checked… Read More »

Format of Mapper and Reducer

The fomular of MapReduce can be described as follow. Besides, you can find the configuration of a Job. map: (K1, V1) → list(K2, V2) reduce: (K2, list(V2)) → list(K3, V3) (K1, V1): jobConf.setInputKeyClass(K1. class ); jobConf.setInputValueClass(V1. class ); list(K2, V2): job.setMapOutputKeyClass(K2.class); job.setMapOutputValueClass(V2.class); list(K3, V3): jobConf.setOutputKeyClass(K3. class ); jobConf.setOutputValueClass(V3. class ); Normally, (K2, V2) equals (K3,… Read More »

Run remote wordcount mapreduce from eclipse in windows8

I have already deployed 4 nodes hadoop cluster in VMWare. I will run the wordcount application in myeclipse. Myeclipse is run in win8, which is my own laptop. 1. Deploy a hadoop work environment in eclipse. (You can refer to “Build hadoop work environment in MyEclipse” in my blog). 2. Import hadoop-mapreduce-client-common-2.3.0.jar and hadoop-mapreduce-client-jobclient-2.3.0.jar to… Read More »

MapReduce process on Yarn

1. Client submits a task to ResourceManager 2. ApplicationsManager in RM will asign this task to a Node Manager. It will launch a MapReduceApplicationsMaster. MapReduceApplicationMaster in this NodeManager will be totally responsible for this task. 3. MRApplicationMaster calculates how many CPU/Memory are needed for this task, and response to ApplicationsManager in RM. 4. Resource Scheduler… Read More »