I have already deployed 4 nodes hadoop cluster in VMWare. I will run the wordcount application in myeclipse. Myeclipse is run in win8, which is my own laptop.
1. Deploy a hadoop work environment in eclipse. (You can refer to “Build hadoop work environment in MyEclipse” in my blog).
2. Import hadoop-mapreduce-client-common-2.3.0.jar and hadoop-mapreduce-client-jobclient-2.3.0.jar to workspace. They exist in HADOOP/share/hadoop/mapreduce directory.
3. You should still have a hadoop directory in windows. And set set the environment vairable of HADOOP_HOME. This is mandantory.
4. Download hadoop-common-2.2.0-bin-master.rar, extract the files and overwrite in hadoop/bin. Here, there are 2 necessary files. They are hadoop.dll and winutils.exe.
6. I copied file1.txt, file2.txt to centmaster:9000/input
5. Modify the WordCount.java, and run it in eclipse.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper extends
Mapper《Object, Text, Text, IntWritable》 {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends
Reducer《Text, IntWritable, Text, IntWritable》 {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = { “hdfs://centmaster:9000/input”,
“hdfs://centmaster:9000/output” };
System.setProperty(“HADOOP_USER_NAME”, “root”); //this is very important. Set the root name of the NameNode(centmaster).
Job job = new Job(conf, “word count”);
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
6. You can find the result.
Use the cat to check the result:
[root@centmaster bin]# bash hadoop fs -cat /output/part-r-00000
2012-3-1 2
2012-3-2 2
2012-3-3 4
2012-3-4 2
2012-3-5 2
2012-3-6 2
2012-3-7 2
a 4
b 4
c 5
d 3
Some difference from the online articles.
1. Some articles said, you should configure the hadoop/etc/hadoop/hadoop-env.cmd, and set the JAVA_HOME. But in my case, it is not necessary. I tested, that I modified “etc” into “etc2”. Mapreduce runs well.
2. Some articles said, that you should put hadoop.dll in windows/system32. I copied, but I didn’t test if it is necessary to do that. You can try that. But you need to restart your computer.
3. I didn’t configure the %HADOOP_HOME%\bin in PATH environment viariable, it still runs well.
Experience to debug:
Eclipse doesn’t show many error information. The best way is that you find it in logs/hadoop-root-namenode-centmaster.log file in the namenode, and google it.