Category Archives: database

Solve the problem that hive is pending for a long time for MapReduce select

I have partition_table in Hive. It run the select command well: Hive>select * from partition_table; When I ran Hive>select count(*) from partition_table; Starting Job = job_1414213419655_0001, Tracking URL = http://centmaster:8088/proxy/application_1414213419655_0001/ Kill Command = /home/hadoop/hadoop-2.3.0/bin/hadoop job  -kill job_1414213419655_0001 After that, Hive is pending there for 40 minutes, until I pressed Ctrl+c to stop it. I checked… Read More »

Build_Jdbc_connection_to_Hive

Before this, the Hive should be correctly installed. This not only means that you can enter hive, but also your hive can interact with the mysql you configured. 1. Jars needed. It’s better you import all .jar files in /Hive/lib. Besides, you need hadoop-common-2.3.0.jar and slf4j-api-1.6.6.jar.Include them: activation-1.1.jar ant-1.9.1.jar ant-launcher-1.9.1.jar antlr-2.7.7.jar antlr-runtime-3.4.jar asm-commons-3.1.jar asm-tree-3.1.jar avro-1.7.5.jar… Read More »

Create hadoop work environment in myeclipse

Hadoop requires a lot of jars. Sometimes, we don’t know which jar should we include. It took me for a very long time. I got bored of it. Finally, I found the maven, it can build the hadoop environment easily and fast. Here we go: 1. Create a Maven project 2. Open the pom.xml, put the… Read More »

Format of Mapper and Reducer

The fomular of MapReduce can be described as follow. Besides, you can find the configuration of a Job. map: (K1, V1) → list(K2, V2) reduce: (K2, list(V2)) → list(K3, V3) (K1, V1): jobConf.setInputKeyClass(K1. class ); jobConf.setInputValueClass(V1. class ); list(K2, V2): job.setMapOutputKeyClass(K2.class); job.setMapOutputValueClass(V2.class); list(K3, V3): jobConf.setOutputKeyClass(K3. class ); jobConf.setOutputValueClass(V3. class ); Normally, (K2, V2) equals (K3,… Read More »

HA Federation on Hadoop2.0

In Hadoop 1.0, there is only 1 NameNode in whole cluster, this brings the risk of “single point failure”. In order to solve this problem, Hadoop 2 introduces HA mechanism. In the picture, there are 4 DataNodes, and 4 NameNodes. 2 NameNodes are active, 2 NameNodes are standby. Let’s see the left group of NameNode,… Read More »

Hadoop ecosystem

From hadoop 2.0, yarn is introduced. Yarn is a resource management, similar to JobTracker. On yarn, different calculation model can be implemented. Such as MapReduce, Tez, Storm, Spark. Database like hbase, hive are supported on yarn. MapReduce, use Map, Reduce, <Key, Value> to calculate. Storm, there is constant input to the calculation model. Once there… Read More »

Run remote wordcount mapreduce from eclipse in windows8

I have already deployed 4 nodes hadoop cluster in VMWare. I will run the wordcount application in myeclipse. Myeclipse is run in win8, which is my own laptop. 1. Deploy a hadoop work environment in eclipse. (You can refer to “Build hadoop work environment in MyEclipse” in my blog). 2. Import hadoop-mapreduce-client-common-2.3.0.jar and hadoop-mapreduce-client-jobclient-2.3.0.jar to… Read More »

HQL3

hive.mapred.mode=strict mode By default, “order by” will transfer to only one reducer. If the data amount is huge, it may exhuast the resource of the only reducer. So, it is suggested to use “limit” keyword to limit the output amount. When hive.mapred.mode=strict is set, hive will force to use “limit” when “order by” is used. Or… Read More »

HQL2

Where select * from partition_table where dt=’2014-04-01′ and dep=’R&D’; Limit. This can’t be used like “limit 1,3”; select * from partition_table where dt=’2014-04-01′ and dep=’R&D’ limit 5; “select *”, and the partition fields after where. They don’t require MapReduce, which improve the query efficiency. For example, salary is not a partition field, so this select requires… Read More »

HQL1

Create an internal table: create table hive_1_1(id string, name string, gender string) row format delimited fields terminated by ‘,’ stored as textfile; Create an out table: create external table hive_1_1(id string, name string, gender string) row format delimited fields terminated by ‘,’ stored as textfile; Load data from local computer: load data local inpath ‘/home/hadoop/hive-0.13.1/student.txt’… Read More »