HQL2 | allenlipeng48

Share the joy

Where
select * from partition_table where dt=’2014-04-01′ and dep=’R&D’;

Limit. This can’t be used like “limit 1,3”;
select * from partition_table where dt=’2014-04-01′ and dep=’R&D’ limit 5;

“select *”, and the partition fields after where. They don’t require MapReduce, which improve the query efficiency.
For example, salary is not a partition field, so this select requires MapReduce: select * from partition_table where salary>10000;

Case
select name, salary,
case
when salary<5000 then ‘L1-poor’
when salary>=5000 and salary<10000 then ‘L2-middle’
when salary<=10000 and salary<15000 then ‘L3-rich’
when salary>=15000 then ‘L4-super’
else ‘L0’
end as salary_level, gender, level, dt, dep from partition_table limit 4;

Group by, Having They are the same as the usage in sql.
select gender, sum(salary) from partition_table group by gender having count(*)>4;

set hive.map.aggr=true; This command can make the query much efficient, without more cost of memory.

Left Join
select partition_table.name, dep, class from partition_table left join student on partition_table.name=student.name;

Map Join，When a small table joins with another big table, use the small table in mapjoin. This can improve the efficiency.
select /*+ mapjoin(c) */ * from orders o join cities c on (o.city_id = c.id);
For more, read https://www.facebook.com/notes/facebook-engineering/join-optimization-in-apache-hive/470667928919

M	T	W	T	F	S	S
				Nov »
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31