Where
select * from partition_table where dt=’2014-04-01′ and dep=’R&D’;
Limit. This can’t be used like “limit 1,3”;
select * from partition_table where dt=’2014-04-01′ and dep=’R&D’ limit 5;
“select *”, and the partition fields after where. They don’t require MapReduce, which improve the query efficiency.
For example, salary is not a partition field, so this select requires MapReduce: select * from partition_table where salary>10000;
Case
select name, salary,
case
when salary<5000 then ‘L1-poor’
when salary>=5000 and salary<10000 then ‘L2-middle’
when salary<=10000 and salary<15000 then ‘L3-rich’
when salary>=15000 then ‘L4-super’
else ‘L0’
end as salary_level, gender, level, dt, dep from partition_table limit 4;
Group by, Having They are the same as the usage in sql.
select gender, sum(salary) from partition_table group by gender having count(*)>4;
set hive.map.aggr=true; This command can make the query much efficient, without more cost of memory.
Left Join
select partition_table.name, dep, class from partition_table left join student on partition_table.name=student.name;
Map Join,When a small table joins with another big table, use the small table in mapjoin. This can improve the efficiency.
select /*+ mapjoin(c) */ * from orders o join cities c on (o.city_id = c.id);
For more, read https://www.facebook.com/notes/facebook-engineering/join-optimization-in-apache-hive/470667928919