HQL3

By | October 29, 2014
Share the joy
  •  
  •  
  •  
  •  
  •  
  •  

hive.mapred.mode=strict mode
By default, “order by” will transfer to only one reducer. If the data amount is huge, it may exhuast the resource of the only reducer. So, it is suggested to use “limit” keyword to limit the output amount.
When hive.mapred.mode=strict is set, hive will force to use “limit” when “order by” is used. Or it will report error.
When hive.mapred.mode=strict is set, “join..on..” should replace “where” keyword.
When hive.mapred.mode=strict is set, “partition” field should be indicated if the table has a partition.

Sort by
When hive uses more than one reducer to sort the result, using sort by can gurantee the data in each reducer is sorted.

But the whole result is not guranteed to be sorted. The final result is overlapped by different reducer result.

Create a bucket. Bucket is for enhancing the efficiency for sampling.
create table emp(id int, name string, salary int,gender string,level string)
partitioned by(date string)
clustered by(id) sorted by(salary asc) into 3 buckets
row format delimited fields terminated by ‘,’
stored as textfile