In kettle, we can implement type 2 slowly change dimension easily by using “Dimension lookup/update”.
We have below table in a CSV file:
COUNTRY_ID,COUNTRY_NAME,REGION,LAST_UPDATE_DATE
11,Baharian,A,2015-02-15
12,China,Beijing,2013-02-15
13,USA,AZ,2014-02-20
14,CUBA,CIUDAD,2011-02-19
15,USA,Los Angeles,2011-02-18
16,Japan,TOKYO,2011-02-15
We want to put into table with below structure:
First, we have 2 elements: Text file input, Dimension lookup/update.
Text file input configuration:
Dimension lookup / update configuration:
In the Dimension Lookup / Update tab, there are several fields that we need to pay attention:
Key: This is the natural key. We need to both specify them in source and target table.
Technical key field: In target table, the surrogate key.
Version field: In target table, version in dimension.
Stream Datefield: Source table, the last_update_date which is used to indicates start / end date.
Date range start / end field: In target table, indicate to fill out the start / end date.
Lookup / update field: This indicate which field do we update, and the way we update. Insert, type 2.(Once changed then create a new row with new version. Set the end_date in old row and start_date in new row). Update, only update the value in latest version. Punch through, update all the fields with natural key.
When we run kettle for the first time, the table is like below:
Let’s change the id15 from “15,USA,Los Angeles,2011-02-18” to “15,China,Los Angeles,2011-02-19”. Run the kettle again, the table changed to below. We see a new country_id 15 is generated with new version and eff_date. Old one has end_date set. This is a typical type 2 change.
Later, let’s change id15 “15,China,Los Angeles,2011-02-19” to “15,China,Shenzhen,2011-02-20”. Table changed to below. It only updated the region on the latest version.