Monday, February 28, 2011

Slowly Changing Dimensions: A Data Warehouse Administrator’s Ongoing Challenge

A key pillar of data mart building capabilities is a dimensional framework. Ralph Kimball, a leading data warehouse consultant, describes building data marts with shared or conformed dimensions as a very effective method for rolling out an enterprise solution. For the business user, dimensions such as "customer" or "location" can be described in terms of the business rules that an organization needs to define in order to make decisions about their business (for example, "customers by location" or "sales over time"). Building data marts with common definitions delivers a consistent view of the data. The industry’s most effective ETL solutions are able to define dimensions once and make them available for reuse in subsequent data marts. This best practice methodology quickly facilitates an enterprise roll out of subsequent data marts. A dimensional framework allows data mart designers to quickly prototype and deploy a data mart solution using its dimensional framework.
Tracking Changes Over Time

In the real world, though, many dimensions change over time. But what happens to a dimension when, for example, a customer or product dimension changes? Tracking changes in dimensions, referred to as slowly changing dimensions, can raise interesting questions. Data mart designers need to add new customers or change the addresses of existing customers, the company assigns new regional hierarchies, or product descriptions and packaging change. These sorts of changes need to be reflected in the data mart dimension tables. In several cases, not only does the change need to be reflected, but the history of the changes also needs to be tracked. By remembering history, businesses are able to look at historical data and compare it to their current situation.
Slowly Changing Dimensions

The issues of maintaining the history of this data has profound implications for the design of the data warehouse. Effective ETL solutions are able to simplify the support for Type 1 and Type 2 (as per Ralph Kimball terminology) slowly changing dimensions through automated dimensional management. Furthermore, these dimensions can contain a mixture of both Type 1 and Type 2 dimensions.


Data mart designers have two choices: they can either allow dimension records to be updated in place (no history preservation), or they can create a new dimension record when changes in designated dimension attributes occur (preserving history). The designer must choose based on the users’ needs.


Figure 1: Type 1 (Overwrite Dimension Records)

The first technique does not maintain any history, but is a simple and fast procedure. Overwriting dimensional records is used when the old values are incorrect or unimportant – when there is no value in keeping the old description. For example, when a sales representative changes her last name due to marriage, there is no business reason to keep the representative’s maiden name in history. Thus, her maiden name can be overwritten.


Figure 2: Type 2 (Add Dimension Record)

Type 2 slowly changing dimensions are used to create a new row of dimension data when there are changes in an attribute whose history needs preserving. For example, do the sales for Mary Smith get credited to the new office when she transfers, or do they stay with the old office? In this case it is necessary to keep the history so Mary’s sales (and her commission check) will be assigned to the correct branch.
Surrogates and Slowly Changing Dimensions

Surrogates refer to artificial keys used to connect a dimension table to a fact table and are always used with slowly changing dimensions. Leading data mart creation tools automatically manage the creation of new surrogates and synchronization of these values in fact and dimension tables. The process of surrogate and dimension history management are now reduced to a set of rules that are associated with the dimension table columns in the warehouse. Once these rules are defined, dimensionally friendly ETL solutions automatically maintain the surrogates and updates or adds dimension table records as necessary when changes are made in the corresponding reference sources in the operational system.

Data mart/warehouse administrators that want to rid themselves of the daily headache of slowly changing dimensions need to look to the automation of this important data warehouse function by simplifying the process of handling slowly changing dimensions. For the data mart designer this translates into the ability to respond to changes and provides an efficient method for tracking history.

Peter Griffiths is vice president of Cognos, Inc.

For more information on related topics, visit the following channels:

    * DW Administration, Mgmt., Performance
    * DW Design, Methodology
    * Databases

SOURCE:http://www.information-management.com/infodirect/20010126/2998-1.html

1 comments:

Anonymous said...

So what would you recommend for a good data cleansing tool? I have heard of Google Refine but I am looking for something better..

Post a Comment

newer post older post Home