Monday, February 28, 2011

Slowly Changing Dimensions: A Data Warehouse Administrator’s Ongoing Challenge

1 comments
A key pillar of data mart building capabilities is a dimensional framework. Ralph Kimball, a leading data warehouse consultant, describes building data marts with shared or conformed dimensions as a very effective method for rolling out an enterprise solution. For the business user, dimensions such as "customer" or "location" can be described in terms of the business rules that an organization needs to define in order to make decisions about their business (for example, "customers by location" or "sales over time"). Building data marts with common definitions delivers a consistent view of the data. The industry’s most effective...
newer post

Slowly Changing Dimensions Are Not Always as Easy as 1, 2, 3

0 comments
To kick off our first column of the year, we're going to take on a challenging subject that all designers face: how to deal with changing dimensions. Unlike most OLTP systems, a major objective of a data warehouse is to track history. So, accounting for change is one of the analyst's most important responsibilities. A sales force region reassignment is a good example of a business change that may require you to alter the dimensional data warehouse. We'll discuss how to apply the right technique to account for the change historically. Hang on to your hats — this is not an easy topic. warehouse design presumes that facts,...
newer post

How to handle Slowly Changing Dimensions (SCDs) in data model design?

0 comments
There are multiple methods to handle the slowly changing dimensions. Which technique to use depends on your business requirements. The choice among these three methods are not a technical design decision since their behaviors are different. Type One: Overwite the old data with new data Using this method, you do not store the histoy. For example, that say each customer can have one salesrep at any given point in time. When the salerep of ABC Inc., changes from Sandy to Laura, Sandy was a salerep of ABC will not be kept anywhere. Any report by salesrep will assume that Laura is the salereps of ABC Inc. forever and count...
newer post

Data Warehousing - Slowly Changing DimensionsBy nature of the data warehouse, fact and dimension tables are likely to have very different transactional activity. Fact tables will be mostly inserted into as you load data. Very infrequently you might have to update the facts that were loaded incorrectly. It is even less likely that you'll ever delete rows from the fact table; the only time this might happen is if you wanted to archive the old data that is no longer relevant for the current portion of the data warehouse. Dimension tables, on the other hand are likely to see frequent updates. A classic example of this is a consumer dimension which tracks people that buy your products or services. People can change names due to marriage, divorce or another reason, they can change their title of courtesy from Ms to Mrs or to Dr. As people age they also fit a different age group and perhaps even a different education and income level. Therefore you are likely to change dimension table values often. Dimensions that are changing over time are referred to as slowly changing dimensions (SCD). There are several ways to handle SCD; most common approaches are referred to as type 1, type 2, type 3 and type 4. The easiest way to manage SCD is to simply override the existing value with the new value type 1. Indeed if a department changes the title from "Finance" to "Financial" overwriting such a value isn't likely to do much harm. However, you must weigh the consequences of overriding existing values carefully to ensure that your reports don't lose credibility. For example, suppose Mr. Smith who has near perfect credit marries Ms. Jones who has recently declared bankruptcy. Since Mr. Smith had near perfect credit before his marriage your bank has issued him a loan for $200,000. If you overwrite Mr. Smith's credit score with his new, considerably lower credit score you will make decisions from the not very distant past look very incorrect. If your data analysts happen to run reports immediately before and immediately after the credit rating was changed your data warehouse credibility will be compromised. Here is what type 1 SCD records would look like before and after change: Customer_key Customer_name Customer_city Customer_state 123 Ms. Brown Houston Texas 123 Mrs. Green Houston Texas As you can see type 1 SCD does not allow tracking of changes. Once you overwrite the value all historical reports will display Mrs. Green instead of Ms. Brown. The second and more common way of handling SCD is to create a new record when dimension attributes change - type 2. For example, when Ms. Brown becomes Mrs. Green you create a new record for that customer and mark the first record as obsolete. The second approach adds effective date and obsolete date columns to dimension tables. Doing so allows a data warehouse architect to maintain a history of SCD changes. After a change occurs the warehouse will have the following two records (one of them marked as obsolete): Customer_key Customer_name Customer_city Customer_state Effective_date Obsolete_date 123 Ms.Brown Houston Texas 1/1/2000 1/1/2005 234 Mrs.Green Houston Texas 1/1/2005 NULL The third approach is to add columns to the dimension tables in order to maintain the history of changes in a single row. In the case of consumer dimension you would create an old last name and current last name columns along with the date column that indicates when the change occurred. This type of auditing of changes is typically reserved for very few dimensions where change history tracking is critical. For example if your product prices tend to change over time and you must compare revenues based on old and new price then you might wish to use this method. Type 3 of maintaining SCD carries much overhead and is not used very frequently. After type 3 change the warehouse will still have a single record shown below: Customer_key Customer_name Customer_city Customer_state Change_date Old_name 123 Mrs.Green Houston Texas 1/1/2005 Ms.Brown Most environments that use type 3 slowly changing dimensions track 2 iterations of changes; that is you would record the current version, previous version and version before last for the changed column. Type 4 is a special way of handling type 2 changes. Type 2 usually tracks unlimited history by creating new records each time the change occurs. If you know how far back you wish to track history you can add logic to your code to delete any records that go too far back in history. For example, suppose we created an exam and offered it to our students. As time goes on we might decide to edit the exam. We could manage these edits using type 2 method, but if we accumulate 10 versions of the same exam the data might get confusing. So we might decide that we only store up-to 9 records per each exam; any time 10th version of an exam is created we'll delete the 1st version from the warehouse. SOURCE:http://sqlserverpedia.com/wiki/Data_Warehousing_-_Slowly_Changing_Dimensions

0 comments
By nature of the data warehouse, fact and dimension tables are likely to have very different transactional activity. Fact tables will be mostly inserted into as you load data. Very infrequently you might have to update the facts that were loaded incorrectly. It is even less likely that you'll ever delete rows from the fact table; the only time this might happen is if you wanted to archive the old data that is no longer relevant for the current portion of the data warehouse. Dimension tables, on the other hand are likely to see frequent updates. A classic example of this is a consumer dimension which tracks people that buy...
newer post

Slowly Changing Dimensions SCD in Dimensional Modeling

0 comments
Slowly Changing Dimensions Entities change over time. Customer demographics, product characteristics, classification rules, status of customers etc. lead to changes in the attributes of dimensions. In a transaction system, many a times the change is overwritten and track of change is lost. For example a source system may have only the latest customer PIN Code, as it is needed to send the marketing and billing statements. However, a data warehouse needs to maintain all the previous PIN Codes as well, because we need to track on how many customers move to new locations over what frequency. A key benefit for Data Warehouse...
newer post

Slowly changing dimension

0 comments
Dimension is a term in data management and data warehousing that refers to logical groupings of data such as geographical location, customer information, or product information. Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule.[1] For example, you may have a dimension in your database that tracks the sales records of your company's salespeople. Creating sales reports seems simple enough, until a salesperson is transferred from one regional office to another. How do you record such a change in your sales dimension? You could sum or...
newer post
newer post older post Home