Tuesday, January 11, 2011

Data Warehousing Concepts

The IBM Cognos tool, Decision Stream or by it's latest version name Data Manager, is one of the ETL development tools available to support data delivery. Data Manager has all of the functionality needed to extract the data from the source system, create the necessary programming to transform the data, and deliver it to the warehouse.

Extract the source data

Once the BI analyst has worked with the business user to determine the requirements necessary, an ETL programmer begins by writing a SQL statement against the source system(s) to gather all the required elements. It is a best practice to simply pull the data from the source system without attempting to make any changes to it. In the extract process, the best performance will be from a simple select statement. The Decision Stream tool calls this the data stream.

Transformations of extracted data

The temporary work table has all the data elements required to support the transformation step. The transformation step is the "meat and potatoes" of building a structured data warehouse. All of the elements extracted from the source system(s) are used in this step. The Decision Stream tool has derivations, dimensional lookups, and functions, to provide the programmer with all the tools needed to transform the data to meet the business requirements. The tool calls this the transformation model.

A derivation is a calculation type of operation. An example would be adding 19000000 to dates that are in the CYYMMDD format or multiplying the quantity by the unit price to get a total gross price. There are many supported calculation types within a Decision Stream derivation. This include but are not limited to date math, type conversion, contol structures such as if..then..else, and boolean operators.

Dimensional lookups are used to create a link to the necessary control files. An example would be using the transaction date and the transaction currency to include the active foreign exchange rate for the transaction. Dimensional lookups often use the actual value of the key data to bring back the surrogate identification of the record containing the key data in the master or control files.

A function does everything else. Often a function is used at the beginning of the process to get literal or variable data used within the process or at the end to validate the transformations were successful. An example of this would be to go get the beginning and ending date ranges used in the current process, then update those date ranges at the end of the transformation. Generating the total extracted against the total transformed is another common function.

Load the final delivery table(s)

Lastly the delivery module provides the method for the data to be delivered to a table or multiple tables. While this seems like an afterthought, it is important to structure the columns so that the most used columns are at the beginning of the delivery. It is also a best practice to avoid delivering transition data. A field that is only used as a source (the date in CYYMMDD format) should not be delivered to the final table.

Utilizing the IBM Cognos Decision Stream tool gives the ETL developer all the necessary tools to gather, manipulate, and deliver the data needed to meet the reporting needs of the business.

Sources:
IBM Cognos Decision Stream

Ralph Kimball

0 comments:

Post a Comment

newer post older post Home