Monday, April 25, 2011

Data Warehousing ETL tutorial

1 comments

The ETL and Data Warehousing tutorial is organized into lessons representing various business intelligence scenarios, each of which describes a typical data warehousing challenge.
This guide might be considered as an ETL process and Data Warehousing knowledge base with a series of examples illustrating how to manage and implement the ETL process in a data warehouse environment.

The purpose of this tutorial is to outline and analyze the most widely encountered real life datawarehousing problems and challenges that need to be taken during the design and architecture phases of a successful data warehouse project deployment.

Going through the sample implementations of the business scenarios is also a good way to compare Business Intelligence and ETL tools and get to know the different approaches to designing the data integration process. This also gives an idea and helps identify strong and weak points of various ETL and data warehousing applications.

This tutorial shows how to use the following BI, ETL and datawarehousing tools: Datastage, SAS, Pentaho, Cognos and Teradata.
Data Warehousing & ETL Tutorial lessons

    Surrogate key generation example which includes information on business keys and surrogate keys and shows how to design an ETL process to manage surrogate keys in a data warehouse environment. Sample design in Pentaho Data Integration
    Header and trailer processing - considerations on processing files arranged in blocks consisting of a header record, body items and a trailer. This type of files usually come from mainframes, also it applies to EDI and EPIC files. Solution examples in Datastage, SAS and Pentaho Data Integration
    Loading customers - a data extract is placed on an FTP server. It is copied to an ETL server and loaded into the data warehouse. Sample loading in Teradata MultiLoad
    Data allocation ETL process case study for allocating data. Examples in Pentaho Data Integration and Cognos PowerPlay
    Data masking and scambling algorithms and ETL deployments. Sample Kettle implementation
    Site traffic analysis - a guide to creating a data warehouse with data marts for website traffic analysis and reporting. Sample design in Pentaho Kettle
    Data Quality - ETL process design aimed to test and cleanse data in a Data Warehouse. Sample outline in PDI
    XML ETL processing
newer post

What is Business Intelligence?

0 comments

Business intelligence is a broad set of applications, technologies and knowledge for gathering and analyzing data for the purpose of helping users make better business decisions.
The main challenge of Business Intelligence is to gather and serve organized information regarding all relevant factors that drive the business and enable end-users to access that knowledge easily and efficiently and in effect maximize the success of an organization.
Business intelligence produces analysis and provides in depth knowledge about performance indicators such as company's customers, competitors, business counterparts, economic environment and internal operations to help making effective and good quality business decisions.

From a technical standpoint, the most important areas that Business Intelligence (BI) covers are:
DW - Data warehousing - architecture, modeling, managing, processing
ETL process and data integration
Reporting, Information visualization and Dashboards
OLAP - Online Analytical Processing and multidimensional analysis
Data cleansing and data quality management
Performance management
Data mining, statistical analysis, forecasting
MIS - Management Information Systems
CRM - Customer Relationship Management

Etl Tools Info portal

ETL-Tools.Info portal provides information about different business intelligence tools and datawarehousing solutions, with a main focus on ETL process and tools. On our pages you will find both general articles with high-level information on various Business Intelligence applications and architectures, as well as technical documents, with a low-level description of the presented solutions and detailed tutorials.
A great attention is paid to the Datastage ETL tool and we provide a number of Datastage examples, Datastage tutorials, best practices and resolved problems with real-life examples.
There is also a wide range of information on a rapidly growing Open Source Business Intelligence market (OSBI), with emphasis on applications from the Pentaho BI family, including a Pentaho tutorial.
We also provide a SAS Guide with tutorial, which illustrates the vision of SAS on Business Intelligence, Data Warehousing and ETL process.
We have recently added the ETL case study (ETL and data warehousing course) section which represents a set of business cases, each of which illustrates a typical data warehousing problem followed by sample implementations. We analyze the cases thoroughly and propose the most efficient and appropriate approach to solving that problems by showing sample ETL process designs and DW architectures.
Microsoft users may be very interested in exploring our Excel BI crosstabs section with FAQ and sample solutions.

newer post

Buy vs. Build

0 comments

Buy vs. Build

When it comes to ETL tool selection, it is not always necessary to purchase a third-party tool. This determination largely depends on three things:

    Complexity of the data transformation: The more complex the data transformation is, the more suitable it is to purchase an ETL tool.
    Data cleansing needs: Does the data need to go through a thorough cleansing exercise before it is suitable to be stored in the data warehouse? If so, it is best to purchase a tool with strong data cleansing functionalities. Otherwise, it may be sufficient to simply build the ETL routine from scratch.
    Data volume. Available commercial tools typically have features that can speed up data movement. Therefore, buying a commercial product is a better approach if the volume of data transferred is large.

ETL Tool Functionalities

While the selection of a database and a hardware platform is a must, the selection of an ETL tool is highly recommended, but it's not a must. When you evaluate ETL tools, it pays to look for the following characteristics:

Functional capability: This includes both the 'transformation' piece and the 'cleansing' piece. In general, the typical ETL tools are either geared towards having strong transformation capabilities or having strong cleansing capabilities, but they are seldom very strong in both. As a result, if you know your data is going to be dirty coming in, make sure your ETL tool has strong cleansing capabilities. If you know there are going to be a lot of different data transformations, it then makes sense to pick a tool that is strong in transformation.

Ability to read directly from your data source: For each organization, there is a different set of data sources. Make sure the ETL tool you select can connect directly to your source data.

Metadata support: The ETL tool plays a key role in your metadata because it maps the source data to the destination, which is an important piece of the metadata. In fact, some organizations have come to rely on the documentation of their ETL tool as their metadata source. As a result, it is very important to select an ETL tool that works with your overall metadata strategy.

Popular Tools

    IBM WebSphere Information Integration (Ascential DataStage)
    Ab Initio
    Informatica
    Talend
newer post
newer post older post Home