Friday, January 14, 2011

Slow Running ETL process (read side) & Table Statistics (Oracle).

0 comments
Sometimes an ETL process runs considerably slow speed. During test for the small result set it might fly but when a million rows are applied the performance takes a nosedive. There can be many reasons for slow ETL process.  The process can be slow because read, transformation, load. Lets eliminate the transformation and load for the sake of discussion.
For ETL process to be slow on read side here are some reasons .1. No indexes on joins and/or ‘where clause’ 2. Query badly written. 3. Source not analyzed.  Out of these three lets rule out 1 & 2.
In the past most of the databases had RULE based optimizer in the INIT.ORA file, but with new development and specially Data warehouses ‘CHOOSE’ optimizer is preferred. With ‘CHOOSE’ option the query uses COST based optimizer if the statistics are available for the tables in the query.
There are two methods to gather statistics 1. DBMS_STATS package, 2. ANALYZE command. Oracle does not recommend ANALYZE command going forward for various reasons (its a command, cannot analyze external tables, it gathers stats that are not essential, inaccurate stats on partitioned tables and indexes, in future ANALYZE will not support cost-based optimizer, no Monitoring for stale statistics, etc.)
DBMS_STATS.GATHER_SCHEMA_STATS (ownname =>’DWH’, options =>’GATHER AUTO’);
This Package will gather all necessary statistics automatically (there is no need to write a process to check for tables that have stale or no stats). Oracle implicitly determines which objects need new statistics, and determines how to gather those statistics. So once it is put in the ETL flow you can sleep at home and everybody will be happy.
Bottom Line:
After major data load the tables should be analyzed with an automated process. Also please do not use ANALYZE , as in future it will not collect statistics needed for COB. The code can be found here.
1This package should be used as the part of ETL workflow to make sure as batch processes are run the benefit of statistics are available to the next process in the que.
Also ETL programmers can call this package within the ETL process/procedure in the begening so that if the tables required for process are not analyzed they will be analyzed automatically. Also if the procedure is going to modify lot of data in the table then the package can be called just before the end of the procedure.

SOURCE:http://etlguru.com/blog/2006/04/
newer post

Informatica Product Line

0 comments
Informatica is a powerful ETL tool from Informatica Corporation, a leading provider of enterprise data integration software and ETL softwares.

The important products provided by Informatica Corporation is provided below:

    Power Center
    Power Mart
    Power Exchange
    Power Center Connect
    Power Channel
    Metadata Exchange
    Power Analyzer
    Super Glue

   

Power Center & Power Mart: Power Mart is a departmental version of Informatica for building, deploying, and managing data warehouses and data marts. Power center is used for corporate enterprise data warehouse and power mart is used for departmental data warehouses like data marts. Power Center supports global repositories and networked repositories and it can be connected to several sources. Power Mart supports single repository and it can be connected to fewer sources when compared to Power Center. Power Mart can extensibily grow to an enterprise implementation and it is easy for developer productivity through a codeless environment.

Power Exchange: Informatica Power Exchange as a stand alone service or along with Power Center, helps organizations leverage data by avoiding manual coding of data extraction programs. Power Exchange supports batch, real time and changed data capture options in main frame(DB2, VSAM, IMS etc.,), mid range (AS400 DB2 etc.,), and for relational databases (oracle, sql server, db2 etc) and flat files in unix, linux and windows systems.

Power Center Connect: This is add on to Informatica Power Center. It helps to extract data and metadata from ERP systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.

Power Channel: This helps to transfer large amount of encrypted and compressed data over LAN, WAN, through Firewalls, tranfer files over FTP, etc.

Meta Data Exchange: Metadata Exchange enables organizations to take advantage of the time and effort already invested in defining data structures within their IT environment when used with Power Center. For example, an organization may be using data modeling tools, such as Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc for developing data models. Functional and technical team should have spent much time and effort in creating the data model's data structures(tables, columns, data types, procedures, functions, triggers etc). By using meta deta exchange, these data structures can be imported into power center to identifiy source and target mappings which leverages time and effort. There is no need for informatica developer to create these data structures once again.

Power Analyzer: Power Analyzer provides organizations with reporting facilities. PowerAnalyzer makes accessing, analyzing, and sharing enterprise data simple and easily available to decision makers. PowerAnalyzer enables to gain insight into business processes and develop business intelligence.

With PowerAnalyzer, an organization can extract, filter, format, and analyze corporate information from data stored in a data warehouse, data mart, operational data store, or otherdata storage models. PowerAnalyzer is best with a dimensional data warehouse in a relational database. It can also run reports on data in any table in a relational database that do not conform to the dimensional model.

Super Glue: Superglue is used for loading metadata in a centralized place from several sources. Reports can be run against this superglue to analyze meta data.


Note:This is not a complete tutorial on Informatica. We will add more Tips and Guidelines on Informatica in near future. Please visit us soon to check back. To know more about Informatica, contact its official website www.informatica.com
newer post

Informatica Software Architecture illustrated

0 comments
Informatica ETL product, known as Informatica Power Center consists of 3 main components.
1. Informatica PowerCenter Client Tools:
These are the development tools installed at developer end. These tools enable a developer to
  • Define transformation process, known as mapping. (Designer)
  • Define run-time properties for a mapping, known as sessions (Workflow Manager)
  • Monitor execution of sessions (Workflow Monitor)
  • Manage repository, useful for administrators (Repository Manager)
  • Report Metadata (Metadata Reporter)
2. Informatica PowerCenter Repository:
Repository is the heart of Informatica tools. Repository is a kind of data inventory where all the data related to mappings, sources, targets etc is kept. This is the place where all the metadata for your application is stored. All the client tools and Informatica Server fetch data from Repository. Informatica client and server without repository is same as a PC without memory/harddisk, which has got the ability to process data but has no data to process. This can be treated as backend of Informatica.
3. Informatica PowerCenter Server:
Server is the place, where all the executions take place. Server makes physical connections to sources/targets, fetches data, applies the transformations mentioned in the mapping and loads the data in the target system.
This architecture is visually explained in diagram below:
Sources




Standard: RDBMS, Flat Files, XML, ODBC


Applications: SAP R/3, SAP BW, PeopleSoft, Siebel, JD Edwards, i2


EAI: MQ Series, Tibco, JMS, Web Services


Legacy: Mainframes (DB2, VSAM, IMS, IDMS, Adabas)AS400 (DB2, Flat File)


Remote Sources
Informatica Client Tools, Power Center Repository, Power Center Server
Targets




Standard: RDBMS, Flat Files, XML, ODBC


Applications: SAP R/3, SAP BW, PeopleSoft, Siebel, JD Edwards, i2


EAI: MQ Series, Tibco, JMS, Web Services


Legacy: Mainframes (DB2)AS400 (DB2)


Remote Targets
This is the sufficient knowledge to start with Informatica. So lets go straight to development in Informatica.

 source:http://informatica.techtiks.com/informatica_architecture.html
newer post

Informatica PowerCenter Components

0 comments
Informatica PowerCenter is not just a tool but an end-to-end data processing and data integration environment. It facilitates organizations to collect, centrally process and redistribute data. It can be used just to integrate two different systems like SAP and MQ Series or to load data warehouses or Operational Data Stores (ODS).
Now Informatica PowerCenter also includes many add-on tools to report the data being processed, business rules applied and quality of data before and after processing.
To facilitate this PowerCenter is divided into different components:

  • PowerCenter Domain: As Informatica says “The Power Center domain is the primary unit for management and administration within PowerCenter”. Doesn’t make much sense? Right... So here is a simpler version. Power Center domain is the collection of all the servers required to support Power Center functionality. Each domain has gateway (called domain server) hosts. Whenever you want to use Power Center services you send a request to domain server. Based on request type it redirects your request to one of the Power Center services.
  • PowerCenter Repository: Repository is nothing but a relational database which stores all the metadata created in Power Center. Whenever you develop mapping, session, workflow, execute them or do anything meaningful (literally), entries are made in the repository.
  • Integration Service: Integration Service does all the real job. It extracts data from sources, processes it as per the business logic and loads data to targets.
  • Repository Service: Repository Service is the one that understands content of the repository, fetches data from the repository and sends it back to the requesting components (mostly client tools and integration service)
  • PowerCenter Client Tools: The PowerCenter Client consists of multiple tools. They are used to manage users, define sources and targets, build mappings and mapplets with the transformation logic, and create workflows to run the mapping logic. The PowerCenter Client connects to the repository through the Repository Service to fetch details. It connects to the Integration Service to start workflows. So essentially client tools are used to code and give instructions to PowerCenter servers.
  • PowerCenter Administration Console: This is simply a web-based administration tool you can use to administer the PowerCenter installation.
Informatica Components, Repository Manager, Integration Service, Informatica Domain
There are some more not-so-essential-to-know components discussed below:

  • Web Services Hub: Web Services Hub exposes PowerCenter functionality to external clients through web services.
  • SAP BW Service: The SAP BW Service extracts data from and loads data to SAP BW.
  • Data Analyzer: Data Analyzer is like a reporting layer to perform analytics on data warehouse or ODS data.
  • Metadata Manager: Metadata Manager is a metadata management tool that you can use to browse and analyze metadata from disparate metadata repositories. It shows how the data is acquired, what business rules are applied and where data is populated in readable reports.
  • PowerCenter Repository Reports: PowerCenter Repository Reports are a set of prepackaged Data Analyzer reports and dashboards to help you analyze and manage PowerCenter metadata.
newer post

Informatica Transformations

0 comments
Normalizer Transformation
Active & Connected. The Normalizer transformation processes multiple-occurring columns or multiple-occurring groups of columns in each source row and returns a row for each instance of the multiple-occurring data. It is used mainly with COBOL sources where most of the time data is stored in de-normalized format.

You can create following Normalizer transformation:
*VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier transformation for a COBOL source. VSAM stands for Virtual Storage Access Method, a file access method for IBM mainframe.
*Pipeline Normalizer transformation. A transformation that processes multiple-occurring data from relational tables or flat files. This is default when you create a normalizer transformation.
Components: Transformation, Ports, Properties, Normalizer, Metadata Extensions.

Rank Transformation

Active & Connected. It is used to select the top or bottom rank of data. You can use it to return the largest or smallest numeric value in a port or group or to return the strings at the top or the bottom of a session sort order. For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products. As an active transformation, it might change the number of rows passed through it. Like if you pass 100 rows to the Rank transformation, but select to rank only the top 10 rows, passing from the Rank transformation to another transformation. You can connect ports from only one transformation to the Rank transformation. You can also create local variables and write non-aggregate expressions.

Router Transformation

Active & Connected. It is similar to filter transformation because both allow you to apply a condition to test data. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition and route it to a default output group.
If you need to test the same input data based on multiple conditions, use a Router transformation in a mapping instead of creating multiple Filter transformations to perform the same task. The Router transformation is more efficient.
       

Sequence Generator Transformation
Passive & Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing primary keys.

It has two output ports: NEXTVAL and CURRVAL. You cannot edit or delete these ports. Likewise, you cannot add ports to the transformation. NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.
You can make a Sequence Generator reusable, and use it in multiple mappings. You might reuse a Sequence Generator when you perform multiple loads to a single target.
For non-reusable Sequence Generator transformations, Number of Cached Values is set to zero by default, and the Integration Service does not cache values during the session.For non-reusable Sequence Generator transformations, setting Number of Cached Values greater than zero can increase the number of times the Integration Service accesses the repository during the session. It also causes sections of skipped values since unused cached values are discarded at the end of each session.

For reusable Sequence Generator transformations, you can reduce Number of Cached Values to minimize discarded values, however it must be greater than one. When you reduce the Number of Cached Values, you might increase the number of times the Integration Service accesses the repository to cache values during the session.

Sorter Transformation

Active & Connected transformation. It is used sort data either in ascending or descending order according to a specified sort key. You can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be distinct. When you create a Sorter transformation in a mapping, you specify one or more ports as a sort key and configure each sort key port to sort in ascending or descending order.        

Source Qualifier Transformation
Active & Connected transformation. When adding a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier is used to join data originating from the same source database, filter rows when the Integration Service reads source data, Specify an outer join rather than the default inner join and to specify sorted ports.
It is also used to select only distinct values from the source and to create a custom query to issue a special SELECT statement for the Integration Service to read source data

SQL Transformation
Active/Passive & Connected transformation. The SQL transformation processes SQL queries midstream in a pipeline. You can insert, delete, update, and retrieve rows from a database. You can pass the database connection information to the SQL transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the query and returns rows and database errors.

Stored Procedure Transformation
Passive & Connected or UnConnected transformation. It is useful to automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc. The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements.

Transaction Control Transformation
Active & Connected. You can control commit and roll back of transactions based on a set of rows that pass through a Transaction Control transformation. Transaction control can be defined within a mapping or within a session.
Components: Transformation, Ports, Properties, Metadata Extensions.

Union Transformation
Active & Connected. The Union transformation is a multiple input group transformation that you use to merge data from multiple pipelines or pipeline branches into one pipeline branch. It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows.
Rules
1) You can create multiple input groups, but only one output group.
2) All input groups and the output group must have matching ports. The precision, datatype, and scale must be identical across all groups.
3) The Union transformation does not remove duplicate rows. To remove duplicate rows, you must add another transformation such as a Router or Filter transformation.
4) You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation.
5) The Union transformation does not generate transactions.
Components: Transformation tab, Properties tab, Groups tab, Group Ports tab.

Unstructured Data Transformation
Active/Passive and connected. The Unstructured Data transformation is a transformation that processes unstructured and semi-structured file formats, such as messaging formats, HTML pages and PDF documents. It also transforms structured formats such as ACORD, HIPAA, HL7, EDI-X12, EDIFACT, AFP, and SWIFT.
Components: Transformation, Properties, UDT Settings, UDT Ports, Relational Hierarchy.

Update Strategy Transformation
Active & Connected transformation. It is used to update data in target table, either to maintain history of data or recent changes. It flags rows for insert, update, delete or reject within a mapping.

XML Generator Transformation
Active & Connected transformation. It lets you create XML inside a pipeline. The XML Generator transformation accepts data from multiple ports and writes XML through a single output port.

XML Parser Transformation
Active & Connected transformation. The XML Parser transformation lets you extract XML data from messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or databases. The XML Parser transformation functionality is similar to the XML source functionality, except it parses the XML in the pipeline.

XML Source Qualifier Transformation
Active & Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources. has one input or output port for every column in the XML source.

External Procedure Transformation

Active & Connected/UnConnected transformation. Sometimes, the standard transformations such as Expression transformation may not provide the functionality that you want. In such cases External procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping.
       

Advanced External Procedure Transformation
Active & Connected transformation. It operates in conjunction with procedures, which are created outside of the Designer interface to extend PowerCenter/PowerMart functionality. It is useful in creating external transformation applications, such as sorting and aggregation, which require all input rows to be processed before emitting any output rows.

SOURCE:http://informatica.techtiks.com/informatica_transformations_3.html
newer post

Tuesday, January 11, 2011

Sigmaways Inc

0 comments
Who we are:
Sigmaways is a high end consulting and business-driven IT training establishment founded by a team of veterans who are passionate about sharing the wisdom of knowledge.
Our team has extensive experience in design, implementation and management of enterprise systems and technical training for more than twenty years.
Our Instructors are the finest in the industry, having prior work experience from management consulting firm to Fortune 500 to Startups.
We prepare individuals to lead and lead with skills. We mentor and train individuals to become the very best in their field of study by providing practical guidance from industry experts.
Our workshop based training continually assesses student progress through lab exercises, quizzes, assigned presentations, and one-on-one interviews.
Our training is based on real world business use cases, the very technologies that industry experts implement in real world projects.
We provide suite of technical and business courses to technical and non-technical managers in accelerated formats. Our lab is a virtual training ground where we simulate all IT business scenarios.
Our OBIEE training program will be starting in the first week of October.
Please check the course details for OBIEE training program
Oracle BI Platform (Server Architect / Repository Developer)
Introduction to Dimensional Modeling
Overview of OBEE Platform
Platform Vs Applications Overview
Installation & Configuration
Building Physical layer
Building Business Model Layer
Building Presentation Layer
Creating sample reports
Using flat files and Databases
Building Fact Measures
Building Dimensional Hierarchies
Time Series
Building Level based measures
Adding multiple sources
Aggregate Awareness
Use of Variables
Using Initialization Blocks
Configuring Object Level Security
Cache Management
Concepts of Partition and Fragmentation
Filtering Data
Best practices
Implementation Procedures
Oracle BI Report Developer
Introduction to Dimensional Modeling
Overview of OBEE Platform
Platform Vs Applications Overview
Installation & Configuration
Exploring Answers
Creating Reports
Type of Reports
Guided Navigation
Type of Reports
Guided Navigation
Manage Web Administration
Configuring Delivers / iBots
Best practices
Oracle BI Application Tools (Ware House Developer)
Introduction to Data warehouse
Overview of OBEE Platform
Platform Vs Applications Overview
Introduction to Informatica
Introdution to Data Warehouse
Admin Console (DAC)
Installating & Configuring Informatica
Creating Informatica Mappings
Creating Informatica Workflows
Extract and Loading Data Warehouse
Using Informatica
Data Modelling
Extract Mappings (SDE)
Change Data Capture (CDC)
Executing Mappings (Extract into Staging)
Load Mappings (SIL)
Load into Data warehouse Targets
Using Mapping parameters & Session
Parameters
Introduction to Slowly changing Dimensions
Creating SCD Mappings
Introduction to Aggregates
Installing and Configuring DAC
Exploring DAC Meta Data
Building Custom DAC Meta Data (Stand alone)
Running ETL using DAC
Best Practices
Implementation Procedures
Exploring Oracle BI Applications (DW Application Developer)
Introduction to Data warehouse
Overview of OBEE Platform
Platform Vs Applications Overview
Installation & Configuration
Exploring OOTB Informatica Metadata
Exploring DAC Meta Data
System Configuration
Source Specific Configuration (EBS)
Functional Configuration (Order Management)
Running ETL Workshop
Exploring OOTB Repository
Exploring OOTB Reports / Dashboards
Upgrade Best Practices
Implementation Procedures
The above course can also be divided into 2 separate programs
1. OBIEE Platform
2. OBIEE Applications
The fees structure for these programs will be $750 each.
If you want to take a both the programs then there will be around 20 discount and the fees will be $1200. The batch starts soo so please get in touch with me ASAP.

source:http://www.training-classes.com/providers/00/68/6859_sigmaways_inc.php
newer post

8 Understanding Performance and Advanced ETL Concepts

0 comments
This chapter includes the following topics:

Best Practices for Designing PL/SQL Mappings

This section addresses PL/SQL mapping design and includes:
Warehouse Builder generates code for PL/SQL mappings that meet the following criteria:
  • The output code of each operator satisfies the input code requirement of its next downstream operator.
  • If the mapping contains an operator that generates only PL/SQL output, all downstream dataflow operators must also be implemented by PL/SQL. You can use SQL operators in such a mapping only after loading the PL/SQL output to a target.
As you design a mapping, you can evaluate its validity by taking note of the input and output code types for each operator in the mapping. For example, you can see that the mapping in Figure 8-1 is invalid because the Match-Merge operator MM generates PL/SQL output but the subsequent Join operator accepts SQL input only.
Figure 8-1 Mapping Violates Input Requirement for Join Operator
This illustration is described in the surrounding text.
Description of "Figure 8-1 Mapping Violates Input Requirement for Join Operator"
To achieve the desired results for the mapping, consider joining the source tables before performing the Match-Merge, as shown in Figure 8-2, or loading the results from the Match-Merge to a staging table before performing the join, as shown in Figure 8-3.
Figure 8-2 Valid Mapping Design with Sources Joined Before Match-Merge
This illustration is described in the surrounding text.
Description of "Figure 8-2 Valid Mapping Design with Sources Joined Before Match-Merge"
Figure 8-3 Valid Mapping Design with Staging Table
This illustration is described in the surrounding text.
Description of "Figure 8-3 Valid Mapping Design with Staging Table"
Table 8-1 and Table 8-2 list the implementation types for each Warehouse Builder operator. These tables also indicate whether or not PL/SQL code includes the operation associated with the operator in the cursor. This information is relevant in determining which operating modes are valid for a given mapping design. It also determines what auditing details are available during error handling.
Table 8-1 Source-Target Operators Implementation in PL/SQL Mappings
Operator Implementation Types Valid in Set Based Mode Valid in Row Based Mode Valid in Row Based (Target Only)
Source Operators: Tables, Cubes, Views, External Tables.
SQL
Yes.
Yes.
Yes. Part of cursor.
Target Operators: Tables, Cubes, Views,
SQL
PL/SQL
Yes, except when loading = DELETE or loading= UPDATE and database is not 10g or higher.
Yes.
Yes. Not part of cursor.
Flat File as source
For PL/SQL, create External Table.
Yes.
Yes.
Yes. Part of the cursor.
Flat File as target
SQL
Yes, except when loading = DELETE or loading= UPDATE and database is not 10g or higher.
Yes.
Yes. Not part of cursor.
Advanced Queue as source
SQL
Yes.
Yes.
Yes, part of cursor.
Advanced Queue as target
SQL
Yes, except when loading = DELETE or loading= UPDATE and database is not 10g or higher.
Yes.
Yes. Not part of cursor
Sequence as source
SQL
Yes.
Yes.
Yes, part of cursor.
Table 8-2 Data Flow Operator Implementation in PL/SQL Mappings
Operator Name Implementation Types Valid in Set Based Mode Valid in Row Based Mode Valid in Row Based (Target Only) Mode
Aggregators
SQL
Yes.
Yes, only if part of the cursor.
Yes, only if part of the cursor.
Constant Operator
PL/SQL
SQL
Yes.
Yes.
Yes.
Data Generator
SQL*Loader Only
N/A
N/A
N/A
Deduplicator
SQL
Yes.
Yes, only if part of the cursor.
Yes, only if part of the cursor.
Expression
SQL
PL/SQL
Yes.
Yes.
Yes.
Filter
SQL
PL/SQL
Yes.
Yes.
Yes.
Joiner
SQL
Yes.
Yes, only if part of the cursor.
Yes, only if part of the cursor.
Key Lookup
SQL
Yes.
Yes, only if part of the cursor.
Yes, only if part of the cursor.
Mapping Input Parameter
SQL
PL/SQL
Yes.
Yes.
Yes.
Mapping Output Parameter
SQL
PL/SQL
Yes.
Yes.
Yes.
Match-Merge
SQL input
PL/SQL output
(PL/SQL input from XREF group only)
No.
Yes.
Yes. Not part of cursor.
Name-Address
PL/SQL
No.
Yes.
Yes. Not part of cursor.
Pivot
SQL
PL/SQL
Yes.
Yes.
Yes.
Post-Mapping Process
Irrelevant
Yes, independent of dataflow.
Yes.
Yes.
Pre-Mapping Process
Irrelevant
Yes, independent of dataflow.
Yes.
Yes.
Set
SQL
Yes.
Yes, only if part of the cursor.
Yes, only if part of the cursor.
Sorter
SQL
Yes.
Yes, only if part of the cursor.
Yes, as part of the cursor.
Splitter
SQL
PL/SQL
Yes.
Yes.
Yes.
Table Function
SQL or PL/SQL input
SQL output only
Yes.
Yes.
Yes.
Transformation as a procedure
PL/SQL
No.
Yes.
Yes. Not part of cursor.
Transformation as a function that does not perform DML
SQL
PL/SQL
Yes.
Yes.
Yes, included in the cursor.

Set Based Versus Row Based Operating Modes

For mappings with a PL/SQL implementation, select one of the following operating modes:
The default operating mode you select depends upon the performance you expect, the amount of auditing data you require, and how you design the mapping. Mappings have at least one and as many as three valid operating modes, excluding the options for failing over to row based modes. During code generation, Warehouse Builder generates code for the specified default operating mode as well as the deselected modes. Therefore, at runtime, you can select to run in the default operating mode or any one of the other valid operating modes.
The types of operators in the mapping may limit the operating modes you can select. As a general rule, mappings run in set based mode can include any of the operators except for Match-Merge, Name-Address, and Transformations used as procedures. Although you can include any of the operators in row based and row based (target only) modes, there are important restrictions on how you use SQL based operators such as Aggregators, Joins, and Key Lookups. To use SQL based operators in either of the row based modes, ensure that the operation associated with the operator can be included in the cursor.
These general rules are explained in the following sections.

Set based

In set based mode, Warehouse Builder generates a single SQL statement that processes all data and performs all operations. Although processing data as a set improves performance, the auditing information available is limited. Runtime Auditing is limited to reporting to the execution error only. In set based mode, you cannot view details on which rows contain errors.
Figure 8-4 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in set based operating mode. TAB1, FLTR, and TAB2 are processed as a set using SQL.
Figure 8-4 Simple Mapping Run in Set Based Mode
This illustration is described in the surrounding text.
Description of "Figure 8-4 Simple Mapping Run in Set Based Mode"
To correctly design a mapping for the set based mode, avoid operators that require row by row processing such as Match-Merge and Name-Address operators. If you include an operator in the dataflow that cannot be performed in SQL, Warehouse Builder does not generate set based code and issues an error when you execute the package in set based mode.
For target operators in a mapping, the loading types INSERT/UPDATE and UPDATE/INSERT are always valid for set based mode. Warehouse Builder support s UPDATE loading in set based mode only when the Oracle Database is 10g or higher. Warehouse Builder does not support DELETE loading in set based mode. For a complete listing of how Warehouse Builder handles operators in set based mappings, see Table 8-2.

Row based

In row based mode, Warehouse Builder generates statements that process data row by row. The select statement is in a SQL cursor. All subsequent statements are PL/SQL. You can access full runtime auditing information for all operators performed in PL/SQL and only limited information for operations performed in the cursor.
Table 8-2 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in row based operating mode. TAB1 is included in the cursor and processed as a set using SQL. FLTR and TAB2 are processed row by row using PL/SQL.
Figure 8-5 Simple Mapping Run in Row Based Mode
This illustration is described in the surrounding text.
Description of "Figure 8-5 Simple Mapping Run in Row Based Mode"
If the mapping includes any SQL based operators, which cannot be performed in PL/SQL, Warehouse Builder attempts to generate code with those operations in the cursor. To generate valid row based code, design your mapping such that, if you include any of the following SQL based operators, Warehouse Builder can include the operations in the cursor:
  • Aggregation
  • Deduplicator
  • Join
  • Key Lookup
  • Sequence
  • Set
  • Sorter
For the preceding operators to be included in the cursor, do not directly precede it by an operator that generates PL/SQL code. In other words, you cannot run the mapping in row-based mode if it contains a Transformation implemented as procedure, a Flat File used as a source, a Match-Merge, or Name-Address operator directly followed by any of the seven SQL based operators. For the design to be valid, include a staging table between the PL/SQL generating operator and the SQL based operator.

Row based (Target Only)

In row based (Target Only) mode, Warehouse Builder generates a cursor select statement and attempts to include as many operations as possible in the cursor. For each target, Warehouse Builder inserts each row into the target separately. You can access full runtime auditing information for all operators performed in PL/SQL and only limited information for operations performed in the cursor. Use this mode when you expect fast set based operations to extract and transform the data but need extended auditing for the loading the data, which is where errors are likely to occur.
Table 8-2 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in row based (target only) operating mode. TAB1 and FLTR are included in the cursor and processed as a set using SQL. TAB2 is processed row by row.
Figure 8-6 Simple Mapping Run in Row Based (Target Only) Mode
This illustration is described in the surrounding text.
Description of "Figure 8-6 Simple Mapping Run in Row Based (Target Only) Mode"
Row based (target only) places the same restrictions on SQL based operators as the row based operating mode. Additionally, for mappings with multiple targets, Warehouse Builder generates code with a cursor for each target.

About Committing Data in Warehouse Builder

There are two major approaches to committing data in Warehouse Builder. You can commit or rollback data based on the mapping design. To do this, use one of the commit control methods described in "Committing Data Based on Mapping Design".
Alternatively, for PL/SQL mappings, you can commit or rollback data independently of the mapping design. Use a process flow to commit the data or establish your own method as described in "Committing Data Independently of Mapping Design".

Committing Data Based on Mapping Design

By default, Warehouse Builder loads and then automatically commits data based on the mapping design. For PL/SQL mappings you can override the default setting and control when and how Warehouse Builder commits data. You have the following options for committing data in mappings:
Automatic: This is the default setting and is valid for all mapping types. Warehouse Builder loads and then automatically commits data based on the mapping design. If the mapping has multiple targets, Warehouse Builder commits and rolls back each target separately and independently of other targets. Use the automatic commit when the consequences of multiple targets being loaded unequally are not great or are irrelevant.
Automatic Correlated: Automatic correlated commit is a specialized type of automatic commit that applies to PL/SQL mappings with multiple targets only. Warehouse Builder considers all targets collectively and commits or rolls back data uniformly across all targets. Use the correlated commit when it is important to ensure that every row in the source impacts all affected targets uniformly. For more information about correlated commit, see "Committing Data from a Single Source to Multiple Targets".
Manual: Select manual commit control for PL/SQL mappings that you want to interject complex business logic, perform validations, or run other mappings before committing data. For examples, see "Embedding Commit Logic into the Mapping" and "Committing Data Independently of Mapping Design".

Committing Data from a Single Source to Multiple Targets

If you want to populate multiple targets based on a common source, you may also want to ensure that every row from the source impacts all affected targets uniformly.
Figure 8-7 shows a PL/SQL mapping that illustrates this case. The target tables all depend upon the source table. If a row from SOURCE causes changes in multiple targets, for instance TARGET_1 and TARGET_2, then Warehouse Builder should commit the appropriate data to both effected targets at the same time. If this relationship is not maintained when you run the mapping again, the data can becomes inaccurate and possibly unusable.
Figure 8-7 Mapping with Multiple Targets Dependent on One Source
This illustration is described in the surrounding text.
Description of "Figure 8-7 Mapping with Multiple Targets Dependent on One Source"
If the number of rows from SOURCE is relatively small, maintaining the three targets may not be difficult. Manually maintaining targets dependent on a common source, however, becomes more tedious as you increase the number of rows from the source, or as you design more complex mappings with more targets and transformations.
To ensure that every row in the source properly impacts every target, configure the mapping to use the correlated commit strategy.
Using the Automatic Correlated Commit Strategy
In set based mode, correlated commit may impact the size of your rollback segments. Space for rollback segments may be a concern when you merge data (insert/update or updated/insert).
Correlated commit operates transparently with PL/SQL bulk processing code.
The correlated commit strategy is not available for mappings run in any mode that are configured for Partition Exchange Loading or include an Advanced Queue, Match-Merge, or Table Function operator.

Automatic Commit versus Automatic Correlated Commit

The combination of the commit strategy and operating mode determines mapping behavior. Table 8-3 shows the valid combinations you can select.
Table 8-3 Valid Commit Strategies for Operating Modes
Operating Mode Automatic Correlated Commit Automatic Commit
Set based
Valid
Valid
Row based
Valid
Valid
Row based (target only)
Not Applicable
Valid
Correlated commit is not applicable for row based (target only). By definition, this operating mode places the cursor as close to the target as possible. In most cases, this results in only one target for each select statement and negates the purpose of committing data to multiple targets. If you design a mapping with the row based (target only) and correlated commit combination, Warehouse Builder runs the mapping but does not perform the correlated commit.
To understand the effects each operating mode and commit strategy combination has on a mapping, consider the mapping from Figure 8-7. Assume the data from source table equates to 1,000 new rows. When the mapping runs successfully, Warehouse Builder loads 1,000 rows to each of the targets. If the mapping fails to load the 100th new row to Target_2, you can expect the following results, ignoring the influence from other configuration settings such as Commit Frequency and Number of Maximum Errors:
  • Set based/ Correlated Commit: A single error anywhere in the mapping triggers the rollback of all data. When Warehouse Builder encounters the error inserting into Target_2, it reports an error for the table and does not load the row. Warehouse Builder rolls back all the rows inserted into Target_1 and does not attempt to load rows to Target_3. No rows are added to any of the target tables. For error details, Warehouse Builder reports only that it encountered an error loading to Target_2.
  • Row based/ Correlated Commit: Beginning with the first row, Warehouse Builder evaluates each row separately and loads it to all three targets. Loading continues in this way until Warehouse Builder encounters an error loading row 100 to Target_2. Warehouse Builder reports the error and does not load the row. It rolls back the row 100 previously inserted into Target_1 and does not attempt to load row 100 to Target_3. Next, Warehouse Builder continues loading the remaining rows, resuming with loading row 101 to Target_1. Assuming Warehouse Builder encounters no other errors, the mapping completes with 999 new rows inserted into each target. The source rows are accurately represented in the targets.
  • Set based/ Automatic Commit: When Warehouse Builder encounters the error inserting into Target_2, it does not load any rows and reports an error for the table. It does, however, continue to insert rows into Target_3 and does not roll back the rows from Target_1. Assuming Warehouse Builder encounters no other errors, the mapping completes with one error message for Target_2, no rows inserted into Target_2, and 1,000 rows inserted into Target_1 and Target_3. The source rows are not accurately represented in the targets.
  • Row based/Automatic Commit: Beginning with the first row, Warehouse Builder evaluates each row separately for loading into the targets. Loading continues in this way until Warehouse Builder encounters an error loading row 100 to Target_2 and reports the error. Warehouse Builder does not roll back row 100 from Target_1, does insert it into Target_3, and continues to load the remaining rows. Assuming Warehouse Builder encounters no other errors, the mapping completes with 999 rows inserted into Target_2 and 1,000 rows inserted into each of the other targets. The source rows are not accurately represented in the targets.

Embedding Commit Logic into the Mapping

For PL/SQL mappings only, you can embed commit logic into the mapping design by adding a pre or post mapping operator with SQL statements to commit and rollback data. When you run the mapping, Warehouse Builder commits or rollback data based solely on the SQL statements you provide in the pre or post mapping operator.
Use these instructions to implement a business rule that is tedious or impossible to design given existing Warehouse Builder mapping operators. For example, you may want to verify the existence of a single row in a target. Write the required logic in SQL and introduce that logic to the mapping through a pre or post mapping operator.
To include commit logic in the mapping design:
  1. Design the mapping to include a pre or post mapping operator. Use one of these operators to introduce commit and rollback SQL statements.
  2. Configure the mapping with Commit Control set to Manual.
    In the Project Explorer, right-click the mapping and select Configure. Under Code Generation Options, select Commit Control to Manual.
    To understand the implications of selecting to commit data manually, refer to "About Manual Commit Control".
  3. Deploy the mapping.
  4. Run the mapping.
    Warehouse Builder executes the mapping but does not commit data until processing the commit logic you wrote in the pre or post mapping operator.

Committing Data Independently of Mapping Design

You may want to commit data independently of the mapping design for any of the following reasons:
  • Running Multiple Mappings Before Committing Data: You may want to run multiple mappings without committing data until successfully running and validating all mappings. This can be the case when you have separate mappings for loading dimensions and cubes.
  • Maintaining targets more efficiently: If bad data is loaded and committed to a very large target, it can be difficult and time consuming to repair the damage. To avoid this, first check the data and then decide whether to issue a commit or rollback command.
The first step to achieve these goals is to configure the mapping with commit control set to Manual.
About Manual Commit Control
Manual commit control enables you to specify when Warehouse Builder commits data regardless of the mapping design. Manual commit control does not affect auditing statistics. This means that you can view the number of rows inserted and other auditing information before issuing the commit or rollback command.
When using manual commit, be aware that this option may have performance implications. Mappings that you intend to run in parallel maybe be executed serially if the design requires a target to be read after being loaded. This occurs when moving data from a remote source or loading to two targets bound to the same table.
When you enable manual commit control, Warehouse Builder runs the mapping with PEL switched off.

Running Multiple Mappings Before Committing Data

This section provides two sets of instructions for committing data independent of the mapping design. The first set describes how to run mappings and then commit data in a SQL Plus session. Use these instructions to test and debug your strategy of running multiple mappings and then committing the data. Then, use the second set of instructions to automate the strategy.
Both sets of instructions rely upon the use of the main procedure generated for each PL/SQL mapping.
Main Procedure
The main procedure is a procedure that exposes the logic for launching mappings in Warehouse Builder. You can employ this procedure in PL/SQL scripts or use it in interactive SQL Plus sessions.
When you use the main procedure, you must specify one required parameter, p_status. And you can optionally specify other parameters relevant to the execution of the mapping as described in Table 8-4. Warehouse Builder uses the default setting for any optional parameters that you do not specify.
Table 8-4 Parameter for the Main Procedure
Parameter Name Description
p_status
Use this required parameter to write the status of the mapping upon completion. It operates in conjunction with the predefined variable called status.
The status variable is defined such that OK indicates the mapping completed without errors. OK_WITH_WARNINGS indicates the mapping completed with user errors. FAILURE indicates the mapping encountered a fatal error.
p_operating_mode
Use this optional parameter to pass in the Default Operating Mode such as SET_BASED.
p_bulk_size
Use this optional parameter to pass in the Bulk Size.
p_audit_level
Use this optional parameter to pass in the Default Audit Level such as COMPLETE.
p_max_no_of_errors
Use this optional parameter to pass in the permitted Maximum Number of Errors.
p_commit_frequency
Use this optional parameter to pass in the Commit Frequency.

Committing Data at Runtime

For PL/SQL mappings only, you can run mappings and issue commit and rollback commands from the SQL Plus session. Based on your knowledge of SQL Plus and the Main Procedure, you can manually run and validate multiple mappings before committing data.
To commit data manually at runtime:
  1. Design the PL/SQL mappings. For instance, create one mapping to load dimensions and create a separate mapping to load cubes.
    These instructions are not valid for SQL*Loader and ABAP mappings.
  2. Configure both mappings with Commit Control set to Manual.
    In the Project Explorer, right-click the mapping and select Configure. Under Code Generation Options, set Commit Control to Manual.
  3. Generate each mapping.
  4. From a SQL Plus session, issue the following command to execute the first mapping called map1 in this example:
    var status varchar2(30);
    execute map1.main(:status);
    The first line declares the predefined status variable described in Table 8-4. In the second line, p_status is set to the status variable. When map1 completes, SQL Plus displays the mapping status such as OK.
  5. Execute the second mapping, in this example, the cubes mappings called map2.
    You can run the second in the same way you ran the previous map. Or, you can supply additional parameters listed in Table 8-4 to dictate how to run the map2 in this example:
    map2.main (p_status => :status, \
    p_operating_mode => 'SET_BASED', \
    p_audit_level => 'COMPLETE') ;
  6. Verify the results from the execution of the two mappings and send either the commit or rollback command.
  7. Automate your commit strategy as described in "Committing Mappings Using the Process Flow Editor".

Committing Mappings Using the Process Flow Editor

For PL/SQL mappings only, you can commit or rollback mappings together. Based on your knowledge of the Sqlplus activity, the Main Procedure, and writing PL/SQL scripts, you can use process flows to automate logic that commits data after all mappings complete successfully or rollback the data if any mapping fails.
To commit multiple mappings using a process flow:
  1. Design the PL/SQL mappings.
    These instructions are not valid for SQL*Loader and ABAP mappings.
  2. Ensure each mapping is deployed to the same schema.
    All mappings must have their locations pointing to the same schema. You can achieve this by designing the mappings under the same target module. Or, for multiple target modules, ensure that the locations point to the same schema.
  3. Configure each mapping with Commit Control set to Manual.
    In the Project Explorer, right-click the mapping and select Configure. Under Code Generation Options, set Commit Control to Manual.
  4. Design a process flow using a sqlplus activity instead of multiple mapping activities.
    In typical process flows, you add a mapping activity for each mapping and the process flow executes an implicit commit after each mapping activity. However, in this design, do not add mapping activities. Instead, add a single sqlplus activity.
  5. Write a PL/SQL script that uses the main procedure to execute each mapping. The following script demonstrates how to run the next mapping only if the initial mapping succeeds.
    declare status varchar2(30);begin map1.main(status); if status!='OK' then rollback else map2.main(status); if status!='OK' then rollback; else commit; end if; end if;end if;
  6. Paste your PL/SQL script into the sqlplus activity.
    In the editor explorer, select SCRIPT under the sqlplus activity and then double-click Value in the object inspector shown in Figure 8-8.
    Figure 8-8 Specifying a Script in the Sqlplus Activity
    This illustration is described in the surrounding text.
    Description of "Figure 8-8 Specifying a Script in the Sqlplus Activity"
  7. Optionally apply a schedule to the process flow as described in "Process for Defining and Using Schedules".
  8. Deploy the mappings, process flow, and schedule if you defined one.

Ensuring Referential Integrity in PL/SQL Mappings

When you design mappings with multiple targets, you may want to ensure that Warehouse Builder loads the targets in a specific order. This is the case when a column in one target derives its data from another target.
To ensure referential integrity in PL/SQL mappings:
  1. Design a PL/SQL mapping with multiple targets.
  2. Optional step: Define a parent/child relationship between two of the targets by specifying a foreign key.
    A foreign key in the child table must refer to a primary key in the parent table. If the parent does not have a column defined as a primary key, you must add a column and define it as the primary key. For an example of how to do this, see "Using Conventional Loading to Ensure Referential Integrity in SQL*Loader Mappings".
  3. In the mapping properties, view the Target Load Order property.
    If you defined a foreign key relationship in the previous step, Warehouse Builder calculates a default loading order that loads parent targets before children. If you did not define a foreign key, use the Target Load Order dialog shown in to define the loading order.
    Figure 8-9 Target Load Order Dialog
    This illustration is described in the surrounding text.
    Description of "Figure 8-9 Target Load Order Dialog"
    For more information, see "Target Load Order".
  4. Ensure that the Use Target Load Ordering configuration property described is set to its default value of true.

Best Practices for Designing SQL*Loader Mappings

This section includes the following topics:

Using Conventional Loading to Ensure Referential Integrity in SQL*Loader Mappings

If you are extracting data from a multiple-record-type file with a master-detail structure and mapping to tables, add a Mapping Sequence operator to the mapping to retain the relationship between the master and detail records through a surrogate primary key or foreign key relationship. A master-detail file structure is one where a master record is followed by its detail records. In Example 8-1, records beginning with "E" are master records with Employee information and records beginning with "P" are detail records with Payroll information for the corresponding employee.
Example 8-1 A Multiple-Record-Type Flat File with a Master-Detail Structure
E 003715 4 153 09061987 014000000 "IRENE HIRSH" 1 08500
P 01152000 01162000 00101 000500000 000700000
P 02152000 02162000 00102 000300000 000800000
E 003941 2 165 03111959 016700000 "ANNE FAHEY" 1 09900 
P 03152000 03162000 00107 000300000 001000000
E 001939 2 265 09281988 021300000 "EMILY WELLMET" 1 07700
P 01152000 01162000 00108 000300000 001000000
P 02152000 02162000 00109 000300000 001000000
In Example 8-1, the relationship between the master and detail records is inherent only in the physical record order: payroll records correspond to the employee record they follow. However, if this is the only means of relating detail records to their masters, this relationship is lost when Warehouse Builder loads each record into its target table.

Maintaining Relationships Between Master and Detail Records

You can maintain the relationship between master and detail records if both types of records share a common field. If Example 8-1 contains a field Employee ID in both Employee and Payroll records, you can use it as the primary key for the Employee table and as the foreign key in the Payroll table, thus associating Payroll records to the correct Employee record.
However, if your file does not have a common field that can be used to join master and detail records, you must add a sequence column to both the master and detail targets (see Table 8-5 and Table 8-6) to maintain the relationship between the master and detail records. Use the Mapping Sequence operator to generate this additional value.
Table 8-5 represents the target table containing the master records from the file in Example 8-1. The target table for the master records in this case contains employee information. Columns E1-E10 contain data extracted from the flat file. Column E11 is the additional column added to store the master sequence number. Notice that the number increments by one for each employee.
Table 8-5 Target Table Containing Master Records
E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11
E
003715
4
153
09061987
014000000
"IRENE
HIRSH"
1
08500
1
E
003941
2
165
03111959
016700000
"ANNE
FAHEY"
1
09900
2
E
001939
2
265
09281988
021300000
"EMILY
WELSH"
1
07700
3
Table 8-6 represents the target table containing the detail records from the file in Example 8-1. The target table for the detail records in this case contains payroll information, with one or more payroll records for each employee. Columns P1-P6 contain data extracted from the flat file. Column P7 is the additional column added to store the detail sequence number. Notice that the number for each payroll record matches the corresponding employee record in Table 8-5.
Table 8-6 Target Table Containing Detail Records
P1 P2 P3 P4 P5 P6 P7
P
01152000
01162000
00101
000500000
000700000
1
P
02152000
02162000
00102
000300000
000800000
1
P
03152000
03162000
00107
000300000
001000000
2
P
01152000
01162000
00108
000300000
001000000
3
P
02152000
02162000
00109
000300000
001000000
3

Extracting and Loading Master-Detail Records

This section contains instructions on creating a mapping that extracts records from a master-detail flat file and loads those records into two different tables. One target table stores master records and the other target table stores detail records from the flat file. The Mapping Sequence is used to maintain the master-detail relationship between the two tables.
Note:
These instructions are for conventional path loading. For instructions on using direct path loading for master-detail records, see "Using Direct Path Loading to Ensure Referential Integrity in SQL*Loader Mappings".
This procedure outlines general steps for building such a mapping. Additional detailed instructions are available:
To extract from a master-detail flat file and maintain master-detail relationships:
  1. Import and sample a flat file source that consists of master and detail records.
    When naming the record types as you sample the file, assign descriptive names to the master and detail records, as shown in Figure 8-10. This makes it easier to identify those records in the future.
    Figure 8-10 shows the Flat File Sample Wizard for a multiple-record-type flat file containing department and employee information. The master record type (for employee records) is called EmployeeMaster, while the detail record type (for payroll information) is called PayrollDetail.
Figure 8-10 Naming Flat File Master and Detail Record Types
This illustration is described in the surrounding text.
Description of "Figure 8-10 Naming Flat File Master and Detail Record Types "
  1. Drop a Mapping Flat File operator onto the mapping editor canvas and specify the master-detail file from which you want to extract data.
  2. Drop a Mapping Sequence operator onto the mapping canvas.
  3. Drop a Table Operator operator for the master records onto the mapping canvas.
    You can either select an existing repository table that you created earlier or create a new unbound table operator with no attributes. You can then map or copy all required fields from the master record of the file operator to the master table operator (creating columns) and perform an outbound reconciliation to define the table later.
    The table must contain all the columns required for the master fields you want to load plus an additional numeric column for loading sequence values, as shown in Figure 8-11.
Figure 8-11 Adding a Sequence Column to the Master and Detail Target Tables
This illustration is described in the surrounding text.
Description of "Figure 8-11 Adding a Sequence Column to the Master and Detail Target Tables"
  1. Drop a Table Operator operator for the detail records onto the mapping canvas.
    You can either select an existing repository table that you created earlier or create a new unbound table operator with no attributes. You can then map or copy all required fields from the master record of the file operator to the master table operator (creating columns) and perform an outbound synchronize to define the table later.
    The table must contain all the columns required for the detail fields you want to load plus an additional numeric column for loading sequence values.
  2. Map all of the necessary flat file master fields to the master table and detail fields to the detail table, as shown in Figure 8-12.
  3. Map the Mapping Sequence NEXTVAL attribute to the additional sequence column in the master table, as shown in Figure 8-12.
  4. Map the Mapping Sequence CURRVAL attribute to the additional sequence column in the detail table, as shown in Figure 8-12.
    Figure 8-12 shows a completed mapping with the flat file master fields mapped to the master target table, the detail fields mapped to the detail target table, and the NEXTVAL and CURRVAL attributes from the Mapping Sequence mapped to the master and detail target tables, respectively.
Figure 8-12 Completed Mapping from Master-Detail Flat File to Two Target Tables
This illustration is described in the surrounding text.
Description of "Figure 8-12 Completed Mapping from Master-Detail Flat File to Two Target Tables"
  1. Configure the mapping with the following parameters:
    Direct Mode: False
    Errors Allowed: 0
    Row: 1
    Trailing Nullcols: True (for all tables)

Error Handling Suggestions

This section contains error handling recommendations for files with varying numbers of errors.
If your data file almost never contains errors:
  1. Create a mapping with a Sequence operator (see "Sequence Operator").
  2. Configure a mapping with the following parameters:
    Direct Mode= false
    ROW=1
    ERROR ALLOWED = 0
  3. Generate the code and run an SQL Loader script.
    If the data file has errors, the loading stops when the first error happens.
  4. Fix the data file and run the control file again with the following configuration values:
    CONTINUE_LOAD=TRUE
    SKIP=number of records already loaded
If your data file is likely to contain a moderate number of errors:
  1. Create a primary key (PK) for the master record based on the seq_nextval column.
  2. Create a foreign key (FK) for the detail record based on the seq_currval column which references the master table PK.
    In this case, master records with errors will be rejected with all their detail records. You can recover these records by following these steps.
  3. Delete all failed detail records that have no master records.
  4. Fix the errors in the bad file and reload only those records.
  5. If there are very few errors, you may choose to load the remaining records and manually update the table with correct sequence numbers.
  6. In the log file, you can identify records that failed with errors because those errors violate the integrity constraint. The following is an example of a log file record with errors:
    Record 9: Rejected - Error on table "MASTER_T", column "C3". 
    ORA-01722: invalid number 
    Record 10: Rejected - Error on table "DETAIL1_T". 
    ORA-02291: integrity constraint (SCOTT.FK_SEQ) violated - parent key not found
    Record 11: Rejected - Error on table "DETAIL1_T". 
    ORA-02291: integrity constraint (SCOTT.FK_SEQ) violated - parent key not found 
    Record 21: Rejected - Error on table "DETAIL2_T". 
    ORA-02291: invalid number  
    
If your data file always contains many errors:
  1. Load all records without using the Mapping Sequence operator.
    Load the records into independent tables. You can load the data in Direct Mode, with the following parameters that increase loading speed:
    ROW>1
    ERRORS ALLOWED=MAX
  2. Correct all rejected records.
  3. Reload the file again with a Sequence operator (see "Sequence Operator").

Subsequent Operations

After the initial loading of the master and detail tables, you can use the loaded sequence values to further transform, update, or merge master table data with detail table data. For example, if your master records have a column that acts as a unique identifier (such as an Employee ID), and you want to use it as the key to join master and detail rows (instead of the sequence field you added for that purpose), you can update the detail table(s) to use this unique column. You can then drop the sequence column you created for the purpose of the initial load. Operators such as the Aggregator, Filter, or Match and Merge operator can help you with these subsequent transformations.

Using Direct Path Loading to Ensure Referential Integrity in SQL*Loader Mappings

If you are using a master-detail flat file where the master record has a unique field (or if the concatenation of several fields can result in a unique identifier), you can use Direct Path Load as an option for faster loading.
For direct path loading, the record number (RECNUM) of each record is stored in the master and detail tables. A post-load procedure uses the RECNUM to update each detail row with the unique identifier of the corresponding master row.
This procedure outlines general steps for building such a mapping. Additional detailed instructions are available:
To extract from a master-detail flat file using direct path load to maintain master-detail relationships:
  1. Import and sample a flat file source that consists of master and detail records.
    When naming the record types as you sample the file, assign descriptive names to the master and detail records, as shown in Figure 8-10. This will make it easier to identify those records in the future.
  2. Drop a Mapping Flat File operator onto the mapping canvas and specify the master-detail file from which you want to extract data.
  3. Drop a Data Generator and a Constant operator onto the mapping canvas.
  4. Drop a Table Operator operator for the master records onto the mapping canvas.
    You can either select an existing repository table that you created earlier, or create a new unbound table operator with no attributes and perform an outbound synchronize to define the table later.
    The table must contain all the columns required for the master fields you plan to load plus an additional numeric column for loading the RECNUM value.
  5. Drop a Table Operator for the detail records onto the mapping canvas.
    You can either select an existing repository table that you created earlier, or create a new unbound table operator with no attributes and perform an outbound synchronize to define the table later.
    The table must contain all the columns required for the detail fields you plan to load plus an additional numeric column for loading a RECNUM value, and a column that will be updated with the unique identifier of the corresponding master table row.
  6. Map all of the necessary flat file master fields to the master table and detail fields to the detail table, as shown in Figure 8-14.
  7. Map the Data Generator operator's RECNUM attribute to the RECNUM columns in the master and detail tables, as shown in Figure 8-14.
  8. Add a constant attribute in the Constant operator.
    If the master row unique identifier column is of a CHAR data type, make the constant attribute a CHAR type with the expression '*'.
    If the master row unique identifier column is a number, make the constant attribute a NUMBER with the expression '0'. Figure 8-13 shows the expression property of the constant attribute set to '0'. This constant marks all data rows as "just loaded."
Figure 8-13 Constant Operator Properties
This illustration is described in the surrounding text.
Description of "Figure 8-13 Constant Operator Properties"
  1. Map the constant attribute from the Constant operator to the detail table column that will later store the unique identifier for the corresponding master table record.
    Figure 8-14 shows a completed mapping with the flat file's master fields mapped to the master target table, the detail fields mapped to the detail target table, the RECNUM attributes from the Data Generator operator mapped to the master and detail target tables, respectively, and the constant attribute mapped to the detail target table.
Figure 8-14 Completed Mapping from Master-Detail Flat File with a Direct Path Load
This illustration is described in the surrounding text.
Description of "Figure 8-14 Completed Mapping from Master-Detail Flat File with a Direct Path Load"
  1. Configure the mapping with the following parameters:
    Direct Mode: True
    Errors Allowed: 0
    Trailing Nullcols: True (for each table)
  2. After you validate the mapping and generate the SQL*Loader script, create a post-update PL/SQL procedure and add it to the Warehouse Builder library.
  3. Run the SQL*Loader script.
  4. Execute an UPDATE SQL statement by running a PL/SQL post-update procedure or manually executing a script.
The following is an example of the generated SQL*Loader control file script:
OPTIONS ( DIRECT=TRUE,PARALLEL=FALSE, ERRORS=0, BINDSIZE=50000, ROWS=200, READSIZE=65536)
LOAD DATA
CHARACTERSET WE8MSWIN1252
                INFILE 'g:\FFAS\DMR2.dat'
                READBUFFERS 4
                INTO TABLE "MATER_TABLE"
                APPEND
                REENABLE DISABLED_CONSTRAINTS
                WHEN 
                "REC_TYPE"='P'
                FIELDS
                TERMINATED BY ','
                OPTIONALLY ENCLOSED BY '"'
                TRAILING NULLCOLS
    
                (
                "REC_TYPE" POSITION (1) CHAR ,
                "EMP_ID" CHAR ,
                "ENAME" CHAR ,
                "REC_NUM" RECNUM
                )
  
INTO TABLE "DETAIL_TABLE"
                APPEND
                REENABLE DISABLED_CONSTRAINTS
                WHEN 
                "REC_TYPE"='E'
                FIELDS
                TERMINATED BY ','
                OPTIONALLY ENCLOSED BY '"'
                TRAILING NULLCOLS
        (
                "REC_TYPE" POSITION (1) CHAR ,
                "C1" CHAR ,
                "C2" CHAR ,
                "C3" CHAR ,
                "EMP_ID" CONSTANT '*',
         "REC_NUM" RECNUM
The following is an example of the post-update PL/SQL procedure:
create or replace procedure wb_md_post_update( 
     master_table varchar2 
    ,master_recnum_column varchar2 
    ,master_unique_column varchar2 
    ,detail_table varchar2 
    ,detail_recnum_column varchar2 
    ,detail_masterunique_column varchar2 
    ,detail_just_load_condition varchar2) 
  IS 
     v_SqlStmt VARCHAR2(1000); 
  BEGIN 
     v_SqlStmt := 'UPDATE '||detail_table||' l '|| 
                  ' SET l.'||detail_masterunique_column||' = (select i.'||master_unique_column|| 
                  ' from '||master_table||' i '|| 
                  ' WHERE i.'||master_recnum_column||' IN '|| 
                  ' (select max(ii.'||master_recnum_column||') '|| 
                  ' from '||master_table||' ii '|| 
                  ' WHERE ii.'||master_recnum_column||' < l.'||detail_recnum_column||') '|| 
                  ' ) '|| 
                  ' WHERE l.'||detail_masterunique_column||' = '||''''||detail_just_load_condition||''''; 
     dbms_output.put_line(v_sqlStmt); 
     EXECUTE IMMEDIATE  v_SqlStmt; 
  END; 
  /

Improved Performance Through Partition Exchange Loading

Data partitioning can improve performance when loading or purging data in a target system. This practice is known as Partition Exchange Loading (PEL).
PEL is recommended when loading a relatively small amount of data into a target containing a much larger volume of historical data. The target can be a table, a dimension, or a cube in a data warehouse.
This section includes the following topics:

About Partition Exchange Loading

By manipulating partitions in your target system, you can use Partition Exchange Loading (PEL) to instantly add or delete data. When a table is exchanged with an empty partition, new data is added.
You can use PEL to load new data by exchanging it into a target table as a partition. For example, a table that holds the new data assumes the identity of a partition from the target table and this partition assumes the identity of the source table. This exchange process is a DDL operation with no actual data movement. Figure 8-15 illustrates this example.
Figure 8-15 Overview of Partition Exchange Loading
This illustration is described in the surrounding text.
Description of "Figure 8-15 Overview of Partition Exchange Loading"
In Figure 8-15, data from a source table Source is inserted into a target table consisting of four partitions (Target_P1, Target_P2, Target_P3, and Target_P4). If the new data needs to be loaded into Target_P3, the partition exchange operation only exchanges the names on the data objects without moving the actual data. After the exchange, the formerly labeled Source is renamed to Target_P3, and the former Target_P3 is now labeled as Source. The target table still contains four partitions: Target_P1, Target_P2, Target_P3, and Target_P4. The partition exchange operation available in Oracle9i completes the loading process without data movement.

Configuring a Mapping for PEL

To configure a mapping for partition exchange loading, complete the following steps:
  1. In the Project Explorer, right-click a mapping and select Configure.
    Warehouse Builder displays the Configuration Properties dialog.
  2. By default, PEL is disabled for all mappings. Select PEL Enabled to use Partition Exchange Loading.
  3. Use Data Collection Frequency to specify the amount of new data to be collected for each run of the mapping. Set this parameter to specify if you want the data collected by Year, Quarter, Month, Day, Hour, or Minute. This determines the number of partitions.
  4. Select Direct if you want to create a temporary table to stage the collected data before performing the partition exchange. If you do not select this parameter, Warehouse Builder directly swaps the source table into the target table as a partition without creating a temporary table. For more information, see "Direct and Indirect PEL".
  5. If you select Replace Data, Warehouse Builder replaces the existing data in the target partition with the newly collected data. If you do not select it, Warehouse Builder preserves the existing data in the target partition. The new data is inserted into a non-empty partition. This parameter affects the local partition and can be used to remove or swap a partition out of a target table. At the table level, you can set Truncate/Insert properties.

Direct and Indirect PEL

When you use Warehouse Builder to load a target by exchanging partitions, you can load the target indirectly or directly.
  • Indirect PEL: By default, Warehouse Builder creates and maintains a temporary table that stages the source data before initiating the partition exchange process. For example, use Indirect PEL when the mapping includes a remote source or a join of multiple sources.
  • Direct PEL: You design the source for the mapping to match the target structure. For example, use Direct PEL in a mapping to instantaneously publish fact tables that you loaded in a previously executed mapping.

Using Indirect PEL

If you design a mapping using PEL and it includes remote sources or a join of multiple sources, Warehouse Builder must perform source processing and stage the data before partition exchange can proceed. Therefore, configure such mappings with Direct PEL set to False. Warehouse Builder transparently creates and maintains a temporary table that stores the results from source processing. After performing the PEL, Warehouse Builder drops the table.
Figure 8-17 shows a mapping that joins two sources and performs an aggregation. If all new data loaded into the ORDER_SUMMARY table is always loaded into same partition, then you can use Indirect PEL on this mapping to improve load performance. In this case, Warehouse Builder transparently creates a temporary table after the Aggregator and before ORDER_SUMMARY.
Figure 8-16 Mapping with Multiple Sources
Description of Figure 8-16 follows
Description of "Figure 8-16 Mapping with Multiple Sources"
Warehouse Builder creates the temporary table using the same structure as the target table with the same columns, indexes, and constraints. For the fastest performance, Warehouse Builder loads the temporary table using parallel direct-path loading INSERT. After the INSERT, Warehouse Builder indexes and constrains the temporary table in parallel.

Example: Using Direct PEL to Publish Fact Tables

Use Direct PEL when the source table is local and the data is of good quality. You must design the mapping such that the source and target are in the same database and have exactly the same structure. The source and target must have the same indexes and constraints, the same number of columns, and the same column types and lengths.
For example, assume that you have the same mapping from Figure 8-17 but would like greater control on when data is loaded into the target. Depending on the amount of data, it could take hours to load and you would not know precisely when the target table would be updated.
To instantly load data to a target using Direct PEL:
  1. Design one mapping to join source data, if necessary, transform data, ensure data validity, and load it to a staging table. Do not configure this mapping to use PEL.
    Design the staging table to exactly match the structure of the final target that you will load in a separate mapping. For example, the staging table in Figure 8-17 is ORDER_SUMMARY and should be of the same structure as the final target, ORDER_CUBE in Figure 8-18.
  2. Create a second mapping that loads data from the staging table to the final target such as shown in Figure 8-18. Configure this mapping to use Direct PEL.
    Figure 8-17 Publish_Sales_Summary Mapping
    Description of Figure 8-17 follows
    Description of "Figure 8-17 Publish_Sales_Summary Mapping"
  3. Use either the Warehouse Builder Process Flow Editor or Oracle Workflow to launch the second mapping after the completion of the first.

Using PEL Effectively

You can use PEL effectively for scalable loading performance if the following conditions are true:
  • Table partitioning and tablespace: The target table must be Range partitioned by one DATE column. All partitions must be created in the same tablespace. All tables are created in the same tablespace.
  • Existing historical data: The target table must contain a huge amount of historical data. An example use for PEL is for a click stream application where the target collects data every day from an OLTP database or Web log files. New data is transformed and loaded into the target that already contains historical data.
  • New data: All new data must to be loaded into the same partition in a target table. For example, if the target table is partitioned by day, then the daily data should be loaded into one partition.
  • Loading Frequency: The loading frequency should be equal to or less than the data collection frequency.
  • No global indexes: There must be no global indexes on the target table.

Configuring Targets in a Mapping

To configure targets in a mapping for PEL:

Step 1: Create All Partitions

Warehouse Builder does not automatically create partitions during runtime. Before you can use PEL, you must create all partitions as described in "Using Partitions".
For example, if you select Month as the frequency of new data collection, you need to create all the required partitions for each month of new data. Use the Data Object Editor to create partitions for a table, dimension, or cube. Figure 8-18 shows the property inspector window for table ORDER_SUMMARY. This figure shows six partitions that have been added for this table.
To use PEL, all partition names must follow a naming convention. For example, for a partition that will hold data for May 2002, the partition name must be in the format Y2002_Q2_M05.
For PEL to recognize a partition, its name must fit one of the following formats.
Ydddd
Ydddd_Qd
Ydddd_Qd_Mdd
Ydddd_Qd_Mdd_Ddd
Ydddd_Qd_Mdd_Ddd_Hdd
Ydddd_Qd_Mdd_Ddd_Hdd_Mdd
Where d represents a decimal digit. All the letters must be in upper case. Lower case is not recognized.
Figure 8-18 Configuration Properties for Table ORDER_SUMMARY
This illustration is described in the surrounding text.
Description of "Figure 8-18 Configuration Properties for Table ORDER_SUMMARY"
If you correctly name each partition, Warehouse Builder automatically computes the Value Less Than property for each partition. Otherwise, you must manually configure Value Less Than for each partition for Warehouse Builder to generate a DDL statement. The following is an example of a DDL statement generated by Warehouse Builder:
. . .
PARTITION A_PARTITION_NAME 
      VALUES LESS THAN (TO_DATE('01-06-2002','DD-MM-YYYY')),
. . .
Figure 8-19 shows automatically generated configuration values for the Value Less Than parameter.
Figure 8-19 Automatically Generated "Value Less Than" Setting
This illustration is described in the surrounding text.
Description of "Figure 8-19 Automatically Generated "Value Less Than" Setting"

Step 2: Create All Indexes Using the LOCAL Option

Add an index (ORDER_SUMMARY_PK_IDX) to the ORDER_SUMMARY table. This index has two columns, ORDER_DATE and ITEM_ID. Configure the following:
  • Set the Index Type parameter to UNIQUE.
  • Set the Local Index parameter to True.
Now Warehouse Builder can generate a DDL statement for a unique local index on table ORDER_SUMMARY.
Using local indexes provides the most important PEL performance benefit. Local indexes require all indexes to be partitioned in the same way as the table. When the temporary table is swapped into the target table using PEL, so are the identities of the index segments.
If an index is created as a local index, the Oracle server requires that the partition key column must be the leading column of the index. In the preceding example, the partition key is ORDER_DATE and it is the leading column in the index ORDER_SUMMARY_PK_IDX.

Step 3: Primary/Unique Keys Use "USING INDEX" Option

In this step you must specify that all primary key and unique key constraints are created with the USING INDEX option.
With the USING INDEX option, a constraint will not trigger automatic index creation when it is added to the table. The server will search existing indexes for an index with same column list as that of the constraint. Thus, each primary or unique key constraint must be backed by a user-defined unique local index. The index required by the constraint ORDER_SUMMARY_PK is ORDER_SUMMARY_PK_IDX which was created in "Step 2: Create All Indexes Using the LOCAL Option".

Restrictions for Using PEL in Warehouse Builder

These are the restrictions for using PEL in Warehouse Builder:
  • Only One Date Partition Key: Only one partition key column of DATE data type is allowed. Numeric partition keys are not supported in Warehouse Builder.
  • Only Natural Calendar System: The current PEL method supports only the natural calendar system adopted worldwide. Specific business calendar systems with user-defined fiscal and quarter endings are currently not supported.
  • All Data Partitions Must Be In The Same Tablespace: All partitions of a target (table, dimension, or cube) must be created in the same tablespace.
  • All Index Partitions Must Be In The Same Tablespace: All indexes of a target (table, dimension, or cube) must be created in the same tablespace. However, the index tablespace can be different from the data tablespace.

High Performance Data Extraction from Remote Sources

Although you can design mappings to access remote sources through database links, performance is likely to be slow when you move large volumes of data. For mappings that move large volumes of data between sources and targets of the same Oracle Database version, you have an option for dramatically improving performance through the use of transportable modules. For instructions on using transportable modules.

SOURCE:http://download.oracle.com/docs/cd/B31080_01/doc/owb.102/b28223/concept_etl_performance.htm
newer post
newer post older post Home