Sunday, August 14, 2011

Lookup Transformation


  1. Lookup Transformation

 
We can use lookup transformation to lookup data in a flat file, relational table , view or synonym.

 
Tasks:

 
à
Get a related value: We can retrieve a value from lookup table based on value in the source.

 
à
Perform a calculation: We can retrieve a value from lookup table and use it in calculation.

 
à
Update slowly changing dimension table: Using lookup we can check whether rows exist in a target or not.

 

 

 
Connected lookup

 
    à Receives input values directly from the pipeline.

 
    à Use a dynamic or static cache.

 
    à Can return multiple columns from the same row or insert into dynamic cache.

 
    à If there is no match, the IS returns default value for all output ports, for dynamic cache IS inserts row into cache.

 
    à If there is a match, the IS returns a result from the lookup condition for all the lookup/output ports.

 
    à Pass multiple output values to another transformation.

 
    à Supports user-defined default values.

 
Un-connected lookup

 
    à Receives input value from the result of :LKP expression in another transformation.

 
    à Use a static cache.

 
    à Returns one column from each row.

 
    à If there is no match, the IS returns NULL.

 
    à If there is match, the IS returns the result of lookup condition.

 
    à Pass one output value to another transformation.

 
    à Does not support user-defined default values.

 
We can perform the following tasks with un-connected lookup:

 
    à Test the result of a lookup in an expression.

 
    à Filter rows based on the lookup results.

 
    à Mark rows for update based on the result of a lookup and update SCD's.

 
    à Call the same lookup multiple times in a mapping.

 

 

 

 

 
Cached lookup:

 
    When we enable lookup caching, the IS queries the lookup source once, caches the values , and lookup values in the cache during the session. Caching lookup values can improve session performance.

 

 
Un-cached lookup:

 
    When we disable caching, each time a row passes into the transformation , the IS issues a select statement to the lookup source for lookup values.

 

 
Lookup policy on multiple match:

 
    Determines which rows the lookup transformation returns when it finds multiple rows that match the lookup condition. We can select the first row or last row returned from the cache or lookup source, or report an error. Or, we can allow the lookup to return any matching value the transformation returns the first value that matches the lookup condition.

 
If we do not enable the output old value on update option, the lookup policy on multiple match option is set to Report an error for dynamic lookups.

 

 
Lookup query:

 
    The IS queries the lookup based on the ports and properties we configure in the lookup transformation. The IS runs a default SQL statement when the first row enters the lookup transformation.

 
Default lookup query contains:

 
à
SELECT statement includes all the lookup ports in the mapping. Do not add or delete any ports columns from the default SELECT statement.

 
à ORDER BY clause orders the columns in the same order they appear in the lookup transformation. We cannot view this when we generate the default SQL using the lookup SQL override.

 
Overriding the LOOKUP Query:

 
    We can override the lookup query for a relational lookup.

 
à Override the ORDER BY clause: Create order by clause with fewer columns to increase performance. When we override the ORDER BY clause we must suppress the generated ORDER BY clause with a comment notation (--).

 
à If the table name or column name in the query contains any reserved words we must enclose them in the quotes.

 
à Add a WHERE clause: Use a lookup SQL override to add a WHERE clause to the default SQL statement. We can use the WHERE clause to reduce the number of rows included in the cache.

 
à Use a lookup SQL override to query the lookup data from multiple tables.

 

 
Notes:

 
è Lookup table can be a single table or we can join multiple tables in the same database using a lookup SQL override.

 
è The Designer designates each column in the lookup source as a lookup(L) and output(O) port.

 
è If we delete ports from flat file lookup, the session fails.

 
è We can delete ports from a relational lookup if the mapping does not use the lookup port. This reduces the amount of memory the IS needs to run the session.

 
è The IS always caches flat files and pipeline lookups.

 
è If we use pushdown optimization, we cannot override the ORDER BY clause or suppress the order by clause with the comment notation.

 
è The IS matches null values for lookup transformation. For example, if an input lookup condition column is NULL, the IS evaluates the NULL equal to NULL in the lookup.

 
è If we configure flat file lookup for sorted input, the IS fails the session if the condition columns are not grouped. If the condition columns are grouped, but not sorted, the IS processes the lookup as if we did not configure sorted input.

 
è Lookup condition contains the following operators: =,>,<,>=,<=,!=

 
è We can use a dynamic cache for relational or flat file lookups.

 
è The IS builds caches for un-connected lookup sequentially regardless of how we configure cache building.

 

 

 

 

 

 

 

 

 
Lookup caches:

 
    The IS builds a cache in memory when it processes the first row of data in a cached lookup transformation. The IS stores condition values in the index cache and output values in the data cache. The IS queries the cache for each row that enters the transformation.

 
Building caches:

 
    We can configure the session to build caches sequentially or concurrently. When we build sequential caches, the IS creates cache as the source rows enter the lookup.
When we configure the session to build concurrent caches, the IS does not wait for the first row to enter the lookup before it creates cache. Instead, it builds multiple caches concurrently.

 
Persistent cache:

 
    We can save the lookup cache files and reuse them the next time the IS processes a lookup. If the lookup table does not change between sessions we can the lookup to use persistent cache. The first time the IS runs a session using a persistent lookup cache, it saves the cache files to disk instead of deleting them. Then next time we run the session it builds the cache memory from cache files. If the lookup table changes occasionally, we can override the lookup property to Recache the lookup from the database.

 

 
Recache from source:    

 
    If the persistent cache is not synchronized with the lookup table, we can configure the lookup transformation to rebuild the lookup cache.
We can instruct the IS to rebuild the cache if we think that the lookup source changed since the last time the IS built the persistent cache.

 

 
Static cache:

 
    The IS does not update the cache while it processes the lookup transformation.

 
Dynamic cache:

 
    The IS dynamically inserts or updates data in the lookup cache and passes data to the target.

 

 

 

 

 

 

 

 

 
Properties:

 
NewLookupRow:

 
    The designer adds this port to a lookup configured to use a dynamic cache. Indicates with a numeric value whether the IS inserts or updates the row in the cache, or makes no changes to the cache.

 
0 – IS does not update or insert the row in the cache.

 
1 – IS inserts the row into the cache.

 
2 – IS updates the row in the cache.

 
Associated port:

 
    The IS uses the data in the associated port to insert or update rows in the cache. If we associate a sequence ID, the IS generates a primary key for inserted rows in the lookup cache.

 
Ignore Null inputs for updates:

 
    We can enable this property when we don't want the IS to update the column in the cache when the data in this column contains a null value.

 
Ignore in Comparison:

 
    The IS compares the values in all lookup ports with the values in their associated input ports by default. We can use this property when we want the IS to ignore the port when it compares values before updating a row.

 
    When we add a WHERE clause in a lookup SQL override, the IS uses the WHERE clause to build the cache from the database and to perform a lookup on the database table for an un-cached lookup. It does not use the WHERE clause to insert rows into a dynamic cache when it runs a session.

 

 
Shared cache:

 
    We can share the cache between multiple transformations. We can share the unnamed cache between transformations in the same mapping. We can share the named cache between transformations in same or different mappings.

0 comments:

Post a Comment

newer post older post Home