- Joiner Transformation
Joiner transformation joins two related heterogeneous sources residing in different locations or file systems.
- We use the joiner to join two sources with at least one matching port.
- Joiner typically combine information from two different sources that do not have matching keys such as flat file sources.
- Joiner allows to use join sources that contain binary data.
There are some limitations on the pipelines we connect to the joiner. We cannot use a joiner in the following situations.
- Both input pipelines originate from the same Source Qualifier transformation.
- Both input pipelines originate from the same Normalizer or Joiner.
- Either input pipeline contains an Update Strategy transformation.
- We connect a sequence generator transformation directly before joiner.
Master and Detail Source
When we add the ports of a transformation to a Joiner, the ports from the first source are automatically set as detail sources. Adding the ports from the second transformation automatically sets them as master sources.
Join Types
- Normal (Default)
- Master Outer
- Detail Outer
- Full Outer
Master – Detail Join Rules
If a session contains a mapping with multiple joiner transformations, the IS reads rows in the following order.
- For each joiner, the IS reads all the master rows before it reads the first detail row.
- For each joiner, the IS produces output rows as soon as it reads the first detail row.
If you create a mapping with two joiners in the same target load order group, make sure each joiner receives detail rows from a different source pipeline, so that IS reads the rows according to the master-detail join rules.
Joiner Caches
When we run a session with joiner, the IS reads all the rows from the master source and builds index and data caches based on the master source. Since the cache read only the master source rows, we should specify the source with fewer rows as the master source.
During a workflow, the joiner compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
Join Condition
Both ports in a condition must have the same data type. We need to convert the data types for non-matching data types.
If the data types do not match, the mapping will be invalid.
Join condition only supports equality between fields.
The joiner does not match null values.
To join rows with null values, we can replace null input with default values, and then join the default values.
Join Types
Normal Join
The IS discards all rows of data from the master detail source that do not match based on the condition.
Master Outer join
It keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source.
Detail Outer join
It keeps all rows of data from the master source and the matching rows from the detail source. It discards unmatched rows from the detail source.
Full Outer Join
It keeps all rows of data from both the master and detail sources.
- A normal or master outer join performs faster than a full outer or detail outer join.
- If a result set includes fields that do not contain data in either of the sources, the joiner populates the empty fields with null values.
Performing a join in the database
Performing a join in the database is faster than performing a join in the session. In some cases, its not possible, such as joining tables from two different databases or flat file systems.
To perform a join in the database
- Create a pre-session stored procedure.
- Use the Source Qualifier to perform the join.
0 comments:
Post a Comment