Tuesday 26 June 2012

All about Inforamtica Lookup

All about Inforamtica Lookup

 A Lookup is a Passive, Connected or Unconnected Transformation used to look up data in a relational table, view, synonym or flat file. The integration service queries the lookup table to retrieve a value based on the input source value and the lookup condition.  

A connected lookup recieves source data, performs a lookup and returns data to the pipeline;
While an unconnected lookup is not connected to source or target and is called by a transformation in the pipeline by :LKP expression which in turn returns only one column value to the calling transformation.

Lookup can be Cached or Uncached. If we cache the lookup then again we can further go for static or dynamic or persistent cache,named cache or unnamed cache .
By default lookup transformations are cached and static.

Advanced Properties of Lookup Transformation:

  • Lookup policy on multiple match:  "Lookup Policy on Multiple Match option"  is used to determine which rows that the lookup transformation returns when it finds multiple rows that match the lookup condition. You can select lookup to return first or last row or any matching row or to report an error. 
  • Connection Information: Query the lookup table from the source or target connection. In case of flat file lookup we can give the file path and name, whether direct or indirect.

  • Output Old Value On Update: This option is used when dynamic cache is enabled. When this option is enabled, the integration service outputs old values out of the lookup/output ports. When the Integration Service updates a row in the cache, it outputs the value that existed in the lookup cache before it updated the row based on the input data. When the Integration Service inserts a new row in the cache, it outputs null values. When you disable this property, the Integration Service outputs the same values out of the lookup/output and input/output ports.
  • Re-cache from lookup source  When checked, integration service rebuilds lookup cache from lookup source when the lookup instance is called in the session.
    Insert Else Update: Insert Else Update option applies to rows entering the lookup transformation with the row type of insert. When this option is enabled the integration service inserts new rows in the cache and updates existing rows when disabled, the Integration Service does not update existing rows.
    Update Else Insert:Update Else Insert option applies to rows entering the lookup transformation with the row type of update. When this option is enabled, the Integration Service updates existing rows, and inserts a new row if it is new. When disabled, the Integration Service does not insert new rows.

    Dynamic Lookup cache: 

    A Lookup cache does not change its data once built. But  if the underlying table upon which lookup was done changes the data after the lookup cache is created,  Is there a way so that the cache always remain up-to-date even if the underlying table changes, at this scenario we use dynamic lookup cache.

    Dynamic lookup cache: It updates/ inserts data into cache before updating the target table. It maintains the cache up to date.

    When the lookup is configured to Dynamic lookup cache by default a NewLookup Row port will be created. which in turn returns 0,1,2

    When NewLookup Row port returns
    • 0 = Integration Service does not update or insert the row in the cache.
    • 1 = Integration Service inserts the row into the cache.
    • 2 = Integration Service updates the row in the cache. 
     Note: While using a dynamic lookup cache, we must associate each lookup/output port with an input/output port or a sequence ID. The Integration Service uses the data in the associated port to insert or update rows in the lookup cache. The Designer associates the input/output ports with the lookup/output ports used in the lookup condition. 



     The properties of dynamic lookup are shown in below image


    The lookup/output port output value depends on whether we choose to output old or new values when the Integration Service updates a row:
  • Output old values on update: The Integration Service outputs the value that existed in the cache before it updated the row.
  • Output new values on update: The Integration Service outputs the updated value that it writes in the cache. The lookup/output port value matches the input/output port value ( we don't have any property like  Output new values on update, if we unchecked Output old values on update,  then the integration services works as Output new values on update) 

Handling Null Values in dynamic lookup: 

Suppose in our source  there are some null values and we had checked Ignore Null inputs for Update property.  Than the lookup ignores the null value which is not inserted into cache and it inserts into target,

In this case the cache and that target table get unsynchronize, to eradicate this we need to sync the target table and the cache, by inserting null value or to ignore it, for that  we have two options.

  • Insert null values: The Integration Service uses null values from the source and updates the lookup cache and target table using all values from the source.
  • Ignore Null inputs for Update property : The Integration Service ignores the null values in the source and updates the lookup cache and target table using only the not null values from the source. 

More details on Dynamic Lookup:

Dynamic lookup cache compares the values in all lookup ports with the values in their associated input ports by default and determine whether or not to update the row in the lookup cache. When a value in an input port changes from the value in the lookup port, the Integration Service updates/inserts  the row in the cache as well as in a table,
But if we need to compare only certain ports not all the ports
 This scenario comes  when the source data includes a column that indicates whether or not the row contains data we need to update.
  • We can choose the ports that we want the Integration Service to ignore when it compares ports.
  • The Designer only enables this property for lookup/output ports when the port is not used in the lookup condition. 
  • We can improve performance by ignoring some ports during comparison.
  • Select the Ignore in Comparison property for all lookup ports except the port that indicates whether or not to update the row in the cache and target table.

 

 More on informatica lookup cache

  • Sequential cache: The Integration Service builds lookup caches sequentially. The Integration Service builds the cache in memory when it processes the first row of the data in a cached lookup transformation. The Integration Service builds caches for unconnected Lookup transformations as sequentially.
  • Concurrent caches: The Integration Service builds lookup caches concurrently. It does not need to wait for data to reach the Lookup transformation. 
  • Pipeline lookup transformation:A pipeline lookup transformation is used to perform lookup on application sources such as JMS, MSMQ or SAP. A pipeline lookup transformation has a source qualifier as the lookup source.
  • Persistent Cache: If the lookup source does not change between session runs, then you can improve the performance by creating a persistent cache for the source. When a session runs for the first time, the integration service creates the cache files and saves them to disk instead of deleting them. The next time when the session runs, the integration service builds the memory from the cache file.
  • Shared Cache: We can configure multiple Lookup transformations in a mapping to share a single lookup cache. The Integration Service builds the cache when it processes the first Lookup transformation. It uses the same cache to perform lookup for subsequent Lookup transformations that share the cache.
  • Unnamed cache: When Lookup transformations in a mapping have compatible caching structures, the Integration Service shares the cache by default. You can only share static unnamed caches.
  • Named cache: Use a persistent named cache when you want to share a cache file across mappings or share a dynamic and a static cache. The caching structures must match or be compatible with a named cache. You can share static and dynamic named caches.
Some steps on improving performance tuning
  • Create an index on the columns used in the lookup condition
  • Place conditions with equality operator first
  • Cache small lookup tables.
  • Join tables in the database: If the source and the lookup table are in the same database, join the tables in the database rather than using a lookup transformation.
  • Use persistent cache for static lookup.
  • Avoid ORDER BY on all columns in the lookup source. Specify explicitly the ORDER By clause on the required columns.
  • For flat file lookup, provide Sorted files as lookup source.
That's all for today, will be posting more on persistent caches and performance tuning in my coming posts.