Skip Ribbon Commands
Skip to main content
Navigate Up
Sign In

Quick Launch

Average Rating:

facebook Twitter
Email
Print Bookmark Alert me when this article is updated

Feedback

MAPPING TEMPLATES: Aggregation using sorted input
Content

Overview 

You can import the performance of mappings that perform aggregation using the Sorted Input option when source data can be sorted according to the Aggregator's Group By ports.

In a typical PowerCenter mapping that performs aggregation (without the Sorted Input option), the Informatica server must read the entire data source before it begins performing calculations, in order to ensure that it has received every record that belongs to each unique group. While the server is optimized to perform aggregate calculations, the time required to read the entire data source can be a bottleneck in mappings that load very large amounts of data.

In a mapping that uses Sorted Input, the Informatica server assumes that all data entering an Aggregator transformation are already sorted into groups that correspond to the Aggregator's "Group By" ports. As a result, the server does not have to read the entire data source before performing calculations. As soon as the Aggregator detects a new unique group, it performs all of the calculations required for the current group, and then passes the current group's record on to the next transformation. Selecting Sorted Input often provides dramatic increases in Aggregator performance.

Implementation Guidelines

In order for Sorted Input to work, you must be able to sort the data in your source by the Aggregator's Group By columns.

The key concepts illustrated in this mapping template can be found in two transformation objects, the Source Qualifier transformation ( SQ_ORDER_ITEMS ) and the Aggregator transformation ( agg_CALC_PROFIT_and_MARGIN ):

SQ_ORDER_ITEMS contains a SQL Override statement that pulls data from the ORDER_ITEMS table. The select statement in this SQL Override contains an ORDER BY clause that orders the source data by the ITEM_ID column. In addition, on the Properties tab of SQ_ORDER_ITEMS , the "Number of Sorted Ports" option is set to "1." Configuring the Source Qualifier in this way prepares the data for Sorted Input aggregation.

In agg_CALC_PROFIT_and_MARGIN , the ITEM_ID port is selected as the "Group By" port in the transformation (the number of "Group By" ports must correspond to the "Number of Sorted Ports" indicated in the Source Qualifier object). In addition, on the Properties tab of agg_CALC_PROFIT_and_MARGIN , the "Sorted Input" option is selected.

No other configuration is required in order for Sorted Input functionality to work in this mapping. When a session is created to implement this mapping, it will automatically detect that the Sorted Input option has been selected.

Please refer to the metadata descriptions in the mapping m_AGG_Sorted_Input for more details on the functionality provided by this mapping template.

Pros


  • Improves performance of mappings that perform aggregation.
  • Can be used with relational and file-based sources, as long as data can be sorted according to the Aggregator's "Group By" ports.

Cons

  • With relational sources, additional overhead is exerted on the database when "Order By" clauses are used.


More Information


Reference


Attachments


Applies To
Product: PowerCenter
Problem Type:
User Type: Administrator
Project Phase:
Product Version: PowerCenter
Database:
Operating System:
Other Software:

Last Modified Date: 11/12/2009 7:55 AM ID: 16297
People who viewed this also viewed

Feedback

Did this KB document help you?



What can we do to improve this information (2000 or fewer characters)