You can import the performance of mappings that perform aggregation using the Sorted Input option when source data can be sorted according to the Aggregator's Group By ports.
In a typical PowerCenter mapping that performs aggregation (without the Sorted Input option), the Informatica server must read the entire data source before it begins performing calculations, in order to ensure that it has received every record that belongs to each unique group. While the server is optimized to perform aggregate calculations, the time required to read the entire data source can be a bottleneck in mappings that load very large amounts of data.
In a mapping that uses Sorted Input, the Informatica server assumes that all data entering an Aggregator transformation are already sorted into groups that correspond to the Aggregator's "Group By" ports. As a result, the server does not have to read the entire data source before performing calculations. As soon as the Aggregator detects a new unique group, it performs all of the calculations required for the current group, and then passes the current group's record on to the next transformation. Selecting Sorted Input often provides dramatic increases in Aggregator performance.
In order for Sorted Input to work, you must be able to sort the data in your source by the Aggregator's Group By columns.
The key concepts illustrated in this mapping template can be found in two transformation objects, the Source Qualifier transformation ( SQ_ORDER_ITEMS ) and the Aggregator transformation ( agg_CALC_PROFIT_and_MARGIN ):
SQ_ORDER_ITEMS contains a SQL Override statement that pulls data from the ORDER_ITEMS table. The select statement in this SQL Override contains an ORDER BY clause that orders the source data by the ITEM_ID column. In addition, on the Properties tab of SQ_ORDER_ITEMS , the "Number of Sorted Ports" option is set to "1." Configuring the Source Qualifier in this way prepares the data for Sorted Input aggregation.
In agg_CALC_PROFIT_and_MARGIN , the ITEM_ID port is selected as the "Group By" port in the transformation (the number of "Group By" ports must correspond to the "Number of Sorted Ports" indicated in the Source Qualifier object). In addition, on the Properties tab of agg_CALC_PROFIT_and_MARGIN , the "Sorted Input" option is selected.
No other configuration is required in order for Sorted Input functionality to work in this mapping. When a session is created to implement this mapping, it will automatically detect that the Sorted Input option has been selected.
Please refer to the metadata descriptions in the mapping m_AGG_Sorted_Input for more details on the functionality provided by this mapping template.
FAQ: What are the limitations of the Aggregator sorted input option? (20369)
MAPPING TEMPLATES: Using a Sorter with a Flat File Source (16313)
MAPPING TEMPLATES: Removing duplicate source records (16307)
Handling duplicate rows using a PowerCenter mapping (15175)
What can we do to improve this information (2000 or fewer characters)