Skip Ribbon Commands
Skip to main content
Navigate Up
Sign In

Quick Launch

Average Rating:

facebook Twitter
Email
Print Bookmark Alert me when this article is updated

Feedback

FAQ: What are the pre-requisite configurations that should be performed for executing mappings with Informatica Blaze Engine?
Answer

Informatica 'Data Engineering Integration' (DEI), earlier known as 'Big Data Management' (BDM), supports the execution of mappings in the Hadoop Environment. For running mappings in Hadoop Environments, one of the following execution engines can be chosen: 

  1. Spark Engine
  2. Blaze Engine
  3. Hive Engine ('Map Reduce' or 'Tez' modes)  (Available in Pre-Informatica 10.2.2 versions and not available from Informatica 10.2.2 version onwards)​

Informatica Blaze engine integrates with Apache Hadoop YARN to provide intelligent data pipelining, job partitioning, job recovery, and high-performance scaling. Blaze Engine consists of 'Blaze Grid Manager' and 'Blaze Job Monitor' applications, which would be running in one of the Hadoop Data Node machines.

 

Following are the pre-requisite configurations that need to be carried out for running DEI mappings using 'Blaze Engine':

 

  • In Pre-Informatica 10.2.0 versions, Ensure that Data Integration Service (DIS), used for running Blaze Engine mapping, is having the proper values configured for the attributes 'Hadoop Distribution Directory' and 'Informatica Home Directory on Hadoop'. For more information on the same, refer to the following article:

 

​https://kb.informatica.com/solution/23/Pages/61/511389.aspx 


Note:  Starting from Informatica DEI 10.2.0 version, '1-click auto install' feature is available, which would help in automatic transfer and setup of Informatica DEI packages in Hadoop cluster data node(s), as part of the first Hadoop pushdown job run from Informatica DEI. Due to the '1-click auto install' feature, it would not be required to configure 'Hadoop Distribution Directory' and 'Informatica Home Directory on Hadoop' attributes in the 'Data Integration Service' (DIS), as earlier in Informatica DEI 10.x versions.  Both of the attributes would not be available under 'Execution Options' section of the DIS from Informatica DEI 10.2.0 version.

 

  • In Pre-Informatica 10.2.1 versions, verify if the 'Data Integration Service Hadoop Distribution Directory' is configured with valid 'Hadoop distribution' directory available at '$INFA_HOME/services/shared/hadoop' location of DIS machine. 'Hadoop distribution directory' configured should be 'closer to' or 'same as' the actual Hadoop distribution version, where the Blaze jobs would be executed. 


infa_1020_dis_hadoop_distribution_directory.png 



Note:  

        • From Informatica DEI 10.2.1 version, 'Data Integration Hadoop Distribution Directory'  attribute would be not available in 'Data Integration Service' (DIS). DIS picks its 'Hadoop Distribution directory' at the runtime, based on 'Cluster Configuration Object' (CCO), associated with the Hadoop pushdown connection. 'Distribution type' and 'Distribution version' attributes in the CCO would be used by DIS for selecting correct Hadoop distribution directory. For more information, refer to KB 533148.​

  • Ensure that for the user, provided in 'Blaze User Name' attribute of 'Hadoop Pushdown Connection', impersonation proxyuser entries are added to the 'core-site.xml'  file of Hadoop cluster. If 'Blaze User' attribute is left as blank in the Hadoop connection, then proxyuser impersonation entries for 'Operating System' (OS) user, who started Informatica Domain and hence the DIS, should be configured in the Hadoop cluster.

infa_bdm_1022_blaze_user_name_hadoop_connection.png


Example


If the name of the 'Operating System' user who started DIS is 'infadei', when proxyuser configurations are added in the cluster, entries similar to the following should be available in the 'core-site.xml' file:

 

<property>

<name>hadoop.proxyuser.infadei.groups</name>

<value>*</value>

<description>Allows impersonation from any group.</description>

</property>

 

<property>

<name>hadoop.proxyuser.infadei.hosts</name>

<value>*</value>

<description>Allows impersonation from any host.</description>

</property>

 

<property>

<name>hadoop.proxyuser.yarn.groups</name>

<value>*</value>

<description>Allows impersonation from any group.</description>

</property>

 

<property>

<name>hadoop.proxyuser.yarn.hosts</name>

<value>*</value>

<description>Allows impersonation from any host.</description>

</property>


 ​​For more information, related to configuring the impersonation proxyuser settings in the Hadoop cluster, refer to the KB 561734​​.

 

  • Make sure that the resource configuration in 'YARN' service of Hadoop Cluster is meeting the following minimum requirements for Blaze Grid Manager application:

 

Configuration in yarn-site.xml

New value

yarn.scheduler.minimum-allocation-mb

1024

yarn.scheduler.maximum-allocation-mb

6144

yarn.scheduler.minimum-allocation-vcores

1

yarn.scheduler.maximum-allocation-vcores

4

 

Configuration values for YARN service can be viewed from Hadoop Administration Manager (Ambari in-case of Hortonworks, or Cloudera Manager) or through 'yarn-site.xml' file. If the configuration values are less than the specified, update the corresponding configurations to meet the minimum requirement. Once modified, restart the affected services including YARN in the Hadoop Cluster.


In general, for 'n' data node(s) Hadoop cluster, it would be required to have, at the minimum, '(n*2)+3' number of YARN containers available during Blaze Engine's startup. For instance, On 10 data nodes Hadoop cluster, Blaze Engine would require 23 YARN containers for its successful startup.  For more information, refer to KB 533143.

 

  • Ensure that the following memory-related properties of Blaze engine components are configured as per the available resources in the Hadoop cluster and not with higher values:


Default settings


infagrid.blaze.console.memory=2048                        ## Blaze Job Monitor component memory configuration

infagrid.def.max.memory=4096                              ## Blaze 'Data Exchange Framework' component memory​ configuration​

infagrid.orch.scheduler.oop.container.pref.memory=5120    ## Blaze 'DTM/Tasklet' component memory configuration​

infagrid.orch.scheduler.oop.container.pref.vcore=4        ## Blaze 'DTM/Tasklet' component vcore​ configuration​


Note: 

    • For Blaze Engine Components - 'Blaze Grid Manager''Orchestrator''DTM Process/OOP Container Manager' - by default, memory usage would be equivalent to the 'yarn.scheduler.minimum-allocation-mb' setting in YARN service. 
    • 'vcore' usage of all the Blaze Engine components, except DTM/Tasklet, would be equivalent to the 'yarn.scheduler.minimum-allocation-vcores' setting in YARN service. 
    • ​For re-configuring the Blaze Tasklet/YARN scheduler 'memory' & 'vcore' configurations, based on available resources in the Hadoop cluster, refer to KB 533265.​​
    • In case of Pre-Informatica 10.2.1 versions, the 'memory' & 'vcores' configuration settings of configurable Blaze Engine components would be available ​at 'hadoopEnv.properties' file at '$INFA_HOME/services/shared/hadoop/[distribution]/infaConf' location. 
    • Starting from Informatica 10.2.1 version, 'hadoopEnv.properties' file would not be available in Informatica server machine. For ease of use, configuration of all the Hadoop pushdown job settings have been moved to the 'Hadoop Pushdown connection'. As a result of the same, all the settings can be configured directly, either from Informatica Developer client or from Informatica Administrator console. For more information, refer KB 532971.  

​​​​


More Information
For more information on 'Blaze Architecture', refer to following document, which explains about 'Blaze Engine Architecture'  and its core components:

 

​​​https://docs.informatica.com/big-data-management/data-engineering-integration/10-4-0/administrator-guide/introduction-to-data-engineering-administration/hadoop-integration/run-time-process-on-the-blaze-engine.html​


Applies To
Product: Data Engineering Integration(Big Data Management); Data Engineering Quality(Big Data Quality); Enterprise Data Preparation; Enterprise Data Catalog
Problem Type: Configuration; Sizing
User Type: Administrator
Project Phase: Configure; Onboard
Product Version: Informatica 10.1; Informatica 10.1.1; Informatica 10.2; HotFix; Informatica 10.2.1; Informatica 10.2.1 Service Pack 1; Informatica 10.2.2; Informatica 10.4
Database:
Operating System: Linux
Other Software:

Reference

Attachments

Last Modified Date:3/31/2020 12:25 AMID:524728
People who viewed this also viewed

Feedback

Did this KB document help you?



What can we do to improve this information (2000 or fewer characters)