Skip Ribbon Commands
Skip to main content
Navigate Up
Sign In

Quick Launch

Average Rating:

facebook Twitter
Email
Print Bookmark Alert me when this article is updated

Feedback

FAQ: Why is 'Data Integration Service Hadoop Distribution Directory' attribute not available under 'Execution Options' section of DIS from Informatica BDM 10.2.1?
Answer

Informatica BDM supports execution of mappings in the Hadoop Environment. For running mappings in Hadoop Environments, one of the following execution engines can be chosen:

 

  1. Spark Engine
  2. Blaze Engine
  3. Hive Engine ('Map Reduce' or 'Tez' modes) (Available in Pre-Informatica 10.2.2 versions and not available from Informatica 10.2.2 version onwards)

 

Starting from Informatica 10.2.1 version, it is not required to configure the 'Hadoop Distribution Directory' with 'Data Integration Service' (DIS), used for running pushdown mapping. DIS picks 'Hadoop Distribution directory' at the runtime, based on 'Cluster Configuration Object' (CCO), associated with the Hadoop pushdown connection. 'Distribution type' and 'Distribution version' attributes in the CCO would be used for selecting correct Hadoop distribution directory.

 

For instance, if the 'Distribution type' of the CCO is 'Hortonworks' and its 'Distribution Version' is specified as '2.6', then Data Integration Service would be using '$INFA_HOME/services/shared/hadoop/HDP_2.6' as its 'Hadoop distribution directory', while running Hadoop pushdown jobs.

 

1021_dis_cco_distribution_type_version.png 

1021_dis_hadoop_distribution_directory.png 

Due to the aforementioned changes in the 'Data Integration Service Hadoop Distribution Directory' attribute, which was used in earlier Informatica versions to configure the distribution directory, would not be available under 'Execution Options' properties of DIS.

​​​​

More Information
​ 

Sample Mapping log trace for 'HDP' Cluster

2018-04-23 19:11:28.118 <LdtmCompile-pool-2-thread-31> INFO:  Completed creating static ports for the generated ports in the transformation instance [TARGET].

2018-04-23 19:11:28.121 <LdtmCompile-pool-2-thread-31> INFO: [LDTM_0027] LDTM: Mapping compilation done.

2018-04-23 19:11:28.847 <LdtmCompile-pool-2-thread-31> INFO: [LDTM_0117] Choose mapping execution engine [Spark engine].

2018-04-23 19:11:29.087 <LdtmCompile-pool-2-thread-31> INFO: [LDTMCMN_0037] The Hadoop distribution directory is defined in Data Integration Service properties at the path [/data/informatica/1021/services/shared/hadoop/HDP_2.6].

2018-04-23 19:11:29.108 <LdtmCompile-pool-2-thread-31> INFO: [CLUSTERCONF_10024] The cluster configuration [HDP_26_Multi_Node_Kerberos_Cluster_infagcs] is unchanged from the last export. Using the existing export file [/data/informatica/1021/tomcat/bin/disTemp/D_Delphinus/DIS_Delphinus_HDP/hdp_26_multi_node_kerberos_cluster_infagcs/SPARK/infacco-site.xml].

 

2018-04-23 19:20:45.583 <LdtmCompile-pool-2-thread-31> INFO: [AUTOINST_3029] The cluster Hadoop distribution type is: [HDP_2.6].

2018-04-23 19:20:45.583 <SyncInfaBinaries> INFO: [AUTOINST_3001] The MD5 hexadecimal value for the Informatica archive is [b8b51cf517428af4d3ecedab6b4bebe5].

2018-04-23 19:27:46.733 <SyncInfaBinaries> INFO: [AUTOINST_3026] The Informatica archive already exists on the Data Integration Service machine. No archiving necessary.

  

Sample Mapping log trace for 'CDH' Cluster

 

2018-04-24 15:39:31.953 <LdtmCompile-pool-2-thread-11> INFO: [LDTMCMN_0037] The Hadoop distribution directory is defined in Data Integration Service properties at the path [/data/informatica/1021/services/shared/hadoop/CDH_5.13].

2018-04-24 15:39:31.953 <LdtmCompile-pool-2-thread-11> INFO: [CLUSTERCONF_10024] The cluster configuration [CDH_513_Multi_Node_Cluster_ths] is unchanged from the last export. Using the existing export file [/data/informatica/1021/tomcat/bin/disTemp/D_Delphinus/DIS_Delphinus_CDH/cdh_513_multi_node_cluster_ths/HADOOP/infacco-site.xml].

2018-04-24 15:39:31.953 <LdtmCompile-pool-2-thread-11> INFO: [CLUSTERCONF_10028] Based on the Hadoop distribution [CLOUDERA] and the run-time engine [HADOOP], the Data Integration Service will override the following cluster configuration properties at run time: null​


Applies To
Product: Data Engineering Integration(Big Data Management); Data Engineering Quality(Big Data Quality); Data Engineering Streaming(Big Data Streaming); Enterprise Data Preparation
Problem Type: Configuration; Product Feature
User Type: Developer
Project Phase: Implement; Onboard
Product Version: Informatica 10.2.1; Informatica 10.2.1 Service Pack 1; Informatica 10.2.2
Database:
Operating System:
Other Software:

Reference

Attachments

Last Modified Date:7/21/2019 7:30 AMID:533148
People who viewed this also viewed

Feedback

Did this KB document help you?



What can we do to improve this information (2000 or fewer characters)