Skip Ribbon Commands
Skip to main content
Navigate Up
Sign In

Quick Launch

Average Rating:

(2 Ratings)
facebook Twitter
Email
Print Bookmark Alert me when this article is updated

Feedback

HOW TO: Configure PowerExchange for Hadoop for PowerCenter to enable read from or write data to Hadoop clusters
Solution

You can configure PowerCenter and the PowerCenter Integration Service to read data from and write data to a Hadoop cluster. The Hadoop cluster could be a High Availability (HA), non-HA, Kerberos-enabled, or non-Kerberos cluster.


Perform the following steps to configure PowerCenter for Cloudera, Hortonworks, IBM BigInsights, and MapR distributions: 

  1. On the Informatica node where PowerCenter Integration Service runs, create a directory. The PowerCenter Administrator user must have the read access on this directory. For example: <infa_home>/pwx-hadoop/conf
  2. Copy the following files from Hadoop cluster to the directory created in step 1:
    /etc/hadoop/conf/core-site.xml
    /etc/hadoop/conf/mapred-site.xml
    /etc/hadoop/conf/hdfs-site.xml
    /etc/hive/conf/hive-site.xml
  3. Optional. Applicable to Kerberos-enabled clusters. Run the kinit on the Informatica node where PowerCenter Integration Service is running to create the Kerberos ticket cache file.
     For example: /tmp/krb5cc_<UID>
  4. Optional. Applicable to kerberos-enabled clusters except MapR. Edit the core-site.xml file in the directory created in step 1 and add the following parameter:
    <property>
    <name>hadoop.security.kerberos.ticket.cache.path</name>
    <value>/tmp/REPLACE_WTH_CACHE_FILENAME</value>
    <description>Path to the Kerberos ticket cache. </description>
    </property>
  5. In the Administrator tool, go to the Services and Nodes tab. Select the Processes view for the required PowerCenter Integration Service and add the environment variable "CLASSPATH" with the value of the directory created in step 1.
  6. Restart the PowerCenter Integration Service.
  7. In the Workflow Manager, create the HDFS connection and assign to source or target and run the workflow. When you create the HDFS connection, use the value for the fs.default.name property for the NameNode URI. You can find the value for the fs.default.name property in the core-site.xml file.
More Information

Depending on the encryption used in the Kerberos tickets, it might be required to install the Java Cryptography Extension files and update the files in the JVM bundled with Informatica. 
Refer KB 166746​ for more information.

​​When attempting to read/write to a Kerberized HDFS without making these configuration changes, an error similar to the following will be encountered:

2015-07-30 17:07:56 : ERROR : (30775 | WRITER_1_*_1) : (IS | is) : node01 : HDFS_66008 : 
File [/user/sample.txt] could not be opened because of the following error: [org.apache.hadoop.security.AccessControlException:
SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Reference
CR 384771​
Applies To
Product: PowerCenter
Problem Type:
User Type: Developer
Project Phase:
Product Version: Informatica 9.5.1; Informatica 9.6.0; Informatica 10.0; Informatica 10.1; Informatica 10.1.1; Informatica 10.2
Database:
Operating System: Linux; Windows
Other Software:
Attachments
Last Modified Date:11/6/2017 10:19 PMID:161419
People who viewed this also viewed

Feedback

Did this KB document help you?



What can we do to improve this information (2000 or fewer characters)