Skip Ribbon Commands
Skip to main content
Navigate Up
Sign In

Quick Launch

Average Rating:

(1 Rating)
facebook Twitter
Email
Print Bookmark Alert me when this article is updated

Feedback

HOW TO: Configure Sqoop for Oracle Databases in Informatica Developer
Solution
Effective in version 10.1, in Informatica Developer, you can use Sqoop to import and export data. Sqoop is a command line program to process data between relational databases and HDFS through MapReduce programs. When you use Sqoop, you do not need to install the relational database client and software on any node in the Hadoop cluster.

To use Sqoop with Informatica Developer, you must configure Sqoop properties in a JDBC connection and run the mapping in the Hadoop environment. You can configure Sqoop connectivity for relational data objects and customized data objects.

The Model Repository Service uses JDBC to import metadata. The Data Integration Service runs the mapping in the Hadoop run-time environment and pushes the job processing to Sqoop. The Sqoop program then creates MapReduce jobs in the Hadoop cluster.

To configure Sqoop for processing data between an Oracle database and HDFS, perform the following steps:
  1. Install Informatica Developer version 10.1 and complete the prerequisite tasks.
  2. Install Informatica Big Data Management version 10.1 and complete the prerequisite tasks.
  3. Set the infapdo.aux.jars.path property in the hadoopenv.properties file if you use an Informatica Developer version earlier than 10.2. You do not need to set this property from version 10.2 onwards.
  4. Set the Hadoop_node_jdk_home property in the 'hadoopEnv.properties' (or) in the Hadoop connection, depending on Informatica version. For more information, refer to KB 527642
  5. Download the Oracle JDBC driver jar files. You can use any Type 4 JDBC driver that the database vendor recommends for Sqoop connectivity.
  6. Note: The DataDirect JDBC drivers that Informatica ships are not licensed for Sqoop connectivity.
  7. On the node where the Data Integration Service runs, copy the JDBC driver jar files to the following directory:
    <Informatica installation directory>\externaljdbcjars 
    At run time, the Data Integration Service copies the jar files to the Hadoop distribution cache so that the jar files are accessible to all nodes in the Hadoop cluster.
    If the Data Integration Service runs on a grid, repeat this step on all nodes in the grid.
  8. Create a JDBC connection to connect to the Oracle database.
    JDBC Driver Class Name: com.informatica.jdbc.oracle.OracleDriver (use the Data Direct Driver and not the vendor JDBC driver)
    This will apply to access the metadata.
  9. In the JDBC connection, select Sqoop v1.x from the Use Sqoop Connector list.
  10. In the Sqoop Arguments field, enter the Sqoop arguments to connect to the Oracle database. 
    The --connect argument is mandatory
    Syntax:
    - using SID:                  --connect jdbc:oracle:thin:@<database_host_name>[:<database_port>]:<database_sid>
    - using ServiceName: --connect jdbc:oracle:thin:@//<database_host_name>[:<database_port>]/<service>
  11. oracle_sqoop.png
  12. Alternatively, to use the OraOop connector with Sqoop, you must specify the --direct argument in the Sqoop Arguments field. In such a case, you will also need to use a different syntax for multiple arguments: -D<argument_1=value_1> -D<argument_2=value_2>
  13. Save the JDBC connection.
  14. Create a mapping with the required source, transformations, and target.
    You can then specify additional Sqoop arguments and disable the Sqoop connector at the mapping level, if required. 
    The Sqoop arguments that you specify at the mapping level take precedence over the arguments that you specified in the JDBC connection.
  15. Select the Read or Write transformation that is associated with the data object. 
  16. Click the Advanced tab.
  17. To disable the Sqoop connector for the data object, select the Disable Sqoop Connector check box. 
  18. To specify additional Sqoop import arguments for the data object, enter the import arguments in the Additional Sqoop Import Arguments text box.
    When you import data from an Oracle database, if the source table does not have a primary key defined, you must use the --split-by argument to import data based on the column name. 
  19. To specify additional Sqoop export arguments for the data object, enter the export arguments in the Additional Sqoop Export Arguments text box.
  20. Run the mapping in the Hadoop run-time environment. You must use a Hadoop connection and run the mapping on the Hive engine.
    You can view log events for a Sqoop mapping job in the Monitoring tab of the Administrator tool. You can view the results and statistics of the Sqoop mapping job under the stderr section of the Hadoop cluster logs.
​​​
More Information
You must consider the following conditions while configuring the Sqoop connector for an Oracle database:
  • When you specify the --driver oracle.jdbc.driver.OracleDriver Sqoop argument along with the connect string, an error occurs while reading data. Hence, you must not specify the driver argument. The Sqoop program selects the driver automatically at run time when the --driver argument is not specified.
  • When you use the --append argument, the Sqoop program ignores the argument during import and export. 
  • When you write data to a flat file through Sqoop and use the double, real, and float data types, the Sqoop program writes only the integer value into the flat file.
  • When you recreate the target table for a data object that uses Sqoop, the Sqoop program honors only the ANSI SQL standard syntax.
  • If the metadata from the Developer tool does not match the metadata in the database, or some of the columns are not linked between the Read and Write transformations in a Sqoop mapping, the mapping fails.
You must consider the following conditions while configuring OraOop:
  • The Sqoop program does not honor the oraoop-site.xml for import and export. You must specify the OraOop arguments within the Sqoop mapping, as the Sqoop arguments.
  • When you run a Sqoop mapping to import or export Date and Timestamp values, you might encounter Oracle errors. To avoid such errors, you must set the OraOop attribute -Doraoop.timestamp.string=false.
  • When you use OraOop, the table name that is passed to the –table argument will always be converted to upper case. Hence, while recreating the target table by using the OraOop template table, case-sensitive table names cannot be created.
Reference
Applies To
Product: Data Engineering Integration(Big Data Management); Data Engineering Quality(Big Data Quality); Enterprise Data Preparation
Problem Type: Configuration; Connectivity
User Type: Architect; Business Analyst
Project Phase: Configure; Onboard
Product Version: Big Data Management 10.1; Big Data Management 10.2; Big Data Management 10.2.1; Big Data Management 10.2.2; Big Data Management 10.2.2 Service Pack 1
Database: Oracle
Operating System:
Other Software:
Attachments
Last Modified Date:12/31/2019 2:37 AMID:500711
People who viewed this also viewed

Feedback

Did this KB document help you?



What can we do to improve this information (2000 or fewer characters)