Skip Ribbon Commands
Skip to main content
Navigate Up
Sign In

Quick Launch

Average Rating:

facebook Twitter
Email
Print Bookmark Alert me when this article is updated

Feedback

ERROR: "java.net.URISyntaxException: Expected scheme-specific part" while running mappings in Spark Engine mode using Informatica DEI
Problem Description

While running mappings in 'Spark' execution engine using Informatica 'Data Engineering Integration' (DEI), earlier known as 'Big Data Management' (BDM), mapping execution fails. In the mapping run log, the following error trace could be observed:

 

Log Trace

 

2018-02-19 02:37:02.220 <CmdExecInProcessTasks-pool-2-thread-2> SEVERE: [Cleanup] [HadoopFSRmRfTask]java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 5: hdfs:
        at com.informatica.platform.dtm.executor.hadoop.fs.impl.AbstractFileSystemImpl.globStatus(AbstractFileSystemImpl.java:312)
        at com.informatica.platform.dtm.executor.hadoop.impl.cmdtasks.HadoopFSRmRfTask.call(HadoopFSRmRfTask.java:35)
        at com.informatica.platform.dtm.executor.hadoop.impl.cmdtasks.HadoopFSRmRfTask.call(HadoopFSRmRfTask.java:1)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511

​​
Cause

Encountered issue occurs when HDFS 'Staging' and 'Event' log directories for 'Spark Engine' are specified with HDFS name node details in the 'Hadoop Pushdown' connection used for execution. However, it would be required to specify the HDFS path without name node details. That is, without 'hdfs://' protocol details.

Solution

Perform the following steps for resolving the encountered issue:

 

  1. Login to Informatica Administrator console or Informatica Developer Client tool.
  2. Edit the 'Hadoop Pushdown Connection' being used for running Spark jobs.
  3. Navigate to 'Spark Engine' section in the connection.
  4. Update the values of 'Spark Staging Directory' and 'Spark Eventlog directory' as below:

 

Before:

 

Spark Staging Directory

hdfs://<name_node_service>/user/spark/workdir

Spark Event Log directory

hdfs://<name_node_service>/user/spark/eventlog

 

After:

 

Spark Staging Directory

/user/spark/workdir

Spark Event Log directory

/user/spark/eventlog

 

        5. Once updated, save the changes made to connection.

        6. Ensure that the folders specified as 'Spark Staging and Event Log' directory exists in HDFS and the impersonation user, specified under 'Common Attributes' section of the connection has required permissions on the folder. 

        7. Once verified, re-run the mapping in Spark Execution mode.

More Information
Applies To
Product: Data Engineering Integration(Big Data Management); Data Engineering Quality(Big Data Quality); Data Engineering Streaming(Big Data Streaming); Enterprise Data Preparation
Problem Type: Configuration; Connectivity
User Type: Administrator; Developer
Project Phase: Onboard; Configure
Product Version: Informatica 10.1; Informatica 10.1.1; HotFix; Informatica 10.2; Informatica 10.2.1; Informatica 10.2.1 Service Pack 1; Informatica 10.2.2; Informatica 10.4
Database:
Operating System:
Other Software:

Reference
Attachments
Last Modified Date:3/31/2020 4:36 AMID:526526
People who viewed this also viewed

Feedback

Did this KB document help you?



What can we do to improve this information (2000 or fewer characters)