Skip Ribbon Commands
Skip to main content
Navigate Up
Sign In

Quick Launch

Average Rating:

facebook Twitter
Email
Print Bookmark Alert me when this article is updated

Feedback

Mapping reading/writing from/to Hive on S3 fails with "java.io.IOException: No CredentialProviderFactory" in Spark execution
Problem Description
Mapping reading/writing from/to Hive on S3 fails with the following stack in Spark execution:

19/03/06 13:17:15 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, dev-hdp-edw.corporate.abc.com, executor 1): java.io.IOException: Cannot find password option fs.s3a.access.key
at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:489)
at org.apache.hadoop.fs.s3a.S3AUtils.getPassword(S3AUtils.java:468)
at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:451)
at org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:341)
at org.apache.hadoop.fs.s3a.S3ClientFactory$DefaultS3ClientFactory.createS3Client(S3ClientFactory.java:73)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:185)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:306)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:237)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1204)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1113)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:251)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:250)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Configuration problem with provider path.
at org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:1999)
at org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:1959)
at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:484)
... 42 more
Caused by: java.io.IOException: No CredentialProviderFactory for jceks://hdfs@dev-hdp-edw.corporate.abc.com:8020/user/s3/keys/hdp_dev.jceks in hadoop.security.credential.provider.path
at org.apache.hadoop.security.alias.CredentialProviderFactory.getProviders(CredentialProviderFactory.java:66)
at org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:1979)
... 44 more

Cause
The issue was due to S3 credentials not being passed properly to the Spark executor. S3 credentials were passed through JCEKS approach and not as a secret & access key configured in the site xml.
Solution

To resolve the issue, when S3 credentials are passed through JCEKS (instead of secret & access key configured in the site xml), you need to configure the 'JCEKS file path' in the hive configuration file (hive-site.xml) at the Cluster configuration Object (CCO) in Informatica.

 

Adding JCEKS path in the hive-site.xml resolved the issue.

More Information
Applies To
Product: Data Engineering Integration(Big Data Management)
Problem Type: Configuration
User Type: Developer
Project Phase: Configure; Implement
Product Version:
Database:
Operating System:
Other Software:

Reference
Attachments
Last Modified Date:11/19/2019 1:18 AMID:572140
People who viewed this also viewed

Feedback

Did this KB document help you?



What can we do to improve this information (2000 or fewer characters)