Skip Ribbon Commands
Skip to main content
Navigate Up
Sign In

Quick Launch

Average Rating:

facebook Twitter
Email
Print Bookmark Alert me when this article is updated

Feedback

ERROR: "Cluster does not have sufficient resources for Blaze services, please decrease the memory and CPU requirements for services." while starting Blaze Engine in Informatica BDM
Problem Description

While running mappings in 'Blaze' execution engine using Informatica BDM, mapping execution fails with the following error message in the mapping run log:

 

Mapping Log Trace

 

2018-06-19 21:32:33.956 <LdtmWorkflowTask-pool-1-thread-10> INFO: [task MAINSESSION_task1]: Set log4j root logger's logging level back to INFO

2018-06-19 21:32:33.957 <LdtmWorkflowTask-pool-1-thread-10> SEVERE: The Integration Service failed to run the task [MAINSESSION_task1]. See the additional error messages for more information.

com.informatica.sdk.dtm.ExecutionException: [[GRIDDTM_1011] The Integration Service failed to execute grid mapping.]

[[CAL_API_1] The Integration Service encountered an unexpected error condition: [java.lang.RuntimeException: Service [Blaze_Grid_Manager_Service] has stopped executing; indthsbde002.informatica.com:12493]..

at com.informatica.dtm.executor.grid.cal.yarn.svc.CADIRuntimeRegistry.checkCADIInitialization(CADIRuntimeRegistry.java:661)

at com.informatica.dtm.executor.grid.cal.yarn.svc.CADIRuntimeRegistry.validateGridMgrService(CADIRuntimeRegistry.java:542)

at com.informatica.dtm.executor.grid.cal.yarn.svc.CADIRuntimeRegistry.startCADIGridManager(CADIRuntimeRegistry.java:459)

at com.informatica.dtm.executor.grid.cal.yarn.svc.CADIRuntimeRegistry.startCADIServices(CADIRuntimeRegistry.java:367)

at com.informatica.dtm.executor.grid.cal.yarn.svc.CADIRuntimeRegistry.access$1(CADIRuntimeRegistry.java:318)

at com.informatica.dtm.executor.grid.cal.yarn.svc.CADIRuntimeRegistry$StartCADIHandler.run(CADIRuntimeRegistry.java:812)

at com.informatica.dtm.executor.grid.cal.yarn.svc.CADIRuntimeRegistry$StartCADIHandler.run(CADIRuntimeRegistry.java:1)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at com.informatica.platform.dtm.executor.hadoop.impl.AbstractIUserGroupInformationImpl.doAs(AbstractIUserGroupInformationImpl.java:133)

at com.informatica.dtm.executor.grid.cal.yarn.svc.CADIRuntimeRegistry.initCADIServices(CADIRuntimeRegistry.java:151)

at com.informatica.dtm.executor.grid.cal.yarn.client.YarnClusterServicesCnxFactory.initCADIServices(YarnClusterServicesCnxFactory.java:78)

at com.informatica.platform.dtm.executor.grid.GridExecutor.initCADIServices(GridExecutor.java:993)

at com.informatica.platform.dtm.executor.grid.GridExecutor.preProcessing(GridExecutor.java:412)

at com.informatica.platform.dtm.executor.grid.GridExecutor.runAsync(GridExecutor.java:274)

at com.informatica.platform.dtm.executor.grid.task.impl.GridMappingTaskHandlerImpl.executeMainScriptAsync(GridMappingTaskHandlerImpl.java:143)

at com.informatica.executor.workflow.taskhandler.impl.BaseTaskHandlerImpl.startTaskAsync(BaseTaskHandlerImpl.java:212)

at com.informatica.executor.workflow.taskhandler.impl.BaseTaskHandlerImpl.runAsync(BaseTaskHandlerImpl.java:188)

at com.informatica.executor.workflow.taskhandler.impl.BaseTaskHandlerImpl.run(BaseTaskHandlerImpl.java:115)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

].]

 

In the 'stderr' log of the 'Blaze Grid Manager' application, events similar to the following could be observed:

 

'stderr' Log Trace

 

2018-06-20 01:32:28.031    INFO: Resource calibration performed: min-vcores-allocation [1], min-memory-allocation-mb [4112]..
2018-06-20 01:32:28.031    INFO: Resource calibration performed: increment-allocation-vcores[1], increment-allocation-mb [4112]..

2018-06-27 03:08:55.635    INFO: Resource calibration performed: increment-allocation-vcores[1], increment-allocation-mb [32]..

2018-06-27 03:08:55.657 WARNING: Maximum resource availability per node: Memory = 11520MB , Vcores = 8

2018-06-27 03:08:55.658 WARNING: Total minimum resource requirement for Blaze on each node: Memory = 13312MB , Vcores = 9

2018-06-27 03:08:55.658 WARNING: Minimum normalized resource requirement configuration for Blaze services on each node:

Grid Manager Service: Memory Requirement = 1024MB , Vcores Requirement = 1

Orchestrator Service: Memory Requirement = 1024MB , Vcores Requirement = 1

Container Manager Service: Memory Requirement = 1024MB , Vcores Requirement = 1

Data Exchange Service: Memory Requirement = 5120MB , Vcores Requirement = 1

Tasklet Container: Memory Requirement = 3072MB , Vcores Requirement = 4

Blaze Job Monitor Service: Memory Requirement = 2048MB , Vcores Requirement = 1

2018-06-27 03:08:55.659  SEVERE: Cluster does not have sufficient resources for Blaze services, please decrease the memory and cpu requirements for services.

2018-06-27 03:08:55.661  SEVERE: Service [Blaze_Grid_Manager_Service] initialization method failed:

java.lang.RuntimeException: Cluster does not have sufficient resources for Blaze services.

​ ​​

Cause

The encountered issue occurs either when sufficient resources are not available in the entire Hadoop cluster or if the resource requirement of Blaze Engine from each of the Hadoop data nodes, is more than the resources available on the node.

 

Memory management for all the applications, executed in Hadoop cluster, is managed by YARN service. When any application gets executed in Hadoop cluster, it would request YARN scheduler for the resources (CPU/Memory). YARN service will allocate resources in the form of containers, from any of the data nodes in the cluster, to the requesting application. Container is the basic resource allocation unit and it would be an encapsulation of both CPU core and Memory. 

Solution

Perform the following steps to resolve the encountered issue:

 

  1. Estimate the number of YARN containers required for Blaze Engine startup. In general, for a 'n' data nodes cluster, '(2*n)+3' YARN containers would be required for Blaze Engine startup. i.e. for 10 data nodes cluster, 23 YARN containers would be required for Blaze Engine startup. For more information, refer to KB 533143
  2. Find the minimum 'Memory' and 'vcore' settings for container in the 'Hadoop Cluster'. It can be found using one of the following methods:

 

  • Following attributes in 'yarn-site.xml' file, corresponding to the Hadoop cluster

 

'yarn-site.xml' attribute

Remarks

yarn.scheduler.minimum-allocation-mb

Minimum Memory allotted to the YARN container

yarn.scheduler.minimum-allocation-vcores

Minimum vCore (CPU Core) allotted to the YARN container

 

  • An entry similar to the following in the 'stderr' log of the failed 'Blaze Grid Manager' application:

 

2018-06-20 01:32:28.031    INFO: Resource calibration performed: min-vcores-allocation [1], min-memory-allocation-mb [4096]..

 

In the aforementioned example, the minimum vcores for YARN container is '1' and the 'Minimum Memory' allotted is 4096 MB (4 GB)

 

     3. Log in to YARN Resource Manager Web UI and find the available memory and vCores in the Hadoop cluster.

 

 

Available Memory = Memory Total - Memory Used

Available VCores = VCores Total - Vcores Used

 

For instance, consider the following 'YARN Resource Manager' details corresponding to one of the Hadoop clusters:

 

Yarn_Resource_manager_WebUI_available_memory_and_vcores_2.png

 

For the aforementioned Hadoop cluster, following are the available memory and vCores

 

Available Memory = 84 GB - 15 GB = 69 GB

Available vCores = 75 - 9 = 66

 

 

      4. Estimate the number of 'YARN' containers that can be created for the available Memory and vCores in the Hadoop cluster, using the following formula:

 

Minimum available YARN containers based on available memory = INT ( Available_Memory_in_MB / yarn.scheduler.minimum-allocation-mb )

Minimum available YARN containers based on available vCores = INT ( Available_vCores / yarn.scheduler.minimum-allocation-vcores )

 

Available YARN containers = MIN ( Minimum available YARN containers based on available memory , Minimum available YARN containers based on available vCores )

 

Example

Consider the following settings and resource availability in the Hadoop cluster:

 

In 'yarn-site.xml'

 

'yarn-site.xml' attribute

Value

yarn.scheduler.minimum-allocation-mb

2048

yarn.scheduler.minimum-allocation-vcores

2

 

From Resource Manager Web UI

 

Available Memory =  69 GB = 69 * 1024 MB

Available vCores =  66

 

Minimum available YARN containers based on available memory = INT( 69 * 1024 / 2048 )

      = INT( 69*0.5)

      = INT(34.5)

      = 34

 

Minimum available YARN containers based on available vCores = INT ( 66 / 2 )

    = INT(33)

    = 33

 

Available YARN containers = MIN ( 34,33 ) = 33 

 

Hence, for the aforementioned YARN settings and resource availability in Hadoop cluster, 33 containers are available with YARN service for running applications in the Hadoop cluster.


Note: When the Blaze Engine is configured to startup in specific 'YARN Queue'/'Node Label', it would be required to calculate the available YARN containers based on the available free resources in the corresponding 'YARN Queue'/'Node Label', than using the total free available resources at the Hadoop cluster. ​

 ​


Perform one of the following set of actions, depending on the number of available 'YARN' containers in cluster (obtained in Step-4) and the number of YARN containers estimated for 'Blaze Engine' startup (estimated in Step-1): 

 

Number of available YARN containers < Estimated number of YARN containers for 'Blaze Engine' startup

  • Work with Hadoop Admin team in reconfiguring the following attributes of 'YARN' service file, which would help in increasing the available YARN containers.

 

'yarn-site.xml' attribute

yarn.scheduler.minimum-allocation-mb

yarn.scheduler.increment-allocation-mb

yarn.scheduler.maximum-allocation-mb

yarn.scheduler.increment-allocation-vcores

yarn.scheduler.minimum-allocation-vcores

yarn.scheduler.maximum-allocation-vcores


Note: 

​    ​For more information on the configuring those parameters and on how it affects the total available YARN containers,  refer to the following documentation:

 

  

  • Once sufficient YARN containers are confirmed to be available in the Hadoop cluster, re-run the mapping in 'Blaze' mode and then 'Blaze Engine' should start successfully. 


Number of available YARN containers > Estimated number of YARN containers for 'Blaze Engine' startup

 

Perform the following steps depending upon the Informatica BDM version, to re-configure the resources requested by 'Blaze Engine' and to resolve the issue:


 From Informatica 10.2.1 version

  1. Login to Informatica Administrator console or launch Informatica Developer client.
  2. Navigate to 'Connections' tab in case of Admin console and 'Windows > Preferences > Connections > [Domain]> Cluster' , when Developer client is used.
  3. Select the 'Hadoop Pushdown' connection, being used for running the jobs.
  4. Navigate to 'Blaze Engine' section.
  5. Edit the 'Advanced Properties'  attribute in the section.
  6. In the 'Advanced Properties' window, update the value of property as below:

 

infagrid.orch.scheduler.oop.container.pref.memory

<memory_values_to_match_requirement>

infagrid.orch.scheduler.oop.container.pref.vcore

<cpu_core_values_to_match_requirement>

 

For instance, if 2 GB of memory and 2 CPU cores are planned to be used, then update the values as below:

 

infagrid.orch.scheduler.oop.container.pref.memory

2048

infagrid.orch.scheduler.oop.container.pref.vcore

2


infa_bdm_1021_blaze_advanced_properties.jpg 

   

      7. Save the changes made to the Hadoop connection.

 ​

Pre-Informatica 10.2.1 versions
    1. Login to Informatica Server machine, where the DIS used for mapping execution is running.
    2. Navigate to '$INFA_HOME/services/shared/hadoop/<distribution>/infaConf' location.
    3. Take a backup of the existing 'hadoopEnv.properties' file.
    4. Edit and update the 'hadoopEnv.properties' file as below:

 

Before

 

#OOP Container memory and Vcores config (1 Vcore per EDTM) (Memory in MB)

infagrid.orch.scheduler.oop.container.pref.memory=5120

infagrid.orch.scheduler.oop.container.pref.vcore=4

 

After

#OOP Container memory and Vcores config (1 Vcore per EDTM) (Memory in MB)

infagrid.orch.scheduler.oop.container.pref.memory=<memory_values_to_match_requirement>

infagrid.orch.scheduler.oop.container.pref.vcore=<cpu_core_values_to_match_requirement>

 

For instance, if 2 GB of memory and 2 CPU cores is planned to be used, then update the values as below:

 

infagrid.orch.scheduler.oop.container.pref.memory=2048

infagrid.orch.scheduler.oop.container.pref.vcore=2

 

​     5. Once updated, save the changes.

  


Once configuration changes are made, re-run the mapping in 'Blaze' mode and it should start successfully.

 

More Information

'infagrid.orch.scheduler.oop.container.pref.memory'  & 'infagrid.orch.scheduler.oop.container.pref.vcore'  are the configurations used by 'Blaze Engine' for the 'DTM' process, which would be performing the execution of submitted mappings in the Hadoop cluster. When the cluster has more YARN resources available, these settings should be configured based on the data volume to be processed by the mapping. 

  • By default, Blaze Engine would be using '5 GB' & '4 CPU' cores for each of the 'DTM Process'.
  • When the volume of data processed by the submitted mappings to Blaze Engine are more, use more memory (pref.memory) & less CPU cores (pref.vcore).
  • When concurrency is important during the mapping execution, preference can be given to increase the 'CPU' cores (pref.vcore) of Blaze Engine.​ 

For more information on 'Blaze Architecture', refer to the following document, which explains about 'Blaze Engine Architecture' and its core components:

 

https://docs.informatica.com/big-data-management/big-data-management/10-2-2-service-pack-1/big-data-management-administrator-guide/introduction-to-big-data-management-administration/hadoop-integration/run-time-process-on-the-blaze-engine.html

Applies To
Product: Big Data Management; Big Data Quality; Enterprise Data Preparation; Enterprise Data Catalog
Problem Type: Configuration; Sizing; Stability; Crash/Hang
User Type: Administrator; Architect; Developer
Project Phase: Configure; Implement; Onboard
Product Version: Informatica 10.1.1; HotFix; Informatica 10.2; Informatica 10.2.1; Informatica 10.2.1 Service Pack 1; Informatica 10.2.2
Database:
Operating System:
Other Software:

Reference
Attachments
Last Modified Date:10/9/2019 9:24 PMID:533265
People who viewed this also viewed

Feedback

Did this KB document help you?



What can we do to improve this information (2000 or fewer characters)