Any fault tolerant enterprise software must provide the following functionality for mission critical deployment. Business relied on high available clusters offerings such as HP ServiceGuard or SunCluster to deploy the mission critical enterprise software. High available clustering solution is very expensive to setup and maintain, hence it is generally in use only for mission critical applications.
The key elements of high availability are
In high available clusters, when a node in the cluster fails, the service processes running on the failed node are automatically migrated to a backup node and services are restarted. The capability is known as failover.
By allowing the service to migrate to a new node and restarting the services, the cluster system minimizes the downtime from failure of one or multiple nodes and thereby delivers more uptime of the software application.
In event of disk failure, Shared-disk cluster allows nodes in the cluster to access data from any disk. The data synchronization is achieved via a distributed lock manager (DLM).
When a node fails in a shared-disk cluster, the new node assuming the role of the failed node can easily gain access to the data due to shared-disk architecture.
Shared nothing cluster does not require a DLM, as each node has ownership of disks. When a node fails, the ownership of the disk must be passed to the new node.
Resiliency is a measure of how well a software application handles transient failures. Software applications often must connect to other computing resources such as a network, RDBMS, or file system. The network connection can fail temporarily. In such an event, the highly resilient software services must be able to cope with it via retry logic with upper limit on timeout value.
Recovery is a technique used by stateful software for successful restart after a failure. Software applications can be either stateless or stateful. An ftp server is an example of a stateless service where as a RDBMS server is a stateful service. The state of the service is recorded in the original node and the state is reconstructed when the service starts up on a new node. The RDBMS product uses the database log to undo or redo the transaction upon restart to provide recovery.
PowerCenter Domain (simply called domain), is the fundamental unit of PowerCenter Services administration. A domain can be likened to a Cluster or a Grid and consists of:
A node is a logical representation of a physical machine. Each node runs a Service Manager (SM) process to control the services running on that node. A node is considered unavailable if the SM process is not up and running. The SM process may not be running for example, if the administrator shutdown the machine or shutdown the SM process. SM processes exchange heartbeat signal periodically among themselves to detect any node/network failure. Upon detecting a primary (or backup) node failure, the remaining nodes determine the new primary (or backup) node via a distributed voting algorithm. Typically, the administrator will configure the OS to automatically start the SM when the OS boots up, or if the SM were to fail unexpectedly. The SM is the primary point of control of PowerCenter Services on the node.
The administrator designates one or more nodes of the domain as gateway nodes. While specifying a single gateway node is sufficient, the administrator must specify multiple gateway nodes if high availability is desired. When multiple gateways are specified, one of them is elected as the current primary gateway.
The primary gateway is responsible for
The remaining gateway nodes are backup gateways. In case the primary gateway or the node it is running on fails, one of the backup gateways automatically becomes the primary. This ensures availability of the domain and the services it hosts, as long as one of the gateway nodes is up. If none of the gateway nodes are up, the domain becomes nonfunctional. If administrator specifies a single gateway, then the administrator must manually start up a new gateway in case the original gateway goes down and all client application must be told to connect to the new gateway node.
The election algorithm uses the database specified at the domain creation time as the arbitrator. The database is used to store the configuration information about the domain and also to resolve the primary gateway. The primary gateway will periodically update a row in the database with its latest timestamp information. This interval for periodic update is typically in the range of single to low two digit seconds. The default interval is currently 8 seconds and can change from release to release.
When a new gateway node is started, it will look at this row in the database and will try to contact the primary gateway using a network connection to join the domain. If the connection is not successful, then the current node will wait for a predefined time interval to check if the timestamp value in the row is changing.
If the primary gateway is not able to update the database row within some multiple of periodic update time interval (the default now is 32 seconds and may change in future releases), then it will give up the primary role and will shut itself down. This is to avoid multiple nodes in a split-brain scenario to become masters simultaneously as the database could be unavailable only to the current primary node.
If multiple gateway nodes are started at the same time, then the first one to obtain a row lock in the database will become the primary gateway.
Every node in the domain sends a heartbeat to the primary gateway in a periodic interval. The default value of this interval is 15 seconds (may change in future release). The heartbeat is a tiny message sent over a network connection. As part of the heartbeat, each node also updates the gateway with the service processes currently running on the node. If a node fails to send a heartbeat during the default time out value which is a multiple of the heartbeat interval (default value now is 90 seconds), then the primary gateway node marks the node unavailable and will failover any of the services running on that node. This way 6 chances are given for the node to update the master before it is marked as down and avoid any false alarms from being raised just because of a single packet loss or in cases of a heavy network load, where the packet delivery could take longer.
While updating the primary gateway, a node could detect that the primary gateway is not responding. If the current node is a gateway node, then it will initiate a master election algorithm as mentioned above. If the current node is not a gateway node, then it will be in the stand by mode to connect to another or same primary gateway.
In either case, if a node is not able to contact a primary gateway within the same predefined time interval of 90 seconds, then it will stop all the service processes running on that node.
The Domain currently supports the following service types:
The domain can host zero or more services of each of the above types of services. The administrator creates, configures, and manages these services.
A PowerCenter Service may depend on one or more other PowerCenter Services. This is the concept of a dependent service. The following lists the current dependencies:
The dependent service may exist in the same domain or even a different domain.
The administrator must specify one primary node and one or more backup nodes to configure/setup each PowerCenter service. The part of a service that runs on a node is referred to as service process. By default, the service process representing the service runs on the primary node for the service. If the primary node were to become unavailable, the service is automatically migrated to one of the backup nodes i.e., the service process is started on the backup node. If the backup node were to then fail, the service would be migrated to the primary node if primary node is available or to another backup node and so on. At any point in time, only one service process will be running for that service in the domain. The service will continue to be available as long as one of the nodes in the list is available and the service can be started on that node. The act of service migration from the failed node to an active node is referred to as failover. Service process crashing is another type failure. When this happens, a service process is started on the primary node if it is available or on a backup node if the primary node is unavailable.
Automatic failover is one element of high availability, but does not guarantee it by itself. PowerCenter services are stateful. To provide high availability for the various services, the service state must be recorded on an ongoing basis. Upon any failure, the service state must be reconstructed and the service must be recovered before allowing new service requests.
Workflow execution can be interrupted by failures or explicit stop/abort requests by the user. When workflows are executing, the execution state is recorded in a shared storage. For workflows this state includes keeping track of which tasks have completed, which ones are still running and the values of workflow variables.
Upon a node failure, the DI service startup on a new node. The shared storage recording the workflow state is accessed to recreate the execution state immediately before the failure. Any interrupted command tasks are rerun from scratch. Interrupted session tasks are handled as explained below. Workflow recovery can be invoked manually (using pmcmd or Workflow Manager/Monitor) or automatically as part of the failover mechanism. If the current primary of the DI Service fails, the workflows running on it will be automatically recovered on the new primary.
The user specifies the strategy for recovering an interrupted session task in the session property.
For the Metadata service, the state information includes the various metadata objects and their states, e.g. objects that are locked by a specific Designer applications or workflow manager applications. The repository state information is recorded in the Repository database.
When the secondary node restarts the Metadata service (upon failure of primary node), the state of the Metadata service is reconstructed (immediately before the failure) before handling new requests. All process appears completely transparent to all active repository clients.
The Web Services Provider Service is a stateless service, hence there is no need for recovery for Web Services Provider Service.
PowerCenter jobs, during execution, generate log files such as Service log, workflow log and session log. These log files are written to file system, local to node. To ensure high availability, the administrator should use a cluster file system for storing the log files. In the event of node failure, the cluster file system will continue to provide access to the log files.
To perform advanced recovery, PowerCenter service writes checkpoint records. Such records are written to file system. The administrator must use a cluster file system for storing checkpoint information to safeguard against node failure.
There are many commercial cluster file systems that provide high availability. In addition, the high availability offering from vendors such as HP, SUN include a high available file system.
PowerCenter service manager and the repository service store its metadata in the RDBMS system. The administrator must ensure that the underlying RDBMS is configured for high availability.
All PowerCenter service processes are made resilient to the transient network errors. This includes connections between all PowerCenter components and connections to DB or FTP servers.
If connection terminates due to the network failure, the system will try to reestablish the connection back and restore original communication state after that so it will appear as if connection did not fail but just paused for some time.
With operational data integration initiatives, data integration services started to become mission critical. A loss of integration services is not tolerable. If a service does not provide failover and high availability, business must buy an expensive solution or build one from scratch.
Neither solution is cost effective in the longer run. Hence, PowerCenter 8 High Availability Option provides robust, fault-tolerant capabilities for continuity in the event of node or network failure:
HOW TO: Stabilize a domain in a non optimal environment (109188)
FAQ: What happens when the PowerCenter Domain Configuration Database fails? (17981)
HOW TO: Configure nodes and services in a highly available PowerCenter environment to optimize performance (103024)
What can we do to improve this information (2000 or fewer characters)