Real Application Clusters
Oracle Real Application Clusters (RAC) is a database clustering solution that allows more than one instance to mount and open an Oracle database. RAC can only be used on special clustered systems with shared disk storage and a private network interconnect.
A normal Oracle installation consists of a single Oracle instance that accesses a database on the same computer system. In contrast, RAC allows multiple instances on different computer systems (nodes in a cluster) to access the same database files simultaneously. Communication between instances is managed by the Distributed Lock Manager (DLM). To address the possibility of two or more instances attempting to modify the same information simultaneously, Oracle uses up to ten additional background processes, named LCK0 through LCK9, to lock the resources in use by these instances.
RAC is available with Oracle Enterprise Edition, and under certain conditions, with Standard Edition as well. These restrictions (for Standard Edition) include:
- Must use Oracle Clusterware (no third-party clusterware allowed);
- Must use ASM to store database files; and
- Can only use a max of 4 CPU sockets in the cluster (either 2 nodes with 2 CPUs each, or 4 nodes with 1 CPU each).
RAC was previously called Oracle Parallel Server (OPS). Oracle re-branded OPS as RAC when they released Oracle 9i.
Running Oracle in RAC mode can benefit you in the following ways:
- High availability - If some nodes fail, the remainder of the nodes will still be available for processing requests. Failover support is available from Oracle 8 with Oracle's Transparent Application Failover (TAF) technology and from Oracle 10g, server side load balancing.
- Speedup (increased transaction response time) - RAC normally adds some overhead. However, for some DSS applications one can achieve increased performance by running operations in parallel (mainly for DSS applications).
- Scale-up (increased transaction volume) - RAC can be used to provide increased application scalability (mainly used for OLTP applications).
RAC Storage Options
The database's data and control files are shared between the instances. However, each instance must have its own UNDO and REDO:
- UNDO: Each instance must have its own UNDO_TABLESPACE
- REDO: Each instance must have its own REDO LOG files (called as a thread)
Shared files can be stored on:
Oracle Clusterware (previously called Cluster Ready Services or CRS) provides Cluster Management Services and High Availability Services to Oracle and 3rd party software that wants to hook into it. For example, if you kill your LGWR process, CRS will detect the failure and automatically restart the database.
Oracle clusterware eliminates the need for 3rd party clusterware software like SUN Clusters, IBM HACMP and HP Omniguard. Oracle clusterware is provided at no additional cost with the 10g and 11g database.
Oracle clusterware needs to be installed on all nodes of the cluster before installing the database software. During installation, you will be prompted to configure a virtual IP, voting disk and cluster registry.
IP address that will fail-over to another node in the cluster when a failure is detected. This will allow connected session to (almost) immediately fail-over to another node when a problem is experienced.
RAC requires one or more private interconnects, and two or more public network interfaces.
A Voting Disk is a shared disk device or file used to determine node availability (establishes quorum) and resolve split-brain scenarios. All instances write to the voting disk (check in) to indicate that they are still active. This is required as instances may not always be able to communicate across the network with each other.
The voting disk, like the OCR, should be multiplexed and backed-up to protect them against media failures. Always implement 3 or more (an uneven number) of voting disks. Voting disks can be stored in a raw partition, regular file in a clustered filesystem (like OCFS, or in ASM (preferred).
The Cluster Registry (OCR) is used to store cluster wide settings and status information. For example: node names, IP and VIP addresses, voting disk locations, node applications, database names, instance names, listener names, etc.
The Oracle Cluster Registry (OCR) is a binary file that is maintained by the CRS daemon. The OCR must be stored in a raw partition or regular file in a clustered filesystem (not on ASM!).
To see what is in the OCR, use the "ocrdump" command to dump it contents to a flat file.
When clusterware is started, the following process will be running:
- crsd – Cluster Resource Services Daemon
- cssd – Cluster Synchronization Services Daemon
- evmd – Event Manager Daemon
Corresponded executables are located in $ORA_CRS_HOME/bin/.
Stop and start
Oracle Clusterware is started at boot with the /etc/rc.d/init.d/init.crs script.
Commands to manually start or stop:
/etc/init.d/init.crs start /etc/init.d/init.crs stop
Commands to manually disable or enable start at boot time:
/etc/init.d/init.crs disable /etc/init.d/init.crs enable
Check status of components registered in the OCR:
Status in tabular format:
Starting and stopping resources
Stops all RAC instances:
$ srvctl stop database –d myracdb
Stops Listener, VIP, GSD, ONS:
$ srvctl stop nodeapps –n racnode1
Starts ASM on racnode1 and all required dependencies:
$ srvctl start asm –n racnode1
Starts one instance and all required dependencies:
$ srvctl start instance –d myracdb –i mydb1
For more info, read article srvctl.
Some of the INIT.ORA/SPFILE parameters required for RAC:
- CLUSTER DATABASE=TRUE -- start the database in RAC mode
- INSTANCE_NUMBER=n -- a unique instance number
- THREAD=n -- each instance must have its own thread of redo logs
- UNDO_TABLESPACE=... -- each instance must have its own undo tablespace
The V$ views only shows details from the currently connected instance. Use the GV$ parameters to query all values across all instances (note the INST_ID column).
For example: instead of using V$THREAD, query GV$THREAD to see all threads across all nodes.
Since each instance has its own thread of redo logs, all instances need to archive their own redo logs. For convenience, it is recommended that each instance archives to a shared cluster file system.
One can use DBCA or srvctl to configure a service with "primary" instances and "available" instances. Client connections will be routed to the "primary" instances. When one of the "primary" instances becomes unavailable, one of the "available" instance will be promoted to "primary".
NOTE: while fail-over to a "available" instance happens automatically, you need to manually switch them back to "primary" when they become available again.
- RAC FAQ
- Extended Distance Clusters (also called stretch clusters)
- Transparent Application Failover (TAF)
- Fail Safe