Sunday, November 25, 2007
Oracle 10gR2 - Oracle Clusteware World
Source: Oracle Clusteware documentation (http://tahiti.oracle.com)
Oracle Clusterware Configuration
Note:1)Oracle Clusterware supports up to 100 nodes in a cluster on configurations running Oracle Database 10g Release 2 and later releases
2)Cluster-aware storage may also be referred to as a multihost device.
Terms used in this post:
RAC: Real Application Clusters
OCR: Oracle Cluster Registry
CRS: Cluster Ready Service
Oracle Real Application Clusters (Oracle RAC) uses Oracle Clusterware as the infrastructure that binds together multiple nodes that then operate as a single server. Oracle Clusterware is a portable cluster management solution that is integrated with Oracle Database. In an Oracle RAC environment, Oracle Clusterware monitors all Oracle components (such as instances and Listeners). If a failure occurs, Oracle Clusterware automatically attempts to restart the failed component and also redirects operations to a surviving component.
Oracle Clusterware includes two important components: the voting disk and the OCR. The voting disk is a file that manages information about node membership, and the OCR is a file that manages cluster and Oracle RAC database configuration information.The Oracle Clusterware installation process creates the voting disk and the OCR on shared storage
Oracle Clusterware processes on Linux and UNIX systems include the following:
crsd—Performs high availability recovery and management operations such as maintaining the OCR and managing application resources. This process runs as LocalSystem. This process restarts automatically upon failure.
evmd—Event manager daemon. This process also starts the racgevt process to manage FAN server callouts.
ocssd—Manages cluster node membership and runs as the oracle user; failure of this process results in a node restart.
oprocd—Process monitor for the cluster. Note that this process only appears on platforms that do not use third-party vendor clusterware with Oracle Clusterware.
When to Backup up Voting Disks:
1) After Installation
2) After adding nodes to or deleting nodes from the cluster
3) After performing voting disk add or delete operations
Syntax to backing up Voting Disk using dd command:
dd if=voting_disk_name of=backup_file_name
Use device name when the voting disk is reside on raw partition
dd if=/dev/sde1 of=/tmp/voting.dmp
Note: There is no need to stop CRS (dameon - crsd.bin) before taking the backup of Voting Disk.
Voting Disk Recovery from Backup:
dd if=backup_file_name of=Active_voting_disk_name
Add and Remove Voting Disk:
You can dynamically perform add and removal of Voting Disk after installation of RAC.
As a root user add a Voting Disk:
crsctl add css votedisk path
As a root user remove a Voting Disk:
crsctl delete css votedisk path
Note:a)where path is the fully qualified path for the additional voting disk.
b)-force option should not be used when the cluster node is active. -force option to modify the voting disk when either of these commands when the Oracle clusteware dameon is not active (crsd.bin).
c)Oracle Clusterware automatically creates OCR backups every 4 hours. At any one time, Oracle Clusterware always retains the latest 3 backup copies of the OCR that are 4 hours old, 1 day old, and 1 week old. You can use other backup software to keep a copy of OCR backup generated automatically on a different device at least once a daily.
Viewing Available OCR Backups:
To check the most recent backup on any node of cluster:
Backing Up the OCR
ocrconfig tool to make copies of the automatically created backup files at least once a day. You should be loggin as root to use ocrconfig tool.
OCR contents export to a file in case of configuration changes causes erros:
ocrconfig -export backup_file_name
Recovering the OCR
Before you proceed with recovering the OCR you should ensure that OCR is unavailable.
There are two methods for recovering the OCR.
a) automatically generated OCR file copies.
b) manually created OCR export files.
Ensure that the OCR is unavailable:
The above command should display the result as 'Device/File integrity check succeeded' for atleast one copy of OCR . If not then your primary and the OCR mirror have failed. The only option left is to resotre from the backup.
a)Restoring the OCR (First method mentioned above):
1.Check if the backup is available
# ocrconfig -showbackup
2. Verify the contents of OCR backup
ocrdump -backupfile backup_file_name
3. Stop the clustware on all RAC nodes (as root)
crsctl stop crs ( run this command on all RAC nodes as root)
4. Restore from the OCR backup (as root0
ocrconfig -restore backup_file_name
5. Start the Oracle clusterware after restore
# crsctl start crs (run this command on all RAC nodes as root)
6. verify the OCR integrity after restore
$ cluvfy comp ocr -n all [-verbose]
Note: -n all argument retrieves a list of all the cluster nodes that are configured as part of your cluster.
b)Recovering the OCR (Second Method mentioned above)
1. Keep the OCR export file to a accessible directory.
2. Stop the Oracle Clusterware
crsctl stop crs (run this command on all nodes of cluster as root)
3. Import the contents from backup OCR export file
ocrconfig -import export_file_name
4. Start the Oracle Clusterware
crsctl start crs (run this command on all nodes of cluster as root)
5. Verify OCR Integrity after restore
cluvfy comp ocr -n all [-verbose]
Note: a)-n all argument retrieves a list of all the cluster nodes that are configured as part of your cluster.
b)You cannot use the ocrconfig command to import an OCR backup file