check_serviceguard is designed to be used with NRPE, i.e. run
locally. Example:
# check_serviceguard
OK - Cluster 'pgprod' is up, 3 nodes, 12 packages
If something is wrong, the plugin will report it:
# check_serviceguard
[imap-cluster] Package 'lister-prod' is down (halted)
The option --state can be used to prefix all alerts with its
corresponding service state as reported by the plugin:
# check_serviceguard --state
CRITICAL: [odont-cluster] Package 'odxray' is down (halted)
WARNING: [odont-cluster] Service 'ODPROD_mon' for package 'odprod' status is unknown
Alternatively, you can use the option --short-state to get an
abbreviated, one-letter service state:
# check_serviceguard --short-state
C: [odont-cluster] Package 'odxray' is down (halted)
W: [odont-cluster] Service 'ODPROD_mon' for package 'odprod' status is unknown
The Nagios plugin development guideline suggests that this is good
practice. I'm not a fan of this, but I've included these options for
those who disagree.
check_serviceguard uses the ServiceGuard command cmviewcl for
all its work, and needs permission to run this command. The best way
to accomplish this is to use sudo. Edit the file /etc/sudoers
(e.g. by running visudo as root) and add the following line:
nagios ALL=NOPASSWD:/usr/local/cmcluster/bin/cmviewcl
If you run NRPE as another user than nagios, replace "nagios" with
the appropriate user name.
check_serviceguard will automatically try to use sudo unless it is run
as root.
A package can run on its primary node, or one of its alternate
nodes. The plugin can give a warning about packages that aren't
running on their primary nodes. This is turned off by default, but can
be activated with the --primary switch:
# check_serviceguard --primary
[imap-cluster] Package 'lister-prod' is down (halted)
[imap-cluster] Package 'mail-mgmt' is running on alternate node mail-imap6 (primary=mail-imap4)
[imap-cluster] Package 'imap-sg17' is running on alternate node mail-imap6 (primary=mail-imap5)
[imap-cluster] Package 'imap-sg14' is running on alternate node mail-imap5 (primary=mail-imap4)
[imap-cluster] Package 'imap-sg19' is running on alternate node mail-imap6 (primary=mail-imap5)
[imap-cluster] Package 'imap-sg15' is running on alternate node mail-imap5 (primary=mail-imap4)
[imap-cluster] Package 'imap-sg18' is running on alternate node mail-imap6 (primary=mail-imap5)
[imap-cluster] Package 'imap-sg09' is running on alternate node mail-imap4 (primary=mail-imap2)
[imap-cluster] Package 'imap-sg13' is running on alternate node mail-imap5 (primary=mail-imap4)
[imap-cluster] Package 'imap-sg05' is running on alternate node mail-imap2 (primary=mail-imap1)
[imap-cluster] Package 'lister-test' is running on alternate node mail-imap4 (primary=mail-imap5)
[imap-cluster] Package 'jabber' is running on alternate node mail-imap2 (primary=mail-imap1)
[imap-cluster] Package 'imap-sg20' is running on alternate node mail-imap6 (primary=mail-imap5)
We have found that the important thing is that the package is up and
running, not that it's running on its primary node. Reporting this by
default therefore seemed like overkill.
If the cluster contains unimportant packages of which you're not
interested in the status (e.g. test packages), they can be blacklisted
with the -b|--blacklist option.
# check_serviceguard
[imap-cluster] Package 'lister-prod' is down (halted)
# check_serviceguard -b lister-prod
OK - Cluster 'imap-cluster' is up, 6 nodes, 30 packages
Blacklisted packages are skipped and are not checked at all. They will
not turn up in the verbose output. The argument to the
-b|--blacklist option is a string with comma-separated package
names, or a filename that contains the string.