check_hp_bladechassis

HP™ Blade Enclosure Monitoring with Nagios®

Author: Trond Hasle Amundsen
Contact: t.h.amundsen@usit.uio.no
Date: 2010-06-30
Latest version: 1.0.1 Released Fri Jan 22 2010

Contents:

1   Basic Overview

check_hp_bladechassis is a plugin for the Nagios monitoring software which checks the hardware health of HP blade enclosures via SNMP. The plugin is only tested with the c7000 enclosure.

HP c7000 BladeSystem

This plugin is designed to be a companion plugin to check_dell_bladechassis in terms of supported options and functionality. The information that can be gathered via SNMP from these enclosures is different than that of Dell enclosures, so the plugins will differ in output.

2   Prerequisites

check_hp_bladechassis is written in Perl, and needs a perl interpreter. Nagios' embedded perl interpreter (ePN) can be used, but be aware that the plugin is not well tested against ePN. The plugin assumes that perl is available as /usr/bin/perl, but you can easily change this as you wish by editing the first line in the script.

Since this plugin uses SNMP, you'll also need the perl module Net::SNMP on the Nagios server (or the server running the queries). This module is not part of perl itself, but is available in all modern Linux distributions. Installing Net::SNMP is quite easy:

If this does not apply to your server, consult your OS repository to find Net::SNMP. If all else fails, try installing from CPAN.

3   Getting started

Attention!

This is a short HOWTO that describes how to get started with using check_hp_bladechassis. This HOWTO assumes that the prerequisites are met, and that you have a Nagios server up and running. Nagios version 3.x is assumed.

The examples below are simple examples with very basic usage of check_hp_bladechassis. There are many more or less advanced options that you might consider useful. Se the usage section for info.

3.1   Creating a hostgroup

The first thing you want to do is create a hostgroup that contains your blade enclosures. If you have very few enclosures you can skip this step and use hosts in the service definition instead, but I think hostgroups are always better:

# hostgroup for HP blade enclosures
define hostgroup {
    hostgroup_name  hp-bladecenters
    alias           HP bladecenters
}

3.2   Defining the hosts

You'll need a host definition for each of the enclosures. If you are an experienced Nagios admin you already know this, of course:

define host {
    host_name       my-bladecenter1.foo.org
    alias           my-bladecenter1
    address         192.168.10.12
    use             generic-host
    hostgroups      hp-bladecenters
    contact_groups  example@foo.org
}

3.3   Creating a servicegroup

Next you want to create a servicegroup for this service. This is not required, but it makes things easier when you want to inspect your HP enclosures via Nagios' web interface. Creating a servicegroup is simple:

# Servicegroup for HP blade enclosures
define servicegroup {
    servicegroup_name         hp-bladechassis
    alias                     HP server health status
}

The servicegroup is used later in the service definition.

3.4   Defining a command

The next step is to define a command for check_hp_bladechassis:

# HP blade enclosure check
define command {
    command_name    check_hp_bladechassis
    command_line    /path/to/check_hp_bladechassis -H $HOSTADDRESS$
}

Note that is is a very basic example of check_hp_bladechassis usage. Refer to the usage section for info about the different options that alters the behaviour of check_hp_bladechassis.

3.5   Defining the service

Finally, you define the service:

define service {
    use                       generic-service
    hostgroup_name            hp-bladecenters
    service_description       HP blade enclosure health
    servicegroups             hp-bladechassis
    check_command             check_hp_bladechassis
    action_url                https://$HOSTNAME$/
    notes_url                 http://folk.uio.no/trondham/software/check_hp_bladechassis.html
}

The action_url and notes_url is optional.

4   Usage

The plugin queries the monitored host remotely via SNMP. Prerequisites for this are that the monitored host is running SNMP, and that the Nagios server is allowed to communicate with the enclosure over SNMP. The -H|--hostname option is needed for the hostname/IP you want to check.

$ check_hp_bladechassis -H my-bladecenter1
OK - System: 'BladeSystem c7000 Enclosure', SN: 'XXXXXXXXXX', Firmware: '2.41', hardware working fine, 14 blades, 6 i/o modules

You can specify the SNMP community string (for SNMP version 1 and 2c) with the -C|--community option. Default community is set to "public" if the option is not present:

$ check_hp_bladechassis -H my-bladecenter2 -C mycommunity
OK - System: 'BladeSystem c7000 Enclosure G2', SN: 'XXXXXXXXXX', Firmware: '2.52', hardware working fine, 2 blades, 6 i/o modules

For other SNMP options, refer to the manual page.

4.1   Output control

The default behaviour of the plugin is to print all alerts on separate lines with no extra fuzz:

$ check_hp_bladechassis -H my-bladecenter1
Fan 2 condition is Failed

There are several options that allows you to alter this, as listed below.

4.1.1   Prefix alerts with the service state

The -s|--state option will prefix each alert with the full service state:

$ check_hp_bladechassis -H my-bladecenter1 -s
CRITICAL: Fan 2 condition is Failed

4.1.2   Prefix alerts with the service state (abbreviated)

Example output with the --short-state option, which does the same, except that the service state is abbreviated to only one letter, i.e. C for CRITICAL, W for WARNING etc.:

$ check_hp_bladechassis -H my-bladecenter1 --short-state
C: Fan 2 condition is Failed

4.1.3   Prefix alerts with the serial number

The option -i|--info will prefix all alerts with the serial number:

$ check_hp_bladechassis -H my-bladecenter1 -i
[XXXXXXXXXX] Fan 2 condition is Failed

4.1.4   Extra info about failed component

The option -v|--verbose will append the part number, spare part number and serial number of the failed component:

$ check_hp_bladechassis -H my-bladecenter1 -v
Fan 2 condition is Failed [part: n/a, spare: n/a, sn: n/a]

In the above example the fan is missing, so the information is not available.

4.1.5   System info after the alert(s)

The option -e|--extinfo will print the server model, serial number and firmware revision on a separate line at the end of the alert:

$ check_hp_bladechassis -H my-bladecenter1 -e
Fan 2 condition is Failed
------ SYSTEM: BladeSystem c7000 Enclosure G2, SN: XXXXXXXXXX, FW: 2.52

4.1.6   Combination of output options

You can combine any of these options. Example:

$ check_hp_bladechassis -H my-bladecenter1 -s -e
CRITICAL: Fan 2 condition is Failed
------ SYSTEM: BladeSystem c7000 Enclosure G2, SN: XXXXXXXXXX, FW: 2.52

Which (combination) of these options you choose to use, if any, depends on how you use Nagios and your personal preference.

4.2   Debug output

If supplied the option -d or --debug, check_hp_bladechassis will output messages about all the checked components, along with their respectible part numners and alert states. If supported by the enclosure the plugin will also output total power usage. An example debug output from a c7000 is given below.

Warning

The option -d|--debug is intended for diagnostics and debugging purposes only. Do not use this option from within Nagios, i.e. in your Nagios config.

4.3   Multiple line output, turn off escaping HTML tags

The output from this plugin contains multiple lines separated by HTML linebreaks (<br/>) if run as a command within Nagios, via NRPE etc. If run from a console which has a TTY, i.e. if you log in via SSH or similar and run the plugin manually, the linebreaks will be regular linebreaks.

Nagios 3.x allows the following option in cgi.cfg:

# ESCAPE HTML TAGS
# This option determines whether HTML tags in host and service
# status output is escaped in the web interface.  If enabled,
# your plugin output will not be able to contain clickable links.

escape_html_tags=1

The default, as seen above in the sample cgi.cfg from the distribution, is that HTML tags are escaped. My advice is to turn this off. If not, you will see output like this in your Nagios console:

CRITICAL: example error message 1<br/>WARNING: example error message 2

instead of this:

CRITICAL: example error message 1
WARNING: example error message 2

With Nagios 3.x, plugins are allowed to output multiple lines with regular linebreaks, but only the first line is shown in the web interface (status.cgi).

4.4   Full usage information

Usage information gathered with check_hp_bladechassis -h:

Usage: check_hp_bladechassis -H <HOSTNAME> [OPTION]...

OPTIONS:
   -H, --hostname      Hostname or IP of the enclosure
   -C, --community     SNMP community string
   -P, --protocol      SNMP protocol version
   --port              SNMP port number
   -p, --perfdata      Ouput performance data
   -t, --timeout       Plugin timeout in seconds
   -i, --info          Prefix alerts with the enclosure's serial number
   -v, --verbose       Append extra info to alerts (part no. etc.)
   -e, --extinfo       Append system info to alerts
   -s, --state         Prefix alerts with alert state
   --short-state       Prefix alerts with alert state (abbreviated)
   -d, --debug         Debug output, reports everything
   -h, --help          Display this help text
   -V, --version       Display version info

For more information and advanced options, see the manual page or URL:
http://folk.uio.no/trondham/software/check_hp_bladechassis.html

5   Performance data

check_hp_bladechassis will output performance data (power consumption in Watts) if the --perfdata or -p option is used. An example graph using PNP4Nagios is given below.

pnp4nagios

The template used to generate these graphs are available as check_hp_bladechassis.php in the downloadable ZIP archive and tarball.

6   Download

6.1   Latest version

You can also download the plugin and the manpage separately.

6.2   Changelog / Old versions

Version Date Changes
1.0.1 2010-01-22
  • Minor feature enhancemenets
  • Ignore components that are absent, such as power supplies and fans. If your chassis isn't fully stacked with blades, it may not be fully stacked with power supplies and fans either.
  • Collect perl warnings and output them in a Nagios friendly way
  • Plugin version information is now available in the debug output
  • Manpage moved from section 3 to section 8
  • Plugin now installs in <libdir>/nagios, not <libdir>/nagios/contrib
1.0.0 2009-08-04
  • Initial release

7   Reporting bugs, proposing new features etc.

Please let me know if you are experiencing bugs, have feature requests, or suggestions on how to improve check_hp_bladechassis. We use this plugin in production at the University of Oslo, but we don't use all the different features of the plugin. While the plugin is bug-free for us, it might not be for you, so let me know if you have problems.

You can also send bug reports or feature requests to the Nagios users mailing list. I read postings to this list frequently:

nagios-users@lists.sourceforge.net

You can email me directly, but then other users won't benefit from the discussion. Unless you have security issues or other concerns are preventing you from using the mailing list, it is better to discuss problems in a public forum.

8   Disclaimer

This is free software. Use at your own risk.