| Author: | Trond Hasle Amundsen |
|---|---|
| Contact: | t.h.amundsen@usit.uio.no |
| Date: | 2010-06-30 |
Contents:
Important
When using this plugin to monitor 1855/1955 enclosures, it is important to use SNMP versjon 1. To accomplish this, use the -P option, like this:
check_dell_bladechassis -H myhostname -P 1
The management module on the 1855/1955 chassis dies or otherwise becomes unavailable after a while if it is probed with SNMP versjon 2c. This is merely an annoyance, as it is easy to remove and insert the management controller to rectify the issue. But to avoid this altogether, use SNMP version 1.
check_dell_bladechassis is a plugin for the Nagios monitoring software which checks the hardware health of Dell blade enclosures via SNMP. The plugin supports both the new M1000e enclosure and the old 1855/1955 enclosures.

This plugin is designed to be a companion plugin to check_openmanage in terms of supported options and functionality. The information that can be gathered via SNMP from these enclosures is limited, so the plugin can't be as detailed as check_openmanage can for Dell servers. In particular, this applies to the old 1855/1955 chassis.
check_dell_bladechassis is written in Perl, and needs a perl interpreter. Nagios' embedded perl interpreter (ePN) can be used, but be aware that the plugin is not well tested against ePN. The plugin assumes that perl is available as /usr/bin/perl, but you can easily change this as you wish by editing the first line in the script.
Since this plugin uses SNMP, you'll also need the perl module Net::SNMP on the Nagios server (or the server running the queries). This module is not part of perl itself, but is available in all modern Linux distributions. Installing Net::SNMP is quite easy:
For RHEL/CentOS 5.x the best way is to use EPEL:
rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm yum install perl-Net-SNMP
For Fedora:
yum install perl-Net-SNMP
For SuSE:
rug install perl-Net-SNMP
For Debian and Ubuntu:
aptitude install libnet-snmp-perl
If this does not apply to your server, consult your OS repository to find Net::SNMP. If all else fails, try installing from CPAN.
Attention!
This is a short HOWTO that describes how to get started with using check_dell_bladechassis. This HOWTO assumes that the prerequisites are met, and that you have a Nagios server up and running. Nagios version 3.x is assumed.
The examples below are simple examples with very basic usage of check_dell_bladechassis. There are many more or less advanced options that you might consider useful. Se the usage section for info.
The first thing you want to do is create a hostgroup that contains your blade enclosures. If you have very few enclosures you can skip this step and use hosts in the service definition instead, but I think hostgroups are always better:
# hostgroup for Dell blade enclosures
define hostgroup {
hostgroup_name dell-bladecenters
alias Dell bladecenters
}
You'll need a host definition for each of the enclosures. If you are an experienced Nagios admin you already know this, of course:
define host {
host_name my-bladecenter1.foo.org
alias my-bladecenter1
address 192.168.10.12
use generic-host
hostgroups dell-bladecenters
contact_groups example@foo.org
}
Next you want to create a servicegroup for this service. This is not required, but it makes things easier when you want to inspect your Dell servers via Nagios' web interface. Creating a servicegroup is simple:
# Servicegroup for Dell blade enclosures
define servicegroup {
servicegroup_name dell-bladechassis
alias Dell server health status
}
The servicegroup is used later in the service definition.
The next step is to define a command for check_dell_bladechassis:
# Dell blade enclosure check
define command {
command_name check_dell_bladechassis
command_line /path/to/check_dell_bladechassis -H $HOSTADDRESS$
}
Note that is is a very basic example of check_dell_bladechassis usage. Refer to the usage section for info about the different options that alters the behaviour of check_dell_bladechassis.
Finally, you define the service:
define service {
use generic-service
hostgroup_name dell-bladecenters
service_description Dell blade enclosure health
servicegroups dell-bladechassis
check_command check_dell_bladechassis
action_url https://$HOSTNAME$/
notes_url http://folk.uio.no/trondham/software/check_dell_bladechassis.html
}
The action_url and notes_url is optional.
The plugin queries the monitored host remotely via SNMP. Prerequisites for this are that the monitored host is running SNMP, and that the Nagios server is allowed to communicate with the enclosure over SNMP. The -H|--hostname option is needed for the hostname/IP you want to check.
$ check_dell_bladechassis -H my-bladecenter1 OK - System: 'PowerEdge M1000e', SN: 'XXXXXXX', Firmware: '2.00', hardware working fine
You can specify the SNMP community string (for SNMP version 1 and 2c) with the -C|--community option. Default community is set to "public" if the option is not present:
$ check_dell_bladechassis -H my-bladecenter2 -C mycommunity OK - System: 'DRAC/MC', SN: 'XXXXXXX', Firmware: '1.5.0 (Build 10.01)', hardware working fine
For other SNMP options, refer to the manual page.
The default behaviour of the plugin is to print all alerts on separate lines with no extra fuzz:
$ check_dell_bladechassis -H my-bladecenter1 Blade subsystem health status is Critical Global system health status is Critical
There are several options that allows you to alter this, as listed below.
The -s|--state option will prefix each alert with the full service state:
$ check_dell_bladechassis -H my-bladecenter1 -s CRITICAL: Blade subsystem health status is Critical CRITICAL: Global system health status is Critical
Example output with the --short-state option, which does the same, except that the service state is abbreviated to only one letter, i.e. C for CRITICAL, W for WARNING etc.:
$ check_dell_bladechassis -H my-bladecenter1 --short-state C: Blade subsystem health status is Critical C: Global system health status is Critical
The option -i|--info will prefix all alerts with the service tag:
$ check_dell_bladechassis -H my-bladecenter1 -i [XXXXXXX] Blade subsystem health status is Critical [XXXXXXX] Global system health status is Critical
The option -e|--extinfo will print the server model and service tag on a separate line at the end of the alert:
$ check_dell_bladechassis -H my-bladecenter1 -e Blade subsystem health status is Critical Global system health status is Critical ------ SYSTEM: PowerEdge M1000e, SN: XXXXXXX, FW: 2.00
You can combine any of these options. Example:
$ check_dell_bladechassis -H my-bladecenter1 -s -e CRITICAL: Blade subsystem health status is Critical CRITICAL: Global system health status is Critical ------ SYSTEM: PowerEdge M1000e, SN: XXXXXXX, FW: 2.00
Which (combination) of these options you choose to use, if any, depends on how you use Nagios and your personal preference.
If supplied the option -d or --debug, check_dell_bladechassis will output messages about all the checked components, along with their respectible alert states. If supported by the enclosure (i.e. M1000e) the plugin will also output power supply data and total power usage. An example debug output from a M1000e is given below.
$ check_dell_bladechassis -H my-bladecenter1 -d
System: PowerEdge M1000e
ServiceTag: XXXXXXX
Firmware: 2.00
-----------------------------------------------------------------------------
System Component Status
=============================================================================
STATE | MESSAGE TEXT
---------+-------------------------------------------------------------------
OK | IO Module (IOM) subsytem health status is Ok
OK | KVM subsystem health status is Ok
OK | Redundancy status is Ok
OK | Power subsystem health status is Ok
OK | Fan subsystem health status is Ok
CRITICAL | Blade subsystem health status is Critical
OK | Temperature sensor subsystem health status is Ok
OK | Chassis Management Controller (CMC) health status is Ok
CRITICAL | Global system health status is Critical
-----------------------------------------------------------------------------
System Power Readings
=============================================================================
Power Supply 1 (PS-1) voltage reading: 231.5 V
Power Supply 2 (PS-2) voltage reading: 233.8 V
Power Supply 3 (PS-3) voltage reading: 229.2 V
Power Supply 4 (PS-4) voltage reading: 240.5 V
Power Supply 5 (PS-5) voltage reading: 240.5 V
Power Supply 6 (PS-6) voltage reading: 241.8 V
------------------------------------------------------------
Power Supply 1 (PS-1) amperage reading: 4.66 A
Power Supply 2 (PS-2) amperage reading: 0.25 A
Power Supply 3 (PS-3) amperage reading: 0.25 A
Power Supply 4 (PS-4) amperage reading: 4.61 A
Power Supply 5 (PS-5) amperage reading: 0.31 A
Power Supply 6 (PS-6) amperage reading: 0.27 A
------------------------------------------------------------
Total chassis power usage: 2300 W
Total chassis current usage: 10.406 A
Debug output from a 1855/1955 chassis is depressing in comparison:
$ check_dell_bladechassis -H my-bladecenter2 -d
System: DRAC/MC
ServiceTag: XXXXXXX
Firmware: 1.5.0 (Build 10.01)
-----------------------------------------------------------------------------
System Component Status
=============================================================================
STATE | MESSAGE TEXT
---------+-------------------------------------------------------------------
OK | Global system health status is Ok
The limited output from the 1855/1955 is due to limitations in available information via SNMP.
Warning
The option -d|--debug is intended for diagnostics and debugging purposes only. Do not use this option from within Nagios, i.e. in your Nagios config.
The output from check_openmanage contains multiple lines separated by HTML linebreaks (<br/>) if run as a command within Nagios, via NRPE etc. If run from a console which has a TTY, i.e. if you log in via SSH or similar and run check_openmanage manually, the linebreaks will be regular linebreaks.
Nagios 3.x allows the following option in cgi.cfg:
# ESCAPE HTML TAGS # This option determines whether HTML tags in host and service # status output is escaped in the web interface. If enabled, # your plugin output will not be able to contain clickable links. escape_html_tags=1
The default, as seen above in the sample cgi.cfg from the distribution, is that HTML tags are escaped. My advice is to turn this off. If not, you will see output like this in your Nagios console:
Blade subsystem health status is Critical<br/>Global system health status is Critical
instead of this:
Blade subsystem health status is Critical Global system health status is Critical
With Nagios 3.x, plugins are allowed to output multiple lines with regular linebreaks, but only the first line is shown in the web interface (status.cgi).
Usage information gathered with check_dell_bladechassis -h:
Usage: check_dell_bladechassis -H <HOSTNAME> [OPTION]... OPTIONS: -H, --hostname Hostname or IP of the enclosure -C, --community SNMP community string -P, --protocol SNMP protocol version --port SNMP port number -p, --perfdata Ouput performance data -t, --timeout Plugin timeout in seconds -i, --info Prefix any alerts with the service tag -e, --extinfo Append system info to alerts -s, --state Prefix alerts with alert state --short-state Prefix alerts with alert state (abbreviated) -d, --debug Debug output, reports everything -h, --help Display this help text -V, --version Display version info For more information and advanced options, see the manual page.
check_dell_bladechassis will output performance data if the --perfdata or -p option is used. Performance data is only available on the M1000e enclosure. An example graph using PNP4Nagios is given below.

The template used to generate these graphs are available here: check_dell_bladechassis.template. Right-click on the link and choose "Save As". Rename to check_dell_bladechassis.php.
Note
The PNP4Nagios template is included in the tarball and zip archive.
You can also download the plugin and the manpage separately.
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2009-08-04 |
|
Please let me know if you are experiencing bugs, have feature requests, or suggestions on how to improve check_dell_bladechassis. We use this plugin in production at the University of Oslo, but we don't use all the different features of the plugin. While the plugin is bug-free for us, it might not be for you, so let me know if you have problems.
Please send bug reports or feature requests to the Nagios users mailing list. I read postings to this list frequently:
nagios-users@lists.sourceforge.net
You can also email me directly, but then other users won't benefit from the discussion. Unless you have security issues or other concerns are preventing you from using the mailing list, it is better to discuss problems in a public forum.
Depending on the time of day etc., you can also reach me on IRC, as trondham on the #nagios channel on Freenode.
This is free software. Use at your own risk.