check_linux_bonding

Linux® Bonded Network Monitoring with Nagios®

Author: Trond Hasle Amundsen
Contact: t.h.amundsen@usit.uio.no
Date: 2014-05-13
Latest version: 1.4 Released Tue May 13 2014

Contents:

1   About

check_linux_bonding is a plugin for the Nagios monitoring software that checks bonding interfaces on Linux. The plugin is fairly simple and will report any interfaces that are down (both masters and slaves). It will also alert you of bonding interfaces with only one slave, since that usually points to a misconfiguration. If no bonding interfaces are detected, the plugin will exit with an OK value by default. It is therefore safe to run this plugin on all your Linux machines.

The plugin will first try to use the sysfs (/sys) filesystem to detect bonding interfaces. If that does not work, i.e. the kernel or bonding module is too old for the necessary files to exist, the plugin will use procfs (/proc) as a fallback.

The plugin supports an arbitrary number of bonding interfaces.

2   Usage

check_linux_bonding is designed to be used with NRPE, i.e. run locally. Example:

$ ./check_linux_bonding
Interface bond0 is up: mode=1 (active-backup), 2 slaves: eth0!, eth1

If something is wrong, the plugin will report it:

$ ./check_linux_bonding
Bonding interface bond0 [mode=1 (active-backup)]: Slave eth1 is down

2.1   Active and primary slaves

In the OK output, the plugin will indicate which of the slaves is active with an exclamation mark "!", if applicable. If one of the slaves is configured as primary, this is indicated with an asterisk "*":

$ ./check_linux_bonding
Interface bond0 is up: mode=1 (active-backup), 2 slaves: eth0*, eth1!

In the above example, eth0 is configured as the primary slave, and eth1 is the currently active slave.

2.2   Prefix alerts with service state

The option -s or --state can be used to prefix all alerts with its corresponding service state as reported by the plugin:

$ check_linux_bonding -s
CRITICAL: Bonding interface bond1 [mode=1 (active-backup)] is down
WARNING: Bonding interface bond0 [mode=4 (802.3ad)]: Slave eth2 is down

Alternatively, you can use the option -S or --short-state to get an abbreviated, one-letter service state:

$ check_linux_bonding -S
C: Bonding interface bond1 [mode=1 (active-backup)] is down
W: Bonding interface bond0 [mode=4 (802.3ad)]: Slave eth2 is down

The Nagios plugin development guideline suggests that this is good practice. I'm not a fan of this, but I've included these options for those who disagree.

2.3   Multiple line output, turn off escaping HTML tags

The output from check_linux_bonding contains multiple lines separated by HTML linebreaks (<br/>) if run as a command within Nagios, via NRPE etc. If run from a console which has a TTY, i.e. if you log in via SSH or similar and run check_linux_bonding manually, the linebreaks will be regular linebreaks.

Nagios 3.x allows the following option in cgi.cfg:

# ESCAPE HTML TAGS
# This option determines whether HTML tags in host and service
# status output is escaped in the web interface.  If enabled,
# your plugin output will not be able to contain clickable links.

escape_html_tags=1

The default, as seen above in the sample cgi.cfg from the distribution, is that HTML tags are escaped. My advice is to turn this off. If not, you will see output like this in your Nagios console:

CRITICAL: Bonding interface bond1 [mode=1 (active-backup)] is down<br/>WARNING: Bonding interface bond0 [mode=4 (802.3ad)]: Slave eth2 is down

instead of this:

CRITICAL: Bonding interface bond1 [mode=1 (active-backup)] is down
WARNING: Bonding interface bond0 [mode=4 (802.3ad)]: Slave eth2 is down

With Nagios 3.x, plugins are allowed to output multiple lines with regular linebreaks, but only the first line is shown in the web interface (status.cgi).

2.4   Blacklisting

You may choose to blacklist one or more interfaces. This is done with the option -b or --blacklist, which can be specified multiple times. The argument can also be a file, in which the file is expected to contain a single line with the same syntax, i.e.:

interface1,interface2,...

Examples:

check_linux_bonding -b bond1 -b eth1
check_linux_bonding -b bond1,eth1
check_linux_bonding -b /etc/check_linux_bonding.black

2.5   Exit value when no bonding interfaces are found

By default, the plugin will exit with an OK value if no bonding interfaces are found. This can be modified with the --no-bonding or -n option. Default behaviour:

$ ./check_linux_bonding
OK: No bonding interfaces found

Warning message:

$ ./check_linux_bonding --no-bonding=warning
WARNING: No bonding interfaces found

Critical message:

$ ./check_linux_bonding --no-bonding=critical
CRITICAL: No bonding interfaces found

Unknown message:

$ ./check_linux_bonding --no-bonding=unknown
UNKNOWN: No bonding interfaces found

2.6   Full usage information

Usage output gathered with check_linux_bonding -h:

Usage: check_linux_bonding [OPTION]...

OPTIONS:

   -t, --timeout       Plugin timeout in seconds [5]
   -s, --state         Prefix alerts with alert state
   -S, --short-state   Prefix alerts with alert state abbreviated
   -n, --no-bonding    Alert level if no bonding interfaces found [ok]
   --slave-down        Alert level if a slave is down [warning]
   --disable-sysfs     Don't use sysfs (default), use procfs
   --ignore-num-ad     (IEEE 802.3ad) Don't warn if num_ad_ports != num_slaves
   -b, --blacklist     Blacklist failed interfaces
   -d, --debug         Debug output, reports everything
   -h, --help          Display this help text
   -V, --version       Display version info

For more information and advanced options, see the manual page or URL:
  http://folk.uio.no/trondham/software/check_linux_bonding.html

See also the man page.

3   Download

3.1   Latest version

Packaged

Single files

3.2   Changelog / Old versions

Version Date Changes
1.4 2014-05-13
  • Major bugfixes
  • Fix for Linux kernel version 3.13 or above
1.3.2 2012-12-14
  • Minor feature enhancement, minor bugfixes
  • Added an option "--ignore-num-ad" which allows the user to specify that the plugin shouldn't warn if the number of AD ports are not equal to the number of slaves. E.g. if the IEEE 802.3ad bonding device spans multiple switches.
  • Manual pages are now written in Docbook XML
  • Bugfix for primary slave when using procfs
1.3.1 2010-10-26
  • Minor feature enhancements
  • For 802.3ad bonding interfaces, the plugin now checks that the number of aggregated ports are equal to the number of slaves assigned to the bonding interface.
1.3.0 2010-08-26
  • Minor feature enhancements
  • The plugin is made ePN compatible. This is useful if the plugin is run on the Nagios server without the use of agents such as NRPE.
  • A couple of minor fixes.
1.2.1 2010-02-17
  • Minor feature enhancements
  • New option --slave-down to set the alert state reported when a slave interface is down
1.2.0 2010-02-16
  • Major feature enhancements
  • Option -S is now short for --short-state
  • Better handling of perl warnings during execution
  • Added option --disable-sysfs to disable the use of /sysfs, i.e. only use the /proc filesystem
1.1.0 2009-10-09
  • Major feature enhancements
  • New blacklist option, thanks to Giles Westwood for a patch
  • New option --no-bonding to specify return value when no bonding interfaces are found
  • Redirect STDERR to STDOUT if the plugin is run without a TTY
  • Man page is moved to section 8
1.0.1 2009-07-23
  • Minor feature enhancements
  • A couple of unimportant cosmetic changes
1.0.0 2009-06-12
  • Initial release

4   Known bugs & limitations

None known at present.

5   Reporting bugs, proposing new features etc.

Please send me a note if you are experiencing bugs, have feature requests, or suggestions on how to improve check_linux_bonding. We use this plugin in production at the University of Oslo, on many Linux servers, but only with RHEL. While the plugin is bug-free for us, it might not be for you, so let me know if you have problems.

6   Disclaimer

This is free software. Use at your own risk.