Blogger Digest: Nagios

Nagios is the answer!

It is licensed under the terms of the GNU General Public License Version 2 as published by the Free Software Foundation.

Nagios is a powerful system and network monitoring application. It monitors hosts and services specified, alerting administrators when threshold triggered, and when they recover to healthy state.

Nagios is only available in Linux or UNIX variants. Although, it could helps to monitor Windows servers as well via the Windows version of Nagios client.

Nagios features:

Monitors network services such as SMTP, POP3, HTTP, NNTP, PING, etc.
Monitors server resources such as processor load, disk usage, etc.
Simple plugin design that allows users to easily customize own service checks.
Parallelized service checks.
Ability to define network devices hierarchy using "parent" hosts, allowing detection of and distinction between network devices that are down and those that are unreachable.
Notifications to contacts of email, pager, or user-defined method, when service or host status change.
Ability to define event handlers to be run during service or host events for proactive problem resolution.
Automatic log file rotation.
Support for implementing redundant monitoring hosts.
Optional web interface for viewing current network status, notification and problem history, log file, etc.

Related information:

Nagios official site
Nagios 2.0 documentation
Nagios Plugin is the open project at Source Forge to create Nagios plugin
Nagios Exchange is the official repository for third party Nagios plugin
Nagios plugin development guidelines
Search more related info with Google Search engine built-in

These optimization tips are suggested by Nagios official documentation. It might be useful to fine tune Nagios for optimum performance and effective monitoring service.

Enabling aggregated status updates with the aggregate_status_updates option to greatly reduce the load on the monitoring host especially when monitoring a large number of services. The downside of this approach is getting delay notification on status change.

If standard status log is used instead of aggregated status updates, consider putting the directory where the status log is stored on a ramdisk. Ramdisk helps to speed thing up by saving a lot of interrupts and disk thrashing.

Use max_concurrent_checks option to restrict the number of maximum concurrently executing service checks. Nagios is overloaded if the extinfo CGI showing high latency values, say more than 10 seconds, for the majority of service checks.

The overhead needed to process the results of passive service checks is much lower than that of normal active checks. Passive service checks are only really useful if there are some external applications doing some type of monitoring.

Compiled plugin (C/C++) runs more efficient and faster than interpreted script (Perl, etc) plugins. If really want to use Perl plugins, consider compiling them into true executable using perlcc utility which is part of the standard Perl distribution or compiling Nagios with an embedded Perl interpreter.

In order to compile in the embedded Perl interpreter, set the --enable-embedded-perl option in the configuration script before compiling Nagios. In addition, use the --with-perlcache option to enable embedded interpreter caching the compiled Perl scripts for later reuse.

The check_ping plugin used to check host states will performs much faster if break up the checks. This is due to the fact that Nagios judges the status of a host after executing the plugin once.

Hence, it would be much faster to set the max_attempts value to 10 and only send out 1 ICMP packet each time, instead of specifying a max_attempts value of 1 in the host definition and having the check_ping plugin send 10 ICMP packets to the host.

However, the pitfalls of this arrangement will happens when the hosts are slow to respond may be assumed to be down. Another option would be to use a faster plugin check_fping as the host_check_command instead of check_ping.

Do not schedule regular checks of hosts unless absolutely necessary. Set the value to 0 for check_interval directive in the host definition to disable regular checks of a host. Use a longer check interval if really need to have regularly scheduled host checks.

Disable the use_aggressive_host_checking option to speed up host checks. The trade off is that host recoveries can be missed under certain circumstances.

Set the command_check_interval variable to -1 if running a lot of external commands, i.e passive checks in a distributed setup, will cause Nagios to check for external commands as often as possible. This is important because most systems have small pipe buffer sizes (4KB). If Nagios doesn't read the data from the pipe fast enough, applications that write to the external command file (the NSCA daemon) will block and wait until there is enough free space in the pipe to write their data.

System configuration / hardware setup directly affecting how the operating system (and Nagios application) performs. CPU and memory speed are obviously factors that affect system performance, but disk access is biggest bottleneck. Don't store plugins, status log, etc on slow storage medium such as old IDE drives or NFS mounts. Always opt to use UltraSCSI drives or fast IDE drives whenever possible.

Note! Many Linux installations do not attempt to optimize IDE disk access. Use hdparam to change the IDE hard disk access parameters to gain speedy features of the new IDE drives.

Blogger Digest

Monday, September 25, 2006

System And Network Monitoring Freeware

Tuesday, September 19, 2006

Fine Tuning Nagios Performance

Email / Feed Subscription

Monthly Archives