These optimization tips are suggested by Nagios official documentation. It might be useful to fine tune Nagios for optimum performance and effective monitoring service.- Enabling aggregated status updates with the aggregate_status_updates option to greatly reduce the load on the monitoring host especially when monitoring a large number of services. The downside of this approach is getting delay notification on status change.
- If standard status log is used instead of aggregated status updates, consider putting the directory where the status log is stored on a ramdisk. Ramdisk helps to speed thing up by saving a lot of interrupts and disk thrashing.
- Use max_concurrent_checks option to restrict the number of maximum concurrently executing service checks. Nagios is overloaded if the extinfo CGI showing high latency values, say more than 10 seconds, for the majority of service checks.
- The overhead needed to process the results of passive service checks is much lower than that of normal active checks. Passive service checks are only really useful if there are some external applications doing some type of monitoring.
- Compiled plugin (C/C++) runs more efficient and faster than interpreted script (Perl, etc) plugins. If really want to use Perl plugins, consider compiling them into true executable using perlcc utility which is part of the standard Perl distribution or compiling Nagios with an embedded Perl interpreter.
In order to compile in the embedded Perl interpreter, set the --enable-embedded-perl option in the configuration script before compiling Nagios. In addition, use the --with-perlcache option to enable embedded interpreter caching the compiled Perl scripts for later reuse.
- The check_ping plugin used to check host states will performs much faster if break up the checks. This is due to the fact that Nagios judges the status of a host after executing the plugin once.
Hence, it would be much faster to set the max_attempts value to 10 and only send out 1 ICMP packet each time, instead of specifying a max_attempts value of 1 in the host definition and having the check_ping plugin send 10 ICMP packets to the host.
However, the pitfalls of this arrangement will happens when the hosts are slow to respond may be assumed to be down. Another option would be to use a faster plugin check_fping as the host_check_command instead of check_ping.
- Do not schedule regular checks of hosts unless absolutely necessary. Set the value to 0 for check_interval directive in the host definition to disable regular checks of a host. Use a longer check interval if really need to have regularly scheduled host checks.
- Disable the use_aggressive_host_checking option to speed up host checks. The trade off is that host recoveries can be missed under certain circumstances.
- Set the command_check_interval variable to -1 if running a lot of external commands, i.e passive checks in a distributed setup, will cause Nagios to check for external commands as often as possible. This is important because most systems have small pipe buffer sizes (4KB). If Nagios doesn't read the data from the pipe fast enough, applications that write to the external command file (the NSCA daemon) will block and wait until there is enough free space in the pipe to write their data.
- System configuration / hardware setup directly affecting how the operating system (and Nagios application) performs. CPU and memory speed are obviously factors that affect system performance, but disk access is biggest bottleneck. Don't store plugins, status log, etc on slow storage medium such as old IDE drives or NFS mounts. Always opt to use UltraSCSI drives or fast IDE drives whenever possible.
Note! Many Linux installations do not attempt to optimize IDE disk access. Use hdparam to change the IDE hard disk access parameters to gain speedy features of the new IDE drives.
This article has no comments yet. Why don't write your comments for this article?
So, feel free to write your comments for this article...