Though its an old post, replying now because I knew check_load threshold values are bigtime headache for the newbies.. ;)
A warning alert, if CPU is 70% for 5min, 60% for 10mins, 50% for 15mins. A critical alert, if CPU is 90% for 5min, 80% for 10mins, 70% for 15mins.
*command[check_load]=/usr/local/nagios/libexec/check_load -w 0.7,0.6,0.5 -c 0.9,0.8,0.7*
All my findings about CPU load:
Whats meant by "the load": Wikipedia says:
All Unix and Unix-like systems generate a metric of three "load average" numbers in the kernel. Users can easily query the current result from a Unix shell by running the uptime command:
$ uptime
14:34:03 up 10:43, 4 users, load average: 0.06, 0.11, 0.09
From the above output load average: 0.06, 0.11, 0.09
means (on a single-CPU system):
- during the last minute, the CPU was underloaded by 6%
- during the last 5 minutes, the CPU was underloaded 11%
- during the last 15 minutes, the CPU was underloaded 9%
.
$ uptime
14:34:03 up 10:43, 4 users, load average: 1.73, 0.50, 7.98
The above load average of 1.73 0.50 7.98
on a single-CPU system as:
- during the last minute, the CPU was overloaded by 73% (1 CPU with 1.73 runnable processes, so that 0.73 processes had to wait for a turn)
- during the last 5 minutes, the CPU was underloaded 50% (no processes had to wait for a turn)
- during the last 15 minutes, the CPU was overloaded 698% (1 CPU with 7.98 runnable processes, so that 6.98 processes had to wait for a turn)
Nagios threshold value calculation:
For Nagios CPU Load setup, which includes warning and critical:
y = c * p / 100
Where: y = nagios value
c = number of cores
p = wanted load procent
for a 4 core system:
time 5 min 10 min 15 min
warning: 90% 70% 50%
critical: 100% 80% 60%
command[check_load]=/usr/local/nagios/libexec/check_load -w 3.6,2.8,2.0 -c 4.0,3.2,2.4
For a single core system:
y = p / 100
Where: y = nagios value
p = wanted load procent
time 5 min 10 min 15 min
warning: 70% 60% 50%
critical: 90% 80% 70%
command[check_load]=/usr/local/nagios/libexec/check_load -w 0.7,0.6,0.5 -c 0.9,0.8,0.7
A great white paper about CPU Load analysis by Dr. Gunther http://www.teamquest.com/pdfs/whitepaper/ldavg1.pdf In this online article Dr. Gunther digs down into the UNIX kernel to find out how load averages (the “LA Triplets”) are calculated and how appropriate they are as capacity planning metrics.
No comments:
Post a Comment