Quantcast
Channel: THWACK: Message List
Viewing all articles
Browse latest Browse all 20668

Re: CPU load never shows the correct process

$
0
0

Then that's your problem. You are collecting CPU stats every 5 unless you've shortened the "statistics collection" metric for the entire server (or specific nodes).

 

So what you REALLY have done in this alert is injected a 2 minute delay before collecting CPU statistics to put in the email. More than enough time for the offending process to disappear down a dark alley (sorry, been binge-watching Gotham this week).

 

Here's the timing:

 

12:00 - poll server, get stats, everything is OK

12:01 - CPU on server spikes

12:05 - poll server, get stats, CPU is still high

          CPU alert timer starts

12:06 - CPU calms down

12:07 - CPU alert timer completes 2 minute delay

          CPU alert trigger action collects CPU stats

12:09 - CPU alert email is sent

12:10 - poll server, get stats

 

Your choice is to either set this alert to 0 seconds (very aggressive, you will probably get a lot of false alerts), or set it to 5 minutes. Better still ,set it to 10. Seriously. A high CPU for 7 minutes is NOT a huge deal. Lots of processes spike CPU for a relatively short period of time. 7 minutes is survive-able in most cases.


Viewing all articles
Browse latest Browse all 20668

Trending Articles