Solved

services unknown on overloaded servers

  • 17 March 2022
  • 4 replies
  • 227 views

Userlevel 1
Badge +5

Hello everyone,

I have a problem with two supervised servers which often have a fairly high load.

It seems that when the server works a lot I lose SNMP returns which puts all of my services in UNKNOWN state (which is quite painful because the ghost notifications follow one another without much relevance).

Do you have any idea what could be done to prevent this?

 

icon

Best answer by sims24 17 March 2022, 12:55

View original

4 replies

Userlevel 6
Badge +19

Hello, 

 

You may want to try increasing the snmp-timeout value. By default it’s one second and that can be hard for an overloaded server to answer within this timeframe. 

 

Add for example --snmp-timeout=3 in the EXTRAOPTIONS macro at the Host level. 

Let me know if it’s better.

 

Userlevel 1
Badge +5

 Thank you for your answer, setting the Timeout to 3 greatly improves the amount of emails Centron sends me even if it still sends me a few false positives.

Mails notifications


Can increasing the timeout value further help me?
------------------------------------------------------------------------------

Regarding the return of services, however, I have not really improved.
 

UNKNOWN RESULTS

Does this problem of Timeout on overloaded servers make sense to you?

Userlevel 6
Badge +19

Yes it makes sense. 

You can also try to use the snmp-retries option. I would recommend setting it to --snmp-retries=10 (twice the default value). 

 

Also, I see that the last check time is the same for every check, I would recommend not forcing check on every items at the same time. 

 

So, if I summarize: 

  • add the option above to your Host(s) after the --snmp-timeout previously added.
  • don’t force all checks at the same time, do one forced check on each service keeping at least 30 seconds interval.

Should do the trick

Userlevel 1
Badge +5

OK for me.
 

Thank’s a lot ! 

Reply