Skip to main content
Solved

services unknown on overloaded servers

  • March 17, 2022
  • 4 replies
  • 366 views

Intence-Tech
Forum|alt.badge.img+5

Hello everyone,

I have a problem with two supervised servers which often have a fairly high load.

It seems that when the server works a lot I lose SNMP returns which puts all of my services in UNKNOWN state (which is quite painful because the ghost notifications follow one another without much relevance).

Do you have any idea what could be done to prevent this?

 

Best answer by sims24

Hello, 

 

You may want to try increasing the snmp-timeout value. By default it’s one second and that can be hard for an overloaded server to answer within this timeframe. 

 

Add for example --snmp-timeout=3 in the EXTRAOPTIONS macro at the Host level. 

Let me know if it’s better.

 

4 replies

sims24
Forum|alt.badge.img+19
  • Ranger ***
  • Answer
  • March 17, 2022

Hello, 

 

You may want to try increasing the snmp-timeout value. By default it’s one second and that can be hard for an overloaded server to answer within this timeframe. 

 

Add for example --snmp-timeout=3 in the EXTRAOPTIONS macro at the Host level. 

Let me know if it’s better.

 


Intence-Tech
Forum|alt.badge.img+5
  • Author
  • Steward **
  • March 17, 2022

 Thank you for your answer, setting the Timeout to 3 greatly improves the amount of emails Centron sends me even if it still sends me a few false positives.

Mails notifications


Can increasing the timeout value further help me?
------------------------------------------------------------------------------

Regarding the return of services, however, I have not really improved.
 

UNKNOWN RESULTS

Does this problem of Timeout on overloaded servers make sense to you?


sims24
Forum|alt.badge.img+19
  • Ranger ***
  • March 17, 2022

Yes it makes sense. 

You can also try to use the snmp-retries option. I would recommend setting it to --snmp-retries=10 (twice the default value). 

 

Also, I see that the last check time is the same for every check, I would recommend not forcing check on every items at the same time. 

 

So, if I summarize: 

  • add the option above to your Host(s) after the --snmp-timeout previously added.
  • don’t force all checks at the same time, do one forced check on each service keeping at least 30 seconds interval.

Should do the trick


Intence-Tech
Forum|alt.badge.img+5
  • Author
  • Steward **
  • March 17, 2022

OK for me.
 

Thank’s a lot !