Solved

[os::windows::snmp::plugin] --mode=service: Unknown state instead of Critical when service is stopped

  • 4 September 2023
  • 12 replies
  • 628 views

Badge +2

Hi,

 

I'm facing an issue since the last update of this plugin. The service is considered unknown when it's in a stopped state (crash) and correctly running before the crash.

OS: Alma 8

 

Plugin version : 20230706-100638.el8
/usr/lib/centreon/plugins//centreon_windows_snmp.pl --plugin=os::windows::snmp::plugin --mode=service --hostname=192.168.7.16 --snmp-version='2c' --snmp-community='cloudmon' --snmp-autoreduce --snmp-timeout=30 --service='ControlUp Agent' --warning= --critical=1: --state='' --regexp --verbose
CRITICAL: Service problem 'ControlUp Agent'

 

Plugin version : 20230810-100132.el8
/usr/lib/centreon/plugins//centreon_windows_snmp.pl --plugin=os::windows::snmp::plugin --mode=service --hostname=192.168.7.16 --snmp-version='2c' --snmp-community='cloudmon' --snmp-autoreduce --snmp-timeout=30 --filter-name='ControlUp Agent' --warning-active= --critical-active=1:
UNKNOWN: No service found.

 

This service exists. I've tried with a generic service as Themes and many versions of Windows (srv 2008, 2013, 2016, 2019).

I've tried with different option like --critical-status='%{installed_state} !~ /installed/i' or --warning-status= --critical-status='%{operating_state} !~ /active/' --warning-active= --warning-continue-pending= --warning-pause-pending= --warning-paused= --critical-active= --critical-continue-pending= --critical-pause-pending= --critical-paused=

 

The result is still the same, it's failed due to this part i think:
if (scalar(keys %{$self->{services}}) <= 0) {
         $self->{output}->add_option_msg(short_msg => "No service found.");
         $self->{output}->option_exit();
      }

 

Could you please give me a hint or a way to adapt if the service exists, crashes and needs to show in a Critical state instead of Unknown?

 

Sébastien
 

icon

Best answer by omercier 28 September 2023, 17:36

View original

12 replies

Userlevel 5
Badge +14

Hello

there is a space in the service name in your filter --filter-name='ControlUp Agent'

 

by design a windows service has a display name (description) and a “real” name (name) in the registry, and by rule a windows service real name cannot contain space

the Service manager only displays the description, but you can get the name & descritption in the Windows Task Manager

as you can see here for example, there are no space in left column.

Centreon snmp plugin  works with the real name, not the description from the oid .1.3.6.1.4.1.77.1.2.3.1
 

that being said, there has been a full rewrite of the snmp “service” plugin apparently last month, to keep the plugin updated to the new development method, as I’m comparing both version, the previous plugin was using “--service” and was “matching” text, maybe it was matching “controlup” and “agent”?

now in the new version it is “--filter-name” whitch is regexp by default

 

so, first thing, look at your taskmanager (or double clic on your service in the service manager) to make sure the name is “ControlUp Agent” with a space or not

(I said it was the rule to not use space, but some devs ignore the rule, and windows will accept the space anyway)

if there are no space, then match the real name in your check,

if there really is a space, then try making the regular expression with either a wildcard or a strict match

“^ControlUp”  <= match anything starting with ControlUp (whatever comes next)

“^ControlUp Agent$” <== match exactly “ControlUp Agent”

Badge +2

Hi,

 

Thanks for your comment. As previously said, it’s the same with the service “Themes”. It don’t apply with space or not. Regex or not. Even with this service, it’s considered as Unknown.

 

I’ve tried too with the name “cuAgent” (^cuAgent$) or the description “ControlUp Agent”, it’s still the same.

 

The issue here, i think, it’s this part :

if (scalar(keys %{$self->{services}}) <= 0) {
         $self->{output}->add_option_msg(short_msg => "No service found.");
         $self->{output}->option_exit();
      }

If the service is crashed, there is no OID found on SNMP and then the result is Unknwon due to this part.

 

Sébastien

Userlevel 5
Badge +14

I realize I don’t really use snmp for windows, except for NIC, but I’m using nsclient for services. so this lead me into a rabbit hole…

TL;DR : snmp to monitor services in windows seems broken (maybe)

 

you have 3 table, one with name, installedstate (with value=4 = installed), and operatingstate (1 = active/running)

on windows 2019+ whatever you do on a service, start, stop, etc… none of the 3 values change

on windows 2012, you stop a service, the 3 entry (1 in each table) disappear, even if you restart the service, you need to start/stop another service to get the table updated

(couldn’t try on a 2016)

so basically, on old windows, the entry disappear, so you get an error with the old code, or unknown with the new code (I also tried with an old nagios plugin that does approximatively the same thing)

 

on 2019+ this check is simply doing nothing, as the 3 entries in the tables are the same, name, operstate=1, installstate=4, whatever the status of the service (running or stopped)

 

maybe it takes time to change the values, and the snmptables are not updated realtime, I didn’t try that much.

if it is the case, then the check snmp for service should indeed be modified to make “CRITICAL” and not “UNKNOWN” when the service is not found

this should be done by opening an issue on the git

 

however if the snmptable is not updated, then this check should be deprecated and retired. it needs more testing

 

microsoft did say the snmp was still here for monitoring  purpose but that it would be “altered” (=deprecated, not usable anymore?)

New in SNMP - Win32 apps | Microsoft Learn

 

if you really can’t use nsclient (which works fine) or WSMAN, I strongly advice to monitor your critical service using the “processcount” check and count the number of process .exe of your applications

 

Badge +2

Hi,

 

Thanks for your comment & analysis. You confirm what I’m thinking about this mode.

Maybe I will switch to using Wsman or more Nsclient. Currently, I downgraded the plugin to keep the oldest way.

 

Sébastien

Badge +1

Hi,

I have the same problem.

Can you told me the version of the plugin when he worked ?

 

thanks.

Badge +2

Hi,

 

For sure, this one is working : Plugin version : 20230706-100638.el8

You will find a example in my first post with the options I use.

 

Sébastien

Userlevel 4
Badge +12

Hi, 

Sorry for my short answer, we can talk a bit more about it later, but in short:

  • Microsoft’s implementation of SNMP does not match with what the MIBs read.
  • We have to work around this.
  • There has been a change on Windows SNMP connector that should have been listed as a “breaking change” (we’ll fix that).
  • Adding the --snmp-errors-exit=critical option to the EXTRAOPTIONS macro of the service template should get you the right status when the service is stopped.

Sorry for the inconvenience!

Badge +2

Hi,

 

I would like to thank you for your explanation. I will try tomorrow about the extraoptions.

 

Kind regards,

Sébastien

Badge +1

Hello,

I tried adding the "--snmp-errors-exit=critical" option in extraoptions, the problem is still the same, did you find a solution?
Userlevel 4
Badge +12

You’re right, we need to fix that. It should be done in the next Connector Release (2023-11-14).

Userlevel 4
Badge +12

Hi there,

The new Monitoring Connectors Release is out!

Updating the pack is enough if you are using Centreon 22.04 or higher and the automatic plugins installation, if not, you will need to update the plugin as well.

Badge +2

Hello,

 

Thank you for the information. I confirm it’s working now.

 

Kind regards,

Sébastien RIVIERE.

Reply