[os::windows::snmp::plugin] --mode=service: Unknown state instead of Critical when service is stopped
Hi,
I'm facing an issue since the last update of this plugin. The service is considered unknown when it's in a stopped state (crash) and correctly running before the crash.
OS: Alma 8
Plugin version : 20230706-100638.el8 /usr/lib/centreon/plugins//centreon_windows_snmp.pl --plugin=os::windows::snmp::plugin --mode=service --hostname=192.168.7.16 --snmp-version='2c' --snmp-community='cloudmon' --snmp-autoreduce --snmp-timeout=30 --service='ControlUp Agent' --warning= --critical=1: --state='' --regexp --verbose CRITICAL: Service problem 'ControlUp Agent'
Plugin version : 20230810-100132.el8 /usr/lib/centreon/plugins//centreon_windows_snmp.pl --plugin=os::windows::snmp::plugin --mode=service --hostname=192.168.7.16 --snmp-version='2c' --snmp-community='cloudmon' --snmp-autoreduce --snmp-timeout=30 --filter-name='ControlUp Agent' --warning-active= --critical-active=1: UNKNOWN: No service found.
This service exists. I've tried with a generic service as Themes and many versions of Windows (srv 2008, 2013, 2016, 2019).
I've tried with different option like --critical-status='%{installed_state} !~ /installed/i' or --warning-status= --critical-status='%{operating_state} !~ /active/' --warning-active= --warning-continue-pending= --warning-pause-pending= --warning-paused= --critical-active= --critical-continue-pending= --critical-pause-pending= --critical-paused=
The result is still the same, it's failed due to this part i think: if (scalar(keys %{$self->{services}}) <= 0) { $self->{output}->add_option_msg(short_msg => "No service found."); $self->{output}->option_exit(); }
Could you please give me a hint or a way to adapt if the service exists, crashes and needs to show in a Critical state instead of Unknown?
Sébastien
Page 1 / 1
Hello
there is a space in the service name in your filter --filter-name='ControlUp Agent'
by design a windows service has a display name (description) and a “real” name (name) in the registry, and by rule a windows service real name cannot contain space
the Service manager only displays the description, but you can get the name & descritption in the Windows Task Manager
as you can see here for example, there are no space in left column.
Centreon snmp plugin works with the real name, not the description from the oid .1.3.6.1.4.1.77.1.2.3.1
that being said, there has been a full rewrite of the snmp “service” plugin apparently last month, to keep the plugin updated to the new development method, as I’m comparing both version, the previous plugin was using “--service” and was “matching” text, maybe it was matching “controlup” and “agent”?
now in the new version it is “--filter-name” whitch is regexp by default
so, first thing, look at your taskmanager (or double clic on your service in the service manager) to make sure the name is “ControlUp Agent” with a space or not
(I said it was the rule to not use space, but some devs ignore the rule, and windows will accept the space anyway)
if there are no space, then match the real name in your check,
if there really is a space, then try making the regular expression with either a wildcard or a strict match
“^ControlUp” <= match anything starting with ControlUp (whatever comes next)
“^ControlUp Agent$” <== match exactly “ControlUp Agent”
Hi,
Thanks for your comment. As previously said, it’s the same with the service “Themes”. It don’t apply with space or not. Regex or not. Even with this service, it’s considered as Unknown.
I’ve tried too with the name “cuAgent” (^cuAgent$) or the description “ControlUp Agent”, it’s still the same.
The issue here, i think, it’s this part :
if (scalar(keys %{$self->{services}}) <= 0) { $self->{output}->add_option_msg(short_msg => "No service found."); $self->{output}->option_exit(); }
If the service is crashed, there is no OID found on SNMP and then the result is Unknwon due to this part.
Sébastien
I realize I don’t really use snmp for windows, except for NIC, but I’m using nsclient for services. so this lead me into a rabbit hole…
TL;DR : snmp to monitor services in windows seems broken (maybe)
you have 3 table, one with name, installedstate (with value=4 = installed), and operatingstate (1 = active/running)
on windows 2019+ whatever you do on a service, start, stop, etc… none of the 3 values change
on windows 2012, you stop a service, the 3 entry (1 in each table) disappear, even if you restart the service, you need to start/stop another service to get the table updated
(couldn’t try on a 2016)
so basically, on old windows, the entry disappear, so you get an error with the old code, or unknown with the new code (I also tried with an old nagios plugin that does approximatively the same thing)
on 2019+ this check is simply doing nothing, as the 3 entries in the tables are the same, name, operstate=1, installstate=4, whatever the status of the service (running or stopped)
maybe it takes time to change the values, and the snmptables are not updated realtime, I didn’t try that much.
if it is the case, then the check snmp for service should indeed be modified to make “CRITICAL” and not “UNKNOWN” when the service is not found
this should be done by opening an issue on the git
however if the snmptable is not updated, then this check should be deprecated and retired. it needs more testing
microsoft did say the snmp was still here for monitoring purpose but that it would be “altered” (=deprecated, not usable anymore?)
if you really can’t use nsclient (which works fine) or WSMAN, I strongly advice to monitor your critical service using the “processcount” check and count the number of process .exe of your applications
Hi,
Thanks for your comment & analysis. You confirm what I’m thinking about this mode.
Maybe I will switch to using Wsman or more Nsclient. Currently, I downgraded the plugin to keep the oldest way.
Sébastien
Hi,
I have the same problem.
Can you told me the version of the plugin when he worked ?
thanks.
Hi,
For sure, this one is working : Plugin version : 20230706-100638.el8
You will find a example in my first post with the options I use.
Sébastien
Hi,
Sorry for my short answer, we can talk a bit more about it later, but in short:
Microsoft’s implementation of SNMP does not match with what the MIBs read.
We have to work around this.
There has been a change on Windows SNMP connector that should have been listed as a “breaking change” (we’ll fix that).
Adding the --snmp-errors-exit=critical option to the EXTRAOPTIONS macro of the service template should get you the right status when the service is stopped.
Sorry for the inconvenience!
Hi,
I would like to thank you for your explanation. I will try tomorrow about the extraoptions.
Kind regards,
Sébastien
Hello,
I tried adding the "--snmp-errors-exit=critical" option in extraoptions, the problem is still the same, did you find a solution?
You’re right, we need to fix that. It should be done in the next Connector Release (2023-11-14).
Updating the pack is enough if you are using Centreon 22.04 or higher and the automatic plugins installation, if not, you will need to update the plugin as well.
Hello,
Thank you for the information. I confirm it’s working now.