Is there a way to set a service check that it delays making the status go critical or warning until it has performed all the retry checks?
We are using the VMware vCenter connectors and one check that is performed is to report VMs where VMware Tools is not running. Unfortunately this does not take into account that during a VM start-up or reboot that VMware Tools is correctly not running for a short time - thankfully, the check does take into account if the VM is shut down so they don’t go critical if the VM is switched off. We have 500 VDI workstations which are regularly starting up or being rebooted, and having these all go critical is causing our dashboards to look worse than they really are. They should only be critical if the check fails, for example, 5 times in a row and not immediately.
I’ve played with the various retry intervals and delays, but they just affect the notifications sent out, not the actual service status.