Skip to main content

Hello,

 

Since updating from 21.10 to 22.04 on CentOS 7, we have many issues of services disappearing on the web interface.

Both host and service are enabled on the configuration (centreon database) but after digging, the service gets disabled at centreon_storage databas level with “enabled = 0”.

 

There isn’t any issue collecting metrics, service graph keeps being updated, notification are still sent, everything goes well on Poller.

 

We’ve enabled debug at broker level for sql and bbdo but we’ve found nothing else about some sampled service than usual log (perfdata, processing service messages, etc).

 

The bug has impacted about 20% of active services, which is quite important.

 

Restarting the impacted pollers solves the issue for some time (maybe hours) but it’s not efficient to restart pollers a few times a day.

 

We’ve scripted something to :

  • get the disabled services from centreon_storage database
  • check one by one if they are supposed to be enabled in centreon configuration database
  • update the centreon_storage entry to put “enabled = 1” again for the service if it must be enabled

 

However, this solution is not really nice, we prefer to understand what could cause this “enabled = 0” SQL update to prevent services from being disabled instead of fixing it every X minutes with a homemade script.

 

Feel free to ask for informations, to be honest I don’t know what kind of details I could give :(

HI @Snk 

Usually, enabled is set to 0 for a resource when the poller monitoring this resource is stopped (centengine sends a message to Broker saying that it will stop the monitoring). I cannot be cure but maybe there is a “ghost” poller “haunting” your resources status?

Can you send a screenshot of your **Configuration  >  Pollers** page and the result of this command run on your central server?

netstat -plantu | grep cbd

Can you also try stopping all your pollers for a minute and make sure there are no resources left in Resources Status page?


I have the same issue. It’s not a ghost poller for my case. I have logged all SQL requests and some requests are missing on the restart to enable some services.


Hi @Snk,

Maybe you have error messages like this one in your `/var/log/centreon-broker/central-broker.log` log file: 

d2022-09-18T06:53:41.362+02:00] 0sql] lerror] mysql_connection: could not store host status:  Out of range value for column 'notification_number' at row 1

If so, you can fix the issue by running the following queries on your MariaDB server:

ALTER TABLE centreon_storage.hosts MODIFY notification_number  bigint(20) DEFAULT NULL;
ALTER TABLE centreon_storage.services MODIFY notification_number bigint(20) DEFAULT NULL;

It should fix your problem.

 

EDIT: seems to be the same as 

 


Thanks @omercier but I’ve already tried this fix on a few columns :

ALTER TABLE centreon_storage.hosts MODIFY check_attempt integer;
ALTER TABLE centreon_storage.hosts MODIFY max_check_attempts integer;
ALTER TABLE centreon_storage.hosts MODIFY notification_number integer;
ALTER TABLE centreon_storage.services MODIFY check_attempt integer;
ALTER TABLE centreon_storage.services MODIFY max_check_attempts integer;
ALTER TABLE centreon_storage.services MODIFY notification_number integer;

It didn’t help.

I’ve looked for ghost poller and found nothing, however the issue didn’t happen another time so I don’t know if this could be related to such root cause.

 

I keep monitoring the number of displayed services to see if some of them disappear again but so far so good, I don’t know why :/


Ok. Tell us if it happens again.

FYI: two fixes are in progress, cf here.


I’ve checked this morning, about 4% of our checked services are missing again, all of them are from the same Poller (out of 😎.

Some of them don’t even have any perfdata so it cannot be a parsing issue.

Of course there isn’t any ghost poller on the given poller.


I’ve tried to look into central-master-broker.log for these services but no error of any kind found. The issue doesn’t impact 100% of services from most of hosts involved, some have a few services missing, some almost all of them, it looks really random :/


Reply