Skip to main content

Bonjour,

J’ai la page des graphes de statuts de la plateforme qui fonctionne seulement à certaines périodes depuis quelques temps.


Est-ce que quelqu’un a déjà rencontré un problème similaire, et aurait éventuellement une solution ?

hello

yes, it happens a lot on my platform, most of the time when I have to restart a service (like gorgoned or cbd on the central, like I sometimes have dns/ip change of a poller that need a restart of these services)

a reboot of the central usually recover that function. but I never found a way to check this;

a reboot is usually transparent as it takes a few seconds, so I never bothered to open an incident


>a reboot of the central usually recover that function. but I never found a way to check this;

It could explain why, when doing an upgrade, and so, a reboot, it starts to work again. Sadly this is not always the case.

>a reboot is usually transparent as it takes a few seconds, so I never bothered to open an incident

Well, in my case I reboot the platform’s hosts (VM) only when upgrading (system and Centreon packages). Firstly because I must take a (cold) snapshot of our central and of the database in order to have a revert procedure in case of a serious problem on the new version or during the upgrade. Secondly because kernel upgrade requires it anyway.

We have six external pollers, about 60000 services, and although the VM restart quite quickly, the time needed  for the Centreon platform to get back to a nominal state is about twenty minutes. We even “restart” the engine(s) the less possible, only “reload” most of the time, or else some downtime may be missed and alerts raised while they shouldn’t.

Just to be sure: when you write “recover that function” do we agree it starts to produce the graphics again, but past data aren’t recovered?


Just to be sure: when you write “recover that function” do we agree it starts to produce the graphics again, but past data aren’t recovered?

=> yes, I’m only talking about the fact that the graphics are up to date, as far as I know these data are only stored in the RRD file on the central, and not in the sql database. (I may be wrong)

 

if you can’t reboot the central outside of upgrade, I can understand your issue, but restarting all the centreon services may solve the issue, I havent tried it yet

the service list I *think* should be enough :

systemctl restart centengine cbd gorgoned 

(this is the command from the upgrade documentation, this usually is enough to fix any issue)

as I don’t host the DB on the central, rebooting is not a real issue on my setup, nothing is lost as all the poller cache the data during the few seconds of interruption, so nothing needs to be synchronized or take 20minutes to get back on track

maybe because I have less services per pollers, the load is spread on 40+ pollers, and I have less services than you… I don’t know that much

 

anyway, just restarting these 3 services should reset all centreon services that manage the communication, data flow and graph generation

 


Thank you for your time.


Reply