All Pollers down and All service status to Unknown periodically after backup

  • 16 January 2022
  • 2 replies
  • 576 views

Badge

Prerequisites

Versions

$ rpm -qa | grep centreon | egrep -v "(plugin|pack)" | sort
centreon-21.10.1-1.el7.centos.noarch
centreon-auto-discovery-server-21.10.1-1.el7.centos.noarch
centreon-base-config-centreon-engine-21.10.1-1.el7.centos.noarch
centreon-broker-21.10.0-6.el7.x86_64
centreon-broker-cbd-21.10.0-6.el7.x86_64
centreon-broker-cbmod-21.10.0-6.el7.x86_64
centreon-broker-core-21.10.0-6.el7.x86_64
centreon-broker-storage-21.10.0-6.el7.x86_64
centreon-clib-21.10.0-6.el7.x86_64
centreon-common-21.10.1-1.el7.centos.noarch
centreon-connector-21.10.0-6.el7.x86_64
centreon-connector-perl-21.10.0-6.el7.x86_64
centreon-connector-ssh-21.10.0-6.el7.x86_64
centreon-database-21.10.1-1.el7.centos.noarch
centreon-engine-21.10.0-6.el7.x86_64
centreon-engine-daemon-21.10.0-6.el7.x86_64
centreon-engine-extcommands-21.10.0-6.el7.x86_64
centreon-gorgone-21.10.0-3.el7.centos.noarch
centreon-gorgone-centreon-config-21.10.0-3.el7.centos.noarch
centreon-license-manager-21.10.0-1.el7.centos.noarch
centreon-license-manager-common-21.10.0-1.el7.centos.noarch
centreon-perl-libs-21.10.1-1.el7.centos.noarch
centreon-poller-centreon-engine-21.10.1-1.el7.centos.noarch
centreon-pp-manager-21.10.0-2.el7.centos.noarch
centreon-release-21.10-4.el7.centos.noarch
centreon-trap-21.10.1-1.el7.centos.noarch
centreon-web-21.10.1-1.el7.centos.noarch
centreon-widget-engine-status-21.10.0-2.el7.centos.noarch
centreon-widget-global-health-21.10.0-2.el7.centos.noarch
centreon-widget-graph-monitoring-21.10.0-2.el7.centos.noarch
centreon-widget-grid-map-21.10.0-2.el7.centos.noarch
centreon-widget-hostgroup-monitoring-21.10.0-2.el7.centos.noarch
centreon-widget-host-monitoring-21.10.0-2.el7.centos.noarch
centreon-widget-httploader-21.10.0-2.el7.centos.noarch
centreon-widget-live-top10-cpu-usage-21.10.0-2.el7.centos.noarch
centreon-widget-live-top10-memory-usage-21.10.0-2.el7.centos.noarch
centreon-widget-servicegroup-monitoring-21.10.0-2.el7.centos.noarch
centreon-widget-service-monitoring-21.10.0-2.el7.centos.noarch
centreon-widget-tactical-overview-21.10.0-2.el7.centos.noarch

Operating System

$ cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)

Browser used

  •  Google Chrome

Description

Every few days, our Centreon platform seems to be frozen.
Every pollers are RED and every Services and Host are Unknown.
Although all perfdata are still OK. It's only services/host status that are not anymore updated.

It seems to occur sometimes after the Centreon backup process (at 3h30 in the morning).

Each time, we have to manually restart CBD process in order to get everything back to normal.

$ systemctl restart cbd 

Logs

**centreon-engine logs**
=> OK events are still getting on Central

$ tail -f /var/log/centreon-engine/centengine.log
[1641292136] [29118] SERVICE ALERT: IIS-FRAIS-3;ApplicationPool-VITACENTER-LIGHT-EIR;OK;SOFT;2;OK: Application pool 'VITACENTER-LIGHT-EIR' status: started [auto start: true], requests: 4.89/s
[1641292166] [29118] SERVICE ALERT: MSSQL-1;Cpu;OK;SOFT;2;OK: 10 CPU(s) average usage is 72.00 %
[1641292196] [29118] SERVICE ALERT: IIS-FRAIS-3;ApplicationPool-VITACENTER-LIGHT-EIR;OK;HARD;1;OK: Application pool 'VITACENTER-LIGHT-EIR' status: started [auto start: true], requests: 5.77/s
[1641292226] [29118] SERVICE ALERT: MSSQL-1;Cpu;OK;HARD;1;OK: 10 CPU(s) average usage is 71.40 %
[1641292256] [29118] SERVICE ALERT: Agences;fw_Paris;OK;HARD;1;OK - 31.32.32.29 rta 8.865ms lost 0%

centreon-broker logs

HERE is the issue: When we have the problem here is what the LOG looks like :
It seems to occur after the backup process.

$ tail -f /var/log/centreon-broker/central-broker-master.log
[2022-01-02T03:30:02.329+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-02T03:30:02.329+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-02T03:30:02.331+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.331+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-02T03:30:02.831+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-02T03:30:02.831+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-02T03:30:03.332+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-02T03:30:03.332+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-02T03:30:05.356+01:00] [sql] [error] conflict_manager: error in the main loop: statement -1320387967 not prepared
[2022-01-03T03:30:03.005+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-03T03:30:03.005+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-03T03:30:03.005+01:00] [sql] [error] mysql_connection: could not insert data in data_bin: MySQL server has gone away
[2022-01-03T03:30:03.005+01:00] [sql] [error] mysql_connection: could not update metrics: MySQL server has gone away
[2022-01-03T03:30:03.505+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-03T03:30:03.505+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-03T03:30:03.525+01:00] [sql] [error] conflict_manager: error in the main loop: could not insert data in data_bin: MySQL server has gone away
[2022-01-03T03:30:04.077+01:00] [core] [error] failover: global error: conflict_manager: events loop interrupted
[2022-01-03T03:30:04.077+01:00] [core] [info] sql stream stopped with 0 ackowledged events
[2022-01-03T03:30:04.077+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-03T03:30:04.077+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-03T03:30:04.077+01:00] [core] [error] failover: global error: conflict_manager: events loop interrupted
[2022-01-03T03:30:04.080+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-03T03:30:04.080+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-03T03:30:04.080+01:00] [core] [info] storage stream stopped with 0 acknowledged events

When we DO NOT have the problem here is what the LOG looks like :

$ tail -f /var/log/centreon-broker/central-broker-master.log
[2022-01-04T03:30:02.475+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-04T03:30:02.475+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-04T03:30:02.975+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-04T03:30:02.975+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-04T03:30:02.976+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:02.976+01:00] [sql] [error] mysql_connection: no statement to execute
[2022-01-04T03:30:03.477+01:00] [sql] [error] mysql_connection: The mysql/mariadb database seems not started.
[2022-01-04T03:30:03.477+01:00] [sql] [error] SQL: Reconnection failed.
[2022-01-04T03:30:06.484+01:00] [sql] [error] conflict_manager: error in the main loop: statement -1320387967 not prepared

centreon gorgone logs

=> Looks OK.

tail -f /var/log/centreon-gorgone/gorgoned.log
2022-01-04 11:35:00 - INFO - [proxy] Received setlogs for '2'
2022-01-04 11:35:00 - INFO - [proxy] Received setlogs for '2'
2022-01-04 11:35:03 - INFO - [autodiscovery] -class- host discovery - sync started
2022-01-04 11:35:06 - INFO - [proxy] Received setlogs for '2'
2022-01-04 11:35:16 - INFO - [proxy] Pong received from '2'
2022-01-04 11:35:16 - INFO - [proxy] Pong received from '3'
2022-01-04 11:35:56 - INFO - [proxy] Received setlogs for '2'
2022-01-04 11:36:16 - INFO - [proxy] Pong received from '2'
2022-01-04 11:36:16 - INFO - [proxy] Pong received from '3'

Centreon backup logs

tail -f /var/log/centreon/centreon-backup.log
[2022-01-02 03:30:01] Start central backup processus
[2022-01-02 03:30:01] Finish central backup processus
[2022-01-02 03:30:01] Start monitoring engine backup processus
No SSH keys for Centreon
cp: cannot stat ‘/var/lib/centreon-engine//.ssh/*’: No such file or directory
[2022-01-02 03:30:01] Finish monitoring engine backup processus
[2022-01-02 03:30:01] Start database backup processus
Dumping Db with LVM snapshot (full)
[2022-01-02 03:34:39] Finish database backup processus
Delete file: 2021-12-23-centreon-engine.tar.gz
Delete file: 2021-12-23-central.tar.gz
Delete file: 2021-12-23-mysql-partial.tar.gz
[2022-01-03 03:30:01] Start central backup processus
[2022-01-03 03:30:01] Finish central backup processus
[2022-01-03 03:30:01] Start monitoring engine backup processus
No SSH keys for Centreon
cp: cannot stat ‘/var/lib/centreon-engine//.ssh/*’: No such file or directory
[2022-01-03 03:30:02] Finish monitoring engine backup processus
[2022-01-03 03:30:02] Start database backup processus
Dumping Db with LVM snapshot (partial)
[2022-01-03 03:30:42] Finish database backup processus
Delete file: 2021-12-24-centreon-engine.tar.gz
Delete file: 2021-12-24-mysql-partial.tar.gz
Delete file: 2021-12-24-central.tar.gz
[2022-01-04 03:30:01] Start central backup processus
[2022-01-04 03:30:02] Finish central backup processus
[2022-01-04 03:30:02] Start monitoring engine backup processus
No SSH keys for Centreon
cp: cannot stat ‘/var/lib/centreon-engine//.ssh/*’: No such file or directory
[2022-01-04 03:30:02] Finish monitoring engine backup processus
[2022-01-04 03:30:02] Start database backup processus
Dumping Db with LVM snapshot (partial)
[2022-01-04 03:30:47] Finish database backup processus
Delete file: 2021-12-25-centreon-engine.tar.gz
Delete file: 2021-12-25-mysql-partial.tar.gz
Delete file: 2021-12-25-central.tar.gz

Additional relevant information (e.g. frequency, ...)

It occurs once every few days (5/10 days).


2 replies

Userlevel 5
Badge +16

Hello @tlier ,

I saw that you’ve opened a github issue, thank you for that.

Actually @Laurent ask you some precision on the github issue. It’s a known issue, our developpers will work on it very soon.

We will you inform as soon as we have more information about this bug.

Userlevel 6
Badge +18

Associated Github issue.

Reply