Question

Centreon broker - Monitoring can't start

  • 9 March 2022
  • 9 replies
  • 1471 views

Badge +2

Hello,

I need some help after the crash of my centreon monitoring that stop working after a disk full on the central poller.

After this crash, I found that the poller server was working with the date set to 2043 instead of 2022.
I have correct the date and sync the 2 servers with a ntp server.

But now, during the restart of the poller (systemctl restart centreon.service), I have this error in the log (/var/log/centreon-broker/central-broker-master.log) : 

[1646820491] error:   failover: 'connection-to-CENTREON.POLLER' cannot connect endpoint.

Centreon Broker 20.04.17 log file closed

Centreon Broker 20.04.17 log file opened

[1646820551] error:   conflict_manager: error in the main loop: could not prepare update query for event 'host_check' on table 'hosts': could not update poller:  Out of range value for column 'last_alive' at row 1

[1646820552] error:   failover: 'connection-to-CENTREON.POLLER' cannot connect endpoint.

[1646820555] error:   failover: 'connection-to-CENTREON.POLLER' cannot connect endpoint.

Also, the centreon service is active but exited :

systemctl status centreon

● centreon.service - One Service to rule them all.

   Loaded: loaded (/usr/lib/systemd/system/centreon.service; enabled; vendor preset: disabled)

   Active: active (exited) since mer. 2022-03-09 11:09:09 CET; 1min 49s ago

  Process: 15694 ExecReload=/bin/true (code=exited, status=0/SUCCESS)

  Process: 31547 ExecStart=/bin/true (code=exited, status=0/SUCCESS)

 Main PID: 31547 (code=exited, status=0/SUCCESS)

   CGroup: /system.slice/centreon.service

mars 09 11:09:09 srv-centreon systemd[1]: Stopping One Service to rule them all....

mars 09 11:09:09 srv-centreon systemd[1]: Starting One Service to rule them all....

mars 09 11:09:09 srv-centreon systemd[1]: Started One Service to rule them all..

How could I correct it ?

Thanks for your help


9 replies

Userlevel 5
Badge +16

Hello @GSA69006 

The errors seems to be on the database and an incorrect size of the column last_alive.

Could you send us the result of this command:

mysql -u centreon -p centreon_storage -e "DESC instances"

You can find the password of centreon database user in /etc/centreon/centreon.conf.php

Badge +2

Hello @Kriko,

Thank you for your answer.

Here is the result of the command :

 mysql -u centreon -p centreon_storage -e "DESC instances"

Enter password:

+------------------------------+--------------+------+-----+-----------+-------+

| Field                        | Type         | Null | Key | Default   | Extra |

+------------------------------+--------------+------+-----+-----------+-------+

| instance_id                  | int(11)      | NO   | PRI | NULL      |       |

| name                         | varchar(255) | NO   |     | localhost |       |

| active_host_checks           | tinyint(1)   | YES  |     | NULL      |       |

| active_service_checks        | tinyint(1)   | YES  |     | NULL      |       |

| address                      | varchar(128) | YES  |     | NULL      |       |

| check_hosts_freshness        | tinyint(1)   | YES  |     | NULL      |       |

| check_services_freshness     | tinyint(1)   | YES  |     | NULL      |       |

| daemon_mode                  | tinyint(1)   | YES  |     | NULL      |       |

| description                  | varchar(128) | YES  |     | NULL      |       |

| end_time                     | int(11)      | YES  |     | NULL      |       |

| engine                       | varchar(64)  | YES  |     | NULL      |       |

| event_handlers               | tinyint(1)   | YES  |     | NULL      |       |

| failure_prediction           | tinyint(1)   | YES  |     | NULL      |       |

| flap_detection               | tinyint(1)   | YES  |     | NULL      |       |

| global_host_event_handler    | text         | YES  |     | NULL      |       |

| global_service_event_handler | text         | YES  |     | NULL      |       |

| last_alive                   | int(11)      | YES  |     | NULL      |       |

| last_command_check           | int(11)      | YES  |     | NULL      |       |

| last_log_rotation            | int(11)      | YES  |     | NULL      |       |

| modified_host_attributes     | int(11)      | YES  |     | NULL      |       |

| modified_service_attributes  | int(11)      | YES  |     | NULL      |       |

| notifications                | tinyint(1)   | YES  |     | NULL      |       |

| obsess_over_hosts            | tinyint(1)   | YES  |     | NULL      |       |

| obsess_over_services         | tinyint(1)   | YES  |     | NULL      |       |

| passive_host_checks          | tinyint(1)   | YES  |     | NULL      |       |

| passive_service_checks       | tinyint(1)   | YES  |     | NULL      |       |

| pid                          | int(11)      | YES  |     | NULL      |       |

| process_perfdata             | tinyint(1)   | YES  |     | NULL      |       |

| running                      | tinyint(1)   | YES  |     | NULL      |       |

| start_time                   | int(11)      | YES  |     | NULL      |       |

| version                      | varchar(16)  | YES  |     | NULL      |       |

| deleted                      | tinyint(1)   | NO   |     | 0         |       |

| outdated                     | tinyint(1)   | NO   |     | 0         |       |

+------------------------------+--------------+------+-----+-----------+-------+

 

Regards

Badge +2

The values in the last_alive column are correct :

Central = 1646726855  (Tuesday 8 March 2022 08:07:35 GMT)
Poller = 1646663218 (Monday 7 March 2022 14:26:58 GMT)

No trace of a date in 2043 that could explain the problem :frowning2:

Userlevel 5
Badge +16

I’m kind blind or tired ><
Wrong table my bad.

Could you give the return of this command:

 mysql -u centreon -p centreon_storage -e "desc hosts"

And could you tell me if you have any kind .queue. file in /var/lib/centreon-broker?

 

Badge +2

Here’s the result of the command : 

mysql -u centreon -p centreon_storage -e "desc hosts"

Enter password:

+-------------------------------+--------------+------+-----+---------+-------+

| Field                         | Type         | Null | Key | Default | Extra |

+-------------------------------+--------------+------+-----+---------+-------+

| host_id                       | int(11)      | NO   | PRI | NULL    |       |

| name                          | varchar(255) | NO   | MUL | NULL    |       |

| instance_id                   | int(11)      | NO   | MUL | NULL    |       |

| acknowledged                  | tinyint(1)   | YES  |     | NULL    |       |

| acknowledgement_type          | smallint(6)  | YES  |     | NULL    |       |

| action_url                    | varchar(255) | YES  |     | NULL    |       |

| active_checks                 | tinyint(1)   | YES  |     | NULL    |       |

| address                       | varchar(75)  | YES  |     | NULL    |       |

| alias                         | varchar(100) | YES  |     | NULL    |       |

| check_attempt                 | smallint(6)  | YES  |     | NULL    |       |

| check_command                 | text         | YES  |     | NULL    |       |

| check_freshness               | tinyint(1)   | YES  |     | NULL    |       |

| check_interval                | double       | YES  |     | NULL    |       |

| check_period                  | varchar(75)  | YES  |     | NULL    |       |

| check_type                    | smallint(6)  | YES  |     | NULL    |       |

| checked                       | tinyint(1)   | YES  |     | NULL    |       |

| command_line                  | text         | YES  |     | NULL    |       |

| default_active_checks         | tinyint(1)   | YES  |     | NULL    |       |

| default_event_handler_enabled | tinyint(1)   | YES  |     | NULL    |       |

| default_failure_prediction    | tinyint(1)   | YES  |     | NULL    |       |

| default_flap_detection        | tinyint(1)   | YES  |     | NULL    |       |

| default_notify                | tinyint(1)   | YES  |     | NULL    |       |

| default_passive_checks        | tinyint(1)   | YES  |     | NULL    |       |

| default_process_perfdata      | tinyint(1)   | YES  |     | NULL    |       |

| display_name                  | varchar(100) | YES  |     | NULL    |       |

| enabled                       | tinyint(1)   | NO   |     | 1       |       |

| event_handler                 | varchar(255) | YES  |     | NULL    |       |

| event_handler_enabled         | tinyint(1)   | YES  |     | NULL    |       |

| execution_time                | double       | YES  |     | NULL    |       |

| failure_prediction            | tinyint(1)   | YES  |     | NULL    |       |

| first_notification_delay      | double       | YES  |     | NULL    |       |

| flap_detection                | tinyint(1)   | YES  |     | NULL    |       |

| flap_detection_on_down        | tinyint(1)   | YES  |     | NULL    |       |

| flap_detection_on_unreachable | tinyint(1)   | YES  |     | NULL    |       |

| flap_detection_on_up          | tinyint(1)   | YES  |     | NULL    |       |

| flapping                      | tinyint(1)   | YES  |     | NULL    |       |

| freshness_threshold           | double       | YES  |     | NULL    |       |

| high_flap_threshold           | double       | YES  |     | NULL    |       |

| icon_image                    | varchar(255) | YES  |     | NULL    |       |

| icon_image_alt                | varchar(255) | YES  |     | NULL    |       |

| last_check                    | int(11)      | YES  |     | NULL    |       |

| last_hard_state               | smallint(6)  | YES  |     | NULL    |       |

| last_hard_state_change        | int(11)      | YES  |     | NULL    |       |

| last_notification             | int(11)      | YES  |     | NULL    |       |

| last_state_change             | int(11)      | YES  |     | NULL    |       |

| last_time_down                | int(11)      | YES  |     | NULL    |       |

| last_time_unreachable         | int(11)      | YES  |     | NULL    |       |

| last_time_up                  | int(11)      | YES  |     | NULL    |       |

| last_update                   | int(11)      | YES  |     | NULL    |       |

| latency                       | double       | YES  |     | NULL    |       |

| low_flap_threshold            | double       | YES  |     | NULL    |       |

| max_check_attempts            | smallint(6)  | YES  |     | NULL    |       |

| modified_attributes           | int(11)      | YES  |     | NULL    |       |

| next_check                    | int(11)      | YES  |     | NULL    |       |

| next_host_notification        | int(11)      | YES  |     | NULL    |       |

| no_more_notifications         | tinyint(1)   | YES  |     | NULL    |       |

| notes                         | varchar(255) | YES  |     | NULL    |       |

| notes_url                     | varchar(255) | YES  |     | NULL    |       |

| notification_interval         | double       | YES  |     | NULL    |       |

| notification_number           | smallint(6)  | YES  |     | NULL    |       |

| notification_period           | varchar(75)  | YES  |     | NULL    |       |

| notify                        | tinyint(1)   | YES  |     | NULL    |       |

| notify_on_down                | tinyint(1)   | YES  |     | NULL    |       |

| notify_on_downtime            | tinyint(1)   | YES  |     | NULL    |       |

| notify_on_flapping            | tinyint(1)   | YES  |     | NULL    |       |

| notify_on_recovery            | tinyint(1)   | YES  |     | NULL    |       |

| notify_on_unreachable         | tinyint(1)   | YES  |     | NULL    |       |

| obsess_over_host              | tinyint(1)   | YES  |     | NULL    |       |

| output                        | text         | YES  |     | NULL    |       |

| passive_checks                | tinyint(1)   | YES  |     | NULL    |       |

| percent_state_change          | double       | YES  |     | NULL    |       |

| perfdata                      | text         | YES  |     | NULL    |       |

| process_perfdata              | tinyint(1)   | YES  |     | NULL    |       |

| retain_nonstatus_information  | tinyint(1)   | YES  |     | NULL    |       |

| retain_status_information     | tinyint(1)   | YES  |     | NULL    |       |

| retry_interval                | double       | YES  |     | NULL    |       |

| scheduled_downtime_depth      | smallint(6)  | YES  |     | NULL    |       |

| should_be_scheduled           | tinyint(1)   | YES  |     | NULL    |       |

| stalk_on_down                 | tinyint(1)   | YES  |     | NULL    |       |

| stalk_on_unreachable          | tinyint(1)   | YES  |     | NULL    |       |

| stalk_on_up                   | tinyint(1)   | YES  |     | NULL    |       |

| state                         | smallint(6)  | YES  |     | NULL    |       |

| state_type                    | smallint(6)  | YES  |     | NULL    |       |

| statusmap_image               | varchar(255) | YES  |     | NULL    |       |

| timezone                      | varchar(64)  | YES  |     | NULL    |       |

| real_state                    | smallint(6)  | YES  |     | NULL    |       |

+-------------------------------+--------------+------+-----+---------+-------+

 

But there is no last_alive column in this table.

Regards

Userlevel 5
Badge +16

That what I was thinking when I checked on my platform.

*mumble*

 

Could stop cbd using `systemctl stop cbd` command, and then remove all the file in /var/lib/centreon-broker (with rm -rf /var/lib/centreon-broker/*) ?

 

And start it again ?

Badge +2

Done, but centreon service still exited after restart

The central-broker-master.log have changed :

Centreon Broker 20.04.17 log file closed

Centreon Broker 20.04.17 log file opened

[1646919200] error:   conflict_manager: error in the main loop: could not store host status:  Out of range value for column 'last_time_up' at row 1

[1646919201] error:   conflict_manager: error in the main loop: could not store host status:  Out of range value for column 'last_time_down' at row 1

[1646919201] error:   failover: 'connection-to-CENTREON.POLLER' cannot connect endpoint.

 

Regards

Badge +2

Here’s the max values in the last_time_up & last_time_down column :

MAX(last_time_down) = 1646920764 (Thursday 10 March 2022 13:59:24 GMT)

MAX(last_time_up) = 1646920761 (Thursday 10 March 2022 13:59:21 GMT)

Regards

Userlevel 5
Badge +16

It’s going to be little overkill but could you in that order:

  • stop cbd on the central,
  • stop centengine on the central and on the poller,
  • remove all the files under /var/lib/centreon-broker/ (all your servers),
  • delete all the retention.dat and status.dat in /var/log/centreon-engine/
  • remove all the .queue. files under /var/lib/centreon-engine/
  • And restart cbd and centengine

 

Reply