This tutorial on troubleshooting poller issues is the first installment of a series devoted to the resolution of common issues reported by clients and managed by our teams.
Symptom
Poller issues generally take two forms:
#1 The poller is stated as “not running” on the poller configuration page:
#2 The last update time is older than 15 minutes and highlighted in yellow:
Solution
Solve this issue in Centreon 21.04.
For the purpose of this tutorial, we will use a simple distributed Centreon platform with the following assets:
- a central server with an embedded DB instance (IP 192.168.56.125)
- a poller (IP 192.168.56.126)
Note that in the following examples, commands will be shown as executed with the root user.
Quick tour: Overview of a Centreon typical architecture
For efficient and targeted troubleshooting, you need to know what the different components of a Centreon platform are, and how they interact with each other. As a quick reminder, here’s a platform example comprising a central server and a poller.
- The poller’s monitoring engine (centreon-engine) executes a bunch of monitoring scripts (probes) to query the monitored devices.
- Results are returned by centreon-engine’s cbmod module to the centreon-broker instance on the central (BBDO protocol, TCP 5669).
- The central server’s centreon-broker module handles and processes the results, populating DB tables or generating graphical RRD files.
- The web interface (Apache server and PHP backend) displays the monitoring information to the user.
- The centreon-gorgone module exports to the pollers the configurations made on the interface and launches external commands; centreon-gorgone is a client/server module, active both on the central server and in the pollers (ZMQ protocol TCP 5556).
Now that we know who does what in a Centreon architecture, let’s get to the troubleshooting.
Check 1: Network connections
The poller must be connected to the TCP port 5669 of the Central server (BBDO flows). Conversely, the central server must be connected to the TCP port 5556 of the poller (ZMQ centreon-gorgone flows).
The netstat
command is helpful to check this, but ss
can also be used.
When executing this command on the poller, we should get the following results:
troot@centreon-poller ~]# netstat -plant | egrep '5556|5669'
tcp 0 0 0.0.0.0:5556 0.0.0.0:* LISTEN 3761/perl
tcp 0 0 192.168.56.126:5556 192.168.56.125:57466 ESTABLISHED 3761/perl
tcp 0 0 192.168.56.126:33598 192.168.56.125:5669 ESTABLISHED 3554/centengine
From the Central, the results will be the same, but in the opposite direction:
oroot@centreon-central ~]# netstat -plant | egrep '5669|5556'
tcp 0 0 0.0.0.0:5669 0.0.0.0:* LISTEN 1436/cbd
tcp 0 0 192.168.56.125:57466 192.168.56.126:5556 ESTABLISHED 1781/gorgone-proxy
tcp 0 0 192.168.56.125:5669 192.168.56.126:33574 ESTABLISHED 1436/cbd
tcp 0 0 127.0.0.1:5669 127.0.0.1:58200 ESTABLISHED 1436/cbd
tcp 0 0 127.0.0.1:58200 127.0.0.1:5669 ESTABLISHED 1430/centengine
Should the status be different from “ESTABLISHED”, checking data flows and firewall logs will usually help locate the problem.
Another common reason of the network connection failure is that the Linux firewall may be running on either the central or the poller.
This can be easily checked by running the following command:
oroot@centreon-central ~]# systemctl status firewalld
firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:firewalld(1)
If the firewalld process is still active, stop and disable it:
sroot@centreon-central ~]# systemctl stop firewalld
eroot@centreon-central ~]# systemctl disable firewalld
If a configuration error is identified at this stage (for example: an erroneous IP address when creating a Poller), it will be necessary to check and correct the entry at various levels of the Centreon configuration, we will see that in a future article.
Check 2: centreon-engine process
This step aims to check that the monitoring engine is up and running on the poller. First let’s see if the centengine process (daemon of the centreon-engine module) is acting as it should:
troot@centreon-poller ~]# systemctl status centengine
centengine.service - Centreon Engine
Loaded: loaded (/usr/lib/systemd/system/centengine.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2021-02-10 09:56:35 CET; 1h 6min ago
Main PID: 4463 (centengine)
CGroup: /system.slice/centengine.service
└─4463 /usr/sbin/centengine /etc/centreon-engine/centengine.cfg
Current status should be “active (running)”. Ensuring the “enabled” state is active is also important, as it allows the daemon to automatically start with the system, in case of a power outage or an unexpected reboot (did someone just pull the plug?).
If status is “disabled”, activate the service at startup with the following command:
sroot@centreon-poller ~]# systemctl enable centengine
Created symlink from /etc/systemd/system/centreon.service.wants/centengine.service to /usr/lib/systemd/system/centengine.service.
Sometimes the centengine daemon can be reported as dead as in the following example:
aroot@poller ~]# systemctl status centengine
centengine.service - Centreon Engine
Loaded: loaded (/usr/lib/systemd/system/centengine.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Wed 2021-02-10 09:56:35 CET; 22s ago
Process: 22846 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
Process: 4547 ExecStart=/usr/sbin/centengine /etc/centreon-engine/centengine.cfg (code=exited, status=0/SUCCESS)
Main PID: 4547 (code=exited, status=0/SUCCESS)
Just try to force the restart: systemctl restart centengine
and check the logs in /var/log/centreon-engine/centengine.log for clues about how it ended up this way. The most common reasons would be:
- SELinux configuration
- Rights on folders and files
- Missing libraries or dependencies…
Check 3: centreon-gorgone process
As you briefly saw above, the centreon-gorgone module (carried by the gorgoned process) was introduced in Centreon 21.04, replacing the former centcore module.
The centreon-gorgone module provides a client/server connection between central and pollers whereas centcore only knew how to copy files using SSH (pretty much).
A defaulting poller can sometimes be attributed to a fault in the centreon-gorgone module, as it is responsible for collecting the “Is running or not” status of pollers. It may happen, for example, that the monitoring is running normally, returning current and real-time data, but the poller displays as “not running.” Here’s how to run the check in that case.
- First, identify and save the ID of the problematic poller. Go to Configuration > Pollers and hover the mouse on the poller’s name:
On the URL link at the bottom of the page, the related poller’s ID will show.
- On both sides (central and poller), check that the gorgoned daemon is up and running:
lroot@centreon-central ~]# systemctl status gorgoned
gorgoned.service - Centreon Gorgone
Loaded: loaded (/etc/systemd/system/gorgoned.service; enabled; vendor preset: disabled)
Active: active (running) since mer. 2021-02-10 09:56:03 CET; 2h 10min ago
Main PID: 3413 (perl)
CGroup: /system.slice/gorgoned.service
├─3413 /usr/bin/perl /usr/bin/gorgoned --config=/etc/centreon-gorgone/config.yaml --logfile=/var/log/centreon-gorgone/gorgoned.log --severity=info
├─3421 gorgone-nodes
├─3428 gorgone-dbcleaner
├─3435 gorgone-autodiscovery
├─3442 gorgone-cron
├─3443 gorgone-engine
├─3444 gorgone-statistics
├─3445 gorgone-action
├─3446 gorgone-httpserver
├─3447 gorgone-legacycmd
├─3478 gorgone-proxy
├─3479 gorgone-proxy
├─3486 gorgone-proxy
├─3505 gorgone-proxy
└─3506 gorgone-proxy
févr. 10 09:56:03 cent7-2010-ems systemdb1]: Started Centreon Gorgone.rroot@centreon-poller centreon-engine]# systemctl status gorgoned
gorgoned.service - Centreon Gorgone
Loaded: loaded (/etc/systemd/system/gorgoned.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2021-02-10 09:53:16 CET; 2h 13min ago
Main PID: 4384 (perl)
CGroup: /system.slice/gorgoned.service
├─4384 /usr/bin/perl /usr/bin/gorgoned --config=/etc/centreon-gorgone/config.yaml --logfile=/var/log/centreon-gorgone/gorgoned.log --severity=info
├─4398 gorgone-dbcleaner
├─4399 gorgone-engine
└─4400 gorgone-action - On the central server, check that messages related to the ID are present and do not throw error messages:
- On the poller, check that the gorgoned configuration file /etc/centreon-gorgone/config.d/40-gorgoned.yaml exists and is similar to this one :
aroot@centreon-poller centreon-engine]# cat /etc/centreon-gorgone/config.d/40-gorgoned.yaml
name: gorgoned-Poller
description: Configuration for poller Poller
gorgone:
gorgonecore:
id: 2
external_com_type: tcp
external_com_path: "*:5556"
authorized_clients:
- key: 3H2jXp7D7PC7OTM1ifosCO0l7iqJkf60lHWGWYnR5qY
privkey: "/var/lib/centreon-gorgone/.keys/rsakey.priv.pem"
pubkey: "/var/lib/centreon-gorgone/.keys/rsakey.pub.pem"
modules:
- name: action
package: gorgone::modules::core::action::hooks
enable: true
- name: engine
package: gorgone::modules::centreon::engine::hooks
enable: true
command_file: "/var/lib/centreon-engine/rw/centengine.cmd"
If not, check this link to create it again.
Check 4: centreon-broker process
Centreon-broker flows between the central and the poller are essential for a fully operational and efficient Centreon platform.
Beyond network issues (lost connection BBDO TCPTCP/5669, real-time data processing can potentially be interrupted, which will display the poller as “Not updated.” Here are the steps to check the Centreon broker.
- First, check the version consistency between the cbmod module (on the poller) and the centreon-broker module on the central. They should always be the same:
oroot@centreon-central ~]# rpm -qa | grep centreon-broker
centreon-broker-cbd-21.04.3-5.el7.centos.x86_64
centreon-broker-storage-21.04.3-5.el7.centos.x86_64
centreon-broker-21.04.3-5.el7.centos.x86_64
centreon-broker-cbmod-21.04.3-5.el7.centos.x86_64
centreon-broker-core-21.04.3-5.el7.centos.x86_64
rroot@centreon-poller ~]# rpm -qa | grep cbmod
centreon-broker-cbmod-21.04.3-5.el7.centos.x86_64 - On the central server, check that the cbd daemon is up and running:
/root@centreon-central ~]# systemctl status cbd
cbd.service - Centreon Broker watchdog
Loaded: loaded (/usr/lib/systemd/system/cbd.service; enabled; vendor preset: disabled)
Active: active (running) since mer. 2021-02-10 09:41:01 CET; 5h 13min ago
Main PID: 1529 (cbwd)
CGroup: /system.slice/cbd.service
├─1529 /usr/sbin/cbwd /etc/centreon-broker/watchdog.json
├─1537 /usr/sbin/cbd /etc/centreon-broker/central-broker.json
└─1538 /usr/sbin/cbd /etc/centreon-broker/central-rrd.jsonLike the other vital daemons, it should appear as “active (running)” and have its 3 children processes: cbwd, cbd broker et cbd rrd.
- Check the logs: they’re located in the /var/log/centreon-broker directory. On the Central, look for the central-broker-master.log file; on the poller, look for the module-poller.log file:
oroot@centreon-central ~]# ls -ltr /var/log/centreon-broker/
total 8
-rw-rw-r-- 1 centreon-broker centreon-broker 1039 5 févr. 10:43 central-broker-master.log-20210205
-rw-rw-r-- 1 centreon-broker centreon-broker 360 5 févr. 10:43 central-module-master.log-20210205
-rw-rw-r-- 1 centreon-broker centreon-broker 240 5 févr. 10:43 central-rrd-master.log-20210205
-rw-rw-r-- 1 centreon-broker centreon-broker 3610 5 févr. 10:43 watchdog.log-20210205
-rw-rw-r-- 1 centreon-broker centreon-broker 1402 10 févr. 09:41 watchdog.log
-rw-rw-r-- 1 centreon-broker centreon-broker 669 10 févr. 09:41 central-broker-master.log
-rw-rw-r-- 1 centreon-broker centreon-broker 200 10 févr. 09:41 central-rrd-master.log
-rw-rw-r-- 1 centreon-broker centreon-broker 400 10 févr. 10:00 central-module-master.log
9root@centreon-poller ~]# ls -ltr /var/log/centreon-broker/
total 4
-rw-r--r--. 1 centreon-engine centreon-engine 1330 Feb 10 11:12 module-poller.logThese files should not have recent error messages.
In older versions of Centreon we could find these kind of entries:Error : conflict_manager : error in the main loop
If so, simply restart the centreon-broker daemon on the Central server:
froot@centreon-central ~]# systemctl restart cbd
- Last but not least, on the poller, check that the cbmod module is successfully loaded when centreon-engine starts; this part of information should be in the centengine.log file:
root@centreon-poller ~]# grep -i cbmod.so /var/log/centreon-engine/centengine.log
e1612947395] e4463] Event broker module '/usr/lib64/nagios/cbmod.so' initialized successfully
What’s Next?
In this tutorial, we did a tour of the primary checks that should be done whenever a poller seems to be in a funny state. Of course, other causes of issues exist and sometimes the solutions described here wouldn’t do the trick. Hopefully, this will however give you clues on how to start investigating.
Still blocked and running out of coffee? Keep calm and ask questions! There will always be someone keen to give you a hand (if not running out of coffee). Suggestions are also always welcomed!