Skip to main content

Hello,

 

We having an issue with one of our poller. to give you some details of our monitoring infrastructure, its composed of 1 central server with 2 poller on our clients’s site. 

They are all in the same version 21.10:

 

The most part of time the poller work correctly and do his checking / notification job. But when we lose the whole part of client infrastructure (~70 hosts and ~300 services)  the poller doesnt send the notification on time, also the scheduling part seems to have some problem some host are lock in soft state :

 Also as you can see in the first screenshot the UI shows that the poller have some latency

For the notification part we receive 2 mail every minutes , so sometimes we receive the mail for one check one hour after the check changed his state. 

I checked the centengine.log and i found some interesting things :

But i didnt found any interesting topics on the web. 

We also tried to reinstall the poller … but it didnt solve the problem. 

The other poller is working correctly, we tried a notification test and he acted normaly.

 

Does some of you have meet this problem ? 

 

Thx in advance

 

Hello @abii 

Thank you for you troubleshoot.

Could you give use the centengine.cfg of your poller to check if you’re up to date with the optimisation and that you have a “standard” configuration ?

Maybe the check_orphaned_host and check_orphaned_service is not enabled ?


Hello @Kriko 

Thx for your quick answer and sorry for my late reply.

 

It looks like the check_orphaned_host and check_orphaned_service are enable

 

###################################################################
#                                                                 #
#                       GENERATED BY CENTREON                     #
#                                                                 #
#               Developed by :                                    #
#                   - Julien Mathis                               #
#                   - Romain Le Merlus                            #
#                                                                 #
#                           www.centreon.com                      #
#                For information : contact@centreon.com           #
###################################################################
#                                                                 #
#         Last modification 2022-05-18 15:00                      #
#         By                                       #
#                                                                 #
###################################################################
cfg_file=/etc/centreon-engine/hostTemplates.cfg
cfg_file=/etc/centreon-engine/hosts.cfg
cfg_file=/etc/centreon-engine/serviceTemplates.cfg
cfg_file=/etc/centreon-engine/services.cfg
cfg_file=/etc/centreon-engine/commands.cfg
cfg_file=/etc/centreon-engine/contactgroups.cfg
cfg_file=/etc/centreon-engine/contacts.cfg
cfg_file=/etc/centreon-engine/hostgroups.cfg
cfg_file=/etc/centreon-engine/servicegroups.cfg
cfg_file=/etc/centreon-engine/timeperiods.cfg
cfg_file=/etc/centreon-engine/escalations.cfg
cfg_file=/etc/centreon-engine/dependencies.cfg
cfg_file=/etc/centreon-engine/connectors.cfg
cfg_file=/etc/centreon-engine/meta_commands.cfg
cfg_file=/etc/centreon-engine/meta_timeperiod.cfg
cfg_file=/etc/centreon-engine/meta_host.cfg
cfg_file=/etc/centreon-engine/meta_services.cfg
broker_module=/usr/lib64/nagios/cbmod.so /etc/centreon-broker/aaaaaaaaaaa-mod                                                                                                               ule.json
broker_module=/usr/lib64/centreon-engine/externalcmd.so
interval_length=60
use_timezone=:Europe/Paris
resource_file=/etc/centreon-engine/resource.cfg
log_file=/var/log/centreon-engine/centengine.log
status_file=/var/log/centreon-engine/status.dat
status_update_interval=30
external_command_buffer_slots=4096
command_check_interval=2s
command_file=/var/lib/centreon-engine/rw/centengine.cmd
state_retention_file=/var/log/centreon-engine/retention.dat
retention_update_interval=60
sleep_time=0.5
service_inter_check_delay_method=s
host_inter_check_delay_method=s
service_interleave_factor=2
max_concurrent_checks=0
max_service_check_spread=5
max_host_check_spread=5
check_result_reaper_frequency=10
max_check_result_reaper_time=30
auto_rescheduling_interval=30
auto_rescheduling_window=180
low_service_flap_threshold=25.0
high_service_flap_threshold=50.0
low_host_flap_threshold=25.0
high_host_flap_threshold=50.0
service_check_timeout=60
host_check_timeout=10
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
ochp_timeout=5
perfdata_timeout=5
host_perfdata_file_processing_interval=0
service_perfdata_file_processing_interval=0
service_freshness_check_interval=60
host_freshness_check_interval=60
date_format=euro
illegal_object_name_chars=~!$%^&*"|'<>?,()=
illegal_macro_output_chars=`~$^&"|'<>
admin_email=admin
admin_pager=admin@localhost
event_broker_options=-1
translate_passive_host_checks=0
cached_host_check_horizon=15
cached_service_check_horizon=15
passive_host_checks_are_soft=0
additional_freshness_latency=15
debug_file=/var/log/centreon-engine/centengine.debug
debug_level=-1
debug_verbosity=2
max_debug_file_size=10000000
log_pid=1
enable_macros_filter=0
grpc_port=50004
instance_heartbeat_interval=30
enable_notifications=1
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_event_handlers=1
check_external_commands=1
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
log_notifications=1
log_service_retries=0
log_host_retries=0
log_event_handlers=1
log_external_commands=1
log_passive_checks=1
auto_reschedule_checks=0
soft_state_dependencies=0
obsess_over_services=0
obsess_over_hosts=0
process_performance_data=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
check_host_freshness=0
enable_flap_detection=1
use_regexp_matching=0
use_true_regexp_matching=0
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
use_large_installation_tweaks=0
enable_environment_macros=0
use_setpgid=1
naaaaaaa@aaaaaaaaa ~]$ sudo cat /etc/centreon-engine/centengine.cfg  | grep                            orphaned
check_for_orphaned_services=1
check_for_orphaned_hosts=1

Thx for your time

 

Abi.


Thank you for your answer.

Sorry about the delay.

After a discussion with the dev teams, it’s could be an issue with the timeout of the hosts.

Can you give me the configuration of your command to check the host status ?


Hello Kriko,

 

To check the host status we use the command : check_ICMP. 

And here you can see the details of the check : 

Thanks in advance

 

Abi.


Hello,

 

A little UP 🙂 if someone could help us.

 

Thx in advance.

 

Abi.


Is it possible that your platform is overwhelmed with all the services going down at once. Could a dependency be added?

 

https://docs.centreon.com/docs/alerts-notifications/notif-dependencies/


Hi hmorales,

Thx for your answer, i though about that also but if the platform is overwhelmed i should see that during the notification test with ps -aux , top.

We didnt add any hosts or services, the problem appeared since we did the upgrade from 20.04 to 20.10 

we can add dependecies but i’m sure it wont solve the problem.

Abi.


Reply