How to handle stale process

Question

What is the best practice to handle stale process from services?

Most of my services have a 5 minutes polling, although some of them have their last check reaching hours.

When I list process from centreon-engine they seem to be stuck even with max_execution_time set to 60s

These scripts are basically consulting large MIB tables which runs in 5 seconds average but for some reason they stack in the process pipeline.

I’d like to schedule a script to cleanup those stuck process. Since I don’t want to mess with the system crontab, using a centreon service could be the best way to schedule it. How can I set the right permission to centreon-engine be killing these process though this script?

christophe.niel-ACT · Accepted Answer

I realize I didn’t read your last question correctly and missed the part for the rights to kill process

all the check command are run with the user “centreon-engine”

So you script that will kill all your stale process will be run by the same user and should not have issue to kill them, you should not need specific right to kill your own process or run the “kill” command without sudo or root

but you really should find the rootcause of your problem, a kill script should be the last resort

christophe.niel-ACT · Answer

HelloI had this issue on a specific case : I was using a poller on a “burstable” vm on azure with 10%cpu limit (model B1),this was causing process to get stuck and never ending because I did not have cpu credit to run when I started monitoring more and more services.is it your case?if it is, try allocating more resourcesif not :I have never see that behaviour, unless your script is simply not exiting and sending it’s exit code/output textcould you send your engine configuration with the “Freshness” in “check options”, and your “tuning”, maybe something is not correctly set up there.also your check command line(does your script exit correctly and with the right exit code when you run the command manually on the poller shell? you can check the exit code with “echo $?” right after your command)the centreon engine is a scheduler that will run the command and wait for the exit code, if it never arrives it should timeout at somepointalso it should not wait for the previous execution to finish and should run at each normal check intervaleven if the previous one is not exited. (I’m not 100% sure about this)That reminds me, and you should start there, what is your “normal check interval” on these services (or service template if you use one)you can check this by looking at this on the resource detailthe “Next Check” is 5min after the last check, and my normal check interval is “5”is your “next check” correctly scheduled with the “normal check interval” ?to your question, there is no best way to manage stale process, there shouldn’t be any :-/this will list you the pid of any process named “yourprocessname” running for more than 1h, (3600s)ps -C yourprocessname-o pid,etimes,command | awk '(NR>1){if($2>3600) print $1}'you could kill the listbut if it’s “perl” or “php” or “python” it could have bad consequences.killing a process launched by centreon engine should result in a “no ouput” Unknown status, I’m not sure if that have an impact on the engine, it should not, the process are exiting on their own normally, having something that kill such process should not cause issue (except the unknown status)and the crontab is not really an issue on the whole, you can add whatever you want there

Reply

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded