In this article, we will describe what a threshold is, and how to set it up.
What is a threshold?
Centreon uses plugins to check if everything is fine on a host or service with commands, but without the correct configuration it will just give us information without alerting anyone.
A threshold is a value that defines when an indicator becomes problematic, and a status change should be triggered. For example if we want to check the space on a disk, we can define a threshold that says “ok, if the disk is filled at more than 80% of his total capacity it’s a problem, so tell me when it happens”.
It seems trivial said like that but it’s a very important thing to do on host and services: without that we wouldn’t know without checking by hand every host and service if there is a problem or not and so the monitoring would be pretty useless.
How does it work?
1- You can set it up using the web interface
In the “custom macros” section, you can just enter the value that will be the threshold for warning or critical statuses.
Note that the thresholds we are talking about only work on macros that return numeric values. Macros containing statuses are strings, so they need to be defined in a different way, like that:
CRITICALSTATUS= %{status} !~ /active/i
By the way, if a threshold is not specific to one service or host you can set it up in the template directly.
2- Different types of thresholds
-
Outside the range {0:X}
First, the most common one: just define the value like in the screenshot above. It means that if the data is outside the range between 0 and your threshold (X), then it would trigger a status change. Don’t use that type of threshold if the returned values can be lower than 0, as values below 0 would trigger a status change.
In the example below, the returned values are a percentage. The status will be OK if the returned value is between 0 and 79, it will be warning if the value is between 80 and 89, and critical if the value is 90 or above.
9root@poller1 ~]# /usr/lib/centreon/plugins//centreon_linux_snmp.pl --plugin=os::linux::snmp::plugin --mode=cpu --hostname=svlinuxpar.centreon.training --snmp-version='2c' --snmp-community='os_linux' --snmp-autoreduce --warning-average='80' --critical-average='90'
OK: 4 CPU(s) average usage is 3.00 % | 'total_cpu_avg'=3.00%;0:80;0:90;0;100 'cpu_0'=4.00%;;;0;100 'cpu_1'=3.00%;;;0;100 'cpu_2'=2.00%;;;0;100 'cpu_3'=3.00%;;;0;100
- Under the threshold
Then you can try putting colons at the end of your value: this way it will change the status of the resource if the value is lower than X, i.e. outside a range between X and ∞.
eroot@poller1 ~]# /usr/lib/centreon/plugins//centreon_linux_snmp.pl --plugin=os::linux::snmp::plugin --mode=cpu --hostname=svlinuxpar.centreon.training --snmp-version='2c' --snmp-community='os_linux' --snmp-autoreduce --warning-average='' --critical-average='90:'
CRITICAL: 4 CPU(s) average usage is 4.50 % | 'total_cpu_avg'=4.50%;;90:;0;100 'cpu_0'=1.00%;;;0;100 'cpu_1'=6.00%;;;0;100 'cpu_2'=5.00%;;;0;100 'cpu_3'=6.00%;;;0;100
- Above the threshold
Let’s put the colon before your threshold and adding a tilde like that ~:X
It will be the opposite of before, now the service or host will change status when the returned data is strictly superior to the threshold, i.e. outside the range between -∞ and X.
Basically it’s the same as the classic one but the range of exclusion is larger, since the classic one works for between 0 and X so if you have negative values it can trigger the status change.
The tilde means the negative infinite, in case you’re wondering.
>root@poller1 ~]# /usr/lib/centreon/plugins//centreon_linux_snmp.pl --plugin=os::linux::snmp::plugin --mode=cpu --hostname=svlinuxpar.centreon.training --snmp-version='2c' --snmp-community='os_linux' --snmp-autoreduce --warning-average='' --critical-average='~:90'
CRITICAL: 4 CPU(s) average usage is 4.50 % | 'total_cpu_avg'=4.50%;;90:;0;100 'cpu_0'=1.00%;;;0;100 'cpu_1'=6.00%;;;0;100 'cpu_2'=5.00%;;;0;100 'cpu_3'=6.00%;;;0;100
- Outside the Range: Two numbers
You can set two numbers in the threshold, like that: X:Y. In this case the resource will change its status if the returned value is outside the range between X and Y. For example if I set 30:90, the status will change if the value is in the interval between {-∞ .. 30} or in the interval between 90 and ∞.
wroot@poller1 ~]# /usr/lib/centreon/plugins//centreon_linux_snmp.pl --plugin=os::linux::snmp::plugin --mode=cpu --hostname=svlinuxpar.centreon.training --snmp-version='2c' --snmp-community='os_linux' --snmp-autoreduce --warning-average='' --critical-average='30:90'
CRITICAL: 12 CPU(s) average usage is 1.08 % | 'total_cpu_avg'=1.08%;;30:90;0;100 'cpu_0'=1.00%;;;0;100 'cpu_1'=2.00%;;;0;100 'cpu_10'=1.00%;;;0;100 'cpu_11'=1.00%;;;0;100 'cpu_2'=1.00%;;;0;100 'cpu_3'=1.00%;;;0;100 'cpu_4'=1.00%;;;0;100 'cpu_5'=1.00%;;;0;100 'cpu_6'=1.00%;;;0;100 'cpu_7'=1.00%;;;0;100 'cpu_8'=1.00%;;;0;100 'cpu_9'=1.00%;;;0;100
- In range: the power of @
Finally you can set it up like that: @X :Y. The previous cases defined thresholds for values outside a range, but this time with the @ the values are inside the range! So the status of the resource will change if the returned value is superior or equal to X and lower or equal to Y. If the value is inside the range, the status will be OK.
root@poller1 ~]# /usr/lib/centreon/plugins//centreon_linux_snmp.pl --plugin=os::linux::snmp::plugin --mode=cpu --hostname=svlinuxpar.centreon.training --snmp-version='2c' --snmp-community='os_linux' --snmp-autoreduce --warning-average='' --critical-average='@80:90'
OK: 4 CPU(s) average usage is 4.25 % | 'total_cpu_avg'=4.25%;;@80:90;0;100 'cpu_0'=5.00%;;;0;100 'cpu_1'=4.00%;;;0;100 'cpu_2'=3.00%;;;0;100 'cpu_3'=5.00%;;;0;100
You can set up negative thresholds just by putting a dash before the value, e.g. “-10”. It is useful for some metrics that can be negative like the temperature or the storage service for the Windows SNMP plugin.
See also
- Read more about thresholds and statuses in our official documentation