We install Centreon on a CentOS/Redhat/Oracle Linux base. Therefore, some Linux commands are needed to troubleshoot and administer your platform. When working with our customers and users, we often have to help them troubleshoot performance bottlenecks. This article gives you the fundamental Linux commands to efficiently understand your system health and operating condition. At the end, you will be able to: use common system commands, understand what these commands are doing and now some use cases and when they are relevant.
Â
Â
Check your system load
Â
The load is one of the indicators to monitor on a Centreon platform. Knowing how to read and analyze this value will allow you to respond to the increasing load of your Centreon servers and latency, which degrades the user experience. Many commands allow displaying the system load, among which: uptime.
The uptime command allows us to follow several indicators.
10:41:09 up 192 days, 23:13, 2 users, load average: 1,00, 0,30, 4,00
The output displays:
- The current time
- How long the system has been running
- How many users are currently logged in
- The CPU system load averages for the past 1, 5, and 15 minutes
You can see the different options of the uptime command by running:
eroot@Centreon-Central ~]# uptime -h
The load average indicates the average system load over a time period. This value considers the processes that are running and the processes that are waiting (for physical resources like CPU and Disk or uninterruptible locks like codepaths that set TASK_UNINTERRUPTIBLE) to run. In other words, it measures the number of threads that aren't completely idle (to see more about Linux Load Averages).
The command output displays the CPU load average values from left to right, one value per period
- load average over the last 1 minute is 1,00
- load average over the last 5 minutes is 0,30
- load average over the last 15 minutes is 4,00
To display the number of processing units available, we run:
1
Then we can interpret the load like this:
- The CPU was fully (100%) utilized on average; 1 process on the CPU (1.00) over the last 1 minute.
- The CPU was idle by 70% on average; no processes were waiting for CPU time (0.30) over the last 5 minutes.
- On average, the CPU was overloaded by 400%; 3 processes were waiting for CPU time (4.00) during the previous 15 minutes.
The above calculation is for 1 core. When you have multiple cores, you should divide the load value by the number of available cores. For example, if you have six cores and got an average "10", your CPU usage is: (10 / 6) * 100 = 167%.
If you haven't noticed any problem with load and the Centreon platform is still unstable, then you can investigate on the memory side.
The free command displays the amount of free and used memory in the system. We run free with the  -mh flag to watch the output in megabytes and a human readable format:
troot@Centreon-Central ~]# free -mh
total used free shared buff/cache available
Mem: 3,7G 1,3G 170M 192M 2,3G 2,0G
Swap: 2,0G 398M 1,6G
- The first line indicates the memory details like total RAM available in our system, used RAM, free RAM, shared RAM, buffered RAM and cached RAM
- The second line shows the above values for the memory
- The third line shows total, used, and free swap memory.
To continuously display the result at a few second intervals (for example 10s):
)root@Centreon-Central ~]# free -s 10
If you want to learn more about using the free command correctly, run:
rroot@Centreon-Central ~]# free --help
If there is no memory problem and you are still detecting latency, then look at the centreon-engine configuration. It is the process that updates the information system.
troot@Centreon-Central ~]# /usr/sbin/centenginestats
See online documentation to interpret output parameters.
It will also be necessary to view the log files. In the next part, we will give you some Linux commands to analyze the files.
Â
Everything that you need to know about filesÂ
Â
It will be useful for you to be able to perform basic file operations. In this part we will focus on three operations:
- Search for files
- View Log Files
- Analyze rights / Change user
Â
Search for files
In some situations, you will need to find files or directories and perform operations. One of the powerful and flexible ways to do this is the find command. It is a command-line utility for finding files in a directory hierarchy. Â You can find Files by Name, by Extension, by Type, by Size, by Modification Date, by Permissions, by Owner and by Delete Files. To learn more about the find command, run:
mroot@Centreon-Central ~]# find --help
For example, you see the following error in the broker's log: MySQL server has gone away. Then you decide to check the database configuration. But you don't know where the centreon.cnf file is. To search for a configuration file from the root, run:
root@Centreon-Central ~]# find / -name *.cnf
/etc/pki/tls/openssl.cnf
/etc/my.cnf
/etc/my.cnf.d/mysql-clients.cnf
/etc/my.cnf.d/server.cnf
/etc/my.cnf.d/spider.cnf
/etc/my.cnf.d/centreon.cnf
/usr/share/mysql/wsrep.cnf
Another example, you have a bug on your web interface, and you want to check the logs. Since you know the exact name of the log file centreon-error.log, run:
rroot@Centreon-Central ~]# find / -name centreon-error.log
/var/opt/rh/rh-php73/log/php-fpm/centreon-error.log
Another way to search for files and directories is to use the locate command. This is the quickest way to search for files by their names. The locate package may not be pre-installed on your server.
root@Centreon-Central ~]# locate centreon.cnf
bash: locate: command not found
You can easily install it using the package manager and update the locate database:
aroot@Centreon-Central ~]# yum install mlocate
]root@Centreon-Central ~]# updatedb
To learn more about the locate command, run:
croot@Centreon-Central ~]# locate --help
If we take our previous examples, we find the centreon.cnf and centreon-error.log file as follows:
root@Centreon-Central ~]# locate centreon.cnf centreon-error.log
/etc/my.cnf.d/centreon.cnf
/var/opt/rh/rh-php73/log/php-fpm/centreon-error.log
Â
View Log Files
On a Centreon platform as on any Linux platform you can obtain information on the execution of a process by consulting the logs. Being able to search in the logs or any other file is crucial. The two main commands for that are tail and grep.
By default, tail command prints the last ten lines of a file to standard output. We will run with the -f flag, allowing us to watch over new lines constantly.
root@Centreon-Central ~]# tail -f /var/log/centreon-broker/central-broker-master.log
To have the last 20 lines of a log file, we run:
root@Centreon-Central ~]# tail -n 20 /var/log/centreon-broker/central-broker-master.log
We can give more than one file:
oroot@Centreon-Central ~]# tail -f /var/log/centreon-engine/centengine.log /var/log/messages
The full list of options is available below:
sroot@Centreon-Central ~]# tail --help
The grep command allows you to refine your search. It searches through the file, looking for matches to the pattern specified. Consider the following example: you restart centreon-engine and want to know if the cbmod module has been initialized. The cbmod module makes the link between a Poller and the Central.
oroot@Centreon-Central ~]# grep cbmod /var/log/centreon-engine/centengine.log*
e1606467199] e14328] Event broker module '/usr/lib64/nagios/cbmod.so' initialized successfully
Consider the second example: An instruction in the centreon-engine configuration file explains how to load the library, this can be checked as follows:
broot@Centreon-Central ~]# grep cbmod /etc/centreon-engine/centengine.cfg broker_module=/usr/lib64/nagios/cbmod.so /etc/centreon-broker/poller-module.json
To learn more about the grep command, run:
troot@Centreon-Central ~]# grep --help
Â
Analyze rights / change user
On a Centreon server as on any Linux server, permissions are required to access files. Consider the epp license file, you can read file permissions using ls command as follows:
root@Centreon-Central ~]# ls -l /etc/centreon/license.d/epp.license
-rw-r--r-- 1 apache apache 1091 Sep 22 12:25 epp.license
Three different classes of users can be associated with a file:
- The owner (u)
- The group owning the file (g)
- Other users (o)
There are three types of file permissions
- The read permission (r)
- The write permission (w)
- The execution permission (x)
Let's go back to the previous example:
-rw-r--r-- 1 apache apache 1091 Sep 22 12:25 epp.license
- The first character (-)Â shows the file type. (-)Â indicates a regular file, (d)Â indicates a directory and (l) indicates a symbolic link
- The first triplet (rw-) shows the owner permissions: he can only read (r) and write (w). He cannot execute (x)
- The second triplet (r--) shows the group owning the file permissions. The member can only read (r). They cannot write (w) and execute (x) on the file
- The third triplet (r--) shows everybody else permissions. They can only read (r). They cannot write (w) and execute (x) on the file
You can change directory permissions using chmod. There are two methods:
Symbolic or Text Method
For example, reconsider the epp.licence file:
-rw-r--r-- 1 apache apache 1091 Sep 22 12:25 epp.license
- Add write permissions for other users:
iroot@Centreon-Central ~]# chmod u+o epp.license
- Remove the write permission for the owner
mroot@Centreon-Central ~]# chmod u-w epp.license
- Add execution permissions for the group owning the file
groot@Centreon-Central ~]# chmod g+x epp.license
- Add the write permissions for all
eroot@Centreon-Central ~]# chmod a+w epp.license
Numeric Method
For example, reconsider the epp.licence file:
-rw-r--r-- 1 apache apache 1091 Sep 22 12:25 epp.license
To find out the file’s permissions in numeric mode, do the following calculation:
- Owner: (rw-) =Â 110 (binary) = 6 (decimal)
- Group : (r--) = 100 (binary) = 4 (decimal)
- Others : (r--) = 100 (binary) = 4 (decimal)
So, the file’s permissions in the numeric notation are 644. You can also check this using the stat command:
root@Centreon-Central ~]# stat -c "%a" epp.license
644
Previous operations can be performed using the numeric mode
- Add write permissions for other users:
root@Centreon-Central ~]# chmod 646 epp.license
- Remove the write permission for the owner
rroot@Centreon-Central ~]# chmod 444 epp.license
- Add execution permissions for the group owning the file
froot@Centreon-Central ~]# chmod 664 epp.license
- Add the write permissions for all
troot@Centreon-Central ~]# chmod 666 epp.license
To learn more about the cbmod command, run:
root@Centreon-Central ~]# chmod --help
The chown command allows you to change the owner and/or the group owning the file. Let's take our example:
-rw-r--r-- 1 apache apache 1091 Sep 22 12:25 epp.license
To change both the owner and the group of the epp.license file, we use the chown command followed by the new owner and the new group separated by a colon (:)
hroot@Centreon-Central ~]# chown root:root epp.license
To change only the group, we run the following command:
root@Centreon-Central ~]# chown :root epp.license
You can see the different options of the chown command by running:
oroot@Centreon-Central ~]# chown --help
Â
Understand the use of my disk space
Â
In some situations, it will be helpful to check the disk partitions. For this, you can use the df command. It shows filesystems' available and used disk space on your Centreon server. We run it with the -f flag to get the output in a human-readable format.
troot@Centreon-Central ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 908M 0 908M 0% /dev
tmpfs 920M 0 920M 0% /dev/shm
tmpfs 920M 90M 830M 10% /run
tmpfs 920M 0 920M 0% /sys/fs/cgroup
/dev/mapper/centos-root 3.0G 3.0G 25M 100% /
/dev/mapper/centos-var_lib_centreon--broker 2.0G 2.0G 20K 100% /var/lib/centreon-broker
/dev/mapper/centos-var_lib_mysql 2.0G 1.3G 746M 64% /var/lib/mysql
/dev/mapper/centos-var_lib_centreon--engine 2.0G 2.0G 70M 97% /var/lib/centreon-engine
/dev/mapper/centos-var_lib_centreon 2.0G 290M 1.8G 15% /var/lib/centreon
/dev/mapper/centos-var_log 2.0G 559M 1.5G 28% /var/log
/dev/sda1 1014M 192M 823M 19% /boot
tmpfs 184M 0 184M 0% /run/user/0
It is also possible to analyze the graphics of each disk partition through the Centreon web interface.
A large number of inodes on your Poller can cause writing problems on files or directories. To check the status of the inodes, run the following command:
troot@Centreon-Central ~]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
devtmpfs 225456 281 225175 1% /dev
tmpfs 231170 1 231169 1% /dev/shm
tmpfs 231170 378 230792 1% /run
tmpfs 231170 16 231154 1% /sys/fs/cgroup
/dev/nvme0n1p1 10485232 82844 10402388 1% /
tmpfs 231170 1 231169 1% /run/user/1000
You can see the different options of the df command by running:
froot@Centreon-Central ~]# df --help
When running df we see a default command:
- Filesystem - Provides the name of the file system
- Size - Indicates the total size of the filesystem
- Used - Indicates the amount of disk space already used in the file system
- Avail - Indicates the amount of space available in the file system
- Use%Â - Indicates the amount of disk space already used as a percentage
- Mounted on - Specifies the mount point for this file system.
In this example, the space available in /var/lib/centreon is 1.8G. RRD files used to display charts are stored in this mount point. It will therefore be interesting to analyze the different directories:
oroot@Centreon-Central ~]# du -shx /var/lib/centreon/metrics/
38M /var/lib/centreon/metrics/
vroot@Centreon-Central ~]# du -shx /var/lib/centreon/status/
15M /var/lib/centreon/status/
Â
Know the active processes as well as these resources used
Â
On a Centreon server as on any Linux server, several processes run simultaneously without affecting each other. Therefore, it is important to get information about the processes running within your system. We will focus on two main commands: Top and Ps.Â
The ps (Process Status) command displays information about the currently running processes. Without arguments, it displays the running processes in the current shell. To view all current processes in BSD format, execute:
hroot@Centreon-Central ~]#ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
It gives us a lot of helpful information about a process:
- USER - The user running a command
- PID – The process ID
- %CPU - CPU usage in percentage
- %MEM - Memory usage in percentage
- VSZ - Virtual Memory Size - This is the size of memory that Linux has given to a process
- RSS - Resident Set Size - This is the size of memory that a process has currently used to load all of its pages
- TTY - The terminal associated with the process
- STAT - The process state
- START - The start time of the process
- TIME - The cumulated CPU time
- COMMAND - The executed command
The Ps command can be used with the grep command if you want to filter on a specific process. Consider for example the centengine process, it’s the process that collects data from monitored hosts. To view information about it, you can run:
root@Centreon-Central ~]#ps aux | grep centengine
root 18156 0.0 0.0 112812 972 pts/0 R+ 11:40 0:00 grep --color=auto centengine
centreo+ 21727 0.2 2.9 654804 55592 ? Ssl Oct26 202:33 /usr/sbin/centengine /etc/centreon-engine/centengine.cfg
If you want to learn more about using the ps command correctly, run:
root@Centreon-Central ~]#ps --help all
The top command (table of processes) displays the processor activity of your Centreon Server, and tasks managed by the kernel in real-time.
Consider the following situation: performances of the Centreon platform are deteriorating (database insertion, writing RRD files, reactivity of the Centreon web interface,). It may be important to have an indication where the action of a given process is being delayed due to waiting for the disk to be available for reading or writing. To list all running Linux processes, execute top on the command line. The output of this command consists of two main parts. The top half part contains the statistics of processes and resource usage:
top - 14:01:45 up 67 days, 4:48, 1 user, load average: 0.00, 0.02, 0.05
Tasks: 130 total, 1 running, 129 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.2 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.5 st
KiB Mem : 1849360 total, 131044 free, 802016 used, 916300 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 752932 avail Mem
On this output, the second line is of particular interest. We can read that 130 programs are listed, among them a process is running, 129 are sleeping (running but not active), and there is no stopped process and zombie process (its execution is completed, but it still has an entry in the process table). The third line is also interesting, especially %wa and %si. The first indicates the time waiting for I/O completion and the second indicates time spent servicing software interrupts.
The second half contains a list of the currently running processes:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
It provides a lot of useful information about a process:
- PIDÂ - The task's unique process ID, which periodically wraps, though never restarting at zero
- USERÂ - The effective username of the task's owner
- PRÂ - The scheduling priority of the task.
- NIÂ - The nice value of the task
- VIRTÂ - The total amount of virtual memory used by the task
- RESÂ - The non-swapped physical memory a task is using.
- SHRÂ - The amount of shared memory available to a task, not all of which is typically resident
- SÂ - The status of the task
- %CPUÂ - The task's share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time
- %MEMÂ - A task's currently used share of available physical memory
- TIME+Â Total CPU time the task has used since it started
- COMMANDÂ - Display the command line used to start a task or the name of the associated program.
To learn more about the top command, run:
oroot@Centreon-Central ~]#top -h
Managing the list of current processes is important because the system can be unstable or operates randomly (inconsistent results, duplicate notifications etc.) because of a zombie process within the system. In this case, after finding the PID of the process, you can kill it using the kill command.
The syntax for the kill command looks like this:
kill
To get a list of all available signals, run:
eroot@Centreon-Central ~]#kill -l
Â
Identify inbound and outbound connections
Â
Inbound refers to the connections that come from elsewhere and that arrive on the Centreon server and Outbound refers to the connections that come from the Centreon server and go to a specific device.
In some situations, you will need to analyze network connections. Consider that the Poller is not updated. We know that the network flow from the Poller to the Central Server uses the TCP protocol. The centreon-engine process is listening to this connection on port 5669. Therefore, it would be necessary to investigate to determine the status of this network flow. To do this, we recommend using the ss command.
The ss command displays information about network sockets on a Linux system. When no option is used ss displays a list of open non-listening sockets. But we are looking for centengine connections and we want to have a view of all listening TCP socket connections and display the process IDs. So, we will throw with the flag -lpt -a:
oroot@Centreon-Central ~]# ss -lpt -a | grep centengine
ESTAB 0 0 10.30.2.66:54254 10.30.2.108:5669 users:(("centengine",pid=21993,fd=14))
"ESTAB" means that the connection is established, the IP addresses must match those of the Poller experiencing the problem and of the Central server.
It is also possible to filter the socket port number or address number. Consider this example: user actions, such as changes to hosts configuration, are not taken into account. We know that the network flow for exporting the Centreon configuration runs from the Central server to the Poller and uses the ZMQ protocol on TCP port 5556. Therefore, we check the status of this connection by running:
aroot@Centreon-Central ~]# ss -nat '( dport = :5556 or sport = :5556 )'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 10.30.2.108:42784 10.30.2.66:5556
We note that the connection is established. The next step is to view the gorgone logs.
If you want to learn more about using the ss command correctly, run:
proot@Centreon-Central ~]# ss --help
Â