How to limit disk I/O usage with RRDCacheD ?

  • 21 February 2022
  • 7 replies
  • 1634 views

Userlevel 2
Badge +4


In this article, we will see how to reduce disk I/O usage upon updating performance and status charts (.rrd files). To do that, RRDCacheD is mutualizing data writing instead of doing so one at a time.

 


​​​​​​​Installation

RRDCacheD is automatically installed with the RRDtool RPM. If it is not the case, you can install it using the following command:

yum install rrdtool-cached


Configuration

Create the RRDCacheD user and create its directory:

mkdir -p /var/rrdtool/rrdcached/
useradd rrdcached -d '/var/rrdtool/rrdcached' -G centreon-broker,centreon -m
chown -R rrdcached: /var/rrdtool
chmod 775 -R /var/rrdtool

Add the users below to the following groups:

usermod -a -G rrdcached centreon-broker
usermod -a -G rrdcached apache
usermod -a -G centreon rrdcached
usermod -a -G centreon-broker rrdcached

Create and edit the file /etc/systemd/system/rrdcached.service:

cat > /etc/systemd/system/rrdcached.service <<EOF
[Unit]
Description=Data caching daemon for rrdtool

[Service]
Type=forking
User=rrdcached
PIDFile=/var/rrdtool/rrdcached/rrdcached.pid
ExecStart=/usr/bin/rrdcached -m 775 -s rrdcached -l unix:/var/rrdtool/rrdcached/rrdcached.sock -p /var/rrdtool/rrdcached/rrdcached.pid -b /var/rrdtool/rrdcached -w 3600 -z 3600 -f 7200

[Install]
WantedBy=default.target
EOF

 

The order of the options is very important. If the -m 755 is placed after the -l unix: /var/rrdtool/rrdcached/rrdcached.sock then the socket will be created with the wrong rights.

 

Options are as follows:

Option

Description

-w

Data are written every x seconds on disk (for instance, 3600s represent 1h)

-z

Should be less than -w option. RRDCacheD uses a range value from [0:-z] to do not write in RRDs at the same time.

-f

Timeout in cache before write data to disk.

 

The option's values ​​must fit to the need/constraints of your platform.

Apply the changes :

systemctl daemon-reload

Enable RRDCacheD to be started on startup:

systemctl enable rrdcached

Restart Apache and start RRDCacheD:

systemctl restart httpd24-httpd
systemctl start rrdcached

 

RRDCacheD configuration in Centreon web interface

 

Go to the Configuration>Pollers>Broker configuration menu and edit the configuration by enabling RRDCacheD and adding /var/rrdtool/rrdcached/rrdcached.sock to the “RRDCacheD listening socket/port” input box:

Centreon web interface

The Socket path must be /var/rrdtool/rrdcached/rrdcached.sock to be consistent with the configuration mentioned in the configuration file.

Go to the Configuration>Pollers menu. Export your configuration to restart centreon-broker:

systemctl restart cbd

 

Is it working?

 

Check that the pid, socket, and rights are in /var/rrdtool/rrdcached/:

[root@central ~]$ ll /var/rrdtool/rrdcached/
total 4
-rw-r--r-- 1 rrdcached rrdcached 4 Feb 21 10:24 rrdcached.pid
srwxrwxr-x 1 rrdcached rrdcached 0 Feb 21 10:24 rrdcached.sock

 

The implementation of RRDCacheD implies that charts are not updated in real time, so it is possible to see a small blank on some charts. This means that the data is still in the daemon’s cache: this is a normal behavior. 

 

If the daemon crashes for any reason, data will be lost and there will be no way to get them back except by rebuilding the graphs with Centreon-Broker.

 

In case of any issue, errors will be logged in the Broker RRD logs file in /var/log/centreon-broker/:

# RRDCached is stopped 

error: RRD: error while getting response from rrdcached: QLocalSocket: Remote closed

# Error in the socket name/configuration (check the information in /etc/sysconfig/rrdcached or /etc/systemd/system/rrdcached.service)

error: RRD: could not connect to local socket '/var/rrdtool/rrdcached/rrdcached.sock: QLocalSocket::connectToServer: Invalid name

# Error in the acces of the socket (problably a problem of rights, groups, mode...)

error: RRD: could not connect to local socket '/var/rrdtool/rrdcached/rrdcached.sock: QLocalSocket::connectToServer: Socket access error

If the RRDCacheD process crashes, all the data from the cache would be lost!

 

Migration

 

After migrating a platform on which rrdcache was operational, there are a couple of things to check to confirm good operation of rrdcache and avoid issues in the future.

  1. Check the umask of the centreon-broker user on the new platform :
    1. ​su -c 'umask' -l centreon-broker

      If the result is 0002, you can skip the rest of the steps.

  2. Make sure that the rights of the folders listed below are 755 :
    1. /var/lib/centreon/metrics
    2. /var/lib/centreon/status
  3. Make sure that the .rrd files inside the previously mentioned folders are 664.

7 replies

Userlevel 4
Badge +13

https://oss.oetiker.ch/rrdtool/doc/rrdcached.en.html#HOW_IT_WORKS

Badge +1

Hello,

 

I am looking to use rrdcached in my ha installation.

I am wondering how to configure it inside the pacemaker cluster.

Has anyone done it before ?

 

Thanks for the help

Userlevel 2
Badge +4

Hi,

You can try by adding the rrdcache daemon to pacemaker as a resource and also add it to the centreon group like below :

pcs resource create rrdcached \
systemd:rrdcached \
meta multiple-active="stop_start" target-role="started" \
op start interval="0s" timeout="90s" stop interval="0s" timeout="90s" \
monitor interval="5s" timeout="30s" \
--group centreon

 

Badge +1

Hello,

Thanks for the answer

Since cbd-rrd is a clone resource, it is running on both server.

As a simple resource rrdcached would only be running on the master. Should it not run as a clone too ?

Like that :

pcs resource create "rrdcache" systemd:rrdcached meta target-role="started" op start interval="0s" timeout="30s" stop interval="0s" timeout="30s" monitor interval="5s" timeout="30s" clone

I tried that on my test environment but I have difficulties knowing if it is actualy working since there are very little rrd on this environment.

Userlevel 2
Badge +4

Hi,

 Indeed, it should be a cloned resource, I forgot this detail about cbd-rrd :

pcs resource create "rrdcache" \
systemd:rrdcached \
meta target-role="started" \
op start interval="0s" timeout="90s" \
stop interval="0s" timeout="90s" \
monitor interval="20s" timeout="30s" \
clone

To check if it is working, you can look for the pid, socket, and rights in /var/rrdtool/rrdcached/:

[root@central ~]$ ll /var/rrdtool/rrdcached/
total 4
-rw-r--r-- 1 rrdcached rrdcached 4 Feb 21 10:24 rrdcached.pid
srwxrwxr-x 1 rrdcached rrdcached 0 Feb 21 10:24 rrdcached.sock

 

Badge +1

Hi,

The rights seems right. The service is running but how can I be sure it is actually doing something ?

 

[root@central ~]# ll /var/rrdtool/rrdcached/
total 4
-rw-r--r-- 1 rrdcached rrdcached 6 Aug 21 14:51 rrdcached.pid
srwxrwxr-x 1 rrdcached rrdcached 0 Aug 21 14:51 rrdcached.sock

I tried setting logs as debug in

/etc/systemd/system/rrdcached.service

and restart service but there is still nothing in the cbd logs.

/var/log/centreon-broker/central-broker-master.log and /var/log/centreon-broker/central-rrd-master.log

Userlevel 2
Badge +4

Hello,

Only the errors will be in the cbd log files, like for exemple :

[2022-11-18T12:21:31.361+01:00] [core] [error] failover: global error: RRD: rrdcached query failed on file '/var/lib/centreon/metrics/2692073.rrd' (UPDATE /var/lib/centreon/metrics/2692073.rrd 1668639863:0.000000
): -1 RRD Error: '/var/lib/centreon/metrics/2692073.rrd' is too small (should be 150568 bytes)

To see the debug logs you can find them in the system journal logs through the command or log file below :

  • journalctl -u rrdcached
  • /var/log/messages

You will find similar logs as below :

Aug 23 11:27:37 ykacher-central systemd[1]: Starting Data caching daemon for rrdtool...
Aug 23 11:27:37 ykacher-central rrdcached[16895]: starting up
Aug 23 11:27:37 ykacher-central rrdcached[16895]: listening for connections
Aug 23 11:27:37 ykacher-central systemd[1]: Started Data caching daemon for rrdtool.
Aug 23 11:27:52 ykacher-central systemd[1]: Stopping Data caching daemon for rrdtool...
Aug 23 11:27:52 ykacher-central rrdcached[16895]: caught SIGTERM
Aug 23 11:27:52 ykacher-central rrdcached[16895]: signal_receiver: Signal 18 was received from process 1.
Aug 23 11:27:53 ykacher-central rrdcached[16895]: starting shutdown
Aug 23 11:27:53 ykacher-central rrdcached[16895]: clean shutdown; all RRDs flushed
Aug 23 11:27:53 ykacher-central rrdcached[16895]: goodbye
Aug 23 11:27:53 ykacher-central systemd[1]: rrdcached.service: Succeeded.
Aug 23 11:27:53 ykacher-central systemd[1]: Stopped Data caching daemon for rrdtool.

To have those messages in a specific log file, you can use the option below :

-o log_file

Log to the given file instead of syslog.

 

Reply