Skip to main content

How to limit disk I/O usage with RRDCacheD ?

  • 21 February 2022
  • 7 replies
  • 1981 views

SupportGuy
Forum|alt.badge.img+4


In this article, we will see how to reduce disk I/O usage upon updating performance and status charts (.rrd files). To do that, RRDCacheD is mutualizing data writing instead of doing so one at a time.

 


​​​​​​​Installation

RRDCacheD is automatically installed with the RRDtool RPM. If it is not the case, you can install it using the following command:

yum install rrdtool-cached


Configuration

Create the RRDCacheD user and create its directory:

mkdir -p /var/rrdtool/rrdcached/
useradd rrdcached -d '/var/rrdtool/rrdcached' -G centreon-broker,centreon -m
chown -R rrdcached: /var/rrdtool
chmod 775 -R /var/rrdtool

Add the users below to the following groups:

usermod -a -G rrdcached centreon-broker
usermod -a -G rrdcached apache
usermod -a -G centreon rrdcached
usermod -a -G centreon-broker rrdcached

Create and edit the file /etc/systemd/system/rrdcached.service:

cat > /etc/systemd/system/rrdcached.service <<EOF
[Unit]
Description=Data caching daemon for rrdtool

[Service]
Type=forking
User=rrdcached
PIDFile=/var/rrdtool/rrdcached/rrdcached.pid
ExecStart=/usr/bin/rrdcached -m 775 -s rrdcached -l unix:/var/rrdtool/rrdcached/rrdcached.sock -p /var/rrdtool/rrdcached/rrdcached.pid -b /var/rrdtool/rrdcached -w 3600 -z 3600 -f 7200

[Install]
WantedBy=default.target
EOF

 

The order of the options is very important. If the -m 755 is placed after the -l unix: /var/rrdtool/rrdcached/rrdcached.sock then the socket will be created with the wrong rights.

 

Options are as follows:

Option

Description

-w

Data are written every x seconds on disk (for instance, 3600s represent 1h)

-z

Should be less than -w option. RRDCacheD uses a range value from [0:-z] to do not write in RRDs at the same time.

-f

Timeout in cache before write data to disk.

 

The option's values ​​must fit to the need/constraints of your platform.

Apply the changes :

systemctl daemon-reload

Enable RRDCacheD to be started on startup:

systemctl enable rrdcached

Restart Apache and start RRDCacheD:

systemctl restart httpd24-httpd
systemctl start rrdcached

 

RRDCacheD configuration in Centreon web interface

 

Go to the Configuration>Pollers>Broker configuration menu and edit the configuration by enabling RRDCacheD and adding /var/rrdtool/rrdcached/rrdcached.sock to the “RRDCacheD listening socket/port” input box:

Centreon web interface

The Socket path must be /var/rrdtool/rrdcached/rrdcached.sock to be consistent with the configuration mentioned in the configuration file.

Go to the Configuration>Pollers menu. Export your configuration to restart centreon-broker:

systemctl restart cbd

 

Is it working?

 

Check that the pid, socket, and rights are in /var/rrdtool/rrdcached/:

[root@central ~]$ ll /var/rrdtool/rrdcached/
total 4
-rw-r--r-- 1 rrdcached rrdcached 4 Feb 21 10:24 rrdcached.pid
srwxrwxr-x 1 rrdcached rrdcached 0 Feb 21 10:24 rrdcached.sock

 

The implementation of RRDCacheD implies that charts are not updated in real time, so it is possible to see a small blank on some charts. This means that the data is still in the daemon’s cache: this is a normal behavior. 

 

If the daemon crashes for any reason, data will be lost and there will be no way to get them back except by rebuilding the graphs with Centreon-Broker.

 

In case of any issue, errors will be logged in the Broker RRD logs file in /var/log/centreon-broker/:

# RRDCached is stopped 

error: RRD: error while getting response from rrdcached: QLocalSocket: Remote closed

# Error in the socket name/configuration (check the information in /etc/sysconfig/rrdcached or  /etc/systemd/system/rrdcached.service) 

error: RRD: could not connect to local socket '/var/rrdtool/rrdcached/rrdcached.sock: QLocalSocket::connectToServer: Invalid name

# Error in the acces of the socket (problably a problem of rights, groups, mode...)

error: RRD: could not connect to local socket '/var/rrdtool/rrdcached/rrdcached.sock: QLocalSocket::connectToServer: Socket access error

If the RRDCacheD process crashes, all the data from the cache would be lost!

 

Migration

 

After migrating a platform on which rrdcache was operational, there are a couple of things to check to confirm good operation of rrdcache and avoid issues in the future.

  1. Check the umask of the centreon-broker user on the new platform :
    1. ​su -c 'umask' -l centreon-broker

      If the result is 0002, you can skip the rest of the steps.

  2. Make sure that the rights of the folders listed below are 755 :
    1. /var/lib/centreon/metrics
    2. /var/lib/centreon/status
  3. Make sure that the .rrd files inside the previously mentioned folders are 664.

ponchoh
Centreonian
Forum|alt.badge.img+13
  • Centreonian
  • August 3, 2023

Forum|alt.badge.img+1
  • Steward *
  • August 21, 2023

Hello,

 

I am looking to use rrdcached in my ha installation.

I am wondering how to configure it inside the pacemaker cluster.

Has anyone done it before ?

 

Thanks for the help


SupportGuy
Forum|alt.badge.img+4

Hi,

You can try by adding the rrdcache daemon to pacemaker as a resource and also add it to the centreon group like below :

pcs resource create rrdcached \
    systemd:rrdcached \
    meta multiple-active="stop_start" target-role="started" \
    op start interval="0s" timeout="90s" stop interval="0s" timeout="90s" \
    monitor interval="5s" timeout="30s" \
    --group centreon

 


Forum|alt.badge.img+1
  • Steward *
  • August 22, 2023

Hello,

Thanks for the answer

Since cbd-rrd is a clone resource, it is running on both server.

As a simple resource rrdcached would only be running on the master. Should it not run as a clone too ?

Like that :

pcs resource create "rrdcache" systemd:rrdcached meta target-role="started" op start interval="0s" timeout="30s" stop interval="0s" timeout="30s" monitor interval="5s" timeout="30s" clone

I tried that on my test environment but I have difficulties knowing if it is actualy working since there are very little rrd on this environment.


SupportGuy
Forum|alt.badge.img+4

Hi,

 Indeed, it should be a cloned resource, I forgot this detail about cbd-rrd :

pcs resource create "rrdcache" \
    systemd:rrdcached \
    meta target-role="started" \
    op start interval="0s" timeout="90s" \
    stop interval="0s" timeout="90s" \
    monitor interval="20s" timeout="30s" \
    clone

To check if it is working, you can look for the pid, socket, and rights in /var/rrdtool/rrdcached/:

[root@central ~]$ ll /var/rrdtool/rrdcached/
total 4
-rw-r--r-- 1 rrdcached rrdcached 4 Feb 21 10:24 rrdcached.pid
srwxrwxr-x 1 rrdcached rrdcached 0 Feb 21 10:24 rrdcached.sock

 


Forum|alt.badge.img+1
  • Steward *
  • August 23, 2023

Hi,

The rights seems right. The service is running but how can I be sure it is actually doing something ?

 

[root@central ~]# ll /var/rrdtool/rrdcached/
total 4
-rw-r--r-- 1 rrdcached rrdcached 6 Aug 21 14:51 rrdcached.pid
srwxrwxr-x 1 rrdcached rrdcached 0 Aug 21 14:51 rrdcached.sock

I tried setting logs as debug in

/etc/systemd/system/rrdcached.service

and restart service but there is still nothing in the cbd logs.

/var/log/centreon-broker/central-broker-master.log and /var/log/centreon-broker/central-rrd-master.log


SupportGuy
Forum|alt.badge.img+4

Hello,

Only the errors will be in the cbd log files, like for exemple :

[2022-11-18T12:21:31.361+01:00] [core] [error] failover: global error: RRD: rrdcached query failed on file '/var/lib/centreon/metrics/2692073.rrd' (UPDATE /var/lib/centreon/metrics/2692073.rrd 1668639863:0.000000
): -1 RRD Error: '/var/lib/centreon/metrics/2692073.rrd' is too small (should be 150568 bytes)

To see the debug logs you can find them in the system journal logs through the command or log file below :

  • journalctl -u rrdcached
  • /var/log/messages

You will find similar logs as below :

Aug 23 11:27:37 ykacher-central systemd[1]: Starting Data caching daemon for rrdtool...
Aug 23 11:27:37 ykacher-central rrdcached[16895]: starting up
Aug 23 11:27:37 ykacher-central rrdcached[16895]: listening for connections
Aug 23 11:27:37 ykacher-central systemd[1]: Started Data caching daemon for rrdtool.
Aug 23 11:27:52 ykacher-central systemd[1]: Stopping Data caching daemon for rrdtool...
Aug 23 11:27:52 ykacher-central rrdcached[16895]: caught SIGTERM
Aug 23 11:27:52 ykacher-central rrdcached[16895]: signal_receiver: Signal 18 was received from process 1.
Aug 23 11:27:53 ykacher-central rrdcached[16895]: starting shutdown
Aug 23 11:27:53 ykacher-central rrdcached[16895]: clean shutdown; all RRDs flushed
Aug 23 11:27:53 ykacher-central rrdcached[16895]: goodbye
Aug 23 11:27:53 ykacher-central systemd[1]: rrdcached.service: Succeeded.
Aug 23 11:27:53 ykacher-central systemd[1]: Stopped Data caching daemon for rrdtool.

To have those messages in a specific log file, you can use the option below :

-o log_file

Log to the given file instead of syslog.

 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings