Skip to main content

Hi All,

We have a wsman authentication via kerberos on our pollers, cron is running as expected but services still not renewing after kerberos ticket expiration. our other pollers is working but some pollers are not. the pollers krb5.conf is the same. checked the logs of cron its running the kinit -R command but wsman service are not automatically renewing.

and its working when we manually renew kerberos ticket. Our setup is in WSMAN domain config.

 

Thank you in advance for your inputs/suggessions.

 

 

Does that poller have a different OS than the others (e.g. Alma 9 or Debian)?


Hi @rchauvel  

Does that poller have a different OS than the others (e.g. Alma 9 or Debian)?

They have the same OS - Operating System: CentOS Linux 7 (Core). since can’t find any logs for kerberos. and btw, Our current setup was working before and for some unknown reason its not automatically renewing via cron.
       
            


Hi @kimpoy0730 

 

May you share your cron in case there’s a typo?

Do you have any error message when doing the kinit -R manually using centreon-engine user?

su - centreon-engine
kinit -R

 


Hi @tpo76 

We don’t have issue when renewing manually. below is our cron.

 

This config is working in the last few months. not sure why, its not working now.

 

Thanks.


Thank you for that information.

 

May you make sure your “/var/lib/centreon-engine/wsman_renew.sh” script is still working?

The “kinit -R”  command can’t work if the ticket lifetime is not reset, and as Centreon did not provide the script you’re using, it can be a point of failure.


Hi @tpo76 ,

Yes we know its working coz, that script is the one we run to have ticket lifetime. that where the keytabs location.

#!/bin/bash
kinit -k -t /var/lib/centreon-engine/username@domain.keytab username@domain


Okay,

Not sure what can fail if the manual execution is working, from my point of vue it means something is not working from a cron “perspective”.

 

Maybe what we can do is create a script to execute “kinit -R” command and logs whats happening.

Here an exemple:

Create a "/var/lib/centreon-engine/cron_kinit.sh” file.

Copy/Paste

#!/bin/bash

# Function to log messages with timestamps
log_message() {
echo "$(date +"%Y-%m-%d %H:%M:%S") - $1"
}

# Run kinit -R command and capture output
kinit_output=$(kinit -R 2>&1)

# Check the exit status
if u $? -eq 0 ]; then
log_message "Command run successfully"
else
log_message "Error: kinit command encountered an issue"
log_message "Error message: $kinit_output"
fi

 

Change your cron with this configuration:

0 */9 * * *  centreon-engine /var/lib/centreon-engine/cron_kinit.sh >> /var/log/cron_kinit.log 2>&1

 

While cron will run you may have output to know if things went well and if not you should have more detail.

Exemple of ouput:

tail /var/log/cron_kinit.log
2023-08-31 16:59:01 - Error: kinit command encountered an issue
2023-08-31 16:59:01 - Error message: kinit: Ticket expired while renewing credentials
2023-08-31 17:07:50 - Command run successfully

 


Hi @tpo76 ,

I’ll try this out on one of our pollers, and i’ll update here.

Thanks again for the suggestion.


Hi @tpo76 - got below logs, btw we set the script to run 3 attempts to get more info for errors.

 

for the 00:00 time logs - we configured the kinit command as below .

kinit -k -t /var/lib/centreon-engine/username@domain.keytab username@domain

which is working but we’re getting the “Disk quota exceeded when initializing cache” at first attempt then successfully renew the kerberos at 2nd attempt. but this renew the Ticket lifetime - which is not expected for kerberos renewal.

 

for the 08:00 time logs - configured the kinit command as below.

kinit -R

which is not working, change the kinit -R command to below

kinit -R -k -t /var/lib/centreon-engine/USER@DOMAIN.keytab, will check again after 8hours if this works. 

--------------------------------------------------------------------------------------------

2023-09-11 00:00:01 - Error: kinit command encountered an issue
2023-09-11 00:00:01 - Error message: kinit: Disk quota exceeded when initializing cache
2023-09-11 00:00:01 - Retrying renewal (Attempt 1)...
2023-09-11 00:00:03 - Command run successfully


2023-09-11 08:00:01 - Error: kinit command encountered an issue
2023-09-11 08:00:01 - Error message: kinit: Disk quota exceeded when initializing cache
2023-09-11 08:00:01 - Retrying renewal (Attempt 1)...
kinit: Can't find client principal USER@DOMAIN in cache collection while renewing credentials
2023-09-11 08:00:03 - Error: kinit command encountered an issue
2023-09-11 08:00:03 - Error message: kinit: Can't find client principal centreon-engine@LOGISTICS.CORP in cache collection while renewing credentials
2023-09-11 08:00:03 - Retrying renewal (Attempt 2)...
kinit: Can't find client principal USER@DOMAIN in cache collection while renewing credentials
2023-09-11 08:00:05 - Error: kinit command encountered an issue
2023-09-11 08:00:05 - Error message: kinit: Can't find client principal USER@DOMAIN in cache collection while renewing credentials
2023-09-11 08:00:05 - Max renewal attempts reached. Exiting.


 

 


Hi @kimpoy0730 ,

 

The error message you provided:

kinit: Can't find client principal centreon-engine@LOGISTICS.CORP in cache collection while renewing credentials 

Indicates that kinit is unable to find the specified Kerberos principal (centreon-engine@LOGISTICS.CORP) in the Kerberos credential cache while attempting to renew credentials. This error typically occurs when the client's credential cache does not contain the required principal for renewal.

Here are some possible reasons for this error and steps to resolve it:

Kerberos Principal Mismatch: Double-check that the principal name centreon-engine@LOGISTICS.CORP is correct. Make sure it matches the exact principal for which you want to renew the credentials. Check for any typos or case sensitivity issues.

Expired Credentials: If the credentials associated with the centreon-engine@LOGISTICS.CORP principal have already expired, you won't be able to renew them. You can only renew credentials within a certain timeframe before they expire. You may need to obtain a new initial ticket with kinit instead of renewing.

Incorrect Cache: Verify that you are using the correct cache collection. The Kerberos credential cache can be stored in different locations or with different names, depending on your configuration. Ensure that kinit is using the correct cache.

Cache Corruption: If the cache itself is corrupted, it can cause this error. You can try deleting the existing cache and then using kinit to obtain a new ticket.

For example:

kdestroy # Destroy the existing cache
kinit # Obtain a new ticket

Cache Permissions: Check the permissions on the Kerberos credential cache files. The cache files should be accessible to the user running the kinit command.

Kerberos Configuration: Verify that your Kerberos configuration (/etc/krb5.conf or equivalent) is correctly configured with the correct realm (LOGISTICS.CORP) and Kerberos server settings.

Network Issues: Ensure that your client machine can communicate with the Kerberos Key Distribution Center (KDC). Network issues can prevent the renewal process.

Principal Not Cached: If the principal centreon-engine@LOGISTICS.CORP was never cached on the client machine, you cannot renew it. You would need to initially obtain the ticket using kinit.

Check Ticket Lifetime: Confirm the ticket's lifetime and renewal policy in your Kerberos realm. If renewals are not allowed beyond a certain time limit, you might need to obtain a new ticket.

Debugging: You can use the -v or -d option with kinit for more verbose output to help diagnose the issue:

kinit -v

 

Also the quota issue at first attempt looks wierd to me.

This can be the cause of a corrupted cache.

Is your Centreon environment on a specific hosting service (AWS, Azure, other.) or is this host on a regular VMWare environment ?

Is there anything displayed while running quota -v with the centreon-engine user?

 

Unfortunatly this is not Centreon related and my knowledge around Kerberos are limited, I have no other idea of what you can check to make it work again.


Thank you for your inputs @tpo76.

Our poller is in normal VM environment and qouta -v command does not display anything on centreon-engine user. We’ll further investigate this  and share resolution here incase someone has the same issue.  

 


Hi @kimpoy0730 

You are welcome!

That is indeed really strange for the quota and overall situation.

Please let us know! That is an interesting case.

 


Hi All,

I think we found the root cause of the issue. this has something to do with linux kernel keyrings cache

https://man7.org/linux/man-pages/man7/keyrings.7.html#NAME

1 of our poller does not have an error when running kinit command.

kinit logs

2023-09-12 16:00:03 - Command run successfully
2023-09-13 00:00:02 - Command run successfully
2023-09-13 08:00:01 - Command run successfully
2023-09-13 16:00:02 - Command run successfully
2023-09-14 00:00:02 - Command run successfully

 

klist output

Ticket cache: KEYRING:persistent:993:993
Default principal: USERNAME@REALM


>root@serverpoller~]# cat /proc/key-users

    0:    21 20/20 9/1000000 100/25000000
  991:     9 9/9 8/200 229/20000
  993:   192 192/192 191/200 13010/20000
  994:     2 2/2 2/200 26/20000
899266518:     3 3/3 2/200 58/20000

The fields shown in each line are as follows:                             uid    The user ID.              usage  This is a kernel-internal usage count for the                     kernel structure used to record key users.              nkeys/nikeys                     The total number of keys owned by the user, and the                     number of those keys that have been instantiated.              qnkeys/maxkeys                     The number of keys owned by the user, and the                     maximum number of keys that the user may own.              qnbytes/maxbytes                     The number of bytes consumed in payloads of the                     keys owned by this user, and the upper limit on the                     number of bytes in key payloads for that user.

our uid(993) reached the max number of owned keys which has a limit of 200. that’s why we are getting below error when running kinit -R command.

kinit: Disk quota exceeded when initializing cache

 

Hope this helps.

 

Thanks. 

 

 


Hi @kimpoy0730 

 

I think you found it !

After runing futher test on my lab I can see that a new key is generate in cache each time I connect to a new server.

Wich mean that kernel limit how many server can be monitored (default 200) and make fail ticket renewal as well, at least if I'm getting that right.

 

I'll try to find how to extend this value.

 


EDIT: adding the following value in “/etc/sysctl.conf” seems to work

kernel.keys.maxkeys = 1000

But on my end exceeding this limit don’t seems to have any impact.

Difference is that i'm using KCM key cache

klist 
Ticket cache: KCM:991
Default principal: sa_centreon@CONTOSO.LOCAL

cat /proc/key-users
0: 59 58/58 48/1000000 940/25000000
990: 1 1/1 1/1 9/20000
991: 1 1/1 1/1 9/20000
994: 1 1/1 1/1 9/20000
1000: 2 2/2 2/1 14/20000

 


Reply