We have a wsman authentication via kerberos on our pollers, cron is running as expected but services still not renewing after kerberos ticket expiration. our other pollers is working but some pollers are not. the pollers krb5.conf is the same. checked the logs of cron its running the kinit -R command but wsman service are not automatically renewing.
and its working when we manually renew kerberos ticket. Our setup is in WSMAN domain config.
Thank you in advance for your inputs/suggessions.
Page 1 / 1
Does that poller have a different OS than the others (e.g. Alma 9 or Debian)?
Hi @rchauvel
Does that poller have a different OS than the others (e.g. Alma 9 or Debian)?
They have the same OS - Operating System: CentOS Linux 7 (Core). since can’t find any logs for kerberos. and btw, Our current setup was working before and for some unknown reason its not automatically renewing via cron.
Hi @kimpoy0730
May you share your cron in case there’s a typo?
Do you have any error message when doing the kinit -R manually using centreon-engine user?
su - centreon-engine kinit -R
Hi @tpo76
We don’t have issue when renewing manually. below is our cron.
This config is working in the last few months. not sure why, its not working now.
Thanks.
Thank you for that information.
May you make sure your “/var/lib/centreon-engine/wsman_renew.sh” script is still working?
The “kinit -R” command can’t work if the ticket lifetime is not reset, and as Centreon did not provide the script you’re using, it can be a point of failure.
Hi @tpo76 ,
Yes we know its working coz, that script is the one we run to have ticket lifetime. that where the keytabs location.
Not sure what can fail if the manual execution is working, from my point of vue it means something is not working from a cron “perspective”.
Maybe what we can do is create a script to execute “kinit -R” command and logs whats happening.
Here an exemple:
Create a "/var/lib/centreon-engine/cron_kinit.sh” file.
Copy/Paste
#!/bin/bash
# Function to log messages with timestamps log_message() { echo "$(date +"%Y-%m-%d %H:%M:%S") - $1" }
# Run kinit -R command and capture output kinit_output=$(kinit -R 2>&1)
# Check the exit status if u $? -eq 0 ]; then log_message "Command run successfully" else log_message "Error: kinit command encountered an issue" log_message "Error message: $kinit_output" fi
which is working but we’re getting the “Disk quota exceeded when initializing cache” at first attempt then successfully renew the kerberos at 2nd attempt. but this renew the Ticket lifetime - which is not expected for kerberos renewal.
for the 08:00 time logs - configured the kinit command as below.
kinit -R
which is not working, change the kinit -R command to below
kinit -R -k -t /var/lib/centreon-engine/USER@DOMAIN.keytab, will check again after 8hours if this works.
2023-09-11 00:00:01 - Error: kinit command encountered an issue 2023-09-11 00:00:01 - Error message: kinit: Disk quota exceeded when initializing cache 2023-09-11 00:00:01 - Retrying renewal (Attempt 1)... 2023-09-11 00:00:03 - Command run successfully
2023-09-11 08:00:01 - Error: kinit command encountered an issue 2023-09-11 08:00:01 - Error message: kinit: Disk quota exceeded when initializing cache 2023-09-11 08:00:01 - Retrying renewal (Attempt 1)... kinit: Can't find client principal USER@DOMAIN in cache collection while renewing credentials 2023-09-11 08:00:03 - Error: kinit command encountered an issue 2023-09-11 08:00:03 - Error message: kinit: Can't find client principal centreon-engine@LOGISTICS.CORP in cache collection while renewing credentials 2023-09-11 08:00:03 - Retrying renewal (Attempt 2)... kinit: Can't find client principal USER@DOMAIN in cache collection while renewing credentials 2023-09-11 08:00:05 - Error: kinit command encountered an issue 2023-09-11 08:00:05 - Error message: kinit: Can't find client principal USER@DOMAIN in cache collection while renewing credentials 2023-09-11 08:00:05 - Max renewal attempts reached. Exiting.
Hi @kimpoy0730 ,
The error message you provided:
kinit: Can't find client principal centreon-engine@LOGISTICS.CORP in cache collection while renewing credentials
Indicates that kinit is unable to find the specified Kerberos principal (centreon-engine@LOGISTICS.CORP) in the Kerberos credential cache while attempting to renew credentials. This error typically occurs when the client's credential cache does not contain the required principal for renewal.
Here are some possible reasons for this error and steps to resolve it:
Kerberos Principal Mismatch: Double-check that the principal name centreon-engine@LOGISTICS.CORP is correct. Make sure it matches the exact principal for which you want to renew the credentials. Check for any typos or case sensitivity issues.
Expired Credentials: If the credentials associated with the centreon-engine@LOGISTICS.CORP principal have already expired, you won't be able to renew them. You can only renew credentials within a certain timeframe before they expire. You may need to obtain a new initial ticket with kinit instead of renewing.
Incorrect Cache: Verify that you are using the correct cache collection. The Kerberos credential cache can be stored in different locations or with different names, depending on your configuration. Ensure that kinit is using the correct cache.
Cache Corruption: If the cache itself is corrupted, it can cause this error. You can try deleting the existing cache and then using kinit to obtain a new ticket.
For example:
kdestroy # Destroy the existing cache kinit # Obtain a new ticket
Cache Permissions: Check the permissions on the Kerberos credential cache files. The cache files should be accessible to the user running the kinit command.
Kerberos Configuration: Verify that your Kerberos configuration (/etc/krb5.conf or equivalent) is correctly configured with the correct realm (LOGISTICS.CORP) and Kerberos server settings.
Network Issues: Ensure that your client machine can communicate with the Kerberos Key Distribution Center (KDC). Network issues can prevent the renewal process.
Principal Not Cached: If the principal centreon-engine@LOGISTICS.CORP was never cached on the client machine, you cannot renew it. You would need to initially obtain the ticket using kinit.
Check Ticket Lifetime: Confirm the ticket's lifetime and renewal policy in your Kerberos realm. If renewals are not allowed beyond a certain time limit, you might need to obtain a new ticket.
Debugging: You can use the -v or -d option with kinit for more verbose output to help diagnose the issue:
kinit -v
Also the quota issue at first attempt looks wierd to me.
This can be the cause of a corrupted cache.
Is your Centreon environment on a specific hosting service (AWS, Azure, other.) or is this host on a regular VMWare environment ?
Is there anything displayed while running quota -v with the centreon-engine user?
Unfortunatly this is not Centreon related and my knowledge around Kerberos are limited, I have no other idea of what you can check to make it work again.
Thank you for your inputs @tpo76.
Our poller is in normal VM environment and qouta -v command does not display anything on centreon-engine user. We’ll further investigate this and share resolution here incase someone has the same issue.
Hi @kimpoy0730
You are welcome!
That is indeed really strange for the quota and overall situation.
Please let us know! That is an interesting case.
Hi All,
I think we found the root cause of the issue. this has something to do with linux kernel keyrings cache
The fields shown in each line are as follows: uid The user ID. usage This is a kernel-internal usage count for the kernel structure used to record key users. nkeys/nikeys The total number of keys owned by the user, and the number of those keys that have been instantiated. qnkeys/maxkeys The number of keys owned by the user, and the maximum number of keys that the user may own. qnbytes/maxbytes The number of bytes consumed in payloads of the keys owned by this user, and the upper limit on the number of bytes in key payloads for that user.
our uid(993) reached the max number of owned keys which has a limit of 200. that’s why we are getting below error when running kinit -R command.
kinit: Disk quota exceeded when initializing cache
Hope this helps.
Thanks.
Hi @kimpoy0730
I think you found it !
After runing futher test on my lab I can see that a new key is generate in cache each time I connect to a new server.
Wich mean that kernel limit how many server can be monitored (default 200) and make fail ticket renewal as well, at least if I'm getting that right.
I'll try to find how to extend this value.
EDIT: adding the following value in “/etc/sysctl.conf” seems to work
kernel.keys.maxkeys = 1000
But on my end exceeding this limit don’t seems to have any impact.