Skip to main content

Hello everyone.

We are currently facing an issue with the Azure Virtual machine plugin.

 

We monitor 13 VM, and every time, at least 3 of them appears (randomly and it can be 4-5-6)  as DOWN in our GUI.

 

While the rest of the service associated to the VM works like a charm.

 

 

 

When launching the following command from the poller with the --debug argument I get the following :

== Info: TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
== Info: Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
=> Recv header: HTTP/2 200
=> Recv header: cache-control: no-cache
=> Recv header: pragma: no-cache
=> Recv header: content-length: 663
=> Recv header: content-type: application/json; charset=utf-8
=> Recv header: expires: -1
=> Recv header: strict-transport-security: max-age=31536000; includeSubDomains
=> Recv header: x-content-type-options: nosniff
=> Recv header: x-ms-ratelimit-remaining-subscription-resource-requests: 97
=> Recv header: x-ms-request-id: 142788ce-1818-4a30-9503-2bbc3e53260b
=> Recv header: x-ms-correlation-request-id: 142788ce-1818-4a30-9503-2bbc3e53260b
=> Recv header: x-ms-routing-request-id: FRANCESOUTH:20240909T081955Z:142788ce-1818-4a30-9503-2bbc3e53260b
=> Recv header: x-cache: CONFIG_NOCACHE
=> Recv header: x-msedge-ref: Ref A: 24ADA54643B541E9A28D56FB485E1783 Ref B: MRS211050313023 Ref C: 2024-09-09T08:19:54Z
=> Recv header: date: Mon, 09 Sep 2024 08:19:55 GMT
=> Recv header:
=> Recv data: {"id":"/subscriptions/86156cfa-6f0c-4ce4-8950-3510c75d129f/resourcegroups/rg-gpsweb-nonprod-006/providers/microsoft.compute/virtualmachines/vm-gpsweb-nonprod-006-01/providers/Microsoft.ResourceHealth/availabilityStatuses/current","name":"current","type":"Microsoft.ResourceHealth/AvailabilityStatuses","location":"francecentral","properties":{"availabilityState":"Unknown","title":"Unknown","summary":"We are currently unable to determine the health of this virtual machine.","reasonType":"","category":"Not Applicable","context":"Not Applicable","occuredTime":"2024-09-09T07:56:15Z","reasonChronicity":"Persistent","reportedTime":"2024-09-09T08:19:55.3097151Z"}}

 

How can I resolve this ?

Thanks,

Regards,

Hello,

Any ideas ?

Regards,


Hello,


@Fabrix Can you help us ?

Regards

 


Hi @BenjaminL,

Here are my suggestions:

  • Look for information about this status, that comes from Azure and not from our plugin (cf the debug output, the message is returned by the API).
  • If those “unknown” phases don’t last long and have no actual impact, you may increase the “Max Check Attempts” parameter on the host template to avoid having hard alerts when it doesn’t last.
  • If you don’t want to be bothered by this status any more, then you may add --ok-status=’'%{status} =~ /^(Available|Unknown)$/’ to the host template’s EXTRAOPTIONS macro but be warned that it may hide some serious incidents.

Hello,

 

Thanks a lot for your answer, I’ll take a closer look.

 

Regards,


Hi @BenjaminL,

Here are my suggestions:

  • Look for information about this status, that comes from Azure and not from our plugin (cf the debug output, the message is returned by the API).
  • If those “unknown” phases don’t last long and have no actual impact, you may increase the “Max Check Attempts” parameter on the host template to avoid having hard alerts when it doesn’t last.
  • If you don’t want to be bothered by this status any more, then you may add --ok-status=’'%{status} =~ /^(Available|Unknown)$/’ to the host template’s EXTRAOPTIONS macro but be warned that it may hide some serious incidents.

That’s a workaround. Where does it Centreon get the status from ? az cli shows the status as powered on and ok but we are facing the same dilemma since last week and it’s random. It keeps flapping


Oh, you are facing this too, I was thinking we were alone.


That’s a workaround. Where does it Centreon get the status from ? az cli shows the status as powered on and ok but we are facing the same dilemma since last week and it’s random. It keeps flapping

 

Hi, the status comes from the API response, here it is (extracted from the post that initiated this thread:

{
"id": "/subscriptions/XXXXXXXXXXXXXXXXXXXXXXXXXX/resourcegroups/rg-gpsweb-nonprod-006/providers/microsoft.compute/virtualmachines/vm-gpsweb-nonprod-006-01/providers/Microsoft.ResourceHealth/availabilityStatuses/current",
"name": "current",
"type": "Microsoft.ResourceHealth/AvailabilityStatuses",
"location": "francecentral",
"properties": {
"availabilityState": "Unknown",
"title": "Unknown",
"summary": "We are currently unable to determine the health of this virtual machine.",
"reasonType": "",
"category": "Not Applicable",
"context": "Not Applicable",
"occuredTime": "2024-09-09T07:56:15Z",
"reasonChronicity": "Persistent",
"reportedTime": "2024-09-09T08:19:55.3097151Z"
}
}

 


Right so I’ll report back to Microsoft because they state all is ok from their side


Thanks. Using invoke-azrestmethod, it confirms what you were saying anyway

 


Reply