Skip to main content
Solved

Monitoring Azure Virtual Machine / State : Unknown

  • September 9, 2024
  • 9 replies
  • 174 views

Forum|alt.badge.img+6

Hello everyone.

We are currently facing an issue with the Azure Virtual machine plugin.

 

We monitor 13 VM, and every time, at least 3 of them appears (randomly and it can be 4-5-6)  as DOWN in our GUI.

 

While the rest of the service associated to the VM works like a charm.

 

 

 

When launching the following command from the poller with the --debug argument I get the following :

== Info: TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
== Info: Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
=> Recv header: HTTP/2 200
=> Recv header: cache-control: no-cache
=> Recv header: pragma: no-cache
=> Recv header: content-length: 663
=> Recv header: content-type: application/json; charset=utf-8
=> Recv header: expires: -1
=> Recv header: strict-transport-security: max-age=31536000; includeSubDomains
=> Recv header: x-content-type-options: nosniff
=> Recv header: x-ms-ratelimit-remaining-subscription-resource-requests: 97
=> Recv header: x-ms-request-id: 142788ce-1818-4a30-9503-2bbc3e53260b
=> Recv header: x-ms-correlation-request-id: 142788ce-1818-4a30-9503-2bbc3e53260b
=> Recv header: x-ms-routing-request-id: FRANCESOUTH:20240909T081955Z:142788ce-1818-4a30-9503-2bbc3e53260b
=> Recv header: x-cache: CONFIG_NOCACHE
=> Recv header: x-msedge-ref: Ref A: 24ADA54643B541E9A28D56FB485E1783 Ref B: MRS211050313023 Ref C: 2024-09-09T08:19:54Z
=> Recv header: date: Mon, 09 Sep 2024 08:19:55 GMT
=> Recv header:
=> Recv data: {"id":"/subscriptions/86156cfa-6f0c-4ce4-8950-3510c75d129f/resourcegroups/rg-gpsweb-nonprod-006/providers/microsoft.compute/virtualmachines/vm-gpsweb-nonprod-006-01/providers/Microsoft.ResourceHealth/availabilityStatuses/current","name":"current","type":"Microsoft.ResourceHealth/AvailabilityStatuses","location":"francecentral","properties":{"availabilityState":"Unknown","title":"Unknown","summary":"We are currently unable to determine the health of this virtual machine.","reasonType":"","category":"Not Applicable","context":"Not Applicable","occuredTime":"2024-09-09T07:56:15Z","reasonChronicity":"Persistent","reportedTime":"2024-09-09T08:19:55.3097151Z"}}

 

How can I resolve this ?

Thanks,

Regards,

Best answer by omercier

Hi @BenjaminL,

Here are my suggestions:

  • Look for information about this status, that comes from Azure and not from our plugin (cf the debug output, the message is returned by the API).
  • If those “unknown” phases don’t last long and have no actual impact, you may increase the “Max Check Attempts” parameter on the host template to avoid having hard alerts when it doesn’t last.
  • If you don’t want to be bothered by this status any more, then you may add --ok-status=’'%{status} =~ /^(Available|Unknown)$/’ to the host template’s EXTRAOPTIONS macro but be warned that it may hide some serious incidents.

9 replies

Forum|alt.badge.img+6
  • Author
  • Steward **
  • September 12, 2024

Hello,

Any ideas ?

Regards,


Forum|alt.badge.img+6
  • Author
  • Steward **
  • September 16, 2024

Hello,


@Fabrix Can you help us ?

Regards

 


omercier
Centreonian
Forum|alt.badge.img+13
  • Centreonian
  • Answer
  • September 17, 2024

Hi @BenjaminL,

Here are my suggestions:

  • Look for information about this status, that comes from Azure and not from our plugin (cf the debug output, the message is returned by the API).
  • If those “unknown” phases don’t last long and have no actual impact, you may increase the “Max Check Attempts” parameter on the host template to avoid having hard alerts when it doesn’t last.
  • If you don’t want to be bothered by this status any more, then you may add --ok-status=’'%{status} =~ /^(Available|Unknown)$/’ to the host template’s EXTRAOPTIONS macro but be warned that it may hide some serious incidents.

Forum|alt.badge.img+6
  • Author
  • Steward **
  • September 17, 2024

Hello,

 

Thanks a lot for your answer, I’ll take a closer look.

 

Regards,


Forum|alt.badge.img+5

Hi @BenjaminL,

Here are my suggestions:

  • Look for information about this status, that comes from Azure and not from our plugin (cf the debug output, the message is returned by the API).
  • If those “unknown” phases don’t last long and have no actual impact, you may increase the “Max Check Attempts” parameter on the host template to avoid having hard alerts when it doesn’t last.
  • If you don’t want to be bothered by this status any more, then you may add --ok-status=’'%{status} =~ /^(Available|Unknown)$/’ to the host template’s EXTRAOPTIONS macro but be warned that it may hide some serious incidents.

That’s a workaround. Where does it Centreon get the status from ? az cli shows the status as powered on and ok but we are facing the same dilemma since last week and it’s random. It keeps flapping


Forum|alt.badge.img+6
  • Author
  • Steward **
  • April 3, 2025

Oh, you are facing this too, I was thinking we were alone.


omercier
Centreonian
Forum|alt.badge.img+13
  • Centreonian
  • April 3, 2025

That’s a workaround. Where does it Centreon get the status from ? az cli shows the status as powered on and ok but we are facing the same dilemma since last week and it’s random. It keeps flapping

 

Hi, the status comes from the API response, here it is (extracted from the post that initiated this thread:

{
"id": "/subscriptions/XXXXXXXXXXXXXXXXXXXXXXXXXX/resourcegroups/rg-gpsweb-nonprod-006/providers/microsoft.compute/virtualmachines/vm-gpsweb-nonprod-006-01/providers/Microsoft.ResourceHealth/availabilityStatuses/current",
"name": "current",
"type": "Microsoft.ResourceHealth/AvailabilityStatuses",
"location": "francecentral",
"properties": {
"availabilityState": "Unknown",
"title": "Unknown",
"summary": "We are currently unable to determine the health of this virtual machine.",
"reasonType": "",
"category": "Not Applicable",
"context": "Not Applicable",
"occuredTime": "2024-09-09T07:56:15Z",
"reasonChronicity": "Persistent",
"reportedTime": "2024-09-09T08:19:55.3097151Z"
}
}

 


Forum|alt.badge.img+5

Right so I’ll report back to Microsoft because they state all is ok from their side


Forum|alt.badge.img+5

Thanks. Using invoke-azrestmethod, it confirms what you were saying anyway