[2.0] Failures to release ESXi 6.0 VMs - Failed talking to node's BMC: <vCenter IP Address>:443 is not a VIM server

Bug #1626334 reported by Larry Michel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
2.0
Invalid
Undecided
Unassigned

Bug Description

This may be released to bug 1603563, but it seems to have different signature as it's not about to talk to the vCenter with this error in the event logs:

 Node powered off Thu, 22 Sep. 2016 00:19:54
Node changed status - From 'Releasing' to 'Ready' Thu, 22 Sep. 2016 00:19:54
Powering node off Thu, 22 Sep. 2016 00:19:51
User releasing node - (larry) Thu, 22 Sep. 2016 00:19:51
Failed to power off node - Node could not be powered off: Failed talking to node's BMC: 10.244.192.131:443 is not a VIM server Wed, 14 Sep. 2016 10:21:34
Node changed status - From 'Releasing' to 'Releasing failed' Wed, 14 Sep. 2016 10:21:34
Marking node failed - Node could not be powered off: Failed talking to node's BMC: 10.244.192.131:443 is not a VIM server Wed, 14 Sep. 2016 10:21:34
Powering node off Wed, 14 Sep. 2016 10:21:33
User releasing node - (oil-slave-2) - Released by Juju MAAS provider Wed, 14 Sep. 2016 10:21:33
Failed to query node's BMC - 10.244.192.131:443 is not a VIM server

It could also be an issue with the vCenter but the error is not very telling. So, this issue needs to be investigated further. Note that I was able to release the VMs manually but that was hours after the error occured.

ubuntu@maas2-integration:~$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-==============================-============-=================================================
ii maas 2.0.0+bzr5189-0ubuntu1~16.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.0.0+bzr5189-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.0.0+bzr5189-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.0.0+bzr5189-0ubuntu1~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.0.0+bzr5189-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

Also attaching the logs

Tags: oil
Revision history for this message
Larry Michel (lmic) wrote :
Revision history for this message
Andres Rodriguez (andreserl) wrote :

It seems that the reason of the releasing failure is related to this:

Failed to power off node - Node could not be powered off: Failed talking to node's BMC: 10.244.192.131:443 is not a VIM server Wed, 14 Sep. 2016 10:21:34

I've searched through the MAAS code and "is not a VIM server" doesn't come from MAAS. It actually comes from python-pyvmomi: https://github.com/vmware/pyvmomi/blob/master/pyVim/connect.py

This error seems to happen when the client tries to connect to the server and it cannot find a SupportedVersion of the API.

What may be happening, however, is that the client cannot actually connect to the vCenterAPI, which could be creating the issue.

So I'd say that this is related to connectivity issues between the Rack and the vCenter.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Larry, can you please show what power parameters have been configured to the machine though ? Also, can you confirm that if you do it manually (no juju), it works just fine ?

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Without digging deeper, my intuition tells me that this could be one of of a few things:

(1) The version of python-pyvmomi we have in the repository may be too old to talk to this server. In that case, the following *might* work around the issue (though you would be getting into 'unsupported' territory):

    sudo pip3 install --system --upgrade pyvmomi

(2) It could be an issue with the power parameters. For historical reasons (I think), ports cannot be specified as "host[:port]" in the many of the power drivers. Make sure you are using the 'port' field to enter the port number.

(3) If you do (1) and (2) and it still doesn't work, it might be an issue with the certificate. Newer versions of the python-pyvmomi API will verify the server certificate, which can be inconvenient. In that case, first make sure you have the latest version of MAAS (from the stable PPA), and then either make sure your system considers the certificate trusted (and make sure the certificate's subnet matches the VMware server hostname), or change the power parameters to use protocol=https+unverified.

Hope that helps. Please let me know if any of the above solves your issue.

Revision history for this message
Larry Michel (lmic) wrote :

Andres,

When I tried the manual release, it worked. These are not new systems that we have been using with the CI automated builds and hadn't seen prior to that. We also haven't had a change since moving to release version of maas. But after looking at the timestamps, I am now realizing that there was an issue with the vCenter around the time of these failures last week and we were getting 203 back from trying to access it. The issue went away after restarting the vCenter.

So, this is most likely this scenario that you mentionned "the client cannot actually connect to the vCenterAPI". Here are the power parameters below but you can mark this one as invalid and I'll reopen if I recreate with any data pointing away from the vCenter.

ubuntu@maas2-integration:~$ maas root machines power-parameters id=4y4fma
Success.
Machine-readable output follows:
{
    "4y4fma": {
        "power_uuid": "5007125f-76fe-1e10-a20f-9a9a31b0fa98",
        "power_vm_name": "integrationnedge0",
        "power_protocol": "https+unverified",
        "power_port": null,
        "power_user": "administrator@*******.***",
        "power_pass": "**********",
        "power_address": "10.244.192.131"
    }
}

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Thanks for the confirmation Larry.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.