Check that enough nodes are in "available" state with maintenance mode off error

Bug #1712632 reported by Victoria Martinez de la Cruz on 2017-08-23
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
High
Alex Schultz

Bug Description

When trying to deploy the overcloud (minimal deploy, one controller and one compute) I ran into the following error

(undercloud) [stack@undercloud ~]$ openstack overcloud deploy --templates ~/tripleo-heat-templates/ -e ~/tripleo-heat-templates/environments/docker.yaml -e ~/docker_registry.yaml -e ~/tripleo-heat-templates/environments/manila-netapp-config.yaml -e ~/manila_netapp.yaml -e ~/tripleo-heat-templates/environments/docker-ha.yaml
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: c4775d71-819f-481d-86ec-210260043df0
Waiting for messages on queue 'a7f0b2ec-e581-4e6f-8edd-1e719640ba67' with no timeout.
{u'errors': [u'Only 0 nodes are exposed to Nova of 2 requests. Check that enough nodes are in "available" state with maintenance mode off.'], u'result': {u'enough_nodes': False, u'statistics': {u'count': 0, u'vcpus_used': 0, u'local_gb_used': 0, u'manager': {u'api': {u'server_groups': None, u'keypairs': None, u'servers': None, u'server_external_events': None, u'server_migrations': None, u'agents': None, u'instance_action': None, u'glance': None, u'hypervisor_stats': None, u'virtual_interfaces': None, u'flavors': None, u'availability_zones': None, u'user_id': None, u'cloudpipe': None, u'os_cache': False, u'quotas': None, u'migrations': None, u'usage': None, u'logger': None, u'project_id': None, u'neutron': None, u'quota_classes': None, u'project_name': None, u'aggregates': None, u'flavor_access': None, u'services': None, u'list_extensions': None, u'limits': None, u'hypervisors': None, u'cells': None, u'versions': None, u'client': None, u'hosts': None, u'volumes': None, u'assisted_volume_snapshots': None, u'certs': None}}, u'x_openstack_request_ids': [u'req-da2631dc-4a9c-48a9-871e-ec562f017406'], u'memory_mb': 0, u'current_workload': 0, u'vcpus': 0, u'running_vms': 0, u'free_disk_gb': 0, u'disk_available_least': 0, u'_info': {u'count': 0, u'vcpus_used': 0, u'local_gb_used': 0, u'memory_mb': 0, u'current_workload': 0, u'vcpus': 0, u'running_vms': 0, u'free_disk_gb': 0, u'disk_available_least': 0, u'local_gb': 0, u'free_ram_mb': 0, u'memory_mb_used': 0}, u'local_gb': 0, u'free_ram_mb': 0, u'memory_mb_used': 0, u'_loaded': True}, u'requested_count': 2, u'available_count': 2}, u'warnings': []}
ERRORS
[u'Only 0 nodes are exposed to Nova of 2 requests. Check that enough nodes are in "available" state with maintenance mode off.', u'Only 0 nodes are exposed to Nova of 2 requests. Check that enough nodes are in "available" state with maintenance mode off.']
Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy.

Ironic nodes appear to be ok, there are 2 available, but Nova doesn't know about them.

(undercloud) [stack@undercloud ~]$ ironic node-list
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+
| 51d68c49-57d3-4797-91cb-d0790aa4f23e | control-0 | None | power off | available | False |
| df8f251e-634a-4850-8ea2-4c7518d7229f | compute-0 | None | power off | available | False |
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+

(undercloud) [stack@undercloud ~]$ nova hypervisor-stats
+----------------------+-------+
| Property | Value |
+----------------------+-------+
| count | 0 |
| current_workload | 0 |
| disk_available_least | 0 |
| free_disk_gb | 0 |
| free_ram_mb | 0 |
| local_gb | 0 |
| local_gb_used | 0 |
| memory_mb | 0 |
| memory_mb_used | 0 |
| running_vms | 0 |
| vcpus | 0 |
| vcpus_used | 0 |
+----------------------+-------+

Checking the nova services we see the following

(undercloud) [stack@undercloud ~]$ nova service-list
+--------------------------------------+----------------+------------+----------+----------+-------+----------------------------+----------------------------------------+-------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | Forced down |
+--------------------------------------+----------------+------------+----------+----------+-------+----------------------------+----------------------------------------+-------------+
| c49703f6-f4d8-4a97-80f0-30106035431c | nova-conductor | undercloud | internal | enabled | up | 2017-08-17T16:00:14.000000 | - | False |
| 2c008a8b-8bfe-4313-9af7-412c098db8ac | nova-scheduler | undercloud | internal | enabled | up | 2017-08-17T16:00:12.000000 | - | False |
| 85bbdc1c-6613-45ba-9bf5-2c76e901855d | nova-compute | undercloud | nova | disabled | up | 2017-08-17T16:00:13.000000 | Auto-disabled due to 10 build failures | False |
+--------------------------------------+----------------+------------+----------+----------+-------+----------------------------+----------------------------------------+-------------+

So, we restarted nova-compute

(undercloud) [stack@undercloud ~]$ nova service-enable 85bbdc1c-6613-45ba-9bf5-2c76e901855d
+--------------------------------------+------------+--------------+---------+
| ID | Host | Binary | Status |
+--------------------------------------+------------+--------------+---------+
| 85bbdc1c-6613-45ba-9bf5-2c76e901855d | undercloud | nova-compute | enabled |
+--------------------------------------+------------+--------------+---------+
(undercloud) [stack@undercloud ~]$ nova hypervisor-stats
+----------------------+-------+
| Property | Value |
+----------------------+-------+
| count | 2 |
| current_workload | 0 |
| disk_available_least | 98 |
| free_disk_gb | 98 |
| free_ram_mb | 16384 |
| local_gb | 98 |
| local_gb_used | 0 |
| memory_mb | 16384 |
| memory_mb_used | 0 |
| running_vms | 0 |
| vcpus | 4 |
| vcpus_used | 0 |
+----------------------+-------+

And this solved the issue.

Turns out the problem was that the undercloud didn't have much disk space remaining so ironic builds failed, and nova now auto disables computes that fail 10 consecutive builds. Restarting the nova-compute service does the trick, but there is no clear message indicating this.

We should catch this in the validation to avoid failing to create the overcloud. And by catch I mean report the actual issue, validation did catch this I guess

The validation should be added to triple-common

Alex Schultz (alex-schultz) wrote :

Alternatively is this something we can configure nova not to do?

Changed in tripleo:
importance: Undecided → High
status: New → Triaged
milestone: none → pike-rc1

Fix proposed to branch: master
Review: https://review.openstack.org/496851

Changed in tripleo:
assignee: nobody → Alex Schultz (alex-schultz)
status: Triaged → In Progress
Changed in tripleo:
milestone: pike-rc1 → pike-rc2

Reviewed: https://review.openstack.org/496851
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=b366467f6d596017913167cf3e6c158805c8ae52
Submitter: Jenkins
Branch: master

commit b366467f6d596017913167cf3e6c158805c8ae52
Author: Alex Schultz <email address hidden>
Date: Wed Aug 23 13:04:40 2017 -0600

    Disable compute auto disabling

    As part of Pike, nova introduced a change to have the nova-compute
    process automatically disable the nova-compute instance in the case of
    consecutive build failures. This can lead to odd errors when deploying
    the ironic nodes on the undercloud as you end up with a ComputeFilter
    error. This change disables this functionality for the undercloud since
    we do not want the nova-compute instance running on the undercloud for
    Ironic to be disabled in the case of multiple deployment failures.

    Change-Id: Ia8a4cfcd6b31b496161cba14ee597bc61af2cab4
    Depends-On: If46602fdcac38745e6b6b17d560844bb4f42ba3c
    Closes-Bug: #1712632

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/499398
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=b56b898cf43f81444788d69d5714fca6dd771dc5
Submitter: Jenkins
Branch: stable/pike

commit b56b898cf43f81444788d69d5714fca6dd771dc5
Author: Alex Schultz <email address hidden>
Date: Wed Aug 23 13:04:40 2017 -0600

    Disable compute auto disabling

    As part of Pike, nova introduced a change to have the nova-compute
    process automatically disable the nova-compute instance in the case of
    consecutive build failures. This can lead to odd errors when deploying
    the ironic nodes on the undercloud as you end up with a ComputeFilter
    error. This change disables this functionality for the undercloud since
    we do not want the nova-compute instance running on the undercloud for
    Ironic to be disabled in the case of multiple deployment failures.

    Change-Id: Ia8a4cfcd6b31b496161cba14ee597bc61af2cab4
    Depends-On: If46602fdcac38745e6b6b17d560844bb4f42ba3c
    Closes-Bug: #1712632
    (cherry picked from commit b366467f6d596017913167cf3e6c158805c8ae52)

tags: added: in-stable-pike

This issue was fixed in the openstack/instack-undercloud 7.4.0 release.

This issue was fixed in the openstack/instack-undercloud 8.0.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers