Pre-deploy validations should wait for nova resources

Bug #1728650 reported by Dmitry Tantsur
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Dmitry Tantsur

Bug Description

Currently, if you start a new deploy right after tearing down a previous one, you can hit the following error:

Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 0076d996-cfdf-4903-af41-a9ff241555be
Waiting for messages on queue '684d93c8-120f-4cb2-8ae2-ac1628149cdb' with no timeout.
{u'errors': [u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'], u'result': {u'enough_nodes': False, u'statistic
s': {u'count': 0, u'vcpus_used': 0, u'local_gb_used': 0, u'manager': {u'api': {u'server_groups': None, u'keypairs': None, u'servers': None, u'server_external_events': None, u'server_migratio
ns': None, u'agents': None, u'instance_action': None, u'glance': None, u'hypervisor_stats': None, u'virtual_interfaces': None, u'flavors': None, u'availability_zones': None, u'user_id': None, u'cloudpipe': None, u'os_cache': False, u'quotas': None, u'migrations': None, u'usage': None, u'logger': None, u'project_id': None, u'neutron': None, u'quota_classes': None, u'project_name': None, u'aggregates': None, u'flavor_access': None, u'services': None, u'list_extensions': None, u'limits': None, u'hypervisors': None, u'cells': None, u'versions': None, u'client': None, u'hosts': None, u'volumes': None, u'assisted_volume_snapshots': None, u'certs': None}}, u'x_openstack_request_ids': [u'req-e3efba33-ab0a-4d62-a1b6-a7770e349e43'], u'memory_mb': 0, u'current_workload': 0, u'vcpus': 0, u'running_vms': 0, u'free_disk_gb': 0, u'disk_available_least': 0, u'_info': {u'count': 0, u'vcpus_used': 0, u'local_gb_used': 0, u'memory_mb': 0, u'current_workload': 0, u'vcpus': 0, u'running_vms': 0, u'free_disk_gb': 0, u'disk_available_least': 0, u'local_gb': 0, u'free_ram_mb': 0, u'memory_mb_used': 0}, u'local_gb': 0, u'free_ram_mb': 0, u'memory_mb_used': 0, u'_loaded': True}, u'requested_count': 4, u'available_count': 15}, u'warnings': []}
ERRORS
[u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.', u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.']
Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy.

It goes away after a minute or two. I think we should retry our workflows instead of making a user retry.

Initially reported in https://bugzilla.redhat.com/show_bug.cgi?id=1506038

Dmitry Tantsur (divius)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.openstack.org/516382

Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: queens-2 → queens-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/516382
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=5b61f5a1491a067ec91a2bf29d8bcd64e0d475c1
Submitter: Zuul
Branch: master

commit 5b61f5a1491a067ec91a2bf29d8bcd64e0d475c1
Author: Dmitry Tantsur <email address hidden>
Date: Mon Oct 30 17:28:42 2017 +0100

    Retry the check_default_nodes_count workflow for 2 minutes

    Nova updates its cache of bare metal nodes once in 2 minutes. If a new
    deployment is run right after the previous one, the cache may be
    outdated, and the second deployment may fail validation.

    This change introduces a retry where check_default_nodes_count workflow
    is called.

    Closes-Bug: #1728650
    Change-Id: I43a97f7f9bc4e7062a47003e5d9d06ebe3716450

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/534325

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 8.4.0

This issue was fixed in the openstack/tripleo-common 8.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/pike)

Reviewed: https://review.openstack.org/534325
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=43aba5e9b4a7f9aeb8eeb98d79a0f2bdeb82d2fd
Submitter: Zuul
Branch: stable/pike

commit 43aba5e9b4a7f9aeb8eeb98d79a0f2bdeb82d2fd
Author: Dmitry Tantsur <email address hidden>
Date: Mon Oct 30 17:28:42 2017 +0100

    Retry the check_default_nodes_count workflow for 2 minutes

    Nova updates its cache of bare metal nodes once in 2 minutes. If a new
    deployment is run right after the previous one, the cache may be
    outdated, and the second deployment may fail validation.

    This change introduces a retry where check_default_nodes_count workflow
    is called.

    Closes-Bug: #1728650
    Change-Id: I43a97f7f9bc4e7062a47003e5d9d06ebe3716450
    (cherry picked from commit 5b61f5a1491a067ec91a2bf29d8bcd64e0d475c1)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 7.6.11

This issue was fixed in the openstack/tripleo-common 7.6.11 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.