instance stuck in BUILD state if nova-compute is restarted

Bug #1833581 reported by Balazs Gibizer
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Balazs Gibizer
Pike
Fix Released
Low
Balazs Gibizer
Queens
Fix Released
Low
Elod Illes
Rocky
Fix Committed
Low
Balazs Gibizer
Stein
Fix Committed
Low
Balazs Gibizer
Train
Fix Committed
Low
Balazs Gibizer

Bug Description

Description
===========
Instance stuck in BUILD state indefinitely if nova-compute service restarted in the mean time. Even after the instance_build_timeout the instance is not put into ERROR state.

Steps to reproduce
==================

1) Start 10 VMs in parallel to increase the chance of hitting the bug

$ for NUM in `seq 1 1 10`; do openstack server create --flavor c1 --image cirros-0.4.0-x86_64-disk --availability-zone nova:ubuntu vm$NUM & done

2) when the first instance reach the BUILD state restart the nova-compute service
$ sudo systemctl restart <email address hidden>

3) Observer that instance states after the compute is up again.

Expected result
===============

Instances either in ACTIVE or in ERROR state.

Actual result
=============
Some instance stuck in BUILD state.

Environment
===========

all in one devstack build from recent nova master 61558f274842b149044a14bbe7537b9f278035fd

Logs & Configs
==============

stack@ubuntu:~$ openstack server list
+--------------------------------------+------+--------+------------------------------------+--------------------------+-----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+--------+------------------------------------+--------------------------+-----------+
| 9ee76601-4a61-4682-86f1-743dac2b05e6 | vm3 | BUILD | | cirros-0.4.0-x86_64-disk | cirros256 |
| e459beae-ccb5-4781-b938-2dff68e33bf7 | vm9 | ACTIVE | public=2001:db8::181, 172.24.4.44 | cirros-0.4.0-x86_64-disk | cirros256 |
| 562f44db-cd51-4516-bce9-598bd29c6310 | vm10 | ERROR | public=2001:db8::3a1, 172.24.4.196 | cirros-0.4.0-x86_64-disk | cirros256 |
| 73f1e2c6-78a1-44c5-b178-7adcf9bf58a0 | vm5 | ERROR | public=2001:db8::21, 172.24.4.177 | cirros-0.4.0-x86_64-disk | cirros256 |
| 1b01acfc-b798-48f9-b808-6cfd0d5cd3fb | vm6 | ERROR | public=2001:db8::3e1, 172.24.4.20 | cirros-0.4.0-x86_64-disk | cirros256 |
| c709e3bf-9c71-4f64-bad3-e9e07e911f62 | vm7 | ERROR | public=2001:db8::231, 172.24.4.46 | cirros-0.4.0-x86_64-disk | cirros256 |
| 538d2534-98f1-4e11-9bbb-b4e74bab8c65 | vm4 | ERROR | public=2001:db8::3e9, 172.24.4.157 | cirros-0.4.0-x86_64-disk | cirros256 |
| ed74eb32-00fe-4f24-9379-c57c04ce9af1 | vm2 | ERROR | public=2001:db8::f5, 172.24.4.53 | cirros-0.4.0-x86_64-disk | cirros256 |
| 582b5356-4f3d-42ed-937e-966580303af0 | vm8 | ERROR | public=2001:db8::92, 172.24.4.16 | cirros-0.4.0-x86_64-disk | cirros256 |
| ae36ffca-e4d6-4353-8e7e-41db500a5e0d | vm1 | ERROR | public=2001:db8::1cf, 172.24.4.203 | cirros-0.4.0-x86_64-disk | cirros256 |
+--------------------------------------+------+--------+------------------------------------+--------------------------+-----------+

stack@ubuntu:~$ openstack server show 9ee76601-4a61-4682-86f1-743dac2b05e6
+-------------------------------------+-----------------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | None |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None |
| OS-EXT-SRV-ATTR:instance_name | instance-0000004c |
| OS-EXT-STS:power_state | NOSTATE |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | None |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | |
| config_drive | |
| created | 2019-06-19T02:30:16Z |
| flavor | cirros256 (c1) |
| hostId | |
| id | 9ee76601-4a61-4682-86f1-743dac2b05e6 |
| image | cirros-0.4.0-x86_64-disk (8b88f518-ab48-4859-8e8c-6988911ce9bd) |
| key_name | None |
| name | vm3 |
| progress | 0 |
| project_id | 2fc0b14ea1e041998f420ec85a89314d |
| properties | |
| status | BUILD |
| updated | 2019-06-19T02:30:18Z |
| user_id | 262d29f5f0c3445abbde89723b5f01ee |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------------------------+
stack@ubuntu:~$

mysql> select uuid, host from instances where instances.uuid='9ee76601-4a61-4682-86f1-743dac2b05e6';
+--------------------------------------+------+
| uuid | host |
+--------------------------------------+------+
| 9ee76601-4a61-4682-86f1-743dac2b05e6 | NULL |
+--------------------------------------+------+
1 row in set (0.00 sec)

Logs for 9ee76601-4a61-4682-86f1-743dac2b05e6: http://paste.openstack.org/show/753228/

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Different instance states after the compute restart:

* ERROR: the instance has already have the instance.host set in the db and therefore the compute startup detects it and push it to ERROR state
* ACTIVE: either the instance is already spawned successfully before the compute is stopped, or the build request still was in flight in AMQP when the compute stopped.
* BUILD: the build request reached the compute before it was stopped but instance.host wasn't set as the instance_claim did not finished before the compute is stopped. When the compute started again the compute does not detect this instance as it is not assigned to its host.

There is a periodic job in the compute that ERRORs out instances according to the instance_build_timeout config[1]. But it also only checks for instances assigned to the compute host so it does not push the stuck instance to ERROR.

[1]https://github.com/openstack/nova/blob/c18f7f47f628e266e5b69f4b9733a0f25ed4ffdd/nova/compute/manager.py#L1433

tags: added: compute
Changed in nova:
assignee: nobody → Balazs Gibizer (balazs-gibizer)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/666857

Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/667396

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/667397

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Balazs Gibizer (<email address hidden>) on branch: master
Review: https://review.opendev.org/667397

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/667913

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Balazs Gibizer (<email address hidden>) on branch: master
Review: https://review.opendev.org/667396

Revision history for this message
Matt Riedemann (mriedem) wrote :

I hit similar issues where the instance was stuck in BUILD status without a host set because of an overloaded cell conductor so I was getting MessagingTimeout errors during the build, I left the details in a comment on this change https://review.opendev.org/#/c/667913/. Anyway, just another data point.

Changed in nova:
assignee: Balazs Gibizer (balazs-gibizer) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Balazs Gibizer (balazs-gibizer)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/667913
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d2e0bd81df6a732f9c78df29538db89dda37b246
Submitter: Zuul
Branch: master

commit d2e0bd81df6a732f9c78df29538db89dda37b246
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 21 17:13:31 2019 +0200

    Functional reproduce for bug 1833581

    Change-Id: Id112098ef7603d0e514120ac9b7ed861dfa32bd3
    Related-Bug: #1833581

Revision history for this message
Matt Riedemann (mriedem) wrote :

This is extremely latent but I've marked it going back to at least queens since that's currently our oldest non-extended maintenance branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/666857
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a1a735bc6efa40d8277c9fc5339f3b74f968b58e
Submitter: Zuul
Branch: master

commit a1a735bc6efa40d8277c9fc5339f3b74f968b58e
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 21 16:48:14 2019 +0200

    Error out interrupted builds

    If the compute service is restarted while build requests are
    executing the instance_claim or waiting for the COMPUTE_RESOURCE_SEMAPHORE
    then those instances will be stuck forever in BUILDING state. If the instance
    already finished instance_claim then instance.host is set and when the
    compute restarts the instance is put to ERROR state.

    This patch changes compute service startup to put instances into
    ERROR state if they a) are in the BUILDING state, and b) have
    allocations on the compute resource provider, but c) do not have
    instance.host set to that compute.

    Change-Id: I856a3032c83fc2f605d8c9b6e5aa3bcfa415f96a
    Closes-Bug: #1833581

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/687216

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/687534

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/687535

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/687564

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/687565

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/687877

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/687878

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.opendev.org/687917

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/687918

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/687216
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=06fd7c730172190d7bf7d52bc9062eecba8d7d27
Submitter: Zuul
Branch: stable/train

commit 06fd7c730172190d7bf7d52bc9062eecba8d7d27
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 21 16:48:14 2019 +0200

    Error out interrupted builds

    If the compute service is restarted while build requests are
    executing the instance_claim or waiting for the COMPUTE_RESOURCE_SEMAPHORE
    then those instances will be stuck forever in BUILDING state. If the instance
    already finished instance_claim then instance.host is set and when the
    compute restarts the instance is put to ERROR state.

    This patch changes compute service startup to put instances into
    ERROR state if they a) are in the BUILDING state, and b) have
    allocations on the compute resource provider, but c) do not have
    instance.host set to that compute.

    Change-Id: I856a3032c83fc2f605d8c9b6e5aa3bcfa415f96a
    Closes-Bug: #1833581
    (cherry picked from commit a1a735bc6efa40d8277c9fc5339f3b74f968b58e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/687534
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e4a5516098454775e1c5d5f631308bfa9abf7167
Submitter: Zuul
Branch: stable/stein

commit e4a5516098454775e1c5d5f631308bfa9abf7167
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 21 17:13:31 2019 +0200

    Functional reproduce for bug 1833581

    Change-Id: Id112098ef7603d0e514120ac9b7ed861dfa32bd3
    Related-Bug: #1833581
    (cherry picked from commit d2e0bd81df6a732f9c78df29538db89dda37b246)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/687535
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=530ad1ae8884e5c87761277bb64fdb17b286e968
Submitter: Zuul
Branch: stable/stein

commit 530ad1ae8884e5c87761277bb64fdb17b286e968
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 21 16:48:14 2019 +0200

    Error out interrupted builds

    If the compute service is restarted while build requests are
    executing the instance_claim or waiting for the COMPUTE_RESOURCE_SEMAPHORE
    then those instances will be stuck forever in BUILDING state. If the instance
    already finished instance_claim then instance.host is set and when the
    compute restarts the instance is put to ERROR state.

    This patch changes compute service startup to put instances into
    ERROR state if they a) are in the BUILDING state, and b) have
    allocations on the compute resource provider, but c) do not have
    instance.host set to that compute.

    Conflicts:
          nova/tests/unit/compute/test_compute_mgr.py

    Conflict due to Ia1b3ab0b66fdaf569f6c7a09510f208ee28725b2 is not in
    stable/stein

    Change-Id: I856a3032c83fc2f605d8c9b6e5aa3bcfa415f96a
    Closes-Bug: #1833581
    (cherry picked from commit a1a735bc6efa40d8277c9fc5339f3b74f968b58e)
    (cherry picked from commit 06fd7c730172190d7bf7d52bc9062eecba8d7d27)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/687564
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=19ca978bd48be1990db5a09fadbd0eea58f9d6b7
Submitter: Zuul
Branch: stable/rocky

commit 19ca978bd48be1990db5a09fadbd0eea58f9d6b7
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 21 17:13:31 2019 +0200

    Functional reproduce for bug 1833581

    Change-Id: Id112098ef7603d0e514120ac9b7ed861dfa32bd3
    Related-Bug: #1833581
    (cherry picked from commit d2e0bd81df6a732f9c78df29538db89dda37b246)
    (cherry picked from commit 48d066a4193940815094c2ab8299db543aa514e5)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/687565
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=13bb7ed701121955ba015103c2e44429927e78d4
Submitter: Zuul
Branch: stable/rocky

commit 13bb7ed701121955ba015103c2e44429927e78d4
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 21 16:48:14 2019 +0200

    Error out interrupted builds

    If the compute service is restarted while build requests are
    executing the instance_claim or waiting for the COMPUTE_RESOURCE_SEMAPHORE
    then those instances will be stuck forever in BUILDING state. If the instance
    already finished instance_claim then instance.host is set and when the
    compute restarts the instance is put to ERROR state.

    This patch changes compute service startup to put instances into
    ERROR state if they a) are in the BUILDING state, and b) have
    allocations on the compute resource provider, but c) do not have
    instance.host set to that compute.

    Conflicts:
          nova/tests/unit/compute/test_compute_mgr.py
          nova/compute/manager.py

    Conflict due to Ia1b3ab0b66fdaf569f6c7a09510f208ee28725b2 and
    I020e7dc47efc79f8907b7bfb753ec779a8da69a1 is not in stable/rocky

    Change-Id: I856a3032c83fc2f605d8c9b6e5aa3bcfa415f96a
    Closes-Bug: #1833581
    (cherry picked from commit a1a735bc6efa40d8277c9fc5339f3b74f968b58e)
    (cherry picked from commit 06fd7c730172190d7bf7d52bc9062eecba8d7d27)
    (cherry picked from commit cb951cbcb246221e04a063cd7b5ae2e83ddfe6dd)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/687877
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=23d65bcf1e82ecdd4eff1a14cc8c8c8eb473d036
Submitter: Zuul
Branch: stable/queens

commit 23d65bcf1e82ecdd4eff1a14cc8c8c8eb473d036
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 21 17:13:31 2019 +0200

    Functional reproduce for bug 1833581

    Conflicts:
          nova/tests/functional/compute/test_init_host.py

    Conflict is due to Iea283322124cb35fc0bc6d25f35548621e8c8c2f is missing
    from stable/queens

    Change-Id: Id112098ef7603d0e514120ac9b7ed861dfa32bd3
    Related-Bug: #1833581
    (cherry picked from commit d2e0bd81df6a732f9c78df29538db89dda37b246)
    (cherry picked from commit 48d066a4193940815094c2ab8299db543aa514e5)
    (cherry picked from commit 19ca978bd48be1990db5a09fadbd0eea58f9d6b7)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/687878
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4164b96de9f62fdc35a12adf514d767460187d55
Submitter: Zuul
Branch: stable/queens

commit 4164b96de9f62fdc35a12adf514d767460187d55
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 21 16:48:14 2019 +0200

    Error out interrupted builds

    If the compute service is restarted while build requests are
    executing the instance_claim or waiting for the COMPUTE_RESOURCE_SEMAPHORE
    then those instances will be stuck forever in BUILDING state. If the instance
    already finished instance_claim then instance.host is set and when the
    compute restarts the instance is put to ERROR state.

    This patch changes compute service startup to put instances into
    ERROR state if they a) are in the BUILDING state, and b) have
    allocations on the compute resource provider, but c) do not have
    instance.host set to that compute.

    Change-Id: I856a3032c83fc2f605d8c9b6e5aa3bcfa415f96a
    Closes-Bug: #1833581
    (cherry picked from commit a1a735bc6efa40d8277c9fc5339f3b74f968b58e)
    (cherry picked from commit 06fd7c730172190d7bf7d52bc9062eecba8d7d27)
    (cherry picked from commit cb951cbcb246221e04a063cd7b5ae2e83ddfe6dd)
    (cherry picked from commit 13bb7ed701121955ba015103c2e44429927e78d4)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.1

This issue was fixed in the openstack/nova 20.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.opendev.org/687917
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a84dbeab6245af38886c9da235872209e63a191e
Submitter: Zuul
Branch: stable/pike

commit a84dbeab6245af38886c9da235872209e63a191e
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 21 17:13:31 2019 +0200

    Functional reproduce for bug 1833581

    Conflicts:
          nova/tests/functional/compute/test_init_host.py
    Note: conflict is due to needed changes in Pike version of patch
    I107d842520c088b4859a3b36621ce6bd8e970475 (the missing last assert)

    Additional changes in test_init_host.py compared to Queens:
    * Notification handling is changed as
      Ie4676eed0039c927b35af7573f0b57fd762adbaa is not in Pike.

    Change-Id: Id112098ef7603d0e514120ac9b7ed861dfa32bd3
    Related-Bug: #1833581
    (cherry picked from commit d2e0bd81df6a732f9c78df29538db89dda37b246)
    (cherry picked from commit 48d066a4193940815094c2ab8299db543aa514e5)
    (cherry picked from commit 19ca978bd48be1990db5a09fadbd0eea58f9d6b7)
    (cherry picked from commit 23d65bcf1e82ecdd4eff1a14cc8c8c8eb473d036)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.opendev.org/687918
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e5892ed61b5f4f4f581384e245d5052e7bf840b2
Submitter: Zuul
Branch: stable/pike

commit e5892ed61b5f4f4f581384e245d5052e7bf840b2
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 21 16:48:14 2019 +0200

    Error out interrupted builds

    If the compute service is restarted while build requests are
    executing the instance_claim or waiting for the COMPUTE_RESOURCE_SEMAPHORE
    then those instances will be stuck forever in BUILDING state. If the instance
    already finished instance_claim then instance.host is set and when the
    compute restarts the instance is put to ERROR state.

    This patch changes compute service startup to put instances into
    ERROR state if they a) are in the BUILDING state, and b) have
    allocations on the compute resource provider, but c) do not have
    instance.host set to that compute.

    Note: changes in manager.py and test_compute_mgr.py compared to Queens:
    * the signature change of the get_allocations_for_resource_provider
      call is due to I7891b98f225f97ad47f189afb9110ef31c810717 is missing from
      stable/pike.
    * the VirtDriverNotReady exception does not exists in pike as
      Ib0ec1012b74e9a9e74c8879f3feed5f9332b711f is missing. In pike ironic
      returns an empty node list instead of raising an exception so the bugfix
      and the test is adapted accordingly.

    Change-Id: I856a3032c83fc2f605d8c9b6e5aa3bcfa415f96a
    Closes-Bug: #1833581
    (cherry picked from commit a1a735bc6efa40d8277c9fc5339f3b74f968b58e)
    (cherry picked from commit 06fd7c730172190d7bf7d52bc9062eecba8d7d27)
    (cherry picked from commit cb951cbcb246221e04a063cd7b5ae2e83ddfe6dd)
    (cherry picked from commit 13bb7ed701121955ba015103c2e44429927e78d4)
    (cherry picked from commit 4164b96de9f62fdc35a12adf514d767460187d55)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.1.0

This issue was fixed in the openstack/nova 19.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.3.0

This issue was fixed in the openstack/nova 18.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova pike-eol

This issue was fixed in the openstack/nova pike-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova queens-eol

This issue was fixed in the openstack/nova queens-eol release.

Revision history for this message
yule sun (syle87) wrote :

When I used the yoga version of the cluster and shut down a control node to create a virtual machine, I encountered the same problem.
Only the log of virtual machine creation was found in nova-api.log, nova-scheduler.log and nova-conductor.log did not find the log information of virtual machine creation. But the virtual machine has been stuck in the build state. Where should I start to deal with this problem? Looking forward to your help.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Could you be more specific about the sequence of events? What services was shut down? You say it was a control node but what openstack services run on that node?

Have you checked the logs of the services that was shut down? Do those logs mention the instance or request_id in question?

If you need to recover from this state you can use nova reset-state command to push the instance to ERROR state then you can try to rebuild it or delete it.

If you seek for a bugfix then I suggest to open a separate bug report as this report talks about a nova-compute service restart not a controller restart.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.