Upgraded nova-compute services fail to start new instances

Bug #1821362 reported by Mark Goddard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
High
Mark Goddard
Rocky
Fix Committed
High
Mark Goddard
Stein
Fix Released
High
Mark Goddard

Bug Description

After upgrading from Rocky to Stein, nova-compute services fail to start new instances with the following error message:

Failed to allocate the network(s), not rescheduling.

Looking in the nova-compute logs, we also see this:

ERROR nova.virt.libvirt.driver [req-8733cf16-6f89-4664-9595-189dacab8a93 7cbd99b5747146baad20c8c035a64706 b44e356a90d74efcbeea1a4024104337 - default default] [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] Neutron Reported failure on event network-vif-plugged-60c05a0d-8758-44c9-81e4-754551567be5 for instance 32c493c4-d88c-4f14-98db-c7af64bf3324: NovaException: In shutdown, no new events can be scheduled

And this:

2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [req-8733cf16-6f89-4664-9595-189dacab8a93 7cbd99b5747146baad20c8c035a64706 b44e356a90d74efcbeea1a4024104337 - default default] [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] Failed to allocate network(s): VirtualInterfaceCreateException: Virtual Interface creation failed
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] Traceback (most recent call last):
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py", line 2235, in _build_and_run_instance
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] block_device_info=block_device_info)
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3150, in spawn
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] destroy_disks_on_failure=True)
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5686, in _create_domain_and_network
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] destroy_disks_on_failure)
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] self.force_reraise()
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] six.reraise(self.type_, self.value, self.tb)
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5665, in _create_domain_and_network
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] error_callback=self._neutron_failed_callback):
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] return self.gen.next()
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py", line 472, in wait_for_instance_event
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] error_callback(event_name, instance)
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5624, in _neutron_failed_callback
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] raise exception.VirtualInterfaceCreateException()
2019-03-22 13:49:01.177 7 ERROR nova.compute.manager [instance: 32c493c4-d88c-4f14-98db-c7af64bf3324] VirtualInterfaceCreateException: Virtual Interface creation failed

During the upgrade process, we send nova containers a SIGHUP to cause them to reload their object version state. Speaking to the nova team in IRC, there is a known issue with this, caused by oslo.service performing a full shutdown in response to a SIGHUP, which breaks nova-compute. There is a patch [1] in review to address this.

The workaround is to restart the nova compute service.

[1] https://review.openstack.org/#/c/641907

Mark Goddard (mgoddard)
Changed in kolla-ansible:
assignee: nobody → Mark Goddard (mgoddard)
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.openstack.org/645614

Changed in kolla-ansible:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.openstack.org/645614
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=192dcd1e1b9baf7f3177a694c2b1ce8bd62d9159
Submitter: Zuul
Branch: master

commit 192dcd1e1b9baf7f3177a694c2b1ce8bd62d9159
Author: Mark Goddard <email address hidden>
Date: Fri Mar 22 14:59:41 2019 +0000

    Fix booting instances after nova-compute upgrade

    After upgrading from Rocky to Stein, nova-compute services fail to start
    new instances with the following error message:

    Failed to allocate the network(s), not rescheduling.

    Looking in the nova-compute logs, we also see this:

    Neutron Reported failure on event
    network-vif-plugged-60c05a0d-8758-44c9-81e4-754551567be5 for instance
    32c493c4-d88c-4f14-98db-c7af64bf3324: NovaException: In shutdown, no new
    events can be scheduled

    During the upgrade process, we send nova containers a SIGHUP to cause
    them to reload their object version state. Speaking to the nova team in
    IRC, there is a known issue with this, caused by oslo.service performing
    a full shutdown in response to a SIGHUP, which breaks nova-compute.
    There is a patch [1] in review to address this.

    The workaround employed here is to restart the nova compute service.

    [1] https://review.openstack.org/#/c/641907

    Change-Id: Ia4fcc558a3f62ced2d629d7a22d0bc1eb6b879f1
    Closes-Bug: #1821362

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/647700

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/647701

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/647702

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/rocky)

Reviewed: https://review.openstack.org/647700
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=59d518a75a4d31610dbbf70b0ea18c4104ae1dd9
Submitter: Zuul
Branch: stable/rocky

commit 59d518a75a4d31610dbbf70b0ea18c4104ae1dd9
Author: Mark Goddard <email address hidden>
Date: Fri Mar 22 14:59:41 2019 +0000

    Fix booting instances after nova-compute upgrade

    After upgrading from Rocky to Stein, nova-compute services fail to start
    new instances with the following error message:

    Failed to allocate the network(s), not rescheduling.

    Looking in the nova-compute logs, we also see this:

    Neutron Reported failure on event
    network-vif-plugged-60c05a0d-8758-44c9-81e4-754551567be5 for instance
    32c493c4-d88c-4f14-98db-c7af64bf3324: NovaException: In shutdown, no new
    events can be scheduled

    During the upgrade process, we send nova containers a SIGHUP to cause
    them to reload their object version state. Speaking to the nova team in
    IRC, there is a known issue with this, caused by oslo.service performing
    a full shutdown in response to a SIGHUP, which breaks nova-compute.
    There is a patch [1] in review to address this.

    The workaround employed here is to restart the nova compute service.

    [1] https://review.openstack.org/#/c/641907

    Change-Id: Ia4fcc558a3f62ced2d629d7a22d0bc1eb6b879f1
    Closes-Bug: #1821362
    (cherry picked from commit 192dcd1e1b9baf7f3177a694c2b1ce8bd62d9159)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/647858

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/rocky)

Reviewed: https://review.openstack.org/647858
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=1e2df9e3dbd676af6209fd15e8f1f7eab24f58e1
Submitter: Zuul
Branch: stable/rocky

commit 1e2df9e3dbd676af6209fd15e8f1f7eab24f58e1
Author: Mark Goddard <email address hidden>
Date: Tue Mar 26 18:22:57 2019 +0000

    Don't send SIGHUP to placement-api

    Seen in Kayobe CI upgrade job, placement-api exits 129 when sent a
    SIGHUP. It doesn't use nova versioned objects, so doesn't need to be
    reloaded.

    This is a follow-up to Ia4fcc558a3f62ced2d629d7a22d0bc1eb6b879f1.

    Change-Id: I38eb8645ee254cc775671c43d3c0fbd5a7402512
    Related-bug: #1821362

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/pike)

Reviewed: https://review.openstack.org/647702
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=780e1e4d974f49cce0a3cd6de90405b9a1fe0309
Submitter: Zuul
Branch: stable/pike

commit 780e1e4d974f49cce0a3cd6de90405b9a1fe0309
Author: Mark Goddard <email address hidden>
Date: Fri Mar 22 14:59:41 2019 +0000

    Fix booting instances after nova-compute upgrade

    After upgrading from Rocky to Stein, nova-compute services fail to start
    new instances with the following error message:

    Failed to allocate the network(s), not rescheduling.

    Looking in the nova-compute logs, we also see this:

    Neutron Reported failure on event
    network-vif-plugged-60c05a0d-8758-44c9-81e4-754551567be5 for instance
    32c493c4-d88c-4f14-98db-c7af64bf3324: NovaException: In shutdown, no new
    events can be scheduled

    During the upgrade process, we send nova containers a SIGHUP to cause
    them to reload their object version state. Speaking to the nova team in
    IRC, there is a known issue with this, caused by oslo.service performing
    a full shutdown in response to a SIGHUP, which breaks nova-compute.
    There is a patch [1] in review to address this.

    The workaround employed here is to restart the nova compute service.

    This patch merges in https://review.openstack.org/647858 which was
    applied after the original patch in the rocky branch to avoid sending
    SIGHUP to placement-api, which does not like it.

    [1] https://review.openstack.org/#/c/641907

    Change-Id: Ia4fcc558a3f62ced2d629d7a22d0bc1eb6b879f1
    Closes-Bug: #1821362
    (cherry picked from commit 192dcd1e1b9baf7f3177a694c2b1ce8bd62d9159)

tags: added: in-stable-pike
tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/queens)

Reviewed: https://review.openstack.org/647701
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=177787d5e9a3701a3a5caa6ac5c683ee666409d7
Submitter: Zuul
Branch: stable/queens

commit 177787d5e9a3701a3a5caa6ac5c683ee666409d7
Author: Mark Goddard <email address hidden>
Date: Fri Mar 22 14:59:41 2019 +0000

    Fix booting instances after nova-compute upgrade

    After upgrading from Rocky to Stein, nova-compute services fail to start
    new instances with the following error message:

    Failed to allocate the network(s), not rescheduling.

    Looking in the nova-compute logs, we also see this:

    Neutron Reported failure on event
    network-vif-plugged-60c05a0d-8758-44c9-81e4-754551567be5 for instance
    32c493c4-d88c-4f14-98db-c7af64bf3324: NovaException: In shutdown, no new
    events can be scheduled

    During the upgrade process, we send nova containers a SIGHUP to cause
    them to reload their object version state. Speaking to the nova team in
    IRC, there is a known issue with this, caused by oslo.service performing
    a full shutdown in response to a SIGHUP, which breaks nova-compute.
    There is a patch [1] in review to address this.

    The workaround employed here is to restart the nova compute service.

    This patch merges in https://review.openstack.org/647858 which was
    applied after the original patch in the rocky branch to avoid sending
    SIGHUP to placement-api, which does not like it.

    [1] https://review.openstack.org/#/c/641907

    Change-Id: Ia4fcc558a3f62ced2d629d7a22d0bc1eb6b879f1
    Closes-Bug: #1821362
    (cherry picked from commit 192dcd1e1b9baf7f3177a694c2b1ce8bd62d9159)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 8.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 8.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 5.0.5

This issue was fixed in the openstack/kolla-ansible 5.0.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 6.2.0

This issue was fixed in the openstack/kolla-ansible 6.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 7.1.0

This issue was fixed in the openstack/kolla-ansible 7.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.