Bug #1946729 “libvirt virt driver does not wait for network-vif-...” : Bugs : OpenStack Compute (nova)

OpenStack Infra (hudson-openstack) on 2021-10-12

Changed in nova:
status:	New → In Progress

sean mooney (sean-k-mooney) on 2021-10-12

Changed in nova:
importance:	Undecided → Medium
assignee:	nobody → Balazs Gibizer (balazs-gibizer)

Balazs Gibizer (balazs-gibizer) on 2021-10-12

tags:

added: compute libvirt reboot

Revision history for this message

Vivekanandan Narasimhan (vivekanandan-narasimhan) wrote on 2021-10-13:

#1

First thanks a lot for raising this bug.

We kindly request it would be great to have both vif_plugged and vif_unplugged handshakes between the nova and the networking-backend, thereby it will enable more collaboration enabling easier troubleshooting of which part of VM start activity failed during mass start of VMs on a typical batch of 50 compute-hosts.

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2021-10-13:

#2

@Vivek: Right now nova does not know if the networking backend sends plug and unplug (like ovs) sends only plug (like networking-odl, see[1]) or does not send any plug time events at all (I think like ovn).

The currently proposed temporary workaround fix[2] adds a conditional wait for plug. I assume if nova waits for the plug then nova can be sure that both unplug and plug happened in the backend as nova issued the vif.unplug _before_ the vif.plug.

Therefore I don't think it worth the additional config flag for unplug and the resulting complexity to wait for both events conditionally.

What happens in you case with the currently proposed fix[2] applied and the wait_for_vif_plugged_event_during_hard_reboot config flag is set to True?

[1] https://github.com/openstack/networking-odl/blob/master/networking_odl
/ml2/port_status_update.py#L89-L90

[2] https://review.opendev.org/c/openstack/nova/+/813419

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-18: Fix merged to nova (master)

#3

Reviewed: https://review.opendev.org/c/openstack/nova/+/813419
Committed: https://opendev.org/openstack/nova/commit/68c970ea9915a95f9828239006559b84e4ba2581
Submitter: "Zuul (22348)"
Branch: master

commit 68c970ea9915a95f9828239006559b84e4ba2581
Author: Balazs Gibizer <email address hidden>
Date: Mon Oct 11 14:41:37 2021 +0200

Add a WA flag waiting for vif-plugged event during reboot

    The libvirt driver power on and hard reboot destroys the domain first
    and unplugs the vifs then recreate the domain and replug the vifs.
    However nova does not wait for the network-vif-plugged event before
    unpause the domain. This can cause that the domain starts running and
    requesting IP via DHCP before the networking backend finished plugging
    the vifs.

    So this patch adds a workaround config option to nova to wait for
    network-vif-plugged events during hard reboot the same way as nova waits
    for this event during new instance spawn.

    This logic cannot be enabled unconditionally as not all neutron
    networking backend sending plug time events to wait for. Also the logic
    needs to be vnic_type dependent as ml2/ovs and the in tree sriov backend
    often deployed together on the same compute. While ml2/ovs sends plug
    time event the sriov backend does not send it reliably. So the
    configuration is not just a boolean flag but a list of vnic_types
    instead. This way the waiting for the plug time event for a vif that is
    handled by ml2/ovs is possible while the instance has other vifs handled
    by the sriov backend where no event can be expected.

Change-Id: Ie904d1513b5cf76d6d5f6877545e8eb378dd5499
Closes-Bug: #1946729

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-19: Fix proposed to nova (stable/xena)

#4

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/818515

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-19: Fix proposed to nova (stable/wallaby)

#5

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/818519

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-19: Fix proposed to nova (stable/victoria)

#6

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/818559

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-19: Fix proposed to nova (stable/ussuri)

#7

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/818564

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-19: Fix proposed to nova (stable/train)

#8

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/818598

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-19: Fix proposed to nova (stable/stein)

#9

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/c/openstack/nova/+/818601

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-19: Fix proposed to nova (stable/rocky)

#10

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/c/openstack/nova/+/818604

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-19: Fix proposed to nova (stable/queens)

#11

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/c/openstack/nova/+/818605

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-22: Fix merged to nova (stable/xena)

#12

Reviewed: https://review.opendev.org/c/openstack/nova/+/818515
Committed: https://opendev.org/openstack/nova/commit/0c41bfb8c5c60f1cc930ae432e6be460ee2e97ac
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 0c41bfb8c5c60f1cc930ae432e6be460ee2e97ac
Author: Balazs Gibizer <email address hidden>
Date: Mon Oct 11 14:41:37 2021 +0200

Add a WA flag waiting for vif-plugged event during reboot

    The libvirt driver power on and hard reboot destroys the domain first
    and unplugs the vifs then recreate the domain and replug the vifs.
    However nova does not wait for the network-vif-plugged event before
    unpause the domain. This can cause that the domain starts running and
    requesting IP via DHCP before the networking backend finished plugging
    the vifs.

    So this patch adds a workaround config option to nova to wait for
    network-vif-plugged events during hard reboot the same way as nova waits
    for this event during new instance spawn.

    This logic cannot be enabled unconditionally as not all neutron
    networking backend sending plug time events to wait for. Also the logic
    needs to be vnic_type dependent as ml2/ovs and the in tree sriov backend
    often deployed together on the same compute. While ml2/ovs sends plug
    time event the sriov backend does not send it reliably. So the
    configuration is not just a boolean flag but a list of vnic_types
    instead. This way the waiting for the plug time event for a vif that is
    handled by ml2/ovs is possible while the instance has other vifs handled
    by the sriov backend where no event can be expected.

    Change-Id: Ie904d1513b5cf76d6d5f6877545e8eb378dd5499
    Closes-Bug: #1946729
    (cherry picked from commit 68c970ea9915a95f9828239006559b84e4ba2581)

tags:

added: in-stable-xena

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-26: Fix merged to nova (stable/wallaby)

#13

Reviewed: https://review.opendev.org/c/openstack/nova/+/818519
Committed: https://opendev.org/openstack/nova/commit/89c4ff5f7b45f1a5bed8b6b9b4586fceaa391bfb
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 89c4ff5f7b45f1a5bed8b6b9b4586fceaa391bfb
Author: Balazs Gibizer <email address hidden>
Date: Mon Oct 11 14:41:37 2021 +0200

Add a WA flag waiting for vif-plugged event during reboot

    The libvirt driver power on and hard reboot destroys the domain first
    and unplugs the vifs then recreate the domain and replug the vifs.
    However nova does not wait for the network-vif-plugged event before
    unpause the domain. This can cause that the domain starts running and
    requesting IP via DHCP before the networking backend finished plugging
    the vifs.

    So this patch adds a workaround config option to nova to wait for
    network-vif-plugged events during hard reboot the same way as nova waits
    for this event during new instance spawn.

    This logic cannot be enabled unconditionally as not all neutron
    networking backend sending plug time events to wait for. Also the logic
    needs to be vnic_type dependent as ml2/ovs and the in tree sriov backend
    often deployed together on the same compute. While ml2/ovs sends plug
    time event the sriov backend does not send it reliably. So the
    configuration is not just a boolean flag but a list of vnic_types
    instead. This way the waiting for the plug time event for a vif that is
    handled by ml2/ovs is possible while the instance has other vifs handled
    by the sriov backend where no event can be expected.

    Conflicts:
          nova/conf/workarounds.py due to
          I2da867f2734b590a884b1fe1200c402cbf7e9e1c is not in stable/wallaby

    Change-Id: Ie904d1513b5cf76d6d5f6877545e8eb378dd5499
    Closes-Bug: #1946729
    (cherry picked from commit 68c970ea9915a95f9828239006559b84e4ba2581)
    (cherry picked from commit 0c41bfb8c5c60f1cc930ae432e6be460ee2e97ac)

Reviewed:  https://review.opendev.org/c/openstack/nova/+/818519
Committed: https://opendev.org/openstack/nova/commit/89c4ff5f7b45f1a5bed8b6b9b4586fceaa391bfb
Submitter: "Zuul (22348)"
Branch:    stable/wallaby

commit 89c4ff5f7b45f1a5bed8b6b9b4586fceaa391bfb
Author: Balazs Gibizer <balazs.gibizer@est.tech>
Date:   Mon Oct 11 14:41:37 2021 +0200

Add a WA flag waiting for vif-plugged event during reboot
    
    The libvirt driver power on and hard reboot destroys the domain first
    and unplugs the vifs then recreate the domain and replug the vifs.
    However nova does not wait for the network-vif-plugged event before
    unpause the domain. This can cause that the domain starts running and
    requesting IP via DHCP before the networking backend finished plugging
    the vifs.
    
    So this patch adds a workaround config option to nova to wait for
    network-vif-plugged events during hard reboot the same way as nova waits
    for this event during new instance spawn.
    
    This logic cannot be enabled unconditionally as not all neutron
    networking backend sending plug time events to wait for. Also the logic
    needs to be vnic_type dependent as ml2/ovs and the in tree sriov backend
    often deployed together on the same compute. While ml2/ovs sends plug
    time event the sriov backend does not send it reliably. So the
    configuration is not just a boolean flag but a list of vnic_types
    instead. This way the waiting for the plug time event for a vif that is
    handled by ml2/ovs is possible while the instance has other vifs handled
    by the sriov backend where no event can be expected.
    
    Conflicts:
          nova/conf/workarounds.py due to
          I2da867f2734b590a884b1fe1200c402cbf7e9e1c is not in stable/wallaby
    
    Change-Id: Ie904d1513b5cf76d6d5f6877545e8eb378dd5499
    Closes-Bug: #1946729
    (cherry picked from commit 68c970ea9915a95f9828239006559b84e4ba2581)
    (cherry picked from commit 0c41bfb8c5c60f1cc930ae432e6be460ee2e97ac)

tags:

added: in-stable-wallaby

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-02-11: Fix merged to nova (stable/victoria)

#14

Reviewed: https://review.opendev.org/c/openstack/nova/+/818559
Committed: https://opendev.org/openstack/nova/commit/c531fdcc192afb5af628ac567cb0ff8aa3eab052
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit c531fdcc192afb5af628ac567cb0ff8aa3eab052
Author: Balazs Gibizer <email address hidden>
Date: Mon Oct 11 14:41:37 2021 +0200

Add a WA flag waiting for vif-plugged event during reboot

    The libvirt driver power on and hard reboot destroys the domain first
    and unplugs the vifs then recreate the domain and replug the vifs.
    However nova does not wait for the network-vif-plugged event before
    unpause the domain. This can cause that the domain starts running and
    requesting IP via DHCP before the networking backend finished plugging
    the vifs.

    So this patch adds a workaround config option to nova to wait for
    network-vif-plugged events during hard reboot the same way as nova waits
    for this event during new instance spawn.

    This logic cannot be enabled unconditionally as not all neutron
    networking backend sending plug time events to wait for. Also the logic
    needs to be vnic_type dependent as ml2/ovs and the in tree sriov backend
    often deployed together on the same compute. While ml2/ovs sends plug
    time event the sriov backend does not send it reliably. So the
    configuration is not just a boolean flag but a list of vnic_types
    instead. This way the waiting for the plug time event for a vif that is
    handled by ml2/ovs is possible while the instance has other vifs handled
    by the sriov backend where no event can be expected.

    Conflicts:
          nova/virt/libvirt/driver.py both
          I73305e82da5d8da548961b801a8e75fb0e8c4cf1 and
          I0b93bdc12cdce591c7e642ab8830e92445467b9a are not in
          stable/victoria

The stable/victoria specific changes:

    * The list of accepted vnic_type-s are adapted to what is supported by
      neutron on this release. So vdpa, accelerator-direct, and
      accelerator-direct-physical are removed as they are only added in
      stable/wallaby

    Change-Id: Ie904d1513b5cf76d6d5f6877545e8eb378dd5499
    Closes-Bug: #1946729
    (cherry picked from commit 68c970ea9915a95f9828239006559b84e4ba2581)
    (cherry picked from commit 0c41bfb8c5c60f1cc930ae432e6be460ee2e97ac)
    (cherry picked from commit 89c4ff5f7b45f1a5bed8b6b9b4586fceaa391bfb)

Reviewed:  https://review.opendev.org/c/openstack/nova/+/818559
Committed: https://opendev.org/openstack/nova/commit/c531fdcc192afb5af628ac567cb0ff8aa3eab052
Submitter: "Zuul (22348)"
Branch:    stable/victoria

commit c531fdcc192afb5af628ac567cb0ff8aa3eab052
Author: Balazs Gibizer <balazs.gibizer@est.tech>
Date:   Mon Oct 11 14:41:37 2021 +0200

Add a WA flag waiting for vif-plugged event during reboot
    
    The libvirt driver power on and hard reboot destroys the domain first
    and unplugs the vifs then recreate the domain and replug the vifs.
    However nova does not wait for the network-vif-plugged event before
    unpause the domain. This can cause that the domain starts running and
    requesting IP via DHCP before the networking backend finished plugging
    the vifs.
    
    So this patch adds a workaround config option to nova to wait for
    network-vif-plugged events during hard reboot the same way as nova waits
    for this event during new instance spawn.
    
    This logic cannot be enabled unconditionally as not all neutron
    networking backend sending plug time events to wait for. Also the logic
    needs to be vnic_type dependent as ml2/ovs and the in tree sriov backend
    often deployed together on the same compute. While ml2/ovs sends plug
    time event the sriov backend does not send it reliably. So the
    configuration is not just a boolean flag but a list of vnic_types
    instead. This way the waiting for the plug time event for a vif that is
    handled by ml2/ovs is possible while the instance has other vifs handled
    by the sriov backend where no event can be expected.
    
    Conflicts:
          nova/virt/libvirt/driver.py both
          I73305e82da5d8da548961b801a8e75fb0e8c4cf1 and
          I0b93bdc12cdce591c7e642ab8830e92445467b9a are not in
          stable/victoria
    
    The stable/victoria specific changes:
    
    * The list of accepted vnic_type-s are adapted to what is supported by
      neutron on this release. So vdpa, accelerator-direct, and
      accelerator-direct-physical are removed as they are only added in
      stable/wallaby
    
    Change-Id: Ie904d1513b5cf76d6d5f6877545e8eb378dd5499
    Closes-Bug: #1946729
    (cherry picked from commit 68c970ea9915a95f9828239006559b84e4ba2581)
    (cherry picked from commit 0c41bfb8c5c60f1cc930ae432e6be460ee2e97ac)
    (cherry picked from commit 89c4ff5f7b45f1a5bed8b6b9b4586fceaa391bfb)

tags:

added: in-stable-victoria

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-03-10: Fix included in openstack/nova 22.4.0

#15

This issue was fixed in the openstack/nova 22.4.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-03-10: Fix included in openstack/nova 23.2.0

#16

This issue was fixed in the openstack/nova 23.2.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-03-10: Fix included in openstack/nova 24.1.0

#17

This issue was fixed in the openstack/nova 24.1.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-03-11: Fix included in openstack/nova 25.0.0.0rc1

#18

This issue was fixed in the openstack/nova 25.0.0.0rc1 release candidate.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-08-01: Change abandoned on nova (stable/pike)

#19

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/pike
Review: https://review.opendev.org/c/openstack/nova/+/813437
Reason: stable/pike has transitioned to End of Life for nova, open patches need to be abandoned in order to be able to delete the branch.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-11: Change abandoned on nova (stable/queens)

#20

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/queens
Review: https://review.opendev.org/c/openstack/nova/+/818605
Reason: This branch transitioned to End of Life for this project, open patches needs to be closed to be able to delete the branch.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-11: Change abandoned on nova (stable/rocky)

#21

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/rocky
Review: https://review.opendev.org/c/openstack/nova/+/818604
Reason: This branch transitioned to End of Life for this project, open patches needs to be closed to be able to delete the branch.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-11: Change abandoned on nova (stable/stein)

#22

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/stein
Review: https://review.opendev.org/c/openstack/nova/+/818601
Reason: This branch transitioned to End of Life for this project, open patches needs to be closed to be able to delete the branch.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-09-01: Change abandoned on nova (stable/train)

#23

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/818598
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-07: Change abandoned on nova (stable/ussuri)

#24

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/818564
Reason: stable/ussuri branch of openstack/nova transitioned to End of Life and is about to be deleted. To be able to do that, all open patches need to be abandoned.

OpenStack Compute (nova)

libvirt virt driver does not wait for network-vif-plugged event during hard reboot

Bug Description

Other bug subscribers

Remote bug watches