libvirt: Snapshot and resume does not work for instances with some SR-IOV ports

Bug #1563874 reported by Nikola Đipanov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Stephen Finucane

Bug Description

libvirt driver methods that are used for determining whether a port is an SR-IOV port do not check properly for all possible SR-IOV port types:

https://github.com/openstack/nova/blob/f15d9a9693b19393fcde84cf4bc6f044d39ffdca/nova/virt/libvirt/driver.py#L3378

should be checking for VNIC_TYPES_SRIOV instead.

This affects snapshot and suspend/resume functionality provided by the libvirt driver, for instances using non-direct flavors of SR-IOV

Tags: libvirt pci sriov
tags: added: libvirt pci
Revision history for this message
Moshe Levi (moshele) wrote :

currently we have 2 SR-IOV
direct port which is direct attachment of the VF to the guest
<interface type='hostdev' managed='yes'>
  <mac address='fa:16:3e:0f:bb:1f'/>
  <driver name='kvm'/>
  <source>
    <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x7'/>
  </source>
  <vlan>
    <tag id='140'/>
  </vlan>
  <alias name='hostdev0'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</interface>
and macvtap which is a tap device connected to VF
<interface type='direct'>
  <mac address='fa:16:3e:5c:6b:21'/>
  <source dev='p1p7' mode='passthrough'/>
  <target dev='macvtap0'/>
  <model type='virtio'/>
  <driver name='vhost'/>
  <alias name='net0'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
for macvtap the guest don't see the pci device so we don't need to detach pci on suspend

I think we just need check if it a direct port as it was before https://review.openstack.org/#/c/262341/7/nova/virt/libvirt/driver.py
instead of https://github.com/openstack/nova/blob/f15d9a9693b19393fcde84cf4bc6f044d39ffdca/nova/virt/libvirt/driver.py#L3423

Changed in nova:
assignee: nobody → Moshe Levi (moshele)
Revision history for this message
Moshe Levi (moshele) wrote :
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
summary: - libvirt: Snapshot and resume wont' work for instances with some SR-IOV
- ports
+ libvirt: Snapshot and resume does not work for instances with some SR-
+ IOV ports
tags: added: sriov
Revision history for this message
Matt Riedemann (mriedem) wrote :

I think we should move ahead with enabling the mellanox macvtap CI on nova changes (same sub-set of the nova tree that the direct sriov job tests today). I'd like to see us expand the coverage and fix this bug for at least macvtap and see it passing that job (which already runs a suspend/resume test, but doesn't test snapshots yet).

Revision history for this message
Matt Riedemann (mriedem) wrote :

Correction on comment 3, the mellanox vnic_type=direct CI only runs that on the network scenario tests, which don't include suspend/resume/snapshot.

Revision history for this message
Moshe Levi (moshele) wrote :

Matt,
 the network scenario has test for suspend/resume https://github.com/openstack/tempest/blob/master/tempest/scenario/test_server_advanced_ops.py#L75 but not for snapshot

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/300890

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
assignee: Moshe Levi (moshele) → Stephen Finucane (stephenfinucane)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/300890
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b691125b621c781c89f15b421215ebad5085c4d0
Submitter: Jenkins
Branch: master

commit b691125b621c781c89f15b421215ebad5085c4d0
Author: Moshe Levi <email address hidden>
Date: Sun Apr 3 19:11:07 2016 +0300

    pci: Clarify SR-IOV ports vs direct passthrough ports

    This patch clarify for what type of ports
    detach/attach pci device is needed and for what type of
    port we just need pci request. To avoid confusion this
    patch introduce 2 type of ports list VNIC_TYPES_SRIOV
    and VNIC_TYPES_DIRECT_PASSTHROUGH. The VNIC_TYPES_SRIOV
    are ports which require pci request, while
    VNIC_TYPES_DIRECT_PASSTHROUGH ports require pci device
    attach/detach from libvirt dom. VNIC_TYPES_DIRECT_PASSTHROUGH
    is subset of VNIC_TYPES_SRIOV.

    Closes-Bug: #1563874

    Change-Id: I3a45b1fb41e8e446d1f25d7a1d77991c8bf2a1ed

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0b3

This issue was fixed in the openstack/nova 15.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/841017

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/841017
Committed: https://opendev.org/openstack/nova/commit/51a970af37094f84b7f2ae321b8f74a570609eb4
Submitter: "Zuul (22348)"
Branch: master

commit 51a970af37094f84b7f2ae321b8f74a570609eb4
Author: Sean Mooney <email address hidden>
Date: Sat May 7 21:36:17 2022 +0300

    Fix suspend for non hostdev sriov ports

    change I3a45b1fb41e8e446d1f25d7a1d77991c8bf2a1ed
    tried to fix bug #1563874 by using _detach_pci_device
    to remove hostdev pci devices however that breaks
    other usecase so we attempt to fix that by only
    calling _detach_pci_device for devices it can
    handle and use detach_interface for the rest.

    Related-bug: #1563874
    Related-bug: #1970467
    Change-Id: I351d58d6922ca169b641500c12ffd6f91829df90

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.