failed to boot guest with vnic_type direct when rx_queue_size and tx_queue_size are set

Bug #1789074 reported by Moshe Levi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Moshe Levi
Rocky
Fix Committed
High
Sahid Orentino

Bug Description

Description of problem:

Nova compute forces the virtio RX/TX Queue Size also on SRIOV devices.
This makes VM spawn to fail. The configurable RX/TX Queue Size code is similar all the way from OSP10 to OSP13, so it's possible the issue is present also on other releases.

Version-Release number of selected component (if applicable):
OSP13 z3

How reproducible:

(quick and dirty way)
Change nova config file

# crudini --set /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf libvirt rx_queue_size 1024
# crudini --set /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf libvirt tx_queue_size 1024

# restart nova_compute container
docker restart nova_compute

# boot a VM with an SRIOV (PF or VF) interface

Actual results:
Nova add on the sriov port section rx_queue_size

    <interface type="hostdev" managed="yes">
      <mac address="fa:16:3e:9d:f0:52"/>
      <driver name="vhost" rx_queue_size="1024"/>
      <source>
        <address type="pci" domain="0x0000" bus="0x01" slot="0x14" function="0x7"/>
      </source>
      <vlan>
        <tag id="435"/>
      </vlan>

Expected results:

    <interface type='hostdev' managed='yes'>
      <mac address='fa:16:3e:83:b2:84'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x01' slot='0x14' function='0x7'/>
      </source>
      <vlan>
        <tag id='435'/>
      </vlan>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </interface>

Additional info:

Tags: libvirt
Moshe Levi (moshele)
Changed in nova:
assignee: nobody → Moshe Levi (moshele)
Changed in nova:
status: New → In Progress
Revision history for this message
melanie witt (melwitt) wrote :

Patch is proposed here: https://review.openstack.org/595592

tags: added: libvirt
Revision history for this message
melanie witt (melwitt) wrote :

Previous discussion about this bug thought this is a regression in Rocky, but the bug report says it was seen in OSP13 (which is Queens). The bug report also says it could go back as far as Newton. Can anyone confirm how far back this will need to be backported?

Revision history for this message
Moshe Levi (moshele) wrote :

from the github I see that we need to backport to queens, but I may introduce in earlier openstack releases

Revision history for this message
Moshe Levi (moshele) wrote :

Sorry I meant backport to ocata not queens

Revision history for this message
melanie witt (melwitt) wrote :

Okay, on IRC today, stephenfin reminded me that upstream, the configurable RX/TX Queue Size code is new in Rocky, but that OSP backported it all the way back to Newton. That's why Moshe is seeing the feature code in OSP10-OSP13.

So, indeed upstream this bug is a regression in Rocky, and will have to be backported to stable/rocky.

Changed in nova:
importance: Undecided → High
Revision history for this message
Noam Angel (noama) wrote :

If the regression is also in OSP13, can you backport to queens aswell?

Revision history for this message
Moshe Levi (moshele) wrote : RE: [Bug 1789074] Re: failed to boot guest with vnic_type direct when rx_queue_size and tx_queue_size are set
Download full text (3.3 KiB)

Let first get it merged in master :)

> -----Original Message-----
> From: <email address hidden> <email address hidden> On Behalf Of
> Noam Angel
> Sent: Thursday, August 30, 2018 7:30 PM
> To: Moshe Levi <email address hidden>
> Subject: [Bug 1789074] Re: failed to boot guest with vnic_type direct when
> rx_queue_size and tx_queue_size are set
>
> If the regression is also in OSP13, can you backport to queens aswell?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbu
> gs.launchpad.net%2Fbugs%2F1789074&amp;data=02%7C01%7Cmoshele%40
> mellanox.com%7Cef098d3c842e4a582d9908d60e96b015%7Ca652971c7d2e4d
> 9ba6a4d149256f461b%7C0%7C0%7C636712437707120011&amp;sdata=Ch9PXl
> 4S8FPNg1k6GnwO10DPy%2BU4SfwC60zzqe5Oilg%3D&amp;reserved=0
>
> Title:
> failed to boot guest with vnic_type direct when rx_queue_size and
> tx_queue_size are set
>
> Status in OpenStack Compute (nova):
> In Progress
> Status in OpenStack Compute (nova) rocky series:
> New
>
> Bug description:
> Description of problem:
>
> Nova compute forces the virtio RX/TX Queue Size also on SRIOV devices.
> This makes VM spawn to fail. The configurable RX/TX Queue Size code is
> similar all the way from OSP10 to OSP13, so it's possible the issue is present
> also on other releases.
>
> Version-Release number of selected component (if applicable):
> OSP13 z3
>
> How reproducible:
>
> (quick and dirty way)
> Change nova config file
>
> # crudini --set /var/lib/config-data/puppet-
> generated/nova_libvirt/etc/nova/nova.conf libvirt rx_queue_size 1024
> # crudini --set /var/lib/config-data/puppet-
> generated/nova_libvirt/etc/nova/nova.conf libvirt tx_queue_size 1024
>
> # restart nova_compute container
> docker restart nova_compute
>
> # boot a VM with an SRIOV (PF or VF) interface
>
> Actual results:
> Nova add on the sriov port section rx_queue_size
>
> <interface type="hostdev" managed="yes">
> <mac address="fa:16:3e:9d:f0:52"/>
> <driver name="vhost" rx_queue_size="1024"/>
> <source>
> <address type="pci" domain="0x0000" bus="0x01" slot="0x14"
> function="0x7"/>
> </source>
> <vlan>
> <tag id="435"/>
> </vlan>
>
> Expected results:
>
> <interface type='hostdev' managed='yes'>
> <mac address='fa:16:3e:83:b2:84'/>
> <driver name='vfio'/>
> <source>
> <address type='pci' domain='0x0000' bus='0x01' slot='0x14'
> function='0x7'/>
> </source>
> <vlan>
> <tag id='435'/>
> </vlan>
> <alias name='hostdev0'/>
> <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
> function='0x0'/>
> </interface>
>
> Additional info:
>
> To manage notifications about this bug go to:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbu
> gs.launchpad.net%2Fnova%2F%2Bbug%2F1789074%2F%2Bsubscriptions&am
> p;data=02%7C01%7Cmoshele%40mellanox.com%7Cef098d3c842e4a582d990
>...

Read more...

Revision history for this message
melanie witt (melwitt) wrote :

Noam, this bug is for tracking upstream only. The OSP teams will take care of backporting fixes to OSP, separately, after this fix is reviewed and merged on master.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/595592
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=622ebf2fab0a9bf75ee12437bef28f60e083f849
Submitter: Zuul
Branch: master

commit 622ebf2fab0a9bf75ee12437bef28f60e083f849
Author: Moshe Levi <email address hidden>
Date: Thu Aug 23 14:25:04 2018 +0300

    libvirt: skip setting rx/tx queue sizes for not virto interfaces

    It seem that if driver name is None nova try to set
    rx_queue_size tx_queue_size config. (they define the
    virtio-net rx/tx queue sizes). Direct/Physical Direct
    vnic_types are not vritio so this kind of config is
    invalid and causing booting guest to failed. To avoid
    issues we skip such configuration these vnic_types

    Closes-Bug: #1789074

    Change-Id: I45532896690ad9505f2b09c98d8d86b61bcfef2b

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/599506

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/599506
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e7ac7a1e03b9d703c42a3eea45594ab5980e4b7f
Submitter: Zuul
Branch: stable/rocky

commit e7ac7a1e03b9d703c42a3eea45594ab5980e4b7f
Author: Moshe Levi <email address hidden>
Date: Thu Aug 23 14:25:04 2018 +0300

    libvirt: skip setting rx/tx queue sizes for not virto interfaces

    It seem that if driver name is None nova try to set
    rx_queue_size tx_queue_size config. (they define the
    virtio-net rx/tx queue sizes). Direct/Physical Direct
    vnic_types are not vritio so this kind of config is
    invalid and causing booting guest to failed. To avoid
    issues we skip such configuration these vnic_types

    Closes-Bug: #1789074

    Change-Id: I45532896690ad9505f2b09c98d8d86b61bcfef2b
    (cherry picked from commit 622ebf2fab0a9bf75ee12437bef28f60e083f849)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.1

This issue was fixed in the openstack/nova 18.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.