[SRU] Cannot create 1vcpu instance with multiqueue image, vif_type=tap (calico)

Bug #1939604 reported by Rodrigo Barbieri
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Rodrigo Barbieri
Ussuri
Fix Released
Undecided
Unassigned
Victoria
Fix Released
Undecided
Unassigned
Wallaby
Fix Released
Undecided
Unassigned
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Ussuri
Fix Released
Medium
Unassigned
Victoria
Fix Released
Medium
Unassigned
Wallaby
Fix Released
Medium
Unassigned
nova (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Medium
Unassigned
Hirsute
Fix Released
Medium
Unassigned

Bug Description

Tested on stable/wallaby

Fix for bug #1893263, in which it enabled vif_type=tap (calico use case) devices to support multiqueue in nova, also caused a regression where now when creating the instances with multiqueue, if using a flavor with only VCPU, it fails with the error below in the logs.

This problem can easily be avoided by not using 1VCPUs flavors with multiqueue images (because they wouldn't make sense anyway), and therefore using non-multiqueue images when the flavor is 1VCPU, but provides a bad user experience: Users shouldn't need to be concerned about flavor+image combinations

Steps to reproduce are the same as #1893263 but using a 1VCPU flavor + multiqueue metadata on images

21-08-11 17:36:44.317 376565 ERROR nova.compute.manager [req-99e80890-6c99-4015-91b6-ef99e6be3fa7 ea7dfe225d48428c860321498e184739 8833157a5d244727a74017e5f8729312 - 0373963ccb0042da8306b35775521d60 0373963ccb0042da8306b35775521d60] [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] Failed to build and run instance: libvirt.libvirtError: Unable to create tap device tap73a105b8-82: Invalid argument
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] Traceback (most recent call last):
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2366, in _build_and_run_instance
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] self.driver.spawn(context, instance, image_meta,
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3885, in spawn
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] self._create_guest_with_network(
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6961, in _create_guest_with_network
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] self._cleanup_failed_start(
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] self.force_reraise()
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] raise self.value
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6930, in _create_guest_with_network
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] guest = self._create_guest(
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6863, in _create_guest
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] guest.launch(pause=pause)
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 158, in launch
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] LOG.exception('Error launching a defined domain with XML: %s',
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] self.force_reraise()
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] raise self.value
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 155, in launch
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] return self._domain.createWithFlags(flags)
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 193, in doit
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] result = proxy_call(self._autowrap, f, *args, **kwargs)
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 151, in proxy_call
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] rv = execute(f, *args, **kwargs)
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 132, in execute
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] six.reraise(c, e, tb)
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] raise value
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 86, in tworker
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] rv = meth(*args, **kwargs)
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] File "/usr/lib/python3/dist-packages/libvirt.py", line 1265, in createWithFlags
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b] libvirt.libvirtError: Unable to create tap device tap73a105b8-82: Invalid argument
2021-08-11 17:36:44.317 376565 ERROR nova.compute.manager [instance: 505b68b4-498c-4ea9-85ce-8be0c305ec4b]

=====================================================================================================

SRU template below:

[Impact]

Impact only for very specific use case of calico plugin + 1vcpu flavor + multiqueue-enabled images. Workaround can avoid the issue but user experience becomes suboptimal.

[Test case]

1. Setting up env
1a. Deploy environment
1b. Install calico plugin as per [0]
1c. Setup SSH

ssh-keygen

1d. Create keypair for testing

openstack keypair create key1 --public-key ~/.ssh/id_rsa.pub

1e. Create test flavors

openstack flavor create --vcpu 1 --ram 1024 --disk 10 --public --id 10 test_flavor1

openstack flavor create --vcpu 2 --ram 1024 --disk 10 --public --id 10 test_flavor2

1f. Download an example image

wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img

1g. Create image in glance with multiqueue metadata

openstack image create bionic-mq --file bionic-server-cloudimg-amd64.img --property hw_vif_multiqueue_enabled=True

1h. Create instance with multiqueue + test_flavor2. Make sure instance creation and connectivity succeeds.

openstack server create --network calico --flavor test_flavor2 --image bionic-mq --key-name key1 mq2vcpu

2. Reproducing the bug

2a. Create instance with multiqueue + test_flavor1

openstack server create --network calico --flavor test_flavor1 --image bionic-mq --key-name key1 mq1vcpu

Instance creation will fail

2b. Check logs for error

egrep "libvirt.libvirtError: Unable to create tap device .*: Invalid argument" /var/log/nova/nova-compute.log

3. Cleanup

3a. Delete instances "mq2vcpu" and "mq1vcpu"

4. Install package that contains the fixed code

5. Repeat step 2a. 2a should now succeed.

[Regression Potential]

Looking at the code which is just 1 line, the behavior previous to fix of #1893263 is restored in case vcpus=1. The only change in behavior introduced by this change is when vcpus=1 and affect only calico users.

[Other Info]

None

[0] https://docs.projectcalico.org/getting-started/openstack/installation/

Related branches

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

I set it to medium importance because of bad user experience

description: updated
description: updated
Changed in nova:
assignee: nobody → Rodrigo Barbieri (rodrigo-barbieri2010)
importance: Undecided → Medium
summary: - Cannot create instance with multiqueue image, vif_type=tap (calico) and
- 1vcpu flavor
+ Cannot create 1vcpu instance with multiqueue image, vif_type=tap
+ (calico)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/804303

Changed in nova:
status: New → In Progress
tags: added: sts
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/804303
Committed: https://opendev.org/openstack/nova/commit/7fc6fe6fae891eae42b36ccb9d69cd0f6d6db21d
Submitter: "Zuul (22348)"
Branch: master

commit 7fc6fe6fae891eae42b36ccb9d69cd0f6d6db21d
Author: Rodrigo Barbieri <email address hidden>
Date: Wed Aug 11 16:03:58 2021 -0300

    Fix 1vcpu error with multiqueue and vif_type=tap

    Fix for bug #1893263 introduced a regression where
    1 vcpu instances would fail to build when paired with
    multiqueue-enabled images, in the scenario vif_type=tap.

    Solution is to not pass multiqueue parameter when
    instances.get_flavor().vcpus = 1.

    Closes-bug: #1939604
    Change-Id: Iaccf2eeeb6e8bb80c658f51ce9ab4e8eb4093a55

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/805304

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/805304
Committed: https://opendev.org/openstack/nova/commit/aa5b8d12bcacc01e5f9be45cc1eef24ac9efd2fc
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit aa5b8d12bcacc01e5f9be45cc1eef24ac9efd2fc
Author: Rodrigo Barbieri <email address hidden>
Date: Wed Aug 11 16:03:58 2021 -0300

    Fix 1vcpu error with multiqueue and vif_type=tap

    Fix for bug #1893263 introduced a regression where
    1 vcpu instances would fail to build when paired with
    multiqueue-enabled images, in the scenario vif_type=tap.

    Solution is to not pass multiqueue parameter when
    instances.get_flavor().vcpus = 1.

    Conflicts:
        nova/tests/unit/virt/libvirt/test_vif.py

    NOTE: Conflicts are because commit
    9d037f7d199443da0f2c6c1755704e589d52e730
    is not in the tree for this branch.

    Closes-bug: #1939604
    Change-Id: Iaccf2eeeb6e8bb80c658f51ce9ab4e8eb4093a55
    (cherry picked from commit 7fc6fe6fae891eae42b36ccb9d69cd0f6d6db21d)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/806004

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/nova/+/806004
Committed: https://opendev.org/openstack/nova/commit/aaa56240b0311ad47ccccc3b7850ddc5b0a21702
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit aaa56240b0311ad47ccccc3b7850ddc5b0a21702
Author: Rodrigo Barbieri <email address hidden>
Date: Wed Aug 11 16:03:58 2021 -0300

    Fix 1vcpu error with multiqueue and vif_type=tap

    Fix for bug #1893263 introduced a regression where
    1 vcpu instances would fail to build when paired with
    multiqueue-enabled images, in the scenario vif_type=tap.

    Solution is to not pass multiqueue parameter when
    instances.get_flavor().vcpus = 1.

    Closes-bug: #1939604
    Change-Id: Iaccf2eeeb6e8bb80c658f51ce9ab4e8eb4093a55
    (cherry picked from commit 7fc6fe6fae891eae42b36ccb9d69cd0f6d6db21d)
    (cherry picked from commit aa5b8d12bcacc01e5f9be45cc1eef24ac9efd2fc)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/806811

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/nova/+/806811
Committed: https://opendev.org/openstack/nova/commit/fa0ad18619bfc1d56afdc7aa61729a1098ef651a
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit fa0ad18619bfc1d56afdc7aa61729a1098ef651a
Author: Rodrigo Barbieri <email address hidden>
Date: Wed Aug 11 16:03:58 2021 -0300

    Fix 1vcpu error with multiqueue and vif_type=tap

    Fix for bug #1893263 introduced a regression where
    1 vcpu instances would fail to build when paired with
    multiqueue-enabled images, in the scenario vif_type=tap.

    Solution is to not pass multiqueue parameter when
    instances.get_flavor().vcpus = 1.

    Closes-bug: #1939604
    Change-Id: Iaccf2eeeb6e8bb80c658f51ce9ab4e8eb4093a55
    (cherry picked from commit 7fc6fe6fae891eae42b36ccb9d69cd0f6d6db21d)
    (cherry picked from commit aa5b8d12bcacc01e5f9be45cc1eef24ac9efd2fc)
    (cherry picked from commit aaa56240b0311ad47ccccc3b7850ddc5b0a21702)

tags: added: in-stable-ussuri
description: updated
summary: - Cannot create 1vcpu instance with multiqueue image, vif_type=tap
+ [SRU] Cannot create 1vcpu instance with multiqueue image, vif_type=tap
(calico)
Revision history for this message
Corey Bryant (corey.bryant) wrote :

This is fix released for impish/xena.

Changed in nova (Ubuntu):
status: New → Fix Released
Changed in cloud-archive:
status: New → Fix Released
Changed in nova (Ubuntu Focal):
status: New → Triaged
Changed in nova (Ubuntu Hirsute):
status: New → Triaged
importance: Undecided → Medium
Changed in nova (Ubuntu Focal):
importance: Undecided → Medium
Revision history for this message
Corey Bryant (corey.bryant) wrote :

This is not yet fixed in the latest nova stable point releases. Rodrigo, would it be okay if we pick this fix up in the next round of upstream stable point releases from nova?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.0.0.0rc1

This issue was fixed in the openstack/nova 24.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 22.3.0

This issue was fixed in the openstack/nova 22.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.1.0

This issue was fixed in the openstack/nova 23.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 21.2.3

This issue was fixed in the openstack/nova 21.2.3 release.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Currently is proposed for focal-updates, victoria and wallaby

Changed in nova (Ubuntu Focal):
status: Triaged → Fix Released
Changed in nova (Ubuntu Hirsute):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.