Virtual Interface creation failed due to duplicate entry

Bug #1821088 reported by Zane Bitter
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Stephen Finucane
Train
Won't Fix
Undecided
Unassigned
Ussuri
Won't Fix
Undecided
Unassigned
Victoria
Won't Fix
Undecided
Unassigned
Wallaby
Won't Fix
Undecided
Unassigned
Xena
Fix Released
Undecided
Unassigned
Ubuntu Cloud Archive
New
Undecided
Unassigned
Ussuri
New
Undecided
Unassigned
Victoria
New
Undecided
Unassigned
Wallaby
New
Undecided
Unassigned
Xena
New
Undecided
Unassigned
Yoga
New
Undecided
Unassigned

Bug Description

Seen once in a test on stable/rocky:

http://logs.openstack.org/48/638348/1/gate/heat-functional-convg-mysql-lbaasv2-py35/9d70590/logs/screen-n-api.txt.gz?level=ERROR

The traceback appears to be similar to the one reported in bug 1602357 (which raises the possibility that https://bugs.launchpad.net/nova/+bug/1602357/comments/8 is relevant here):

ERROR nova.api.openstack.wsgi [None req-e05ce059-71c4-437d-91e0-e4bc896acca6 demo demo] Unexpected exception in API method: nova.exception_Remote.VirtualInterfaceCreateException_Remote: Virtual Interface creation failed
pymysql.err.IntegrityError: (1062, "Duplicate entry 'fa:16:3e:9d:18:a6/aac0ca83-b3d2-4b28-ab15-de2d3a3e6e16-0' for key 'uniq_virtual_interfaces0address0deleted'")
oslo_db.exception.DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, "Duplicate entry 'fa:16:3e:9d:18:a6/aac0ca83-b3d2-4b28-ab15-de2d3a3e6e16-0' for key 'uniq_virtual_interfaces0address0deleted'") [SQL: 'INSERT INTO virtual_interfaces (created_at, updated_at, deleted_at, deleted, address, network_id, instance_uuid, uuid, tag) VALUES (%(created_at)s, %(updated_at)s, %(deleted_at)s, %(deleted)s, %(address)s, %(network_id)s, %(instance_uuid)s, %(uuid)s, %(tag)s)'] [parameters: {'created_at': datetime.datetime(2019, 3, 20, 16, 11, 27, 753079), 'tag': None, 'uuid': 'aac0ca83-b3d2-4b28-ab15-de2d3a3e6e16', 'deleted_at': None, 'deleted': 0, 'address': 'fa:16:3e:9d:18:a6/aac0ca83-b3d2-4b28-ab15-de2d3a3e6e16', 'network_id': None, 'instance_uuid': '890675f9-3a1e-4a07-8bed-8648cea9fbb9', 'updated_at': None}] (Background on this error at: http://sqlalche.me/e/gkpj)

(This sequence of exceptions occurs 3 times, I assume because retrying is normally sufficient to fix a duplicate entry problem.)

The test was heat_integrationtests.functional.test_cancel_update.CancelUpdateTest.test_cancel_update_server_with_port

Revision history for this message
Matt Riedemann (mriedem) wrote :

As mentioned in the other bug, we likely just need to keep track of the VirtualInterface objects we create and if something fails we clean them up here:

https://github.com/openstack/nova/blob/92a388a1e34559b2ce69d31fdef996ff029495a6/nova/network/neutronv2/api.py#L847

tags: added: neutron
Changed in nova:
status: New → Triaged
tags: added: low-hanging-fruit
Changed in nova:
importance: Undecided → Medium
Revision history for this message
Matt Riedemann (mriedem) wrote :

Hmm, this cleanup was added in newton:

https://review.opendev.org/#/c/342416/

So why isn't that handling the issue? Is it because _unbind_ports or _delete_ports fails so we don't get to that vif deletion loop?

Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm not seeing the error in logstash so this must be super rare if you're hitting it in Heat's CI testing.

Changed in nova:
importance: Medium → Low
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

This is pretty easy to reproduce. You need to create a server a specify the same port twice. For example:

  $ openstack port create --network private test-port
  $ openstack server create --flavor m1.small --image fedora-iso --port test-port --port test-port --wait test-server
  Error creating server: test-server
  Error creating server

Changed in nova:
status: Triaged → Confirmed
importance: Low → Medium
assignee: nobody → Stephen Finucane (stephenfinucane)
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Here's the request that OSC sends for the above command. Note the duplicate port ID.

REQ: curl -g -i -X POST http://192.168.50.23/compute/v2.1/servers -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: {SHA256}c90e6681b5e681213c816ea7f0747201c4c4acbc950943cfabefc2bf3109b923" -H "X-OpenStack-Nova-API-Version: 2.1" -d '{"server": {"name": "test-server", "imageRef": "5a696736-805e-42af-bb8a-9344cee1b5c3", "flavorRef": "2", "min_count": 1, "max_count": 1, "networks": [{"port": "e3caceae-2c74-4114-9f3f-262a37fc1971"}, {"port": "e3caceae-2c74-4114-9f3f-262a37fc1971"}]}}'

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/827070

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/827120

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/827121

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/827122

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/827123

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/827126

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/827070
Committed: https://opendev.org/openstack/nova/commit/9fe465427310f8215890d26bf169617653605e23
Submitter: "Zuul (22348)"
Branch: master

commit 9fe465427310f8215890d26bf169617653605e23
Author: Stephen Finucane <email address hidden>
Date: Mon Jan 31 12:56:57 2022 +0000

    api: Reject duplicate port IDs in server create

    Specifying a duplicate port ID is currently "allowed" but results in an
    integrity error when nova attempts to create a duplicate
    'VirtualInterface' entry. Start rejecting these requests by checking for
    duplicate IDs and rejecting offending requests. This is arguably an API
    change because there isn't a HTTP 5xx error (server create is an async
    operation), however, users shouldn't have to opt in to non-broken
    behavior and the underlying instance was never actually created
    previously, meaning automation that relied on this "feature" was always
    going to fail in a later step. We're also silently failing to do what
    the user asked (per flow chart at [1]).

    [1] https://docs.openstack.org/nova/latest/contributor/microversions.html#when-do-i-need-a-new-microversion

    Change-Id: Ie90fb83662dd06e7188f042fc6340596f93c5ef9
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1821088

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/827120
Committed: https://opendev.org/openstack/nova/commit/d4c92bc2314c6402c4e3464a5aa61a94293eaf91
Submitter: "Zuul (22348)"
Branch: stable/xena

commit d4c92bc2314c6402c4e3464a5aa61a94293eaf91
Author: Stephen Finucane <email address hidden>
Date: Mon Jan 31 12:56:57 2022 +0000

    api: Reject duplicate port IDs in server create

    Specifying a duplicate port ID is currently "allowed" but results in an
    integrity error when nova attempts to create a duplicate
    'VirtualInterface' entry. Start rejecting these requests by checking for
    duplicate IDs and rejecting offending requests. This is arguably an API
    change because there isn't a HTTP 5xx error (server create is an async
    operation), however, users shouldn't have to opt in to non-broken
    behavior and the underlying instance was never actually created
    previously, meaning automation that relied on this "feature" was always
    going to fail in a later step. We're also silently failing to do what
    the user asked (per flow chart at [1]).

    [1] https://docs.openstack.org/nova/latest/contributor/microversions.html#when-do-i-need-a-new-microversion

    Change-Id: Ie90fb83662dd06e7188f042fc6340596f93c5ef9
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1821088
    (cherry picked from commit 9fe465427310f8215890d26bf169617653605e23)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.1.0

This issue was fixed in the openstack/nova 24.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.0.0.0rc1

This issue was fixed in the openstack/nova 25.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/827126
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ussuri)

Change abandoned by "Stephen Finucane <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/827123

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Stephen Finucane <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/827122

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/wallaby)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/827121
Reason: stable/wallaby branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/wallaby if you want to further work on this patch.

Revision history for this message
Trent Lloyd (lathiat) wrote :

I found another cause for this issue not so far discussed, noting it here for future travellers that may find this bug on the off chance you happen to have a workflow that re-uses a neutron port on another instance.

We had a workflow like this:
- Create a static neutron port that is re-used
- Create an instance using that port
- Delete the instance
- Loop waiting for neutron to show the port's device_id has been cleared to know we can re-use it
- Create another instance using the same port

The problem is that the neutron.ports table entry is updated with device_id='', status='DOWN' before the entry in nova.virtual_interfaces is marked deleted.

So there is a small window of time, a few seconds or more, where you can create an virtual machines and use an unassigned neutron port, but the nova.virtual_interfaces will have a conflicting entry that hasn't yet been cleared and will cause the server creation to fail.

It seems that the nova.instances vm_state='deleted' and deleted=instances.id set after the virtual_interfaces entry is deleted, so as a workaround you can wait for deletion of the previous instance in such an exact workflow, but this is probably still a bug in that the virtual_interfaces entry should probably be getting cleared before the neutron port is released.

I haven't delved into the code to figure out how possible that is. The same behaviour exists on ussuri and antelope.

For reference, the order of MySQL queries in my focal-ussuri test environment (some of this may possibly be asynchronous and may vary depending on the worker count, load, size of database, etc):

2024-03-15T06:47:07 # start: openstack server create --wait
2024-03-15T06:47:12.2 INSERT INTO nova.instances
2024-03-15T06:47:16.0 UPDATE neutron.ports SET device_id
2024-03-15T06:47:18.1 INSERT INTO nova.virtual_interfaces
2024-03-15T06:47:21.9 UPDATE neutron.ports SET status='ACTIVE'
2024-03-15T06:47:27 # finish: openstack server create --wait

2024-03-15T06:47:45 # openstack server delete
2024-03-15T06:47:45.1 UPDATE nova.instances SET task_state='deleting'
2024-03-15T06:47:46.1 UPDATE neutron.ports SET status='DOWN'
2024-03-15T06:47:47.1 UPDATE neutron.ports SET device_id=''
2024-03-15T06:47:48.3 UPDATE nova.virtual_interfaces SET deleted=id
2024-03-15T06:47:48.4 UPDATE nova.instances SET vm_state='deleted'
2024-03-15T06:47:48.9 UPDATE nova.instances SET deleted=instances.id

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.