nova allows booting two instances with the same neutron port in parallel

Bug #1493714 reported by Balazs Gibizer
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Balazs Gibizer
neutron
Invalid
Medium
Kevin Benton

Bug Description

It seems that to reproduce the problem we need a multi node deployment with at least two nova-compute service.

To reproduce it do the following:
1) create a neutron port
2) boot two instances in parallel with that port
Sometimes both instances become ACTIVE in nova which is clearly wrong.

vagrant@controller:~/devstack$ neutron net-list
+--------------------------------------+---------+----------------------------------------------------------+
| id | name | subnets |
+--------------------------------------+---------+----------------------------------------------------------+
| fc257a00-d3bf-47e6-b91f-e2cef985c414 | public | a610715c-614f-492c-8810-f51e03c5d383 2001:db8::/64 |
| | | a923871f-90ad-4354-935d-db24861a5890 172.24.4.0/24 |
| 7a057b12-0e69-4c31-859e-098263abeeba | private | 04f3b138-d7c6-48e1-98e3-7f70eb7ab4fe fda4:10b7:acaa::/64 |
| | | ee70023c-f273-471a-8b84-cb25bb64fcf9 10.0.0.0/24 |
+--------------------------------------+---------+----------------------------------------------------------+
vagrant@controller:~/devstack$ neutron port-create 7a057b12-0e69-4c31-859e-098263abeeba
Created a new port:
+-----------------------+-------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:host_id | |
| binding:profile | {} |
| binding:vif_details | {} |
| binding:vif_type | unbound |
| binding:vnic_type | normal |
| device_id | |
| device_owner | |
| fixed_ips | {"subnet_id": "ee70023c-f273-471a-8b84-cb25bb64fcf9", "ip_address": "10.0.0.4"} |
| | {"subnet_id": "04f3b138-d7c6-48e1-98e3-7f70eb7ab4fe", "ip_address": "fda4:10b7:acaa:0:f816:3eff:fe0c:285d"} |
| id | f2da8f78-8ae4-49f0-bca0-5820588d33ea |
| mac_address | fa:16:3e:0c:28:5d |
| name | |
| network_id | 7a057b12-0e69-4c31-859e-098263abeeba |
| port_security_enabled | True |
| security_groups | 73853e74-a6c7-4b71-ba45-5b82b5e1ad81 |
| status | DOWN |
| tenant_id | 16f8c1dbfa2d472da0d1335b8a70aee0 |
+-----------------------+-------------------------------------------------------------------------------------------------------------+
vagrant@controller:~/devstack$ nova boot --image cirros-0.3.4-x86_64-uec --flavor 42 --nic port-id=f2da8f78-8ae4-49f0-bca0-5820588d33ea vm1 & nova boot --image cirros-0.3.4-x86_64-uec --flavor 42 --nic port-id=f2da8f78-8ae4-49f0-bca0-5820588d33ea vm2 &
[1] 18785
[2] 18786

vagrant@controller:~/devstack$ nova list
+--------------------------------------+------+--------+------------+-------------+--------------------------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+--------------------------------------------------------+
| 24e67346-18fe-4cfa-8b01-f017c915a111 | vm1 | ACTIVE | - | Running | private=fda4:10b7:acaa:0:f816:3eff:fe0c:285d, 10.0.0.4 |
| d90292a2-5b8f-4566-a621-defdf7aa0246 | vm2 | ACTIVE | - | Running | |
+--------------------------------------+------+--------+------------+-------------+--------------------------------------------------------+

Based on the code we have an atomicity problem on the neutron REST API:
At https://github.com/openstack/nova/blob/1db33ca6c248613cc8a76dcbbf78758001ee02d8/nova/network/neutronv2/api.py#L611 nova will check if the device_id of the neutron port are empty or not. Then nova sets the current instance_uuid to the device_id later at https://github.com/openstack/nova/blob/1db33ca6c248613cc8a76dcbbf78758001ee02d8/nova/network/neutronv2/api.py#L692

So it is possible that two nova-compute processes check the port status before one of them sets the device_id, so as a result both nova-compute will think that the port are free to use and the slower nova-compute will overwrite the device_id of the port.

Tags: neutron
Changed in nova:
assignee: nobody → Balazs Gibizer (balazs-gibizer)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/221803

Changed in nova:
status: New → In Progress
Revision history for this message
John Garbutt (johngarbutt) wrote :

We have a new feature in neutron that will really help us in this battle.

tags: added: neutron
Changed in nova:
importance: Undecided → Medium
Revision history for this message
Kevin Benton (kevinbenton) wrote :
Changed in neutron:
status: New → In Progress
assignee: nobody → Kevin Benton (kevinbenton)
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/474383

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/474383
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9c0a0de556f8edafd153a7d4c1c0548dd4ff3129
Submitter: Jenkins
Branch: master

commit 9c0a0de556f8edafd153a7d4c1c0548dd4ff3129
Author: Kevin Benton <email address hidden>
Date: Wed Jun 14 15:36:55 2017 -0700

    Manually increment revision numbers in revision plugin

    Don't rely on the SQLAlchemy revision_col flag to bump revision
    numbers and instead bump them in a before_flush handler. This
    will allow the follow-up patch to do enforcement on conditional
    updates before the revision number is incremented.

    Partial-Bug: #1493714
    Partially-Implements: blueprint push-notifications
    Change-Id: I5feeec5b8385727eff53dc669363bc41db8ceaba

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/409577
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7f17b4759e41aa0a922ac27c2dabc595e7d2f67c
Submitter: Jenkins
Branch: master

commit 7f17b4759e41aa0a922ac27c2dabc595e7d2f67c
Author: Kevin Benton <email address hidden>
Date: Sun Dec 11 18:24:01 2016 -0800

    API compare-and-swap updates based on revision_number

    Allows posting revision number matching in the If-Match header
    so updates/deletes will only be satisfied if the current revision
    number of the object matches.

    DocImpact: The Neutron API now supports conditional updates to resources
               that contain the standard 'revision_number' attribute by
               setting the revision_number in an HTTP If-Match header.
    APIImpact

    Partial-Bug: #1493714
    Partially-Implements: blueprint push-notifications
    Change-Id: I7d97d6044378eb59cb2c7bdc788dc6c174783299

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/482839

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/482840

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-lib (master)

Reviewed: https://review.openstack.org/482839
Committed: https://git.openstack.org/cgit/openstack/neutron-lib/commit/?id=6a2040fcdb62b44d63757f3fc9ac13a8f7537c82
Submitter: Jenkins
Branch: master

commit 6a2040fcdb62b44d63757f3fc9ac13a8f7537c82
Author: Kevin Benton <email address hidden>
Date: Wed Jul 12 01:09:22 2017 -0700

    Add interface to add a constraint to context

    This adds methods to the Context to add, remove, and get a
    constraint on an object to be checked before a change is
    made to that object.

    Related-Bug: #1493714
    Change-Id: Id01c4dfb740f680e96349dc08b11b5bca6c59a74

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/482840
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9662e2b170a60c489f072410f976e3365e20f897
Submitter: Jenkins
Branch: master

commit 9662e2b170a60c489f072410f976e3365e20f897
Author: Kevin Benton <email address hidden>
Date: Wed Jul 12 01:08:48 2017 -0700

    Use context interface for constraint

    Use the new constraint interface on the context rather than
    setting an ugly attribute.

    Depends-On: I6bc2539a1ddbf7990164abeb8bb951ddcb45c993

    Related-Bug: #1493714
    Change-Id: I9142ca96a40092b2a4c94920c4ded9bbc3a0b35b

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/221803
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in nova:
status: In Progress → Invalid
Changed in neutron:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.