One port can be added as multiple routers' interfaces if commands are executed at the same time

Bug #1535551 reported by Lujin Luo
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Low
Lujin Luo

Bug Description

I have three controller nodes and the Neutron servers on these controllers are set behind Pacemaker and HAProxy to realize active/active HA using DevStack. MariaDB Galera cluster is used as my database backend.I am using the latest codes.

If one port is added as multiple routers' interfaces, the expected result is that only API request is executed successfully and the port is associated to one router. Other API requests would recieve error message like
PortInUseClient: Unable to complete operation on port d2c97788-61d7-489a-8b20-7a6e8e39a217 for network 496de8cf-4284-41d7-ad6b-7dd5f232dc21. Port already has an attached device 1b316d80-f5d8-4477-88df-54b376c4c8cd.

Besides, in routerports database, only one record of port is allowed to exist. However, if we run two commands to add one port as two different routers' interfaces at the same time. Both of the commands would show execution succeed. The truth is two records that the port is associated to both routers are listed in routerports database.

How to reproduce

Step 1: Create two routers
$ neutron router-create router-1
$ neutron router-create router-2

Step 2: Create an internal network
$ neutron net-create net1

Step 3: Add a subnet to the network
$ neutron subnet-create --name subnet1 net1 192.166.100.0/24

Step 4: Create a port in the network
$ neutron port-create --name port1 net1

Step 5: Add this port as two routers' interfaces at the same time
On controller1:
$ neutron router-interface-add router-1 port=port1
on controller2:
$ neutron router-interface-add router-2 port=port1

Both commands would return success, as shown http://paste.openstack.org/show/483840/

Step 6: Check port list on both routers
The result is shown http://paste.openstack.org/show/483843/

As we can see, only one router is successfully associated to the port

Step 7: Check routerports database
http://paste.openstack.org/show/483842/

where '99276755-236a-44b7-bf97-b2234d97028b' is the port_id of the port we created in Step 4.

To sum up, we have two issues here
a) Only one API request is executed successfully, but both commands return success
b) Routerports database is updated twice and we need to delete the older record.

Related source codes is [1]

[1] https://github.com/openstack/neutron/blob/master/neutron/db/l3_db.py#L535

------------------------Update on 2016/6/15----------------------------
If we have operator-1 who is trying to add port_1 as router_1's interface while at the same time operator-2 is trying to add port_2 as router_2's interface. However, operator-2 miss-typed "port-2" to "port-1". Without this unique key, both commands will return Success. Operator-2 would hardly realize that he/she did a wrong command. What is worse is that, if router_1 truly added port_1 as interface, and router_2 did not. If we perform interface_delete command on router_2, port_1 is deleted and router_1 (which truly has the interface of port_1) will lose interface.

Tags: l3-ipam-dhcp
Lujin Luo (luo-lujin)
Changed in neutron:
assignee: nobody → Lujin Luo (luo-lujin)
tags: added: l3-ipam-dhcp
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/285048

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → Low
Lujin Luo (luo-lujin)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/285048
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6281fddbcb4c471b6b06e24d3faa2990e040f3d2
Submitter: Jenkins
Branch: master

commit 6281fddbcb4c471b6b06e24d3faa2990e040f3d2
Author: Lujin Luo <email address hidden>
Date: Tue Jun 21 14:23:33 2016 +0900

    Add a unique key to port_id in routerports table

    If multiple commands to add router interfaces to different routers
    by the same port are executed concurrently, then all the commands
    would show success.

    However, there are three issues:
    1. Only one router interface is actually added by the port
    2. Multiple router ports records are stored in routerports table
    3. The port table is updated multiple times and eventually the
    last-arrived command would truly take effect

    This patch adds a unique key to port_id in routerport table,
    so that only the first-arrived command will insert router port
    record and all later requests would raise exceptions.

    Besides, port.device_id and port.device_owner in port table
    needs to be updated again after routerport record is inserted.
    Otherwise, in race condition the port table will store the router
    information from last-arrived request. However, in routerport table,
    only the first-arrived request's router information is inserted.

    Change-Id: I15be35689ec59ac02ed34abe5862fa4580c8587c
    Closes-Bug: #1535551

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/353263

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/353263
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=247128b6c3c03f3c096af9c27a97badbae11e666
Submitter: Jenkins
Branch: master

commit 247128b6c3c03f3c096af9c27a97badbae11e666
Author: Kevin Benton <email address hidden>
Date: Fri Aug 5 14:44:10 2016 -0700

    Fix duplicate routerport handling

    Change 6281fdd introduced a try: except to catch
    a DuplicateError introduced by it's new constraint.
    However, the try moved it outside of the context manager
    so it broke the port cleanup logic.

    This patch completely eliminates the catch for the
    duplicate entry since we retry those anyway which
    would let the regular check see the duplication.

    This also adds a test to prevent another regression of
    being moved outside of the context manager.

    Related-Bug: #1535551
    Related-Bug: #1600344

    Change-Id: I5f473fff4f8372852d563c79dac2991089eb0b77

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.0.0.0b3

This issue was fixed in the openstack/neutron 9.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.