race condition when dhcp agent uses reserved port

Bug #1425402 reported by Wei T
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Eugene Nikanorov

Bug Description

Because when we detach a network from a dhcp agent , the port's device owner will goto "reserved_dhcp_port", and next time if we attach the network to different agent at the same time, both of them will get port list to see if there is reserved_dhcp_port, and update the port's device id by change it to dhcp<hostname-hash>-<network_id>. It will end with both of the two agent will create port with the same profile.

$neutron dhcp-agent-network-remove 47f46f82-5105-4de1-91ab-03faf765c6e3 cec2ca0a-8ced-429b-8a8d-10c0ed1b591d
$neutron dhcp-agent-network-remove d5340feb-1aea-4851-8c9e-d558eb0a37b9 cec2ca0a-8ced-429b-8a8d-10c0ed1b591d

select * from ports where network_id='cec2ca0a-8ced-429b-8a8d-10c0ed1b591d';
+----------------------------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+-------------------------------------------------------------------------------+--------------+
| tenant_id | id | name | network_id | mac_address | admin_state_up | status | device_id | device_owner |
+----------------------------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+-------------------------------------------------------------------------------+--------------+
| c21da7e7d2b142049cc3ec8c551fdb80 | 23e1acb6-5bab-45b5-9d32-67f879771fa7 | | cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | fa:16:3e:4a:27:f4 | 1 | ACTIVE | dhcpe82d9294-4fef-5db5-b27e-d3e53cea2856-cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | network:dhcp |
| c21da7e7d2b142049cc3ec8c551fdb80 | 25f50954-8bf9-4dd6-ba89-09b2ae169f86 | | cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | fa:16:3e:b8:8d:5e | 1 | ACTIVE | dhcp74013012-f6f2-511b-882b-fc394cffe407-cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | network:dhcp |
| c21da7e7d2b142049cc3ec8c551fdb80 | 526be157-8c3b-4aff-997b-3f78a59767ad | | cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | fa:16:3e:47:da:eb | 1 | ACTIVE | reserved_dhcp_port | network:dhcp |
| c21da7e7d2b142049cc3ec8c551fdb80 | 825e7437-96a8-4523-95ba-6c3b6302482d | | cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | fa:16:3e:b3:f4:0b | 1 | ACTIVE | reserved_dhcp_port | network:dhcp |
+----------------------------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+-------------------------------------------------------------------------------+--------------+
4 rows in set (0.00 sec)
$neutron dhcp-agent-network-add 47f46f82-5105-4de1-91ab-03faf765c6e3 cec2ca0a-8ced-429b-8a8d-10c0ed1b591d & neutron dhcp-agent-network-add d5340feb-1aea-4851-8c9e-d558eb0a37b9 cec2ca0a-8ced-429b-8a8d-10c0ed1b591d

"ip netns exec qdhcp-cec2ca0a-8ced-429b-8a8d-10c0ed1b591d ip a|grep tap"

[1] 08:40:34 [SUCCESS] net-001
5819: tap526be157-8c: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    inet 192.168.200.22/24 brd 192.168.200.255 scope global tap526be157-8c
    inet 169.254.169.254/16 brd 169.254.255.255 scope global tap526be157-8c
[2] 08:40:34 [SUCCESS] net-002
8683: tap526be157-8c: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    inet 192.168.200.22/24 brd 192.168.200.255 scope global tap526be157-8c
    inet 169.254.169.254/16 brd 169.254.255.255 scope global tap526be157-8c

select * from ports where network_id='cec2ca0a-8ced-429b-8a8d-10c0ed1b591d';
+----------------------------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+-------------------------------------------------------------------------------+--------------+
| tenant_id | id | name | network_id | mac_address | admin_state_up | status | device_id | device_owner |
+----------------------------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+-------------------------------------------------------------------------------+--------------+
| c21da7e7d2b142049cc3ec8c551fdb80 | 23e1acb6-5bab-45b5-9d32-67f879771fa7 | | cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | fa:16:3e:4a:27:f4 | 1 | ACTIVE | dhcpe82d9294-4fef-5db5-b27e-d3e53cea2856-cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | network:dhcp |
| c21da7e7d2b142049cc3ec8c551fdb80 | 25f50954-8bf9-4dd6-ba89-09b2ae169f86 | | cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | fa:16:3e:b8:8d:5e | 1 | ACTIVE | dhcp74013012-f6f2-511b-882b-fc394cffe407-cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | network:dhcp |
| c21da7e7d2b142049cc3ec8c551fdb80 | 526be157-8c3b-4aff-997b-3f78a59767ad | | cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | fa:16:3e:47:da:eb | 1 | ACTIVE | dhcp1d6fb30f-279e-5cad-848f-353561944dc3-cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | network:dhcp |
| c21da7e7d2b142049cc3ec8c551fdb80 | 825e7437-96a8-4523-95ba-6c3b6302482d | | cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | fa:16:3e:b3:f4:0b | 1 | ACTIVE | reserved_dhcp_port | network:dhcp |
+----------------------------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+-------------------------------------------------------------------------------+--------------+

The resolution could be add row lock when update port

Wei T (nuaafe)
Changed in neutron:
assignee: nobody → Wei T (nuaafe)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/159110

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Kyle Mestery (<email address hidden>) on branch: master
Review: https://review.openstack.org/159110
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote : Re: race condition when dhcp agent using reserved port

I think the right approach here is to implement a kind of compare-and-swap mechanism. It will not fully elimianate race condition, but will greatly reduce probability of its occurence

Changed in neutron:
assignee: Wei T (nuaafe) → Eugene Nikanorov (enikanorov)
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/236983

Revision history for this message
Nell Jerram (neil-jerram) wrote : Re: race condition when dhcp agent using reserved port

What is the use case where reserved DHCP ports are used?

Specifically, I'm wondering if it would make sense to change the reserved device_id from 'reserved_dhcp_port' to 'reserved_dhcp_port_<HOST_ID>'. Then the DHCP agent on each host would pick up its own reserved port again, and we'd never see such races.

Revision history for this message
Wei T (nuaafe) wrote : Re: [Bug 1425402] Re: race condition when dhcp agent using reserved port
Download full text (6.8 KiB)

E.g 4 net nodes and each network has 2 DHCP ports for HA . If one node
going down, another nodes could pick up the reserved port without change IP
address , it will benefit if DNS is also on the DHCP port.
Your idea will work but need to reserve more ports than needed

On Monday, 16 November 2015, Neil Jerram <email address hidden> wrote:

> What is the use case where reserved DHCP ports are used?
>
> Specifically, I'm wondering if it would make sense to change the
> reserved device_id from 'reserved_dhcp_port' to
> 'reserved_dhcp_port_<HOST_ID>'. Then the DHCP agent on each host would
> pick up its own reserved port again, and we'd never see such races.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1425402
>
> Title:
> race condition when dhcp agent using reserved port
>
> Status in neutron:
> In Progress
>
> Bug description:
> Because when we detach a network from a dhcp agent , the port's device
> owner will goto "reserved_dhcp_port", and next time if we attach the
> network to different agent at the same time, both of them will get
> port list to see if there is reserved_dhcp_port, and update the port's
> device id by change it to dhcp<hostname-hash>-<network_id>. It will
> end with both of the two agent will create port with the same profile.
>
>
> $neutron dhcp-agent-network-remove 47f46f82-5105-4de1-91ab-03faf765c6e3
> cec2ca0a-8ced-429b-8a8d-10c0ed1b591d
> $neutron dhcp-agent-network-remove d5340feb-1aea-4851-8c9e-d558eb0a37b9
> cec2ca0a-8ced-429b-8a8d-10c0ed1b591d
>
> select * from ports where
> network_id='cec2ca0a-8ced-429b-8a8d-10c0ed1b591d';
>
> +----------------------------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+-------------------------------------------------------------------------------+--------------+
> | tenant_id | id
> | name | network_id | mac_address |
> admin_state_up | status | device_id
> | device_owner |
>
> +----------------------------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+-------------------------------------------------------------------------------+--------------+
> | c21da7e7d2b142049cc3ec8c551fdb80 |
> 23e1acb6-5bab-45b5-9d32-67f879771fa7 | |
> cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | fa:16:3e:4a:27:f4 | 1 |
> ACTIVE |
> dhcpe82d9294-4fef-5db5-b27e-d3e53cea2856-cec2ca0a-8ced-429b-8a8d-10c0ed1b591d
> | network:dhcp |
> | c21da7e7d2b142049cc3ec8c551fdb80 |
> 25f50954-8bf9-4dd6-ba89-09b2ae169f86 | |
> cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | fa:16:3e:b8:8d:5e | 1 |
> ACTIVE |
> dhcp74013012-f6f2-511b-882b-fc394cffe407-cec2ca0a-8ced-429b-8a8d-10c0ed1b591d
> | network:dhcp |
> | c21da7e7d2b142049cc3ec8c551fdb80 |
> 526be157-8c3b-4aff-997b-3f78a59767ad | |
> cec2ca0a-8ced-429b-8a8d-10c0ed1b591d | fa:16:3e:47:da:eb | 1 |
> ACTIVE | reserved_dhcp_port
> | network...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/236983
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f76ef76f2516dad794818ce56fb15d16437f7314
Submitter: Jenkins
Branch: master

commit f76ef76f2516dad794818ce56fb15d16437f7314
Author: Eugene Nikanorov <email address hidden>
Date: Mon Oct 19 17:41:32 2015 +0400

    Avoid race condition for reserved DHCP ports

    This patch introduces mechanism similar to compare-and-swap
    for updating reserved DHCP port.

    This addresses a case when two DHCP agents that start nearly at
    the same time are assigned to one network and there is a reserved
    DHCP port in the network. Then each of agents will try to use it
    because agents don't check if reserved port is still available.
    Reserved DHCP port can be acquired by different agent between calls to
    get_active_networks and update_port, so this patch adds a check for
    this case.

    Change-Id: I0277ab537ff9d3a664c03ea291b9ec2b0e784dbb
    Closes-Bug: #1425402

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/250779

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/250779
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b77a416313d9e390e559b22251a6687a0dff28d5
Submitter: Jenkins
Branch: stable/liberty

commit b77a416313d9e390e559b22251a6687a0dff28d5
Author: Eugene Nikanorov <email address hidden>
Date: Mon Oct 19 17:41:32 2015 +0400

    Avoid race condition for reserved DHCP ports

    This patch introduces mechanism similar to compare-and-swap
    for updating reserved DHCP port.

    This addresses a case when two DHCP agents that start nearly at
    the same time are assigned to one network and there is a reserved
    DHCP port in the network. Then each of agents will try to use it
    because agents don't check if reserved port is still available.
    Reserved DHCP port can be acquired by different agent between calls to
    get_active_networks and update_port, so this patch adds a check for
    this case.

    Change-Id: I0277ab537ff9d3a664c03ea291b9ec2b0e784dbb
    Closes-Bug: #1425402
    (cherry picked from commit f76ef76f2516dad794818ce56fb15d16437f7314)

tags: added: in-stable-liberty
Revision history for this message
Wei T (nuaafe) wrote : Re: race condition when dhcp agent using reserved port

I was thinking the CAS way mitigates the issue (as there's no lock). it depends on the mysql isolation level, mysql HA setpup since here the get_port and update_port are not an atomic operation, there still some chance the race condition could happen. That's why I came with select for update at the very beginning based on master mysql mode and tested in our environment (and I agree it's not good approach with the masterless Galera setup).

Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 8.0.0.0b1

This issue was fixed in the openstack/neutron 8.0.0.0b1 development milestone.

Changed in neutron:
status: Fix Committed → Fix Released
summary: - race condition when dhcp agent using reserved port
+ race condition when dhcp agent uses reserved port
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 7.0.1

This issue was fixed in the openstack/neutron 7.0.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.