Two DHCP ports on same network due to cleanup failure

Bug #1244860 reported by Stephen Ma
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Stephen Ma
Icehouse
New
Undecided
Unassigned

Bug Description

On a network, "neutron port-list --network_id <net-id> --device_owner 'network:dhcp'" shows there are two ports. This is checked from the mysql database:

mysql> select * from ports where tenant_id='abcd' and device_owner='network:dhcp' and network_id='7d2e3d47-396d-4867-a2b0-0311465a8454';
+----------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+-------------------------------------------------------------------------------+--------------+
| tenant_id | id | name | network_id | mac_address | admin_state_up | status | device_id | device_owner |
+----------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+-------------------------------------------------------------------------------+--------------+
| abcd | 3d6a7627-6af9-4fb6-9cf6-591c1373d349 | | 7d2e3d47-396d-4867-a2b0-0311465a8454 | fa:16:3e:60:83:3f | 1 | ACTIVE | dhcp4fff1f08-9922-5c44-b6f8-fd9780f48512-7d2e3d47-396d-4867-a2b0-0311465a8454 | network:dhcp |
| abcd | a4c0eb19-407e-4970-90a8-0128259fb048 | | 7d2e3d47-396d-4867-a2b0-0311465a8454 | fa:16:3e:e1:1b:8f | 1 | ACTIVE | dhcpce80c236-6a89-571d-970b-a1d4bb787827-7d2e3d47-396d-4867-a2b0-0311465a8454 | network:dhcp |
+----------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+-------------------------------------------------------------------------------+--------------+
2 rows in set (0.00 sec)

However, the "neutron dhcp-agent-list-hosting-net 7d2e3d47-396d-4867-a2b0-0311465a8454 shows only one DHCP-server running.

This problem is observed in an environment with 4 nodes running dhcp-agents. The neutron API server and the DHCP agents are NOT running on the same node.

What happened is that error occurred when the DHCP server is being "moved" from DHCP-agentA running on nodeA to DHCP-agentB running on nodeB. The sequence is

  neutron dhcp-agent-network-remove <agentA> <net-id> (1)
  neutron dhcp-agent-network-add <agentB> <net-id> (2)

Right before or during the time step 1 is done, nodeA was rebooted. So the DHCP-port ws never removed. When nodeA came back and the DHCP-agent restarted, it didn't do the unplug of the dhcp port device. THe DHCP agent also failed to make the release_dhcp_port RPC call to the API-server to have the port deleted from mysql.

Stephen Ma (stephen-ma)
Changed in neutron:
assignee: nobody → Stephen Ma (stephen-ma)
Revision history for this message
Stephen Ma (stephen-ma) wrote :

Correction to the bug description: Nodes running Neutron DHCP-agents ARE NOT RUNNING Neutron API servers.

description: updated
Stephen Ma (stephen-ma)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/56740

Changed in neutron:
status: New → In Progress
Changed in neutron:
assignee: Stephen Ma (stephen-ma) → Mark McClain (markmcclain)
Changed in neutron:
assignee: Mark McClain (markmcclain) → Stephen Ma (stephen-ma)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/56740
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8cf394b896e3644ff51edf6a0d462501fb6e6843
Submitter: Jenkins
Branch: master

commit 8cf394b896e3644ff51edf6a0d462501fb6e6843
Author: Stephen Ma <email address hidden>
Date: Sun Oct 27 08:14:21 2013 -0700

    Delete DHCP port without DHCP server on a net node

    A DHCP-network was deleted from one host using neutron
    dhcp-agent-network-remove and then added to another host
    using neutron dhcp-agent-network-add command. While the
    dhcp-agent-network-remove command was in progress, the
    host crashed. As a result, the removal of the DHCP-network
    was partially done. The network was disassociated from
    the agent in mysql. However, the agent never made the
    release_dhcp_port RPC call to delete the port -- even
    after the agent restarted. The end result is that there
    are two DHCP ports for the same network. One of these
    is found on the host that is no longer hosting the
    dhcp-server.

    This fix make the DHCP agent invoke the release_dhcp_port
    RPC call on a stale network whose dnsmasq process is not
    running (not active). Before this change, the RPC call is
    made on a stale network only when the dnsmasq process is
    running.

    Closes-Bug: #1244860
    Change-Id: Ie0bafdac698810b5455550c306c6a75ddf91d9bb

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → juno-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/127183

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/icehouse)

Reviewed: https://review.openstack.org/127183
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7f13cbca1c007623313025ecfeb8e7cae9e7e365
Submitter: Jenkins
Branch: stable/icehouse

commit 7f13cbca1c007623313025ecfeb8e7cae9e7e365
Author: Stephen Ma <email address hidden>
Date: Sun Oct 27 08:14:21 2013 -0700

    Delete DHCP port without DHCP server on a net node

    A DHCP-network was deleted from one host using neutron
    dhcp-agent-network-remove and then added to another host
    using neutron dhcp-agent-network-add command. While the
    dhcp-agent-network-remove command was in progress, the
    host crashed. As a result, the removal of the DHCP-network
    was partially done. The network was disassociated from
    the agent in mysql. However, the agent never made the
    release_dhcp_port RPC call to delete the port -- even
    after the agent restarted. The end result is that there
    are two DHCP ports for the same network. One of these
    is found on the host that is no longer hosting the
    dhcp-server.

    This fix make the DHCP agent invoke the release_dhcp_port
    RPC call on a stale network whose dnsmasq process is not
    running (not active). Before this change, the RPC call is
    made on a stale network only when the dnsmasq process is
    running.

    Closes-Bug: #1244860
    Change-Id: Ie0bafdac698810b5455550c306c6a75ddf91d9bb
    (cherry picked from commit 8cf394b896e3644ff51edf6a0d462501fb6e6843)

tags: added: in-stable-icehouse
Thierry Carrez (ttx)
Changed in neutron:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.