DHCP agent conflicting with dynamic IPv6 addresses

Bug #1627902 reported by Kevin Benton
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Brian Haley

Bug Description

Below in gate logs. Completely breaks the DHCP for that network because it's trying to add an address that conflicts with one given to it via RA. Cause is the merge of d86f1b87f01c53c3e0b085086133b311e5bf3ab5 which allowed the agent to be configured with stateless v6 addresses to serve metadata correctly.

http://logs.openstack.org/12/343312/5/gate/gate-tempest-dsvm-neutron-full-ubuntu-xenial/c11b933/logs/screen-q-dhcp.txt.gz?level=TRACE#_2016-09-26_21_45_40_604

2016-09-26 21:45:40.604 13605 ERROR neutron.agent.linux.utils [-] Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: File exists

2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 81d252a2-8207-4e8c-a286-07fb3494a3ec.
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent File "/opt/stack/new/neutron/neutron/agent/dhcp/agent.py", line 114, in call_driver
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs)
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent File "/opt/stack/new/neutron/neutron/agent/linux/dhcp.py", line 212, in enable
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent interface_name = self.device_manager.setup(self.network)
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent File "/opt/stack/new/neutron/neutron/agent/linux/dhcp.py", line 1396, in setup
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent namespace=network.namespace)
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent File "/opt/stack/new/neutron/neutron/agent/linux/interface.py", line 129, in init_l3
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent device.addr.add(ip_cidr)
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 577, in add
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent self._as_root([net.version], tuple(args))
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 364, in _as_root
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent use_root_namespace=use_root_namespace)
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 95, in _as_root
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent log_fail_as_error=self.log_fail_as_error)
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 104, in _execute
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent log_fail_as_error=log_fail_as_error)
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent File "/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 138, in execute
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent raise RuntimeError(msg)
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent RuntimeError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: File exists
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent
2016-09-26 21:45:40.604 13605 ERROR neutron.agent.dhcp.agent

Changed in neutron:
assignee: nobody → Kevin Benton (kevinbenton)
Changed in neutron:
importance: Undecided → High
tags: added: gate-failure
tags: added: newton-rc-potential
Changed in neutron:
status: New → Confirmed
Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

The offending patch should be reverted

Changed in neutron:
status: In Progress → Invalid
assignee: Kevin Benton (kevinbenton) → nobody
Changed in neutron:
assignee: nobody → Kevin Benton (kevinbenton)
status: Invalid → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Kevin Benton (<email address hidden>) on branch: master
Review: https://review.openstack.org/377142
Reason: This partial revert is preferable: Ide494b6333a4f1e279ab58aa27c0aa719e79545d

tags: added: l3-ipam-dhcp
Revision history for this message
Kevin Benton (kevinbenton) wrote :
Changed in neutron:
status: In Progress → Fix Released
tags: removed: newton-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/386687

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Kevin Benton (<email address hidden>) on branch: master
Review: https://review.openstack.org/377140
Reason: see Brian's patch (386687)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/385226
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=97f4a3fdbbe1a2b9dd9b09cc042094a4cd5177ca
Submitter: Jenkins
Branch: master

commit 97f4a3fdbbe1a2b9dd9b09cc042094a4cd5177ca
Author: Brian Haley <email address hidden>
Date: Tue Oct 11 09:23:32 2016 -0400

    Move sysctl out of IPDevice class

    Not all callers use a device with sysctl calls, so move it
    out of the IPDevice class to make it more generic.

    This will help facilitate a future change to set accept_ra
    to fix another bug.

    Change-Id: Iaf85981a227234466863f00ee4e2209405a2b083
    Related-bug: #1627902

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/386687
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=904f85e2f933493d386f463af3ff6168dd36b7de
Submitter: Jenkins
Branch: master

commit 904f85e2f933493d386f463af3ff6168dd36b7de
Author: Brian Haley <email address hidden>
Date: Fri Oct 14 11:37:54 2016 -0400

    Disable 'accept_ra' in DHCP agent namespace

    Currently the DHCP agent relies on the acceptance of an
    RA to configure its IPv6 address with SLAAC or DHCPv6-stateless
    network modes. It should explicitly assign addresses to the
    agent based on the data model instead.

    In order to do this we must disable RAs in the namespace so
    that a static assignment doesn't conflict with a previously
    created dynamically-generated address.

    Change-Id: I1b38d131249d59fa486a07024d4b1ec61e693d59
    Related-bug: #1627902

Revision history for this message
Brian Haley (brian-haley) wrote :

I actually broke this with my update, I'll explain.

Although we do correctly disable accept_ra in the dhcp namespace now, on the grenade upgrade job we are restarting a new dhcp agent on an existing installation. This means that the old dhcp namespace could have auto-configured an IPv6 address already, in which case we fall on our face:

'ip', 'netns', 'exec', 'qdhcp-38b38c7e-d336-4955-a140-ef83a4410c2a', 'ip', '-6', 'addr', 'add', 'fdb9:d744:18da:0:f816:3eff:fe61:c32b/64', 'scope', 'global', 'dev', 'tap62476185-f1'
ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: File exists

http://logs.openstack.org/43/406243/1/check/gate-grenade-dsvm-neutron-dvr-multinode-ubuntu-xenial/6cee38c/logs/new/screen-q-dhcp.txt.gz#_2016-12-02_17_40_30_861

Looking a few lines up we can see:

    Reusing existing device: tap8fb1bc28-fa

That's the clue that we might have a conflict.

In init_l3() we filter the IP(v6) addresses with the "permanent" flag, but that will filter any previous IPv6 addresses created via SLAAC on receipt of an RA. Just dropping the "permanent" filter seems like a quick fix, but that will leave the SLAAC-based address alone, which will eventually be removed as it's lifetime expires, leading to other problems.

The best thing to do would be to detect the old address (it has a "dynamic" flag) and remove it, allowing us to correctly add the new "permanent" one. I'll work on a fix.

Changed in neutron:
status: Fix Released → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/406428

Changed in neutron:
assignee: Kevin Benton (kevinbenton) → Brian Haley (brian-haley)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/406428
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=21bb77667007b0b140c17d9104204943c4a0f4cc
Submitter: Jenkins
Branch: master

commit 21bb77667007b0b140c17d9104204943c4a0f4cc
Author: Brian Haley <email address hidden>
Date: Fri Dec 2 23:07:29 2016 -0500

    Correctly configure IPv6 addresses on upgrades

    When starting the dhcp-agent after an upgrade, there could
    be stale IPv6 addresses in the namespace that had been
    configured via SLAAC. These need to be removed, and the
    same address added back statically, in order for the
    agent to start up correctly.

    To avoid the race condition where an IPv6 RA could arrive
    while we are making this change, we must move the call
    to disable RAs in the namespace from plug(), since devices
    may already exist that are receiving packets.

    Uncovered by the grenade tests.

    Change-Id: I7e1e5d6c1fa938918aac3fb63888d20ff4088ba7
    Closes-bug: #1627902

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 10.0.0.0b2

This issue was fixed in the openstack/neutron 10.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/newton)

Related fix proposed to branch: stable/newton
Review: https://review.openstack.org/460920

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/460921

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/newton)

Reviewed: https://review.openstack.org/460920
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=16942945aeb3f341ae7df1a574a32b4c3ee6c055
Submitter: Jenkins
Branch: stable/newton

commit 16942945aeb3f341ae7df1a574a32b4c3ee6c055
Author: Brian Haley <email address hidden>
Date: Fri Oct 14 11:37:54 2016 -0400

    Disable 'accept_ra' in DHCP agent namespace

    Currently the DHCP agent relies on the acceptance of an
    RA to configure its IPv6 address with SLAAC or DHCPv6-stateless
    network modes. It should explicitly assign addresses to the
    agent based on the data model instead.

    In order to do this we must disable RAs in the namespace so
    that a static assignment doesn't conflict with a previously
    created dynamically-generated address.

    Conflicts:
     neutron/agent/linux/interface.py

    Change-Id: I1b38d131249d59fa486a07024d4b1ec61e693d59
    Related-bug: #1627902
    (cherry picked from commit 904f85e2f933493d386f463af3ff6168dd36b7de)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/newton)

Reviewed: https://review.openstack.org/460921
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b413c39f7227f9b1de4f0bed4911bac78f8a2f5a
Submitter: Jenkins
Branch: stable/newton

commit b413c39f7227f9b1de4f0bed4911bac78f8a2f5a
Author: Brian Haley <email address hidden>
Date: Fri Dec 2 23:07:29 2016 -0500

    Correctly configure IPv6 addresses on upgrades

    When starting the dhcp-agent after an upgrade, there could
    be stale IPv6 addresses in the namespace that had been
    configured via SLAAC. These need to be removed, and the
    same address added back statically, in order for the
    agent to start up correctly.

    To avoid the race condition where an IPv6 RA could arrive
    while we are making this change, we must move the call
    to disable RAs in the namespace from plug(), since devices
    may already exist that are receiving packets.

    Uncovered by the grenade tests.

    Change-Id: I7e1e5d6c1fa938918aac3fb63888d20ff4088ba7
    Closes-bug: #1627902
    (cherry picked from commit 21bb77667007b0b140c17d9104204943c4a0f4cc)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.4.1

This issue was fixed in the openstack/neutron 9.4.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.