DHCPv6 failures with "address already allocated in subnet" error

Bug #1655567 reported by IWAMOTO Toshihiro
30
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Miguel Lavalle

Bug Description

Test failures with the DHCPv6 error from multiple test methods.

Use query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22already%20allocated%20in%20subnet%5C%22

An example:

 ==============================
2017-01-11 05:00:04.479808 | Failed 1 tests - output below:
2017-01-11 05:00:04.479858 | ==============================
2017-01-11 05:00:04.479883 |
2017-01-11 05:00:04.479962 | tempest.api.network.test_dhcp_ipv6.NetworksTestDHCPv6.test_dhcpv6_stateless_eui64[id-e5517e62-6f16-430d-a672-f80875493d4c]
2017-01-11 05:00:04.480056 | --------------------------------------------------------------------------------------------------------------------------
2017-01-11 05:00:04.480082 |
2017-01-11 05:00:04.480113 | Captured traceback:
2017-01-11 05:00:04.480154 | ~~~~~~~~~~~~~~~~~~~
2017-01-11 05:00:04.480204 | Traceback (most recent call last):
2017-01-11 05:00:04.480283 | File "tempest/api/network/test_dhcp_ipv6.py", line 109, in test_dhcpv6_stateless_eui64
2017-01-11 05:00:04.480415 | real_ip, eui_ip = self._get_ips_from_subnet(**kwargs)
2017-01-11 05:00:04.480486 | File "tempest/api/network/test_dhcp_ipv6.py", line 90, in _get_ips_from_subnet
2017-01-11 05:00:04.480551 | subnet = self.create_subnet(self.network, **kwargs)
2017-01-11 05:00:04.480607 | File "tempest/api/network/base.py", line 179, in create_subnet
2017-01-11 05:00:04.481118 | **kwargs)
2017-01-11 05:00:04.481198 | File "tempest/lib/services/network/subnets_client.py", line 27, in create_subnet
2017-01-11 05:00:04.481260 | return self.create_resource(uri, post_data)
2017-01-11 05:00:04.481756 | File "tempest/lib/services/network/base.py", line 60, in create_resource
2017-01-11 05:00:04.481830 | resp, body = self.post(req_uri, req_post_data)
2017-01-11 05:00:04.481887 | File "tempest/lib/common/rest_client.py", line 276, in post
2017-01-11 05:00:04.481946 | return self.request('POST', url, extra_headers, headers, body, chunked)
2017-01-11 05:00:04.482000 | File "tempest/lib/common/rest_client.py", line 664, in request
2017-01-11 05:00:04.482042 | self._error_checker(resp, resp_body)
2017-01-11 05:00:04.482098 | File "tempest/lib/common/rest_client.py", line 776, in _error_checker
2017-01-11 05:00:04.482145 | raise exceptions.Conflict(resp_body, resp=resp)
2017-01-11 05:00:04.482204 | tempest.lib.exceptions.Conflict: An object with that identifier already exists
2017-01-11 05:00:04.482325 | Details: {u'detail': u'', u'message': u'IP address 2003::f816:3eff:fe18:34ee already allocated in subnet 181627ba-b8ff-49c2-b97d-21763a0b572c', u'type': u'IpAddressAlreadyAllocated'}

Revision history for this message
Boden R (boden) wrote :

Based on the query provided in the defect description it appears this started occurring on 01/10/2017 and as stated is affecting multiple gate jobs.

Changed in neutron:
importance: Undecided → High
status: New → Triaged
Boden R (boden)
Changed in neutron:
status: Triaged → Confirmed
Miguel Lavalle (minsel)
Changed in neutron:
assignee: nobody → Miguel Lavalle (minsel)
Revision history for this message
Miguel Lavalle (minsel) wrote :

Using the following query in Kibana:

message:"already allocated in subnet" AND build_status:FAILURE AND project:"openstack/neutron"

I find 42 hits, out of which 36 correspond to failures in unit tests due to two patchsets that are marked as WIP by the authors. Another four are failures in non-voting job gate-tempest-dsvm-neutron-multinode-full-ubuntu-xenial-nv due to another patchset that is also markep WIP by the author.

Given the results of my digging into the causes of https://bugs.launchpad.net/neutron/+bug/1683090, I suspect that this bug might have been fixed or at least mitigated by the merge of https://review.openstack.org/#/c/428774/ in master and https://review.openstack.org/#/c/454832/ in stable/ocata.

Will keep watching this bug over the next few days

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :
Revision history for this message
Miguel Lavalle (minsel) wrote :

Analyzing this failure:

http://logs.openstack.org/18/460918/1/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/07685f1/console.html

We can see it happens when creating a subnet:

http://paste.openstack.org/show/608546/

The network the subnet is being created on has already two subnets, one of which (id dfd3d082-bcb0-4bbf-9952-3b1fd6f77ceb) is pointed to in the failed test exception:

http://paste.openstack.org/show/608547/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/462716

Miguel Lavalle (minsel)
tags: added: l3-ipam-dhcp
Revision history for this message
Miguel Lavalle (minsel) wrote :

Looking at this instance of the bug:

http://logs.openstack.org/20/464020/5/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/a4d35a8/console.html

1) The test case being executed is test_dhcpv6_64_subnets. Code is: https://github.com/openstack/tempest/blob/master/tempest/api/network/test_dhcp_ipv6.py#L217

2) The second iteration is starting. All the subnets created by the first iteration have been deleted. In other words, the project / tenant has no subnets.

3) The failure occurs when create IPv6 subnet in the second iteration. The request req-e628a3d3-3c24-4ced-81b0-ec28993ef6f8 fails as follows:

2017-05-16 17:51:53.594710 | 2017-05-16 17:43:05,251 29583 INFO [tempest.lib.common.rest_client] Request (NetworksTestDHCPv6:test_dhcpv6_64_subnets): 409 POST https://10.37.255.77:9696/v2.0/subnets 1.054s
2017-05-16 17:51:53.594753 | 2017-05-16 17:43:05,251 29583 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'X-Auth-Token': '<omitted>', 'Accept': 'application/json'}
2017-05-16 17:51:53.594805 | Body: {"subnet": {"gateway_ip": "2003::1", "network_id": "da25ac93-a061-4ea8-bbf9-2c4e01112034", "ip_version": 6, "ipv6_ra_mode": "dhcpv6-stateless", "cidr": "2003::/64", "ipv6_address_mode": "dhcpv6-stateless"}}
2017-05-16 17:51:53.594877 | Response - Headers: {u'x-openstack-request-id': 'req-e628a3d3-3c24-4ced-81b0-ec28993ef6f8', u'content-type': 'application/json', u'server': 'Apache/2.4.18 (Ubuntu)', u'connection': 'close', 'content-location': 'https://10.37.255.77:9696/v2.0/subnets', u'date': 'Tue, 16 May 2017 17:43:05 GMT', u'content-length': '185', 'status': '409'}
2017-05-16 17:51:53.594922 | Body: {"NeutronError": {"message": "IP address 2003::f816:3eff:fe20:97dd already allocated in subnet 9fd8d6ef-eb85-468c-8453-6c532c61a433", "type": "IpAddressAlreadyAllocated", "detail": ""}}

4) Tracking req-e628a3d3-3c24-4ced-81b0-ec28993ef6f8 in the Neutron server log: http://paste.openstack.org/show/609968/. There are DB retries in this request

5) The id of the subnet being created is 9fd8d6ef-eb85-468c-8453-6c532c61a433. While the subnet creation is taking place (point 4 above), an Update DHCP port is received for the subnet in parallel (req-269ba7b0-a6ab-4549-aa3e-e8136710bbec).

6) Tracking req-269ba7b0-a6ab-4549-aa3e-e8136710bbec in the Neutron server log: http://paste.openstack.org/show/609969/. This request is successful. Note that the IPv6 address assigned to the DHCP port (2003::f816:3eff:fe20:97dd) is the same that is reported by the IpAddressAlreadyAllocated exception in the subnet creation (see point 3 above).

So it seems that the subnet creation is tripped by the the DHCP port creation on it.

Will continue digging.

Revision history for this message
Miguel Lavalle (minsel) wrote :

When creating a subnet with flag dhcpv6-stateless, both the server (at https://github.com/openstack/neutron/blob/master/neutron/db/ipam_pluggable_backend.py#L449) and the DHCP linux driver (at https://github.com/openstack/neutron/blob/master/neutron/agent/linux/dhcp.py#L1222) attempt to assign an IP address to the DHCP port and that may trigger this bug.

Based on this, the action plan discussed today during L3 sub-team meeting is:

1) Confirm in Kibana, with other occurrences of this bug, the cause of this bug
2
) If confirmed, we are going to refine this query to exclude DHCP ports: https://github.com/openstack/neutron/blob/master/neutron/db/ipam_pluggable_backend.py#L461. With this, only the DHCP linux driver will assign IP address to a DHCP port on a dhcpv6-stateless subnet

Revision history for this message
Miguel Lavalle (minsel) wrote :

Checking another instance of this bug (http://logs.openstack.org/56/468056/3/experimental/gate-tempest-dsvm-neutron-pecan-ubuntu-xenial/2a3dadb/console.html#_2017-05-26_19_15_56_021064), found a very similar situation: the subnet creation request (http://paste.openstack.org/show/610774/) fails when the create dhcp port request for that subnet succeeds (http://paste.openstack.org/show/610773/).

So we can try refining query to exclude DHCP ports: https://github.com/openstack/neutron/blob/master/neutron/db/ipam_pluggable_backend.py#L461. With this, only the DHCP linux driver will assign IP address to a DHCP port on a dhcpv6-stateless subnet.

This still leaves open the question, though, as to why both requests allocate the same address....

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/469669

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Miguel Lavalle (<email address hidden>) on branch: master
Review: https://review.openstack.org/462716
Reason: This patchset already fulfilled its destiny during the Neutron on-boarding session in the Boston Summit 2017

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/469669
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=802cf1194218dc125b7211d9c48df8cfebd4f932
Submitter: Jenkins
Branch: master

commit 802cf1194218dc125b7211d9c48df8cfebd4f932
Author: Miguel Lavalle <email address hidden>
Date: Wed May 31 16:38:57 2017 -0500

    Fix race between create subnet and port requests

    When creating an IPv6 auto-address subnet for a network, ports can be
    created or updated concurrently in the same network, before the subnet
    creation concludes. In such a situation, an IpAddressAlreadyAllocated
    exception may be raised, because the subnet create request attempts to
    allocate auto addresses to the concurrently created or updated ports,
    which have been already allocated by the port requests.

    This patchset adds code to the IPAM to skip the attempt to assign the
    auto address to ports if they already have received them.

    Change-Id: If1eb4046865f43b15ba97c52e2d0b9343dc72c19
    Closes-Bug: #1655567

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.0.0b3

This issue was fixed in the openstack/neutron 11.0.0.0b3 development milestone.

tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/509126

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/509126
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=68f91293d305706806ea7efe961ff36dda044d70
Submitter: Zuul
Branch: stable/ocata

commit 68f91293d305706806ea7efe961ff36dda044d70
Author: Miguel Lavalle <email address hidden>
Date: Wed May 31 16:38:57 2017 -0500

    Fix race between create subnet and port requests

    When creating an IPv6 auto-address subnet for a network, ports can be
    created or updated concurrently in the same network, before the subnet
    creation concludes. In such a situation, an IpAddressAlreadyAllocated
    exception may be raised, because the subnet create request attempts to
    allocate auto addresses to the concurrently created or updated ports,
    which have been already allocated by the port requests.

    This patchset adds code to the IPAM to skip the attempt to assign the
    auto address to ports if they already have received them.

    Conflicts:
     neutron/db/ipam_pluggable_backend.py

    Change-Id: If1eb4046865f43b15ba97c52e2d0b9343dc72c19
    Closes-Bug: #1655567
    (cherry picked from commit 802cf1194218dc125b7211d9c48df8cfebd4f932)

tags: added: in-stable-ocata
tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 10.0.5

This issue was fixed in the openstack/neutron 10.0.5 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.