Neutron allows to create two subnets with same CIDR in a network through heat

Bug #1852777 reported by aditya_reddy.nagaram@nuagenetworks.net
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Rodolfo Alonso

Bug Description

If I use heat to create a network, with overlapping subnet CIDR, we will not get an error from Neutron that there is an overlap.

There is an example heat template attached. In my environment, Out of 10 times only two times Neutron reported error of overlapping and in all other cases the stack create was successful.

stack@ubuntu:~$ openstack stack list
+--------------------------------------+----------------------+----------------------------------+-----------------+----------------------+--------------+
| ID | Stack Name | Project | Stack Status | Creation Time | Updated Time |
+--------------------------------------+----------------------+----------------------------------+-----------------+----------------------+--------------+
| 26f32175-c5e8-49e2-abde-75bd2e1d3b3a | overlapping-subnets9 | c48d7b879e40472e8e1a070918abf8c5 | CREATE_FAILED | 2019-11-15T17:16:30Z | None |
| 158c6c2f-ac9b-4131-ac9d-54cabfccf64c | overlapping-subnets8 | c48d7b879e40472e8e1a070918abf8c5 | CREATE_COMPLETE | 2019-11-15T17:16:26Z | None |
| cab371f6-6aeb-43af-ab2a-4c1c1452d253 | overlapping-subnets7 | c48d7b879e40472e8e1a070918abf8c5 | CREATE_COMPLETE | 2019-11-15T17:16:22Z | None |
| 480cd3db-395d-4de9-a8e4-27c8d08e6174 | overlapping-subnets6 | c48d7b879e40472e8e1a070918abf8c5 | CREATE_COMPLETE | 2019-11-15T17:16:19Z | None |
| e4409fc6-e3b4-4664-93a0-648b31ae80ee | overlapping-subnets5 | c48d7b879e40472e8e1a070918abf8c5 | CREATE_COMPLETE | 2019-11-15T17:16:16Z | None |
| 45552045-ec57-4fc4-b5b6-f8886da19521 | overlapping-subnets4 | c48d7b879e40472e8e1a070918abf8c5 | CREATE_COMPLETE | 2019-11-15T17:16:11Z | None |
| ec3f2c27-7306-47ee-a501-97d246fc7fa9 | overlapping-subnets3 | c48d7b879e40472e8e1a070918abf8c5 | CREATE_COMPLETE | 2019-11-15T17:16:08Z | None |
| 15050524-4711-490d-b344-d1a5be376ca8 | overlapping-subnets2 | c48d7b879e40472e8e1a070918abf8c5 | CREATE_COMPLETE | 2019-11-15T17:16:04Z | None |
| da6b235a-83c2-44d4-8e73-3243be310bc1 | overlapping-subnets1 | c48d7b879e40472e8e1a070918abf8c5 | CREATE_FAILED | 2019-11-15T17:16:01Z | None |
| c596b822-d57f-4160-b03b-6f02711fc003 | overlapping-subnets | c48d7b879e40472e8e1a070918abf8c5 | CREATE_COMPLETE | 2019-11-15T17:15:58Z | None |
+--------------------------------------+----------------------+----------------------------------+-----------------+----------------------+--------------+

Output from the neutron net-list which validates this:

stack@ubuntu:~$ neutron net-list
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+--------------------+----------------------------------+----------------------------------------------------------+
| id | name | tenant_id | subnets |
+--------------------------------------+--------------------+----------------------------------+----------------------------------------------------------+
| 0396cfc9-3f7c-4562-82cf-1273178acafd | overlappingsubnets | c48d7b879e40472e8e1a070918abf8c5 | 10db8489-a18a-438d-b905-fc92e900575f 1.1.1.0/24 |
| | | | 8b032907-cc55-4a65-b4a5-58ca539a7f8b 1.1.1.0/24 |
| 130af1a9-79bd-493f-ae85-a72ebb1aad9d | overlappingsubnets | c48d7b879e40472e8e1a070918abf8c5 | 0f178ef1-52d0-46ac-a42d-82123ffbf9fa 1.1.1.0/24 |
| | | | 233674e8-fef3-4294-9368-7d8f1333630b 1.1.1.0/24 |
| 2938ccd5-aeff-41d0-b675-fccd08acbf77 | overlappingsubnets | c48d7b879e40472e8e1a070918abf8c5 | 2442ddca-b07e-408a-99a2-d22f60527a87 1.1.1.0/24 |
| | | | 79e6d335-5c25-48bd-8847-9ae065b8c92d 1.1.1.0/24 |
| 3af1150b-cf47-4805-b560-72f9770487f5 | overlappingsubnets | c48d7b879e40472e8e1a070918abf8c5 | 727bc7cf-767a-43e3-89fd-6aa5a0025a25 1.1.1.0/24 |
| 46f8d80e-59ef-4d47-a0e9-ad7bb2ced57f | overlappingsubnets | c48d7b879e40472e8e1a070918abf8c5 | 8a41dca0-36ba-4d4c-a159-450635231d85 1.1.1.0/24 |
| | | | b5630214-d115-4bd2-9ab5-eb66e479fc71 1.1.1.0/24 |
| 480a1606-6cd9-4736-a045-8e799df941bd | public | c48d7b879e40472e8e1a070918abf8c5 | 83bb3149-61cd-4eca-814a-6c20f1cda09b 2001:db8::/64 |
| | | | 6697c90f-1b40-47ff-bbea-f8472017727a 172.24.4.0/24 |
| 4e7cdf4d-a974-4cf7-a6c6-9e20a5089517 | overlappingsubnets | c48d7b879e40472e8e1a070918abf8c5 | 65173484-abbf-4830-9b5a-c796bdc842ec 1.1.1.0/24 |
| 53c05f05-0025-422f-941c-fe0ca8325424 | private | 34976b5dfd674612bb21b9816c37d303 | 26140d62-0252-4607-85cb-a622b1fa0a2a fda7:b200:16ae::/64 |
| | | | 235d7d7b-0db0-49cf-92b8-c13ee5f017e8 10.0.0.0/26 |
| 679b54e0-0c04-49a6-9ebd-93f9fe4f0ec9 | overlappingsubnets | c48d7b879e40472e8e1a070918abf8c5 | 0e9cb173-8d07-4c6e-aea1-3f41e25e8438 1.1.1.0/24 |
| | | | 89a37360-1f9e-47cf-b363-ebe0bddd796d 1.1.1.0/24 |
| 7a4ced41-0ff0-4a9f-a2a3-2c2e225fc2c8 | shared | c48d7b879e40472e8e1a070918abf8c5 | 08f80ff4-4778-4a0b-9318-4576e09ee497 192.168.233.0/24 |
| 9812fb31-a4f9-4391-9d6d-c9af981ea62d | overlappingsubnets | c48d7b879e40472e8e1a070918abf8c5 | 8e87054d-2abb-4ee3-87f3-e36c7d8a2a01 1.1.1.0/24 |
| | | | 556ec8c2-607a-49fd-b22d-0853bec210fb 1.1.1.0/24 |
| e7ab5e15-065c-436b-866a-9eb9160a6d7f | overlappingsubnets | c48d7b879e40472e8e1a070918abf8c5 | 3551ceb9-773d-4d19-bedd-bdd52fef4ddf 1.1.1.0/24 |
| | | | 7bf86a4e-70fb-42c3-a641-a56591b3fb40 1.1.1.0/24 |
| e8af1dad-f43e-429d-9c72-aee9a8539abc | overlappingsubnets | c48d7b879e40472e8e1a070918abf8c5 | d1a0d2ea-45d5-44da-b3b7-ee2ef6d7d92c 1.1.1.0/24 |
| | | | 730dde9f-b087-4f5c-a92c-7fde0770be7b 1.1.1.0/24 |
+--------------------------------------+--------------------+----------------------------------+----------------------------------------------------------+

It can be reproduced on neutron from Stein to the master on the standard devstack setup with heat. In neutron.conf I have api_workers = 2

Mostly there is some missing locking on network which is causing this issue.

If any further information is needed on this bug please let me know. I hople I have included the relevant info.

Revision history for this message
aditya_reddy.nagaram@nuagenetworks.net (adityarn) wrote :
information type: Public → Public Security
information type: Public Security → Private Security
information type: Private Security → Public
description: updated
description: updated
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Does anyone have the environment setup with heat to triage it and confirm this bug. Seems to be easily reproduceable.

Revision history for this message
Bence Romsics (bence-romsics) wrote :
Download full text (3.7 KiB)

I was able to reproduce it the duplicate subnet:

$ openstack stack create s0 --wait --template bug-1852777.yaml
2019-11-18 12:03:54Z [s0]: CREATE_IN_PROGRESS Stack CREATE started
2019-11-18 12:03:54Z [s0.overlapping_net]: CREATE_IN_PROGRESS state changed
2019-11-18 12:03:55Z [s0.overlapping_net]: CREATE_COMPLETE state changed
2019-11-18 12:03:55Z [s0.subnet4]: CREATE_IN_PROGRESS state changed
2019-11-18 12:03:55Z [s0.subnet4-2]: CREATE_IN_PROGRESS state changed
2019-11-18 12:03:55Z [s0.subnet4]: CREATE_COMPLETE state changed
2019-11-18 12:03:55Z [s0.subnet4-2]: CREATE_COMPLETE state changed
2019-11-18 12:03:55Z [s0]: CREATE_COMPLETE Stack CREATE completed successfully
+---------------------+--------------------------------------------------+
| Field | Value |
+---------------------+--------------------------------------------------+
| id | 2ac750f7-03f5-4090-97e8-d7f24d51db76 |
| stack_name | s0 |
| description | Template to create network with overlapping CIDR |
| creation_time | 2019-11-18T12:03:54Z |
| updated_time | None |
| stack_status | CREATE_COMPLETE |
| stack_status_reason | Stack CREATE completed successfully |
+---------------------+--------------------------------------------------+
rubasov@devstack0:~/src/os/openstack/devstack$ openstack stack resource list s0
+-----------------+--------------------------------------+---------------------+-----------------+----------------------+
| resource_name | physical_resource_id | resource_type | resource_status | updated_time |
+-----------------+--------------------------------------+---------------------+-----------------+----------------------+
| overlapping_net | ee0d190b-b153-4382-91ad-c8a039474a7c | OS::Neutron::Net | CREATE_COMPLETE | 2019-11-18T12:03:54Z |
| subnet4 | f1da4d08-0c01-4053-adde-08cfa9fa60b5 | OS::Neutron::Subnet | CREATE_COMPLETE | 2019-11-18T12:03:54Z |
| subnet4-2 | fe75590a-f44e-42ae-ba12-402545ae424b | OS::Neutron::Subnet | CREATE_COMPLETE | 2019-11-18T12:03:54Z |
+-----------------+--------------------------------------+---------------------+-----------------+----------------------+
rubasov@devstack0:~/src/os/openstack/devstack$ openstack subnet list
+--------------------------------------+---------------------------+--------------------------------------+---------------------+
| ID | Name | Network | Subnet |
+--------------------------------------+---------------------------+--------------------------------------+---------------------+
| 15013415-40e6-4846-9b76-045d1cb43491 | ipv6-private-subnet | 08e64892-fc22-496d-8767-894f80338cf5 | fd66:9bee:6a5d::/64 |
| 19d837a8-ba67-4315-a328-5bd2086182bf | ipv6-public-subnet | 1e913587-0dd5-429a-a050-c548a732cbb5 | 2001:db8::/64 |
| 377c453a-bf38-4ff9-aff0-45acfc30c...

Read more...

Changed in neutron:
status: New → Confirmed
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Above I used neutron master @ 361b216a8712.

Changed in neutron:
importance: Undecided → High
tags: added: api
tags: added: l3-ipam-dhcp
Revision history for this message
LIU Yulong (dragon889) wrote :

I may say multiple API workers all passed the CIDR check, so a UNIQUE CONSTRAINT should be added to the 'network_id' and 'cidr' for the table 'subnets'.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@LIU: unique constraint will not solve the problem as You may still have overlapping cidrs which in db will be unique. e.g. 10.0.0.0/16 and 10.0.1.0/24

Revision history for this message
aditya_reddy.nagaram@nuagenetworks.net (adityarn) wrote :

Also some more info which might be useful in solving the issue is that this very specific scenario works fine on Queens But from Stein on wards I observed this behavior. So looking in to what was changed in terms on DB locking from Queens to Stein would give better understanding of the issue.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

As Slawek commented, a DB constrain won't help with this problem.

It makes sense that two API workers, at the same time, can open a writer context to create a subnet, passing both the same check. The code looks pretty the same since Queens, apart from the facade changes. The writer context starts here [1][2]; then "_save_subnet" [3].

The only change I see here is [4], that can reduce the execution speed when retrieving AddressScope. But I've tested removing this and the result is the same. Furthermore we must ensure this won't happen regardless of the execution speed.

I'll try by locking the subnet check execution or the subnet creation itself.

Regards.

[1] https://github.com/openstack/neutron/blob/stable/queens/neutron/db/db_base_plugin_v2.py#L819
[2] https://github.com/openstack/neutron/blob/stable/queens/neutron/db/ipam_pluggable_backend.py#L509
[3] https://github.com/openstack/neutron/blob/stable/queens/neutron/db/ipam_backend_mixin.py#L502
[4] https://review.opendev.org/#/c/667511/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/695060

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
status: Confirmed → In Progress
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

I've tested with https://review.opendev.org/#/c/695060/ and now the stack fails: http://paste.openstack.org/show/786357/

Of course, in case of multiple subnet creation in the same network, this will slow down the process, but I don't see another solution.

Revision history for this message
aditya_reddy.nagaram@nuagenetworks.net (adityarn) wrote :

@rodolfo-alonso-hernandez I tried the patchset, but did not work for me and the stack was created successfully.

Revision history for this message
LIU Yulong (dragon889) wrote :

Alright, maybe introduce the tooz. A local file or memory lock does not work in multiple physical hosts.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/695060
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=397eb2a2febd234ba7246f40f950c3ed4202a3d5
Submitter: Zuul
Branch: master

commit 397eb2a2febd234ba7246f40f950c3ed4202a3d5
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Nov 21 09:55:45 2019 +0000

    Serialize subnet creating depending on the network ID

    Add a new DB table "network_subnet_lock". The primary key will be the
    network_id. When a subnet is created, inside the write context during
    the "subnet" object creation, a register in the mentioned table is
    created or updated. This will enforce the serialization of the "subnet"
    registers belonging to the same network, due to the write lock in the
    DB.

    This will solve the problem of attending several "subnet" creation
    requests, described in the related bug. If several subnets with the
    same CIDR are processed in parallel, the implemented logic won't reject
    them because any of them will not contain the information of each other.

    This DB lock will also work in case of distributed servers because the
    lock is not enforced in the server logic but in the DB backend.

    Change-Id: Iecbb096e0b7e080a3e0299ea340f8b03e87ddfd2
    Closes-Bug: #1852777

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 16.0.0.0b1

This issue was fixed in the openstack/neutron 16.0.0.0b1 development milestone.

tags: added: neutron-proactive-backport-potential
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

https://review.opendev.org/#/c/695060 includes a DB change. This solution can't be backported to stable branches.

If needed, another approach should be implemented.

Regards.

tags: removed: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.