Restrict netmask of CIDR to avoid DHCP resync

Bug #1443798 reported by watanabe.isao
264
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Security Advisory
Won't Fix
High
Unassigned
neutron
Fix Released
High
watanabe.isao
Icehouse
Incomplete
Undecided
Unassigned
Juno
Incomplete
Undecided
Unassigned
Kilo
Fix Released
High
Akihiro Motoki

Bug Description

If any tenant creates a subnet with a netmask of 31 or 32 in IPv4,
IP addresses of network will fail to be generated, and that
will cause constant resyncs and neutron-dhcp-agent malfunction.

[Example operation 1]
 - Create subnet from CLI, with CIDR /31 (CIDR /32 has the same result).

$ neutron subnet-create net 192.168.0.0/31 --name sub
Created a new subnet:
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| allocation_pools | |
| cidr | 192.168.0.0/31 |
| dns_nameservers | |
| enable_dhcp | True |
| gateway_ip | 192.168.0.1 |
| host_routes | |
| id | 42a91f59-1c2d-4e33-9033-4691069c5e4b |
| ip_version | 4 |
| ipv6_address_mode | |
| ipv6_ra_mode | |
| name | sub |
| network_id | 65cc6b46-17ec-41a8-9fe4-5bf93fc25d1e |
| subnetpool_id | |
| tenant_id | 4ffb89e718d346b48fdce2ac61537bce |
+-------------------+--------------------------------------+

[Example operation 2]
 - Create subnet from API, with cidr /32 (CIDR /31 has the same result).

$ curl -i -X POST -H "content-type:application/json" -d '{"subnet": { "name": "badsub", "cidr" : "192.168.0.0/32", "ip_version": 4, "network_id": "8
8143cda-5fe7-45b6-9245-b1e8b75d28d8"}}' -H "x-auth-token:$TOKEN" http://192.168.122.130:9696/v2.0/subnets
HTTP/1.1 201 Created
Content-Type: application/json; charset=UTF-8
Content-Length: 410
X-Openstack-Request-Id: req-4e7e74c0-0190-4a69-a9eb-93d545e8aeef
Date: Thu, 16 Apr 2015 19:21:20 GMT

{"subnet": {"name": "badsub", "enable_dhcp": true, "network_id": "88143cda-5fe7-45b6-9245-b1e8b75d28d8", "tenant_id": "4ffb89e718d346b48fdce2ac61537bce", "dns_nameservers": [], "gateway_ip": "192.168.0.1", "ipv6_ra_mode": null, "allocation_pools": [], "host_routes": [], "ip_version": 4, "ipv6_address_mode": null, "cidr": "192.168.0.0/32", "id": "d210d5fd-8b3b-4c0e-b5ad-41798bd47d97", "subnetpool_id": null}}

[Example operation 3]
 - Create subnet from API, with empty allocation_pools.

$ curl -i -X POST -H "content-type:application/json" -d '{"subnet": { "name": "badsub", "cidr" : "192.168.0.0/24", "allocation_pools": [], "ip_version": 4, "network_id": "88143cda-5fe7-45b6-9245-b1e8b75d28d8"}}' -H "x-auth-token:$TOKEN" http://192.168.122.130:9696/v2.0/subnets
HTTP/1.1 201 Created
Content-Type: application/json; charset=UTF-8
Content-Length: 410
X-Openstack-Request-Id: req-54ce81db-b586-4887-b60b-8776a2ebdb4e
Date: Thu, 16 Apr 2015 19:18:21 GMT

{"subnet": {"name": "badsub", "enable_dhcp": true, "network_id": "88143cda-5fe7-45b6-9245-b1e8b75d28d8", "tenant_id": "4ffb89e718d346b48fdce2ac61537bce", "dns_nameservers": [], "gateway_ip": "192.168.0.1", "ipv6_ra_mode": null, "allocation_pools": [], "host_routes": [], "ip_version": 4, "ipv6_address_mode": null, "cidr": "192.168.0.0/24", "id": "abc2dca4-bf8b-46f5-af1a-0a1049309854", "subnetpool_id": null}}

[Trace log]
2015-04-17 04:23:27.907 16641 DEBUG oslo_messaging._drivers.amqp [-] UNIQUE_ID is e0a6a81a005d4aa0b40130506afa0267. _add_unique_id /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqp.py:258
2015-04-17 04:23:27.979 16641 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 88143cda-5fe7-45b6-9245-b1e8b75d28d8.
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent Traceback (most recent call last):
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/dhcp/agent.py", line 112, in call_driver
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs)
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 201, in enable
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent interface_name = self.device_manager.setup(self.network)
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 928, in setup
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent port = self.setup_dhcp_port(network)
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 909, in setup_dhcp_port
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent dhcp_port = self.plugin.create_dhcp_port({'port': port_dict})
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/dhcp/agent.py", line 433, in create_dhcp_port
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent port=port, host=self.host)
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 156, in call
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent retry=self.retry)
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent timeout=timeout, retry=retry)
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent retry=retry)
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 341, in _send
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent raise result
2015-04-17 04:23:27.979 16641 TRACE neutron.agent.dhcp.agent RemoteError: Remote error: IpAddressGenerationFailure No more IP addresses available on network 88143cda-5fe7-45b6-9245-b1e8b75d28d8.

Changed in neutron:
assignee: nobody → watanabe.isao (watanabe.isao)
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Changed in ossa:
status: New → Incomplete
description: updated
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

I think this is a valid DOS vector, and there no mitigation that deployers can put into place.
The problem has always been latent but I reckon it's been made evident by a recent change in the neutron source code tree that causes the DCHP port to be created as soon as the subnet is created.

I would however kindly seek validation from another coresec before confirming the security threat.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Is the impact of this "neutron-dhcp-agent malfunction" limited to its functionality for the tenant who added this subnet, or does it cause problems for other tenants as well?

Revision history for this message
watanabe.isao (watanabe.isao) wrote :

Hello, @Jeremy Stanley (fungi)

To other tenants as well.
When re-syncing, something like port create and nova boot will fail.
In a large environment, the re-syncing will last up to 3 or 4 minutes. This means if a re-sync occurred, we will almost not be able to boot any more instances for all tenants.

Hello, @Salvatore Orlando (salvatore-orlando)

Because this is a ordinarily user enabled operation(CLI and API as well), which affects whole system, so I put it as a security problem. If I'm not marking the problem in the right way, please tell me.

Hello, ALL

I will add a patch to this bug report ASAP, which will be well tested in master branch.
The fix is mentioned in the title, which is to add a restrict on netmask of 31 add 32 when IPv4, and 63, 64 when IPv6.
I really need this bug fixed in kilo or stable/kilo.1. If there is any problem with this, please tell me.
Thank you very much.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Is this also affecting Juno and Icehouse ?

@watanabe: the proper process is to fix affected stable releases as well and ideally give some time to stakeholders (3 to 5 working days) before making this public.

I guess if the fix is to add a couple of checks to validate dhcp netmask it can land in kilo, depending on how fast we can get a working fix... What do you think Salvatore ?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

@Jeremy: the effect to other tenants may be in the form of increased delay in server response time; I don't think this can really be classified as Denial Of Service Attack: sure, lots of invalid subnets can be created, but there're a quota involved per tenant.

Having said that, this belongs to the class of validation bugs as raised here:

https://bugs.launchpad.net/neutron/+bug/1271311
https://bugs.launchpad.net/neutron/+bug/1333134
https://bugs.launchpad.net/neutron/+bug/1362651

To the best of my knowledge, none of them got were treated as security vulnerabilities, so I'd recommend to open this up so that we can fix it in time for the Kilo release. The fix should also be fairly trivial and easier to backport.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

@Salvatore: I can confirm this issue occurs for both /31 and /32 subnets. Having said that this is what is shown from the CLI, when using a /32 subnet:

https://github.com/openstack/python-neutronclient/blob/master/neutronclient/neutron/v2_0/subnet.py#L206

This is silly if you ask me, but apparently the user (when using the CLI) is warned that they are about to shoot themselves in the foot.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

WRT to comment #4, I don't believe that the situation is as bad as depicted:

a) DHCP provisioning is not in the control path for booting a VM; so a VM would boot fine, but the effect might be lack of dynamic IP allocation

b) VM's with static IP configuration (e.g. via config-drive) would be unaffected.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

To me these subnets /32, /31, /64 are reasonable subnets and they can work without DHCP, therefore I would oppose a blanket limitation without taking this into account. For what it's worth, it might be worthwhile figuring out how we could avoid the resync loop from the agent side.

Revision history for this message
watanabe.isao (watanabe.isao) wrote :

Hello, @Armando Migliaccio (armando-migliaccio)

Thank you for your help.
Please let me correct my words in #4.
YES, VM would be boot fine, and YES, the effect would be lack of dynamic IP allocation.
The problem only affect new created VMs with dynamic IP allocation, that no IPs will be allocated due to the time out of the allocation.

About your comment in #10, that we should find a way to avoid the resync loop from the agent side. I couldn't agree with it any more. Which it is my first thought, too.
But due to the time limit, I would like to give a first add first.
/64 is already been restricted, so I would like to leave it as it is, if there is not any other request.
/32, /31 is reasonable when enable_dhcp is False, but after the creation, if someone updated the subnet enable_dhcp to True, the resync will happen, too.

So the only way I got is to restrict /32, /31 subnet creation, until we found a way to avoid the resync loop from the agent side.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

It should be easy to restrict only for subnets with dhcp enabled. The check should just occur on update_subnet as well, raising if enable_dhcp is true and the cidr is /32 or /31 and ip_version=4.

The resulting patch will still be very small.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Thanks Armando for digging that up. My only concern is that this report mention a re-sync loop, which would be different from the one-time outage observed in bug/1333134 for example.

If a user repeatedly creates (and deletes to avoid quota limitation) networks, would that make the dhcp server un-available to other tenants for an extended period of time ?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Hi Tristan,

I guess that the most resembling bug is this one:

https://bugs.launchpad.net/neutron/+bug/1362651

The failure mode of bug 1362651 and this one are the same, as I understand the issue. There's a danger of causing a dhcp server outage; bear in mind that anyone with a pinch of salt would run more than one agent, therefore the failure domain is isolated to the agent that's picked up the poisonous apple.

Revision history for this message
watanabe.isao (watanabe.isao) wrote :

To #14
Hello, @alvatore Orlando (salvatore-orlando)
Thank you for your time and review.
I will change the fix to:
1) Restrict subnet create, if enable_dhcp is true and cidr is /32 or/31.
2) Restrict subnet update, if enable_dhcp is changing to true and there is not enough IPs in allocation_pools.

About 2), I found there is another way to cause the DHCP resync from API. I was thinking to create another security bug. But seems like I can close them all in this report.

description: updated
Revision history for this message
watanabe.isao (watanabe.isao) wrote :

Update operation Example.

Revision history for this message
watanabe.isao (watanabe.isao) wrote :

Confirmed same issue will NOT happen in stable/juno and stable/icehouse

[In stable/icehouse]
Port will not be automatically allocated when subnet create and update.

[In stable/juno]
Operation on API and CLI is tested, that
 - If CIDR is /31 the following exception will return:
   "Invalid input for operation: Gateway is not valid on subnet."
 - If CIDR is /32 API returns:
   "Invalid input for operation: Gateway is not valid on subnet."
   and CLI returns:
   "An IPv4 subnet with a /32 CIDR will have only one usable IP address so the device attached to it will not have any IP connectivity.
Invalid input for operation: Gateway is not valid on subnet."

Revision history for this message
Jeremy Stanley (fungi) wrote :

If this really is only impacting master (not yet released kilo) as a vulnerability, we should switch the bug to public immediately if it can be safely fixed before release.

Revision history for this message
watanabe.isao (watanabe.isao) wrote :

Hello, @Salvatore Orlando (salvatore-orlando), @Armando Migliaccio (armando-migliaccio)
It is my first time to handle security bug. I'm not sure if I can click the [change Information type] button.
However, I think before changing the state, your approval is necessary for me, due to #20.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

The issue may not arise in stable/juno or stable/icehouse during subnet creation, but they should occur on the creation of the first user port on the network. I am a bit puzzled as to why, using the stable releases, the creation of subnets of such types is indeed prevented.

That said, change [1] merged during Kilo, which altered the time when the dhcp port is provisioned. My recommendation is to make the bug public, as we did for the others and fix it with the normal procedure.

[1] https://review.openstack.org/#/c/163672/

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Ok, so I could verify that the issue is non-existent on the stable branches, so please let's open this one up.

Having said that, this check:

https://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_v2.py#L1060

Is still in place for master, so we'd need to figure out why this is not being invoked, and if this is a regression of some sort.

Revision history for this message
watanabe.isao (watanabe.isao) wrote :

To #17
To ALL

Update patch.
Add CLI&API result as well.
Also, the patch pass the UT well, but I would like to skip pasting the result here, because it is too long.

Will hide #12 and #13, which hold old patch.

Revision history for this message
watanabe.isao (watanabe.isao) wrote :
Revision history for this message
watanabe.isao (watanabe.isao) wrote :

Sorry. I didn't know I only can attach one file at one time...

Revision history for this message
watanabe.isao (watanabe.isao) wrote :

Hello, @Armando Migliaccio (armando-migliaccio), @Jeremy Stanley (fungi)

Thank you for the confirmation and approval.
Open this bug up.
And will throw the fix to Gerrit ASAP.

information type: Private Security → Public Security
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/174228

Changed in neutron:
status: New → In Progress
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

I've added the kilo-rc-potential tag, should this be proposed to stable/kilo in order to get that change in the release ?

tags: added: kilo-backport-potential kilo-rc-potential
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Watanabe, I'd like to understand how the regression occurred in Kilo before we can devise the best fix forward: in Juno we are unable to create a subnet for a wrong Gateway IP. Did you determine how the logic changed?

Revision history for this message
Kyle Mestery (mestery) wrote :

This bug has no priority and we're cutting the RC tomorrow. How critical is this to hold the RC for?

Revision history for this message
Jeremy Stanley (fungi) wrote :

The alternative is that, assuming it's confirmed to be an exploitable condition in the kilo release (it sounds like it is?), it gets fixed in master and backported after release... at which point we'll need to issue an advisory because it was a vulnerability in a supported release at that point. The VMT can accommodate this if it's what's needed.

Revision history for this message
watanabe.isao (watanabe.isao) wrote :

Hello, @Armando Migliaccio (armando-migliaccio)
Sir, so far, I'm still looking for the root for this level-down problem.
I agree with @Salvatore Orlando (salvatore-orlando)'s thought in #2, that someone change in the neutron source code tree that causes the DCHP port to be created as soon as the subnet is created.
So I think this problem will not bother Gateway IP so much. The restrict for gateway IP works fine in Juno and kilo.
However, we had many enhancements and changes in kilo. It looks like I 'm looking for a needle in ocean. :-)

Hello, @Kyle Mestery (mestery)
Sir, I just post this bug the day before yesterday. I'm not sure if I lost any step to get a neutron priority.
As the test result I post in #19, I treat this problem as a Neutron Kilo level down issue.
I'm trying my best to update the fix on gerrit.
It would be my great honer if you could have a look at the fix.

Revision history for this message
Yongli He (yongli-he) wrote : Re: [Bug 1443798] Re: Restrict netmask of CIDR to avoid DHCP resync

在 2015年04月16日 14:56, watanabe.isao 写道:
> Hello, @Armando Migliaccio (armando-migliaccio), @Jeremy Stanley (fungi)
>
> Thank you for the confirmation and approval.
> Open this bug up.
> And will throw the fix to Gerrit ASAP.
>
> ** Information type changed from Private Security to Public Security
>
http://item.taobao.com/item.htm?spm=a1z10.5-c.w4002-2703967740.40.Jg94sb&id=41379977247

Revision history for this message
watanabe.isao (watanabe.isao) wrote :

Hello, @Yongli He (yongli-he)

Sorry, I'm not sure your URL is correct. What does it stand for, please?

Kyle Mestery (mestery)
Changed in neutron:
importance: Undecided → High
Changed in neutron:
assignee: watanabe.isao (watanabe.isao) → Kyle Mestery (mestery)
Revision history for this message
Kyle Mestery (mestery) wrote :

Thanks for proposing the fix watanabe! I just rebased it, as it was failing to do a fix which went in yesterday. Hopefully it passes Jenkins and we can get Salvatore to take another peek at it. If we merge it today, we can get it into the rebase and avoid the advisory.

Changed in neutron:
milestone: none → liberty-1
Changed in neutron:
assignee: Kyle Mestery (mestery) → Andrew Boik (drewboik)
Revision history for this message
Andrew Boik (drewboik) wrote :

Hi Watanabe,

I updated your patch to address IPv6 subnets as it seems the problem exists with them as well. Please feel free to reassign the bug back to yourself.

Changed in neutron:
assignee: Andrew Boik (drewboik) → nobody
Changed in neutron:
assignee: nobody → watanabe.isao (watanabe.isao)
Jeremy Stanley (fungi)
description: updated
Thierry Carrez (ttx)
Changed in ossa:
importance: Undecided → High
status: Incomplete → Confirmed
Thierry Carrez (ttx)
no longer affects: neutron/kilo
Revision history for this message
Kyle Mestery (mestery) wrote :

We need this to be released for Kilo, so we need to hold RC2 until the fix merges in master and can be back ported to stable/kilo.

Changed in neutron:
milestone: liberty-1 → kilo-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/176595

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/174228
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0c1f96ad5a6606c1205bd50ea944c3a383892cde
Submitter: Jenkins
Branch: master

commit 0c1f96ad5a6606c1205bd50ea944c3a383892cde
Author: watanabe.isao <email address hidden>
Date: Wed Apr 15 15:48:08 2015 +0900

    Restrict subnet create/update to avoid DHCP resync

    As we know, IPs in subnet CIDR are used for
    1) Broadcast port
    2) Gateway port
    3) DHCP port if enable_dhcp is True, or update to True
    4) Others go into allocation_pools
    Above 1) to 3) are created by default, which means if CIDR doesn't
    have that much of IPs, subnet create/update will cause a DHCP resync.

    This fix is to add some restricts to the issue:
    A) When subnet create, if enable_dhcp is True, /31 and /32
       cidrs are forbidden for IPv4 subnets while /127 and /128 cidrs are
       forbidden for IPv6 subnets.
    B) When subnet update, if enable_dhcp is changing to True and there are no
       more IPs in allocation_pools, the request should be denied.

    Change-Id: I2e4a4d5841b9ad908f02b7d0795cba07596c023d
    Co-authored-by: Andrew Boik <email address hidden>
    Closes-Bug: #1443798

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/176627

Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-rc2 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/176627
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0536ec113bc438265ba547bb8a8006aa96e646e3
Submitter: Jenkins
Branch: stable/kilo

commit 0536ec113bc438265ba547bb8a8006aa96e646e3
Author: watanabe.isao <email address hidden>
Date: Wed Apr 15 15:48:08 2015 +0900

    Restrict subnet create/update to avoid DHCP resync

    As we know, IPs in subnet CIDR are used for
    1) Broadcast port
    2) Gateway port
    3) DHCP port if enable_dhcp is True, or update to True
    4) Others go into allocation_pools
    Above 1) to 3) are created by default, which means if CIDR doesn't
    have that much of IPs, subnet create/update will cause a DHCP resync.

    This fix is to add some restricts to the issue:
    A) When subnet create, if enable_dhcp is True, /31 and /32
       cidrs are forbidden for IPv4 subnets while /127 and /128 cidrs are
       forbidden for IPv6 subnets.
    B) When subnet update, if enable_dhcp is changing to True and there are no
       more IPs in allocation_pools, the request should be denied.

    Change-Id: I2e4a4d5841b9ad908f02b7d0795cba07596c023d
    Co-authored-by: Andrew Boik <email address hidden>
    Closes-Bug: #1443798
    (cherry picked from commit 0c1f96ad5a6606c1205bd50ea944c3a383892cde)

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Great work folks! So Kilo won't be affected.

It still unclear from above comments if Juno or Icehouse are affected, especially in the case of first port creation.
Is the stable/juno proposed backport relevant ? ( https://review.openstack.org/176595 )

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/juno)

Change abandoned by Aaron Rosen (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/176595
Reason: already merged.

Thierry Carrez (ttx)
tags: removed: kilo-backport-potential kilo-rc-potential
Revision history for this message
Yushiro FURUKAWA (y-furukawa-2) wrote :

Hi Neutron-team-members.

I'm Yushiro Furukawa co-worker of Watanabe Isao.
I found that it still reproduce following operations.

So, as soon as possible, I will fix the patch, and post on gerrit.

  solution A: Add validation of allocation_pools when create/update subnet
  solution B: When delete-port, non-admin user can not delete DHCP-port
              (DHCP-port means port attribute "device_owner" is "network:dhcp")

[Operations]
A. Specify "[]"(empty list) at "allocation_pools" when create/update-subnet
---------------------------------------------------------------------------
$ $ curl -X POST -d '{"subnet": {"name": "test_subnet", "cidr": "192.168.200.0/24", "ip_version": 4, "network_id": "649c5531-338e-42b5-a2d1-4d49140deb02", "allocation_pools": []}}' -H "x-auth-token:$TOKEN" -H "content-type:application/json" http://127.0.0.1:9696/v2.0/subnets

Then, the dhcp-agent creates own DHCP-port, it is reproduced resync bug.

B. Create port and exhaust allocation_pools
---------------------------------------------------------------
1. Create subnet with 192.168.1.0/24. And, DHCP-port has alteady created.
   gateway_ip: 192.168.1.1
   DHCP-port: 192.168.1.2
   allocation_pools{"start": 192.168.1.2, "end": 192.168.1.254}
   the number of availability ip_addresses is 252.

2. Create non-dhcp port and exhaust ip_addresses in allocation_pools
   In this case, user creates a port 252 times.
   the number of availability ip_addresses is 0.

3. User deletes the DHCP-port(192.168.1.2)
   the number of availability ip_addresses is 1.

4. User creates a non-dhcp port.
   the number of availability ports are 0.
   Then, dhcp-agent tries to create DHCP-port. It is reproduced resync bug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/177080

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Yushiro FURUKAWA (<email address hidden>) on branch: master
Review: https://review.openstack.org/177080
Reason: Please review https://review.openstack.org/#/c/177121/

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Doesn't scenario of enabling DHCP later for a subnet still apply to Juno and ealier? We could be too quick to make the bug public.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/179286

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Ihar, it is still unclear if this new scenario can cause significant service disruption and whenever it warrants an OSSA. See precedent in bug 1362651.

Until this is clarified, I put the OSSA task back to incomplete. Can Salvatore share his thought on this bug ?
Does the fix needs to be backported ?

Changed in ossa:
status: Confirmed → Incomplete
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (neutron-pecan)

Fix proposed to branch: neutron-pecan
Review: https://review.openstack.org/185072

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

An empty allocation pool [] is a valid value. However it's not compatible with DHCP.
I am not sure if this should return an error to the API caller, on result in a failure which is handled by the agent (thus avoiding the resync).
the fact that the DHCP needs an IP address from a pool is indeed an implementation feature of agent-based DHCP. I know neutron does not have any other kind of support for DHCP at the moment, but I'm not sure we should surface this implementation details at the API layer. (I'm not sure means that I'm thinking about it, not that I think it's wrong).

The fix should be backported, yes. I am ok with backporting it even if it changes API constraints, assuming the stable team is fine with that.

Also, I think, as Tristan said that the OSSA criteria for this bug should be those we established for its first occurrence, and for bug 1362651.

Finally, if the people looking after the DHCP agent would try and find a solution to avoid the perennial resync loop for good, that would be great.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Thanks Salvatore for the detail.

Changed in ossa:
status: Incomplete → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)
Download full text (11.7 KiB)

Reviewed: https://review.openstack.org/179286
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b8a09568b5bde8b71ad00468c27b757c98d9c0da
Submitter: Jenkins
Branch: master

commit 7260e0e3fc2ea479e80e0962624aca7fd38a1f60
Author: Henry Gessau <email address hidden>
Date: Mon Apr 27 09:59:21 2015 -0400

    Run radvd as root

    During the refactoring of external process management radvd lost
    its root privileges.

    Closes-bug: 1448813

    Change-Id: I84883fe81684afafac9b024282a03f447c8f825a
    (cherry picked from commit a5e54338770fc074e01fa88dbf909ee1af1b66b2)

commit d37e566dcadf8a540eb5f84b668847fa192393a1
Author: Kevin Benton <email address hidden>
Date: Fri Apr 24 00:35:31 2015 -0700

    Don't resync on DHCP agent setup failure

    There are various cases where the DHCP agent will try to
    create a DHCP port for a network and there will be a failure.
    This has primarily been caused by a lack of available IP addresses
    in the allocation pool. Trying to fix all availability corner cases
    on the server side will be very difficult due to race conditions between
    multiple ports being created, the dhcp_agents_per_network parameter, etc.

    This patch just stops the resync attempt on the agent side if a failure
    is caused by an IP address generation problem. Future updates to the subnet
    will cause another attempt so if the tenant does fix the issue they will
    get DHCP service.

    Change-Id: I0896730126d6dca13fe9284b4d812cfb081b6218
    Closes-Bug: #1447883
    (cherry picked from commit db9ac7e0110a0c2ef1b65213317ee8b7f1053ddc)

commit 38211ae67cb76ade85b08c028b6e88bfc867afc9
Author: Ihar Hrachyshka <email address hidden>
Date: Mon Apr 20 17:06:38 2015 +0200

    tests: confirm that _output_hosts_file does not log too often

    I3ad7864eeb2f959549ed356a1e34fa18804395cc didn't include any regression unit
    tests to validate that the method won't ever log too often again,
    reintroducing performance drop in later patches. It didn't play well
    with stable backports of the fix, where context was lost when doing the
    backport, that left the bug unfixed in stable/juno even though the patch
    was merged there [1].

    The patch adds an explicit note in the code that suggests not to add new
    log messages inside the loop to avoid regression, and a unit test was
    added to capture it.

    Once the test is merged in master, it will be proposed for stable/juno
    inclusion, with additional changes that would fix the regression again.

    Related-Bug: #1414218
    Change-Id: I5d43021932d6a994638c348eda277dd8337cf041
    (cherry picked from commit 3b74095a935f6d2027e6bf04cc4aa21f8a1b46f2)

commit 53b3e751f3c7b32bed48c14742d3dd3a1178d00d
Author: Maru Newby <email address hidden>
Date: Thu Apr 9 17:00:57 2015 +0000

    Double functional testing timeout to 180s

    The increase in ovs testing is resulting in job failure due to
    timeouts in test_killed_monitor_respawns. Giving the test more
    time to complete should reduce the failure rate.

    Change-Id: I2ba9b1eb388bfbbebbd6b0f3edb6d5a5ae0bfead
    Closes-Bug: #1442272
    (c...

Thierry Carrez (ttx)
Changed in neutron:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/qos)

Fix proposed to branch: feature/qos
Review: https://review.openstack.org/196097

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/qos)
Download full text (93.9 KiB)

Reviewed: https://review.openstack.org/196097
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1cfed745d54a6ce9cb3dd4e6f454666d9e6676c2
Submitter: Jenkins
Branch: feature/qos

commit ba7d673d1ddd5bfa5aa1be5b26a59e9a8cd78a9f
Author: Kevin Benton <email address hidden>
Date: Thu Jun 25 18:31:38 2015 -0700

    Remove duplicated call to setup_coreplugin

    The test case for vlan_transparent was calling setup_coreplugin
    before calling the super setUp method which already calls
    setup_coreplugin. This was causing duplicate core plugin fixtures
    which resulted in patching the dhcp periodic check twice.

    Change-Id: Ide4efad42748e799d8e9c815480c8ffa94b27b38
    Partial-Bug: #1468998

commit e64062efa3b793f7c4ce4ab9e62918af4f1bfcc9
Author: Kevin Benton <email address hidden>
Date: Thu Jun 25 18:29:37 2015 -0700

    Remove double mock of dhcp agent periodic check

    The test case for the periodic check was patching a target
    that the core plugin fixture already patched out. This removes
    that and exposes the mock from the fixture so the test case
    can reference it.

    Change-Id: I3adee6a875c497e070db4198567b52aa16b81ce8
    Partial-Bug: #1468998

commit 25ae0429a713143d42f626dd59ed4514ba25820c
Author: Kevin Benton <email address hidden>
Date: Thu Jun 25 18:24:10 2015 -0700

    Remove double fanout mock

    The test_mech_driver was duplicating a fanout mock already setup
    in the setUp routine.

    Change-Id: I5b88dff13113d55c72241d3d5025791a76672ac2
    Partial-Bug: #1468998

commit 993771556332d9b6bbf7eb3f0300cf9d8a2cb464
Author: Kevin Benton <email address hidden>
Date: Thu Jun 25 17:55:16 2015 -0700

    Remove double callback manager mocks

    setup_test_registry_instance() in the base test case class gives
    each test its own registry by mocking out the get_callback_manager.
    The L3 agent test cases were duplicating this.

    Partial-Bug: #1468998
    Change-Id: I7356daa846524611e9f92365939e8ad15d1e1cd8

commit 0be1efad93734f11cd63fb3b7bd2983442ce1268
Author: Kevin Benton <email address hidden>
Date: Thu Jun 25 16:57:30 2015 -0700

    Remove ensure_dirs double-patch

    test_spawn_radvd called mock.patch on ensure_dirs after the
    setup method already patched it out. This causes issues when
    mock.patch.stopall() is called because the mocks are stored
    as a set and are unwound in a non-deterministic fashion.[1]
    So some of the time they will be undone correctly, but others
    will leave a monkey-patched in mock, causing the ensure_dir
    test to fail.

    1. http://bugs.python.org/issue21239

    Closes-Bug: #1467908
    Change-Id: I321b5fed71dc73bd19b5099311c6f43640726cd4

commit 0a2238e34e72c17ca8a75e36b1f56e41a3ece74e
Author: Sukhdev Kapur <email address hidden>
Date: Thu Jun 25 15:11:28 2015 -0700

    Fix tenant-id in Arista ML2 driver to support HA router

    When HA router is created, the framework creates a network and does
    not specify the tenant-id. This casuse Arista ML2 driver to fail.
    This patch sets the tenant-id when it is not passed explicitly by
    by the network_create() call from the HA r...

tags: added: in-feature-qos
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/196701

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (feature/pecan)

Change abandoned by Kyle Mestery (<email address hidden>) on branch: feature/pecan
Review: https://review.openstack.org/196701
Reason: This is lacking the functional fix [1], so I'll propose a new merge commit which includes that one.

[1] https://review.openstack.org/#/c/196711/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/196920

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/pecan)
Download full text (171.5 KiB)

Reviewed: https://review.openstack.org/196920
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7f759c077f8f860c13db92d2ea6b353ef6b70900
Submitter: Jenkins
Branch: feature/pecan

commit 8123144fadd7c5d5e6e56a76ea860512619a2cf6
Author: Moshe Levi <email address hidden>
Date: Sun Jun 28 14:37:14 2015 +0300

    Fix Consolidate sriov agent and driver code

    This patch add mising __init to mech_sriov/mech_driver/
    and update the setup.cfg to the new agent entrypoint

    Trivial Fix

    Change-Id: I53a527081feb78472f496675bbb3c5121d38a14a

commit 8942fccf02e6e179d47582fdb2792a1ca972da21
Author: Assaf Muller <email address hidden>
Date: Mon Jun 29 11:38:51 2015 -0400

    Remove failing SafeFixture tests

    The fixtures 1.3 release attempted to fix the fixtures resource
    leak issue, but failed to do so completely. Our own SafeFixture
    is still needed: The 1.3 release broke our SafeFixture tests,
    but not the usage of SafeFixture itself. This patch removes
    those failing tests for now to unbreak the gate. Jakub reported
    a bug on fixtures 1.3:
    https://bugs.launchpad.net/python-fixtures/+bug/1469759

    We will continue to use SafeFixture until that bug is fixed
    in fixtures, at which point we will be able to require
    fixtures > 1.3.

    Change-Id: I59457c3bb198ff86d5ad55a1e623d008f0034b8f
    Closes-Bug: #1469734

commit 71dffb0a2c1720cd8233a329d32958a0160dd6f5
Author: Kevin Benton <email address hidden>
Date: Mon Jun 29 08:27:41 2015 +0000

    Revert "Removed test_lib module"

    This reverts commit 9a6536de6e1a7fe9b2552adc142e254426b82b6f.

    We pulled all of the plugins out of the tree, many of which still inherit
    from neutron test classes. This change then stated that we no longer
    support testing other plugins. I think this is a bit premature and should
    have been discussed under the subject
    "Neutron plugins can't use neutron plugin unit tests" or something
    similar.

    Change-Id: I68318589f010b731574ea3bfa8df98492bab31fc

commit b20fd81dbd497e058384a0af065dd0f1fdc4c728
Author: Jakub Libosvar <email address hidden>
Date: Fri Jun 5 14:32:51 2015 +0000

    Refactor NetcatTester class

    Following capabilities were added:
       - used transport protocol is passed as a constant instead of bool
       - src port for testing was added
       - connection can be established explicitly
       - change constructor parameters of NetcatTester

    As a part of removing bool for protocol definition
    get_free_namespace_port() was also modified to match the behavior.

    Change-Id: Id2ec322e7f731c05a3754a65411c9a5d8b258126

commit 83e37980dcd0b2bad6d64dd2cb23bcd2891cafca
Author: jingliuqing <email address hidden>
Date: Sat Jun 27 13:41:54 2015 +0800

    Use REST rather than ReST

    Change-Id: I06c9deaab58c5ec13bfeec39fb8fd4b1fe21f42d

commit 1b60df85ba3ad442c2e4e7e52538e1b9a1bf9378
Author: Kevin Benton <email address hidden>
Date: Thu Jun 25 18:34:38 2015 -0700

    Add a double-mock guard to the base test case

    Use mock to patch mock with a check to prevent multiple active
    patches to the...

tags: added: in-feature-pecan
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-1 → 7.0.0
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.