neutron-dhcp-agent fails when small tenant network mtu is set

Bug #1988069 reported by Michael Sherman
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Brian Haley

Bug Description

High level description:
When a user creates a tenant network with a very small MTU (in our case 70), neutron-dhcp-agent stops updating the dnsmasq configuration, causing DHCP issues for all networks.

Pre-conditions:
Neutron is using the openvswitch, baremetal, and networking-generic-switch mechanism drivers.
A physical network named `physnet1` is configured, with MTU=9000

Step-by-step reproduction steps:
As an admin user, run:

# Create "normal" network and subnet
openstack network create --provider-network-type vlan --provider-physical-network physnet1 --mtu 1500 test-net-1500
openstack subnet create --subnet-range 10.100.10.0/24 --dhcp --network test-net-1500 test-subnet-1500

# Create "small MTU" network and subnet
openstack network create --provider-network-type vlan --provider-physical-network physnet1 --mtu 70 test-net-70
openstack subnet create --subnet-range 10.100.11.0/24 --dhcp --network test-net-70 test-subnet-70

# attempt to launch an instance on the "normal" network
openstack server create --image Ubuntu --flavor Baremetal --network test-net-1500

Expected output: what did you hope to see?
We expected to see neutron-dhcp-agent update the dnsmasq configuration, which would then serve requests from the instances.

* Actual output: did the system silently fail (in this case log traces are useful)?
Openstack commands complete successfully, but instance never receives a response to its DHCP requests. Neutron-dhcp-agent logs show:
https://paste.opendev.org/show/b4r0XCu5KpguM72bnh0u/

Version:
  ** OpenStack version "stable/xena", hash bc1dd6939d197d15799aaf252049f76442866c21
  ** Linux distro, kernel. Ubuntu 20.04
  ** Containers built with Kolla, and deployed via Kolla-Ansible

* Environment:
Single node deployment, all services (core, networking, database, etc.) on one node.
All compute-nodes are baremetal via Ironic.

* Perceived severity: is this a blocker for you?
High, as non-admin users can trigger an DHCP outage affecting all users.

Tags: l3-ipam-dhcp
Revision history for this message
Michael Sherman (msherman-uchicago) wrote :

A related issue is that it's not currently possible to set min/max allowed MTU for tenant networks, see https://bugs.launchpad.net/neutron/+bug/1859362

Otherwise we could work around this by setting the minimum tenant network MTU above this threshold.

Revision history for this message
Brian Haley (brian-haley) wrote :

So I could not reproduce the failure you had in your paste (invalid parameter when trying to configure IP address) when I tried this on the master branch.

DHCP would not work on the instance because the MTU was too small to transmit the request:

Starting network: udhcpc: started, v1.29.3
udhcpc: sending discover
udhcpc: sendto: Message too long
udhcpc: sending discover
udhcpc: sendto: Message too long
udhcpc: sending discover
udhcpc: sendto: Message too long

I'd also assume the response would have never fit in 70 bytes either. This makes me think we need to enforce some MTU minimum since the network/subnet is useless otherwise, but outside that the resolution is to use a larger MTU (or the default).

I will set this as Medium just because I could not trigger any DHCP outage as described above, if you can supply any more information on reproducing that I will revisit.

Changed in neutron:
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Michael Sherman (msherman-uchicago) wrote :

Thank you!

DHCP not working for a network where the tenant has set a small MTU is totally expected. In terms of usability, I would (as a user) expect neutron to refuse to enable DHCP on a subnet if the MTU is too low, rather than it being "enabled", but not working. The network is still "useful" for L2 traffic, or L3 with static IPs.

To replicate the issue affecting other networks, I was able to produce the error when running kernel `5.4.0-120-generic #136-Ubuntu`, but not with kernel `5.4.0-122-generic #138-Ubuntu`, both with the Xena commit mentioned above. I have not tested yet with master.

Revision history for this message
Brian Haley (brian-haley) wrote :

Yes, the network is still usable since it could be configured some other way (config drive, manually). You can't run IPv6 on it, and it's right near the 68-byte IPv4 minimum.

The other bug you linked about min/max network MTU would be useful here, although there are cases already where you can overflow the default DHCP response size with things like static route options. In the end the IPv6 minimum of 1280 seems a good place to start.

As a reference I tested this on Ubuntu 20.04 with the 5.15.0-46 kernel. Perhaps there was a kernel bug in some versions that triggered the invalid argument? If that is the case then I'm not sure there's a bug here and we can continue discussion in the linked MTU bug.

Revision history for this message
Michael Sherman (msherman-uchicago) wrote :

While I haven't been able to rule out a kernel bug yet, I do feel that there's a more general bug in error handling for neutron-dhcp-agent, in that a failure to configure DHCP in one netns should not impact others.

I have seen this general behavior of a netns-specific failure breaking dhcp globally in two cases
1. This issue, where a kernel version + MTU triggers the failure
2. https://bugs.launchpad.net/neutron/+bug/1953165 , the IPV6 address conflicts from that bug caused dhcp failures even in namespaces without conflicts.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :

We just hit this issue in a Wallaby deployment (manual package deployment, Ubuntu 20.04, kernel 5.4, linux bridge). Neutron-dhcp-agent fails with the following error:

neutron.privileged.agent.linux.ip_lib.InvalidArgument: Invalid parameter/value used on interface X, namespace qdhcp-Y"

whenever the network MTU is < 1280 bytes. Comparing the interface X of the namespace Y to properly functioning interfaces of other namespaces shows that IPv6 configuration is missing from the interface X.

DHCP agents remain broken globally until the network is removed or adjusted to MTU >= 1280.

Changed in neutron:
status: Expired → Opinion
status: Opinion → Confirmed
Revision history for this message
Brian Haley (brian-haley) wrote :

Unfortunately we cannot prevent a user from mis-configuring their network, such as setting the MTU below 1280 and adding an IPv6 subnet. The best we can do is add a note/warning to the admin guide with this information and hope the user reads it.

Moving to Low for this reason.

Changed in neutron:
importance: Medium → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/874036

Changed in neutron:
status: Confirmed → In Progress
Changed in neutron:
assignee: nobody → Brian Haley (brian-haley)
Revision history for this message
Michael Sherman (msherman-uchicago) wrote :

Isn't it still a large impact if a single user can, either in error, or maliciously, break DHCP for all subnets sharing a DHCP agent?

Last time this came up for us, I was able to work around it as only some kernel versions (see up-thread) are affected, but it would be great if the "blast radius" of a broken DHCP process could be limited to the misconfigured network/namespace.

Revision history for this message
Brian Haley (brian-haley) wrote :

As I see it there are two issues here.

1) A broken kernel. Neutron can't do anything about that, it would be up to the admin to install one with a fix and reboot. There have been issues before with bad kernels, all we can do is highlight they're broken so others don't use them.

2) A mis-configured network. This is user error, and since it's something (presumably) owned by a single project, any "blast radius" only involves a single user. Something like a provider network would be owned by an admin, and a normal user would not have rights to add a subnet to it and cause an issue. The best we can do sometimes is warn not to do something, but we can't change the API to not allow a small MTU.

Unless I'm missing something?

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote (last edit ):

I'd like to clarify several things:

1) I cannot confirm a "broken kernel" suggestion. I tried with several 5.4 kernels from Ubuntu 20.04, including the versions stated earlier as well as older and newer versions, and they all behaved exactly the same. Thus far it seems to me that an interface with MTU<1280 cannot get IPv6 configuration because IPv6 requires MTU>=1280 by design. I'm not sure about this, but it may be an expected kernel behavior. I will try kernel 5.15 today and see whether anything changes.

2) I am not a Python developer or a Neutron developer in particular, but I managed to find the relevant part in ml2 plugin code and prevent it from creating networks with MTU<1280 similarly to how it prevents creating networks with MTU>MAX_MTU. It was a rather ugly hack with a hardcoded minimum value though, so I'm wondering whether there's a better way.

3) The "blast radius" is not limited to a single user/tenant or a single project, but is system-wide. A non-admin user, for example a member of a tenant, is able to cause a denial of service by creating an internal DHCP-enabled network with MTU<1280. In our deployment with 3 infra nodes and redundant DHCP agents, within a few seconds of creating such network neutron-dhcp-agent instances fail on all infra nodes, enter an error-loop and are unable to process configuration changes. Everything that relies on DHCP configuration changes, including attachments of new ports in/to other DHCP-enabled networks of all other tenants, including the service networks of a service tenant such as for example LBaaS management, stops working in this scenario until the user's network is removed or adjusted to have MTU>1280. This seems like a rather high-priority issue to me.

Revision history for this message
Michael Sherman (msherman-uchicago) wrote :

Hi Brian,

Thanks for breaking it down into issues 1 and 2, I think I can respond a bit more clearly.
I agree that:
For 1, if the issue is only a broken kernel, that has a clear workaround and seems to be a non-issue, as mentioned.
For 2, in the case where the system is not impacted by (1), the blast radius is indeed restricted to a single network, and is again a non-issue since it's self-inflicted.

However, my issue lies in a perceived lack of "robustness" or error handling in the dhcp-agent, as I seem to have observed the following cases where some kind of error propagated outside the boundaries of a single network:
1. My original issue in this thread, setting a MTU below 280 + having ipv4 enabled, + kernel `5.4.0-120-generic #136-Ubuntu`, caused an error loop preventing dhcp agents from updating.
2. Zakhar's issue, with MTU below 1280 + IPV6 dhcp, with same symptoms
3. Issue https://bugs.launchpad.net/neutron/+bug/1953165, which although a different cause, has the same failure mode with the DHCP agent no longer processing updates.

My systems have thus far only been running a single networking node + dhcp agent, but Zakhar reports the issue propagating across multiple infrastructure nodes?

To me it seems that if more robust error handling could wrap the interaction between neutron-dhcp-agent, and this category of system error, it would reduce the severity of the above-mentioned failure cases. This is admittedly a naive proposition, and maybe that's impractical!

Thanks again for your attention on this.

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :

Actually my issue is not caused by IPv6 DHCP, the offending network is IPv4 only. DHCP agents start dnsmasq with IPv4 options only, but when MTU is lower than 1280 whatever the agents do - before starting dnsmasq for that network - fails, and the agents enter an error-loop.

The reason why all DHCP agents are affected in our deployment is that we run the agents in HA mode (dhcp_agents_per_network=3) on 3 infra nodes, i.e. all agents run the same configuration and fail the same way.

Revision history for this message
Brian Haley (brian-haley) wrote :

Hi,

Thanks for the comments.

So I guess I always thought it was either a kernel issue, or an issue spawning dnsmasq for an individual network, which would only impact a single network. After I saw Zakhar's comment this morning I did this on a local setup and see the agent goes into a loop, which is worse than I expected.

I will take a closer look at the code and see how we can mitigate this. Unfortunately just requiring a 1280+ network MTU will break our API agreement with users. It could be the agent can detect and ignore this case, but then we're in that place where the user doesn't exactly know they mis-configured things.

Changed in neutron:
importance: Low → High
Revision history for this message
Brian Haley (brian-haley) wrote :

Zakhar - can you provide the stack trace for that failure? And output of 'openstack network show...' and 'openstack subnet show...' for the involved pieces? Without an IPv6 subnet my dhcp-agent starts right up.

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote (last edit ):
Download full text (10.0 KiB)

Brian,

Many thanks for your response. Here's the information you requested:

neutron-dhcp-agent stack trace:

2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for aee895a8-acda-420c-9c58-6a936c9a4102.: neutron.privileged.agent.linux.ip_lib.InvalidArgument: Invalid parameter/value used on interface ns-beed7d37-b4, namespace qdhcp-aee895a8-acda-420c-9c58-6a936c9a4102.
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3/dist-packages/neutron/agent/dhcp/agent.py", line 227, in call_driver
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent rv = getattr(driver, action)(**action_kwargs)
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/dhcp.py", line 266, in enable
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent common_utils.wait_until_true(self._enable, timeout=300)
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3/dist-packages/neutron/common/utils.py", line 708, in wait_until_true
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent while not predicate():
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/dhcp.py", line 278, in _enable
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent interface_name = self.device_manager.setup(self.network)
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/dhcp.py", line 1770, in setup
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent self.driver.init_l3(interface_name, ip_cidrs,
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/interface.py", line 153, in init_l3
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent device.addr.add(ip_cidr)
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/ip_lib.py", line 536, in add
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent add_ip_address(cidr, self.name, self._parent.namespace, scope,
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/ip_lib.py", line 821, in add_ip_address
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent privileged.add_ip_address(
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3/dist-packages/oslo_privsep/priv_context.py", line 247, in _wrap
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent return self.channel.remote_call(name, args, kwargs)
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3/dist-packages/oslo_privsep/daemon.py", line 224, in remote_call
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent raise exc_type(*result[2])
2023-02-16 19:11:47.896 49507 ERROR neutron.agent.dhcp.agent neutron.privileged.agent.linux.ip_l...

Revision history for this message
Brian Haley (brian-haley) wrote :

The disable_ipv6 sysctl is interesting, but that is all caught so doesn't trigger the agent to loop luckily, so I don't think we need to deal with it.

One of the only reasons I can think a v4-only network fails for you is that you have either force_metadata or enable_isolated_metadata set to True in your config, since that would try and add the IPv6 metadata address and fail in this way. Unfortunately it doesn't log what it was trying to add when it fails so I can't tell. If you set the mtu to 1280 what addresses does it show on the interface?

I'll post a hack that seems to work for me, but it will need more work and discussion with others to see if it's a good plan. Still don't think tweaking the API call(s) to return a HttpConflict code is a good idea though.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/874167

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :
Download full text (3.9 KiB)

Brian,

The DHCP agent is configured as follows:

# grep -Ev "^#|^$" /etc/neutron/dhcp_agent.ini
[DEFAULT]
interface_driver = linuxbridge
dhcp_driver = neutron.agent.linux.dhcp.Dnsmasq
enable_isolated_metadata = true
[agent]
availability_zone = openstack-network
[ovs]

I.e. enable_isolated_metadata is enabled (recommended setting: https://docs.openstack.org/neutron/wallaby/install/controller-install-option1-ubuntu.html), force_metadata is disabled (default setting). If we disable isolated metadata, what is going to break?

A low-MTU network has just an IPv4 address and no IPv6 link-local addresses, IPv6 is disabled:

# ip netns exec qdhcp-c9b96063-3f8f-41b2-9783-ac73d37894b0 ip a li
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ns-bbd05a6d-0b@if675: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1279 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:3d:ee:c4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.10.10.4/24 brd 10.10.10.255 scope global ns-bbd05a6d-0b
       valid_lft forever preferred_lft forever

# ls /proc/sys/net/ipv6/conf/vxlan-3947/
ls: cannot access '/proc/sys/net/ipv6/conf/vxlan-3947/': No such file or directory

"Healthy" networks have IPv6 link-local addresses and IPv6 is enabled:

# ip netns exec qdhcp-8abc13db-565b-4640-9507-819d6ef520ef ip a li
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ns-fc876933-90@if90: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:36:d5:40 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.10.10.2/24 brd 10.10.10.255 scope global ns-fc876933-90
       valid_lft forever preferred_lft forever
    inet 169.254.169.254/32 brd 169.254.169.254 scope global ns-fc876933-90
       valid_lft forever preferred_lft forever
    inet6 fe80::a9fe:a9fe/64 scope link
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe36:d540/64 scope link
       valid_lft forever preferred_lft forever

# ls /proc/sys/net/ipv6/conf/vxlan-4952/
accept_dad accept_ra_rtr_pref drop_unsolicited_na max_desync_factor router_probe_interval temp_prefered_lft
accept_ra accept_redirects enhanced_dad mc_forwarding router_solicitation_delay temp_valid_lft
accept_ra_defrtr accept_source_route force_mld_version mldv1_unsolicited_report_interval router_solicitation_interval use_oif_addrs_only
accept_ra_from_local addr_gen_mode force_tllao mldv2_unsolicited_report_interval router_solicitation_max_interval u...

Read more...

Revision history for this message
Brian Haley (brian-haley) wrote :

Thanks for the info, and it confirms the trigger for you is "enable_isolated_metadata = true" since it will add the IPv6 metadata address (fe80::a9fe:a9fe) to be configured, which will cause the failure and loop. My hack would fix that as well so let me have a discussion with other maintainers on it.

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :

Many thanks for looking into this, Brian. May I ask what the chances are for this hack/fix to make it into Wallaby?

Revision history for this message
Brian Haley (brian-haley) wrote :

Since it looks like Day 1 bug we will backport a fix.

Just adding a note here that I had a discussion in our IRC channel and we decided to do the following:

1) Change the code that adds IP(v6) addresses to look at the MTU and fail more gracefully. For both IPv4 and IPv6.

2) Detect the MTU is invalid in API calls, for example when adding an IPv6 subnet to a network, or changing the MTU on a network, and return an error (409 or 400?).

3) Document to "not do this" :)

I've already started 1 and 3, will see how far I get doing updates today.

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :

Thanks again, I very much appreciate your effort!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/875809

tags: added: antelope-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Brian Haley <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/874036
Reason: Superceded by https://review.opendev.org/c/openstack/neutron/+/875809

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Brian Haley <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/874167
Reason: Superceded by https://review.opendev.org/c/openstack/neutron/+/875809

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :

Hi again! What's going on with this issue?

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Zakhar:

You can check in gerrit the status of the patches. The main patch https://review.opendev.org/c/openstack/neutron/+/875809 has been merged.

Regards.

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :

Thanks!

Revision history for this message
Gregory Orange (gregoryo2017) wrote :

I am seeing a discrepancy between the problem stated in the Bug Description, and the patch merged. I will heavily simplify below, referring only to IPv4.

Problem stated: e.g. Network created with 70 MTU causes DHCP to fail on other networks with higher MTU.

Patch merged: MTU is required to be at least 68, else an error.

One can hopefully see why the patch as I've described it would not fix the problem as I've stated it. I assume that I am missing something. Would someone be willing to explain this to me?

Thank you,
Greg.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

The patch fixing this issue makes a change in the API (a limitation). We can't backport this patch because of this.

That was discussed during the Neutron team meeting today [1].

[1]https://meetings.opendev.org/meetings/networking/2023/networking.2023-04-25-14.01.log.html#l-226

tags: removed: antelope-backport-potential
Revision history for this message
Brian Haley (brian-haley) wrote :

Greg - if you look at my first patch, https://review.opendev.org/c/openstack/neutron/+/874167 - you will see it was pretty invasive. I felt it was better to prevent an invalid MTU in the API call(s) instead.

I tested this with a network MTU down to 68 for IPv4, 1280 for IPv6, and didn't see any problems, so 70 should work fine. If you have a chance to try the change please let me know if there is still an issue and I will address it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/875809
Committed: https://opendev.org/openstack/neutron/commit/88ce859b568248a0ee2f47a5d91c1708b774d20e
Submitter: "Zuul (22348)"
Branch: master

commit 88ce859b568248a0ee2f47a5d91c1708b774d20e
Author: Brian Haley <email address hidden>
Date: Wed Mar 1 00:52:38 2023 -0500

    Change API to validate network MTU minimums

    A network's MTU is now only valid if it is the minimum value
    allowed based on the IP version of the associated subnets,
    68 for IPv4 and 1280 for IPv6.

    This minimum is now enforced in the following ways:

    1) When a subnet is associated with a network, validate
       the MTU is large enough for the IP version. Not only
       would the subnet be unusable if it was allowed, but the
       Linux kernel can fail adding addresses and configuring
       network settings like the MTU.

    2) When a network MTU is changed, validate the MTU is large
       enough for any currently associated subnets. Allowing a
       smaller MTU would render any existing subnets unusable.

    Closes-bug: #1988069
    Change-Id: Ia4017a8737f9a7c63945df546c8a7243b2673ceb

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 23.0.0.0b2

This issue was fixed in the openstack/neutron 23.0.0.0b2 development milestone.

Revision history for this message
Andy Gomez (agomerz) wrote :
Download full text (8.9 KiB)

I have just run into this issue running Wallaby though the MTU on a network was set to 128.

This prevented any new ports on other networks sharing this DHCP agent from being put into ACTIVE status.
The ports would be stuck in BUILD status. Until The MTU of the offending network was increased.

openstack network show 2148e8d4-5c5f-4c4e-aa52-9373d7aaa5cc
+---------------------------+--------------------------------------+
| Field | Value |
+---------------------------+--------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | use1-prod0-os-1a, use1-prod0-os-1b |
| created_at | 2023-08-28T11:28:30Z |
| description | |
| dns_domain | |
| id | 2148e8d4-5c5f-4c4e-aa52-9373d7aaa5cc |
| ipv4_address_scope | None |
| ipv6_address_scope | None |
| is_default | None |
| is_vlan_transparent | None |
| mtu | 128 |
| name | cracow-rly-test |
| port_security_enabled | True |
| project_id | af686ff819454f5daef0c4d4262c3280 |
| provider:network_type | vxlan |
| provider:physical_network | None |
| provider:segmentation_id | 4209 |
| qos_policy_id | None |
| revision_number | 2 |
| router:external | Internal |
| segments | None |
| shared | False |
| status | ACTIVE |
| subnets | e1df7e75-a81c-439f-acb0-47504509b7be |
| tags | |
| tenant_id | af686ff819454f5daef0c4d4262c3280 |
| updated_at | 2023-08-28T11:28:31Z |
+---------------------------+--------------------------------------+

openstack port list --network 58dc3b69-2c46-4f6b-ae03-a7de7aeb709b
+--------------------------------------+----------+-------------------+----------------------------------------------------------------------------+--------+
| ID | Name | MAC Address | Fixed IP Addresses | Status |
+--------------------------------------+----------+-------------------+----------------------------------------------------------------------------+--------+
| 20f950e2-7ce3-402e-a209-27f9b25dd7f3 | | fa:16:3e:fc:76:74 | ip_address='10.32.0...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.