Comment 13 for bug 1988069

Revision history for this message
Michael Sherman (msherman-uchicago) wrote :

Hi Brian,

Thanks for breaking it down into issues 1 and 2, I think I can respond a bit more clearly.
I agree that:
For 1, if the issue is only a broken kernel, that has a clear workaround and seems to be a non-issue, as mentioned.
For 2, in the case where the system is not impacted by (1), the blast radius is indeed restricted to a single network, and is again a non-issue since it's self-inflicted.

However, my issue lies in a perceived lack of "robustness" or error handling in the dhcp-agent, as I seem to have observed the following cases where some kind of error propagated outside the boundaries of a single network:
1. My original issue in this thread, setting a MTU below 280 + having ipv4 enabled, + kernel `5.4.0-120-generic #136-Ubuntu`, caused an error loop preventing dhcp agents from updating.
2. Zakhar's issue, with MTU below 1280 + IPV6 dhcp, with same symptoms
3. Issue https://bugs.launchpad.net/neutron/+bug/1953165, which although a different cause, has the same failure mode with the DHCP agent no longer processing updates.

My systems have thus far only been running a single networking node + dhcp agent, but Zakhar reports the issue propagating across multiple infrastructure nodes?

To me it seems that if more robust error handling could wrap the interaction between neutron-dhcp-agent, and this category of system error, it would reduce the severity of the above-mentioned failure cases. This is admittedly a naive proposition, and maybe that's impractical!

Thanks again for your attention on this.