neutron

Bug #1939432
Comment #0

Comment 0 for bug 1939432

Revision history for this message

Rodolfo Alonso (rodolfo-alonso-hernandez) wrote on 2021-08-10:

When a new network and the first subnet are created, the DHCP agent is updated. The agent scheduler increases the DHCP agent register "load" [1] field that will be used to schedule new networks into the same agent.

If multiple concurrent networks (and the first subnet) are created, the agent "load" will be modified concurrently. The DB guarantees that only one transaction can increase the agent "load" parameter at once; the other transactions will fail and retried again. E.g.: https://paste.opendev.org/show/807984/

NOTE: when I say network and the first subnet is because that will trigger the spawn of a new dnsmasq process. This is the event that increases +1 the "load" value. Any other new subnet added to this network will modify the dnsmasq config but won't increase the "load" value.

As commented in the "BaseResourceFilter.bind" method [2], "the resource being bound might or might not be of the same type which is accounted for the load. It isn't a problem because "+ 1" here does not meant to predict precisely what the load of the agent will be. The value will be corrected by the agent on the next report interval." In other words, when the DHCP agent reports the status, accurately updates the number of resources (networks) that is handling.

This bug proposes to catch the DB errors in "BaseResourceFilter.bind" method [2] to avoid the DB retry action. That is unnecessary because the DHCP agent, as commented, will update the "load" value. By avoiding this retry, we avoid unnecessary Neutron server and DB operations and command delays (for example when creating a subnet).

[1]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/db/models/agent.py#L55
[2]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/scheduler/base_resource_filter.py#L35-L39