Comment 7 for bug 1535554

Revision history for this message
Bence Romsics (bence-romsics) wrote :

Our users reported problems related to this bug so I worked on it and managed to reproduce it.

The basic trick is to inject a sufficiently long sleep at the right place in the code to make the race condition worse and easier to trigger. Here's how:

* You need a multi-host devstack: at least one all-in-one host (below: devstack0) and one compute host (below: devstack0a). Make sure both hosts run a fully functional dhcp-agent.

* Use ChanceScheduler for dhcp-agent scheduling. The problem is reproducible with other schedulers too, but it's much harder. With ChanceScheduler it's relatively easy to get multiple concurrent shcedulings produce different agents. With other almost deterministic schedulers like WeightScheduler this is much harder.

neutron.conf
[agent]
network_scheduler_driver = neutron.scheduler.dhcp_agent_scheduler.ChanceScheduler

* Make sure neutron is able to process at least two API requests concurrently. For example by setting 'api_workers' to at least 2. The default value of 'api_workers' is okay on any multiprocessor system.

* Use the old neutron client for manual agent scheduling operations. I found multiple silent bugs in osc's agent scheduling operations - which I will report in other bug reports in the coming days.

neutron dhcp-agent-list-hosting-net NETWORK
neutron dhcp-agent-network-add AGENT NETWORK
neutron dhcp-agent-network-remove AGENT NETWORK

* The relevant neutron configuration in my devstack was this:

network_scheduler_driver = neutron.scheduler.dhcp_agent_scheduler.ChanceScheduler
network_auto_schedule = True
allow_automatic_dhcp_failover = True
dhcp_agents_per_network = 1
enable_services_on_agents_with_admin_state_down = False

* Versions:

devstack 1f6bea17
neutron 10c5f451ce (plus two additional lines of code described below)

* Agent re-scheduling can be triggered by 'subnet create's or agents going to status=DOWN. Since the former is simpler to do we'll use that to trigger re-schedules here.

* Add a sleep() to the line before the call to bind() here:

https://opendev.org/openstack/neutron/src/commit/a309dee7c58ef43d5985fb2e84b839134bca6b9c/neutron/scheduler/base_scheduler.py#L52

Like this:

 50 chosen_agents = self.select(plugin, context, hostable_agents,
 51 hosted_agents, num_agents)
 52 import time
 53 time.sleep(30)
 54 # bind the resource to the agents
 55 self.resource_filter.bind(context, chosen_agents, resource['id'])

* Pick or create a network with a subnet, here we'll use the 'private' network pre-created by devstack.

* De-schedule all agents from the network. For example:

# check which agents host the network's dhcp:
neutron dhcp-agent-list-hosting-net private
# de-schedule them
neutron dhcp-agent-network-remove "$( openstack network agent list --agent-type dhcp --host devstack0 -f value -c ID )" private
neutron dhcp-agent-network-remove "$( openstack network agent list --agent-type dhcp --host devstack0a -f value -c ID )" private

Now this network has 0 dhcp servers which is below the targetted 1 dhcp server.

* Trigger two re-schedules concurrently - that is in quick succession compared to the 30s sleep we injected:

for i in 1 2 ; do openstack subnet create trigger-reschedule-subnet-$i --network private --subnet-pool shared-default-subnetpool-v4 >/dev/null & done ; while true ; do date --rfc-3339=s ; neutron dhcp-agent-list-hosting-net private ; jobs ; echo ; sleep 5 ; done

The while loop above monitors the time, completion of the subnet create calls and the agents hosting the private network's dhcp.

We expect the api calls to complete in 30-something seconds. We expect a completion time slightly greater than 30s because of the sleep(30) we injected. We expect for the total of the two calls less than 2*30s because they should be processed concurrently.

* Repeat the previous step multiple times, since it will not always reproduce the over-scheduling. Here we must consider how the configured agent scheduler selects agents. With two agents and two concurrent schedulings using the ChanceScheduler we have 50% chance that the two schedulings select the same agent and another 50% they select different agents. We will only see the over-scheduling effect when they selected different agents.

Occasionally you may want to clean up the trigger subnets:

openstack subnet list | awk '/ trigger-reschedule-subnet-/ { print $2 }' | xargs -r openstack subnet delete

########

We can use this reproduction in multiple ways:

1) To at least manually test the current proposed fix: https://review.opendev.org/288271

2) Or maybe to propose a better one. I don't understad why change #288271 does not make the original agent scheduling transactional but instead chooses to do a later cleanup of over-scheduled agents.

Clearly we cannot move the scheduling inside the subnet create transaction (considering that re-schedule trigger), since the scheduling is triggered by the AFTER_CREATE event which is by definition after the commit.

But we could either:

a) Put the scheduling into its own transaction if we don't allow the schedulers to call into remote systems:

https://opendev.org/openstack/neutron/src/commit/a309dee7c58ef43d5985fb2e84b839134bca6b9c/neutron/scheduler/base_scheduler.py#L45-L53

b) Or even if we allow the schedulers to call into remote systems then we could (in a transaction) take the chosen agents (the return value of select() produced outside of any transaction) and the currently scheduled agents and throw away arbitrarily as many from the freshly chosen agents to meet exactly the number prescribed in the configuration.

Wouldn't one of these be simpler and more correct than https://review.opendev.org/288271?

What do you think?