Our users reported problems related to this bug so I worked on it and managed to reproduce it.
The basic trick is to inject a sufficiently long sleep at the right place in the code to make the race condition worse and easier to trigger. Here's how:
* You need a multi-host devstack: at least one all-in-one host (below: devstack0) and one compute host (below: devstack0a). Make sure both hosts run a fully functional dhcp-agent.
* Use ChanceScheduler for dhcp-agent scheduling. The problem is reproducible with other schedulers too, but it's much harder. With ChanceScheduler it's relatively easy to get multiple concurrent shcedulings produce different agents. With other almost deterministic schedulers like WeightScheduler this is much harder.
* Make sure neutron is able to process at least two API requests concurrently. For example by setting 'api_workers' to at least 2. The default value of 'api_workers' is okay on any multiprocessor system.
* Use the old neutron client for manual agent scheduling operations. I found multiple silent bugs in osc's agent scheduling operations - which I will report in other bug reports in the coming days.
devstack 1f6bea17
neutron 10c5f451ce (plus two additional lines of code described below)
* Agent re-scheduling can be triggered by 'subnet create's or agents going to status=DOWN. Since the former is simpler to do we'll use that to trigger re-schedules here.
* Add a sleep() to the line before the call to bind() here:
50 chosen_agents = self.select(plugin, context, hostable_agents,
51 hosted_agents, num_agents)
52 import time
53 time.sleep(30)
54 # bind the resource to the agents
55 self.resource_filter.bind(context, chosen_agents, resource['id'])
* Pick or create a network with a subnet, here we'll use the 'private' network pre-created by devstack.
* De-schedule all agents from the network. For example:
# check which agents host the network's dhcp:
neutron dhcp-agent-list-hosting-net private
# de-schedule them
neutron dhcp-agent-network-remove "$( openstack network agent list --agent-type dhcp --host devstack0 -f value -c ID )" private
neutron dhcp-agent-network-remove "$( openstack network agent list --agent-type dhcp --host devstack0a -f value -c ID )" private
Now this network has 0 dhcp servers which is below the targetted 1 dhcp server.
* Trigger two re-schedules concurrently - that is in quick succession compared to the 30s sleep we injected:
for i in 1 2 ; do openstack subnet create trigger-reschedule-subnet-$i --network private --subnet-pool shared-default-subnetpool-v4 >/dev/null & done ; while true ; do date --rfc-3339=s ; neutron dhcp-agent-list-hosting-net private ; jobs ; echo ; sleep 5 ; done
The while loop above monitors the time, completion of the subnet create calls and the agents hosting the private network's dhcp.
We expect the api calls to complete in 30-something seconds. We expect a completion time slightly greater than 30s because of the sleep(30) we injected. We expect for the total of the two calls less than 2*30s because they should be processed concurrently.
* Repeat the previous step multiple times, since it will not always reproduce the over-scheduling. Here we must consider how the configured agent scheduler selects agents. With two agents and two concurrent schedulings using the ChanceScheduler we have 50% chance that the two schedulings select the same agent and another 50% they select different agents. We will only see the over-scheduling effect when they selected different agents.
Occasionally you may want to clean up the trigger subnets:
2) Or maybe to propose a better one. I don't understad why change #288271 does not make the original agent scheduling transactional but instead chooses to do a later cleanup of over-scheduled agents.
Clearly we cannot move the scheduling inside the subnet create transaction (considering that re-schedule trigger), since the scheduling is triggered by the AFTER_CREATE event which is by definition after the commit.
But we could either:
a) Put the scheduling into its own transaction if we don't allow the schedulers to call into remote systems:
b) Or even if we allow the schedulers to call into remote systems then we could (in a transaction) take the chosen agents (the return value of select() produced outside of any transaction) and the currently scheduled agents and throw away arbitrarily as many from the freshly chosen agents to meet exactly the number prescribed in the configuration.
Our users reported problems related to this bug so I worked on it and managed to reproduce it.
The basic trick is to inject a sufficiently long sleep at the right place in the code to make the race condition worse and easier to trigger. Here's how:
* You need a multi-host devstack: at least one all-in-one host (below: devstack0) and one compute host (below: devstack0a). Make sure both hosts run a fully functional dhcp-agent.
* Use ChanceScheduler for dhcp-agent scheduling. The problem is reproducible with other schedulers too, but it's much harder. With ChanceScheduler it's relatively easy to get multiple concurrent shcedulings produce different agents. With other almost deterministic schedulers like WeightScheduler this is much harder.
neutron.conf scheduler_ driver = neutron. scheduler. dhcp_agent_ scheduler. ChanceScheduler
[agent]
network_
* Make sure neutron is able to process at least two API requests concurrently. For example by setting 'api_workers' to at least 2. The default value of 'api_workers' is okay on any multiprocessor system.
* Use the old neutron client for manual agent scheduling operations. I found multiple silent bugs in osc's agent scheduling operations - which I will report in other bug reports in the coming days.
neutron dhcp-agent- list-hosting- net NETWORK network- add AGENT NETWORK network- remove AGENT NETWORK
neutron dhcp-agent-
neutron dhcp-agent-
* The relevant neutron configuration in my devstack was this:
network_ scheduler_ driver = neutron. scheduler. dhcp_agent_ scheduler. ChanceScheduler auto_schedule = True _dhcp_failover = True per_network = 1 services_ on_agents_ with_admin_ state_down = False
network_
allow_automatic
dhcp_agents_
enable_
* Versions:
devstack 1f6bea17
neutron 10c5f451ce (plus two additional lines of code described below)
* Agent re-scheduling can be triggered by 'subnet create's or agents going to status=DOWN. Since the former is simpler to do we'll use that to trigger re-schedules here.
* Add a sleep() to the line before the call to bind() here:
https:/ /opendev. org/openstack/ neutron/ src/commit/ a309dee7c58ef43 d5985fb2e84b839 134bca6b9c/ neutron/ scheduler/ base_scheduler. py#L52
Like this:
50 chosen_agents = self.select(plugin, context, hostable_agents, filter. bind(context, chosen_agents, resource['id'])
51 hosted_agents, num_agents)
52 import time
53 time.sleep(30)
54 # bind the resource to the agents
55 self.resource_
* Pick or create a network with a subnet, here we'll use the 'private' network pre-created by devstack.
* De-schedule all agents from the network. For example:
# check which agents host the network's dhcp: list-hosting- net private network- remove "$( openstack network agent list --agent-type dhcp --host devstack0 -f value -c ID )" private network- remove "$( openstack network agent list --agent-type dhcp --host devstack0a -f value -c ID )" private
neutron dhcp-agent-
# de-schedule them
neutron dhcp-agent-
neutron dhcp-agent-
Now this network has 0 dhcp servers which is below the targetted 1 dhcp server.
* Trigger two re-schedules concurrently - that is in quick succession compared to the 30s sleep we injected:
for i in 1 2 ; do openstack subnet create trigger- reschedule- subnet- $i --network private --subnet-pool shared- default- subnetpool- v4 >/dev/null & done ; while true ; do date --rfc-3339=s ; neutron dhcp-agent- list-hosting- net private ; jobs ; echo ; sleep 5 ; done
The while loop above monitors the time, completion of the subnet create calls and the agents hosting the private network's dhcp.
We expect the api calls to complete in 30-something seconds. We expect a completion time slightly greater than 30s because of the sleep(30) we injected. We expect for the total of the two calls less than 2*30s because they should be processed concurrently.
* Repeat the previous step multiple times, since it will not always reproduce the over-scheduling. Here we must consider how the configured agent scheduler selects agents. With two agents and two concurrent schedulings using the ChanceScheduler we have 50% chance that the two schedulings select the same agent and another 50% they select different agents. We will only see the over-scheduling effect when they selected different agents.
Occasionally you may want to clean up the trigger subnets:
openstack subnet list | awk '/ trigger- reschedule- subnet- / { print $2 }' | xargs -r openstack subnet delete
########
We can use this reproduction in multiple ways:
1) To at least manually test the current proposed fix: https:/ /review. opendev. org/288271
2) Or maybe to propose a better one. I don't understad why change #288271 does not make the original agent scheduling transactional but instead chooses to do a later cleanup of over-scheduled agents.
Clearly we cannot move the scheduling inside the subnet create transaction (considering that re-schedule trigger), since the scheduling is triggered by the AFTER_CREATE event which is by definition after the commit.
But we could either:
a) Put the scheduling into its own transaction if we don't allow the schedulers to call into remote systems:
https:/ /opendev. org/openstack/ neutron/ src/commit/ a309dee7c58ef43 d5985fb2e84b839 134bca6b9c/ neutron/ scheduler/ base_scheduler. py#L45- L53
b) Or even if we allow the schedulers to call into remote systems then we could (in a transaction) take the chosen agents (the return value of select() produced outside of any transaction) and the currently scheduled agents and throw away arbitrarily as many from the freshly chosen agents to meet exactly the number prescribed in the configuration.
Wouldn't one of these be simpler and more correct than https:/ /review. opendev. org/288271?
What do you think?