It turns out that allocation write contention on resource provider generation is a relatively significant issue when launching many VMs to one or a very small number of compute nodes, as might happen in a clustered environment like the vmwareapi virtdriver.
Server side retries will help because there will be less latency but they won't fully fix it as the the architecture expects and assumes there will be at least a bit of horizontal diversity in resource providers.
It turns out that allocation write contention on resource provider generation is a relatively significant issue when launching many VMs to one or a very small number of compute nodes, as might happen in a clustered environment like the vmwareapi virtdriver.
The retries handling at https:/ /github. com/openstack/ nova/blob/ d687e7d29b37b3c dc9e1bc429dec3a 01be298f80/ nova/scheduler/ client/ report. py#L103- L123 is insufficient when something like 1200vms are being created because there's always another vm being created concurrently for the same compute node.
Server side retries will help because there will be less latency but they won't fully fix it as the the architecture expects and assumes there will be at least a bit of horizontal diversity in resource providers.