default api loop count / intervals can't cope with 40 machine clusters
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ironic |
Expired
|
Undecided
|
Unassigned |
Bug Description
We have ironic configured with 5 workers (as currently recommended), and the ironic driver with only
[ironic]
admin_username = ironic
admin_password = ....
admin_url = http://
admin_tenant_name = service
set.
Deleting a cluster of 40 machines consistently throws 5-10 of them into error state from the api deleting loop.
The defaults are 5 retries 2 seconds apart. With 5 workers, if each request took 1 second, we'd spend 8 seconds deleting the cluster.
I suspect this falls in the 'async apis are needed mmmkay' category, but it would be good to handle larger setups more tolerantly by default.
For now I suggest changing the defaults to 60 retries. In future, maybe we could move the unassociation into Ironic, start with a power off (and wait fo rthat) then unplug etc etc and let ironic do the later work on its own..
This bug is separate to the failing-
Changed in ironic: | |
assignee: | Robert Collins (lifeless) → Chris Jones (cmsj) |
Changed in ironic: | |
assignee: | Chris Jones (cmsj) → Robert Collins (lifeless) |
Changed in ironic: | |
importance: | Undecided → High |
Changed in ironic: | |
assignee: | Robert Collins (lifeless) → Ruby Loo (rloo) |
Changed in ironic: | |
assignee: | Ruby Loo (rloo) → Robert Collins (lifeless) |
Changed in ironic: | |
milestone: | none → juno-3 |
Changed in ironic: | |
milestone: | next → none |
Changed in ironic: | |
assignee: | nobody → Galyna Zholtkevych (gzholtkevych) |
Fix proposed to branch: master /review. openstack. org/93731
Review: https:/