scheduler_max_attempts tunable hardcoded to 30

Bug #1673600 reported by Justin Kilpatrick
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Expired
Undecided
Unassigned

Bug Description

https://github.com/openstack/instack-undercloud/blob/master/undercloud.conf.sample#L193

Exactly like it says in the comment if you attempt to build a >30 node Overcloud but forgot to tune this value your deploys will sit forever trying to schedule and then fail.

The undercloud.conf generator does in fact do this for you

http://ucw-bnemec.rhcloud.com/

But it's not noted as important anywhere else, which makes it easy to miss
and costly to fix if you do.

Once you actually manage to figure out what you missed you then have to redeploy your undercloud and then try again. Or take some other steps to circumvent the bug, a common feature in large deployments is to pin instances to specific bare metal nodes for ease of debugging hardware issues, that practice hides this issue.

I'm happy to contribute some lines to the docs about this, but is there any way we can somehow set this value to the size of the cloud your attempting to deploy so that the user doesn't have yet another mandatory configuration option?

description: updated
Changed in tripleo:
status: New → Triaged
importance: Undecided → Medium
milestone: none → pike-1
Changed in tripleo:
milestone: pike-1 → pike-2
Changed in tripleo:
milestone: pike-2 → pike-3
Changed in tripleo:
milestone: pike-3 → pike-rc1
Changed in tripleo:
milestone: pike-rc1 → queens-1
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Revision history for this message
Emilien Macchi (emilienm) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (FUTURE, PIKE, QUEENS, ROCKY, STEIN).
  Valid example: CONFIRMED FOR: FUTURE

Changed in tripleo:
importance: Medium → Undecided
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.