Comment 10 for bug 1623606

Revision history for this message
James Slagle (james-slagle) wrote :

Problem seems to be around updating the existing plan during the deployment from the CLI. This error has only occurred in multinode jobs (not ovb). The first occurrences started on this patch:
https://review.openstack.org/#/c/368760/

The error does not happen 100% of the time, but it can be seen on some earlier CI results of that patch.

We've tried several workarounds to to address the problem:
reducing the mistral workers (since ovb has less vcpu's than multinode and does not see this issue):
https://review.openstack.org/370847
that failed the same way.

Also tried deleting the existing plan before starting the deployment:
https://review.openstack.org/#/c/370857/
That also failed with the messaging timeout, but exposed the issue that the default plan may not yet finish being created before we start the overcloud deployment. When we start the deployment and try to then update the plan, we could be tripping over ourselves and causing this error.

Dougal has a patch to wait to make sure the default plan is created which may be the true fix:
https://review.openstack.org/#/c/369247/
But that was not being tested appropriately due to:
https://bugs.launchpad.net/tripleo/+bug/1623891
where we were not testing patches with delorean due to unintentionally deleting the delorean db.

As of now, we are attempting to land this revert:
https://review.openstack.org/#/c/370434/
However, given the other CI issue with not testing patches correctly we can't land that revert until we temporarily make the multinode job nonvoting:
https://review.openstack.org/370922

Once that project-config patch lands, we plan to land these 3 patches:
https://review.openstack.org/#/c/370434/ (fixes this bug)
https://review.openstack.org/#/c/370250/ (separate issue needed to bring ovb back)
https://review.openstack.org/#/c/369792/ (fixes bug with CI not testing patches)

we will then re-enable the multinode job as voting.