Comment 2 for bug 1721093

Revision history for this message
Paul Belanger (pabelanger) wrote :

This is actually a result of zuul.o.o losing access to zookeeper (nodepool.o.o) and then nodepool-launcher seeing the locks on the zookeeper requests being removed.

Then nodepool-launcher will delete all the nodes in the gate, under the running jobs and ansible will not be able to SSH into that node (because it is gone).

The fix, is to move to the zookeeper cluster (zk01 / zk02 / zk03) and update zuul.o.o and nodepool-launcher to use it.