Hi,
When migrating neutron routers between hosts/neutron-gateways, jenkins-slave dies as follows:
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: INFO: Setting up agent: jenkins-slave-xenial-12
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: WARNING: No Working Directory. Using the legacy JAR Cache location: /var/lib/jenkins/.jenkins/cache/jars
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: INFO: Locating server among [https://jenkins.ols.canonical.com/online-services/, http://jenkins-be.internal:8080/online-services/]
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: Oct 13, 2019 11:30:26 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: Oct 13, 2019 11:30:26 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: Oct 13, 2019 11:30:26 PM org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecv
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: INFO: [JNLP4-connect connection to 10.25.200.124/10.25.200.124:48484] Local headers refused by remote: jenkins-slave-xenial-12 is already connected to this master. Rejecting this connection.
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: jenkins-slave-xenial-12 is already connected to this master. Rejecting this connection.
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: jenkins-slave-xenial-12 is already connected to this master. Rejecting this connection.
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:378)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.onRecvClosed(ConnectionHeadersFilterLayer.java:433)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:172)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:48)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:247)
We should have the service script retry.
In bionic deploys, there's already a systemd unit file.
It has :
Restart=on-failure
Which I think means instant restart. What I've seen is that just after the above, jenkins-slave will try to start a bunch of time and fail, and reach the "max-restart" state.
I think we just need to add a pause between restarts in the systemd unit file.