Jenkins CI Agent Charm

Make jenkins-slave more resilient, ship out systemd service to retry

Bug #1847939 reported by Haw Loeung on 2019-10-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Jenkins CI Agent Charm	Fix Released	Low	Haw Loeung

Bug Description

Hi,

When migrating neutron routers between hosts/neutron-gateways, jenkins-slave dies as follows:

| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: INFO: Setting up agent: jenkins-slave-xenial-12
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: WARNING: No Working Directory. Using the legacy JAR Cache location: /var/lib/jenkins/.jenkins/cache/jars
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: INFO: Locating server among [https://jenkins.ols.canonical.com/online-services/, http://jenkins-be.internal:8080/online-services/]
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: Oct 13, 2019 11:30:26 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: Oct 13, 2019 11:30:26 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: Oct 13, 2019 11:30:26 PM org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecv
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: INFO: [JNLP4-connect connection to 10.25.200.124/10.25.200.124:48484] Local headers refused by remote: jenkins-slave-xenial-12 is already connected to this master. Rejecting this connection.
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: jenkins-slave-xenial-12 is already connected to this master. Rejecting this connection.
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: jenkins-slave-xenial-12 is already connected to this master. Rejecting this connection.
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:378)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.onRecvClosed(ConnectionHeadersFilterLayer.java:433)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:172)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:48)
| Oct 13 23:30:26 juju-manual-jenkaas-4 bash[31920]: #011at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:247)

We should have the service script retry.

See original description

Related branches

~hloeung/jenkins-agent-charm:master

Merged into jenkins-agent-charm:master at revision 4965b020fb1d6614db32130f366190d19769903f

Paul Collins: Approve (lgtm) on 2019-10-15

Canonical IS Reviewers: Pending requested 2019-10-15

Haw Loeung (hloeung) on 2019-10-14

Changed in jenkins-slave-charm:
assignee:	nobody → Haw Loeung (hloeung)
description:	updated

Haw Loeung (hloeung) on 2019-10-14

Changed in jenkins-slave-charm:
status:	New → In Progress

Revision history for this message

Junien F (axino) wrote on 2019-10-14:

In bionic deploys, there's already a systemd unit file.
It has :
Restart=on-failure

Which I think means instant restart. What I've seen is that just after the above, jenkins-slave will try to start a bunch of time and fail, and reach the "max-restart" state.

I think we just need to add a pause between restarts in the systemd unit file.

Haw Loeung (hloeung) on 2019-10-15

Changed in jenkins-slave-charm:
importance:	Undecided → Low

Haw Loeung (hloeung) on 2019-10-15

Changed in jenkins-slave-charm:
status:	In Progress → Fix Committed

Haw Loeung (hloeung) on 2019-10-15

Changed in jenkins-slave-charm:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.