Jenkins errors killing many CI jobs

Bug #1543810 reported by Ben Nemec
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

Seeing this a lot today. Almost every job is failing right now. Maybe connectivity issues between our CI cloud and infra?

2016-02-09 21:22:26.013 | FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
2016-02-09 21:22:26.016 | hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
2016-02-09 21:22:26.016 | at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
2016-02-09 21:22:26.016 | at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
2016-02-09 21:22:26.016 | at hudson.remoting.Request.call(Request.java:174)
2016-02-09 21:22:26.016 | at hudson.remoting.Channel.call(Channel.java:742)
2016-02-09 21:22:26.017 | at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168)
2016-02-09 21:22:26.017 | at com.sun.proxy.$Proxy42.join(Unknown Source)
2016-02-09 21:22:26.017 | at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956)
2016-02-09 21:22:26.017 | at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:137)
2016-02-09 21:22:26.017 | at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:97)
2016-02-09 21:22:26.017 | at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
2016-02-09 21:22:26.017 | at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
2016-02-09 21:22:26.017 | at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:756)
2016-02-09 21:22:26.017 | at hudson.model.Build$BuildExecution.build(Build.java:198)
2016-02-09 21:22:26.017 | at hudson.model.Build$BuildExecution.doRun(Build.java:159)
2016-02-09 21:22:26.017 | at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
2016-02-09 21:22:26.018 | at hudson.model.Run.execute(Run.java:1706)
2016-02-09 21:22:26.018 | at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
2016-02-09 21:22:26.018 | at hudson.model.ResourceController.execute(ResourceController.java:88)
2016-02-09 21:22:26.018 | at hudson.model.Executor.run(Executor.java:232)
2016-02-09 21:22:26.018 | Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
2016-02-09 21:22:26.018 | at hudson.remoting.Request.abort(Request.java:299)
2016-02-09 21:22:26.018 | at hudson.remoting.Channel.terminate(Channel.java:805)
2016-02-09 21:22:26.018 | at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
2016-02-09 21:22:26.018 | Caused by: java.io.IOException: Unexpected termination of the channel
2016-02-09 21:22:26.018 | at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
2016-02-09 21:22:26.018 | Caused by: java.io.EOFException
2016-02-09 21:22:26.019 | at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
2016-02-09 21:22:26.019 | at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
2016-02-09 21:22:26.019 | at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
2016-02-09 21:22:26.019 | at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
2016-02-09 21:22:26.019 | at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:40)
2016-02-09 21:22:26.019 | at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
2016-02-09 21:22:26.019 | at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

Tags: alert
Revision history for this message
Derek Higgins (derekh) wrote :

Been digging into this a bit tonight, by the looks of it our compute nodes have been loosing their IP addresses (and with it the connection to the jenkins slave), I'll continue trying to look into why tomorrow.

Revision history for this message
Marios Andreou (marios-b) wrote :

I see some things are passing now, like https://review.openstack.org/#/c/271940/5 is a recent run from this morning

 Build succeeded (check-tripleo pipeline).

    gate-tripleo-ci-f22-ceph SUCCESS in 1h 37m 12s
    gate-tripleo-ci-f22-ha SUCCESS in 1h 40m 41s
    gate-tripleo-ci-f22-nonha SUCCESS in 1h 42m 31s

Revision history for this message
Derek Higgins (derekh) wrote :

I rebuilt most of the compute nodes last night, my current theory is that they fell over when we fixed the controller on Monday and opened the flood gates(lots of jobs were queued). Gonna close this and reopen if It starts again in the next day or so.

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.