CI: rdocloud node randomly going offline during jobs

Bug #1729586 reported by Gabriele Cerami on 2017-11-02
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Gabriele Cerami

Bug Description

logs at

https://review.rdoproject.org/jenkins/job/periodic-tripleo-centos-7-master-containers-build/254/console

show

06:15:20 Slave went offline during the build
06:15:20 ERROR: Connection was broken: java.io.IOException: Unexpected termination of the channel
06:15:20 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
06:15:20 Caused by: java.io.EOFException
06:15:20 at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
06:15:20 at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
06:15:20 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
06:15:20 at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
06:15:20 at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
06:15:20 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
06:15:20 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
06:15:20
06:15:20 Build step 'Execute shell' marked build as failure
06:15:20 FATAL: remote file operation failed: /home/jenkins/secretFiles/66eec330-e861-4cec-9b22-d9f1e39aaa0c at hudson.remoting.Channel@6b4b7bb0:upstream-centos-7-rdo-cloud-tripleo-30473: hudson.remoting.ChannelClosedException: channel is already closed
06:15:20 java.io.IOException: remote file operation failed: /home/jenkins/secretFiles/66eec330-e861-4cec-9b22-d9f1e39aaa0c at hudson.remoting.Channel@6b4b7bb0:upstream-centos-7-rdo-cloud-tripleo-30473: hudson.remoting.ChannelClosedException: channel is already closed
06:15:20 at hudson.FilePath.act(FilePath.java:986)
06:15:20 at hudson.FilePath.act(FilePath.java:968)
06:15:20 at hudson.FilePath.deleteRecursive(FilePath.java:1170)
06:15:20 at org.jenkinsci.plugins.credentialsbinding.impl.FileBinding$UnbinderImpl.unbind(FileBinding.java:75)
06:15:20 at org.jenkinsci.plugins.credentialsbinding.impl.SecretBuildWrapper$1.tearDown(SecretBuildWrapper.java:68)
06:15:20 at hudson.model.Build$BuildExecution.doRun(Build.java:173)
06:15:20 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
06:15:20 at hudson.model.Run.execute(Run.java:1738)
06:15:20 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
06:15:20 at hudson.model.ResourceController.execute(ResourceController.java:98)
06:15:20 at hudson.model.Executor.run(Executor.java:410)
06:15:20 Caused by: hudson.remoting.ChannelClosedException: channel is already closed
06:15:20 at hudson.remoting.Channel.send(Channel.java:578)
06:15:20 at hudson.remoting.Request.call(Request.java:130)
06:15:20 at hudson.remoting.Channel.call(Channel.java:780)
06:15:20 at hudson.FilePath.act(FilePath.java:979)
06:15:20 ... 10 more
06:15:20 Caused by: java.io.IOException: Unexpected termination of the channel
06:15:20 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
06:15:20 Caused by: java.io.EOFException
06:15:20 at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
06:15:20 at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
06:15:20 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
06:15:20 at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
06:15:20 at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
06:15:20 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
06:15:20 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
06:15:20 [PostBuildScript] - Execution post build scripts.
06:15:20 FATAL: Unable to produce a script file
06:15:20 java.io.IOException: Failed to create a temp file on /home/jenkins/workspace/periodic-tripleo-centos-7-master-containers-build
06:15:20 at hudson.FilePath.createTextTempFile(FilePath.java:1382)
06:15:20 at hudson.tasks.CommandInterpreter.createScriptFile(CommandInterpreter.java:142)
06:15:20 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:80)
06:15:20 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:64)
06:15:20 at org.jenkinsci.plugins.postbuildscript.PostBuildScript.processBuildSteps(PostBuildScript.java:204)
06:15:20 at org.jenkinsci.plugins.postbuildscript.PostBuildScript.processScripts(PostBuildScript.java:143)
06:15:20 at org.jenkinsci.plugins.postbuildscript.PostBuildScript._perform(PostBuildScript.java:105)
06:15:20 at org.jenkinsci.plugins.postbuildscript.PostBuildScript.perform(PostBuildScript.java:85)
06:15:20 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
06:15:20 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
06:15:20 at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:723)
06:15:20 at hudson.model.Build$BuildExecution.post2(Build.java:185)
06:15:20 at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:668)
06:15:20 at hudson.model.Run.execute(Run.java:1763)
06:15:20 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
06:15:20 at hudson.model.ResourceController.execute(ResourceController.java:98)
06:15:20 at hudson.model.Executor.run(Executor.java:410)
06:15:20 Caused by: java.io.IOException: remote file operation failed: /home/jenkins/workspace/periodic-tripleo-centos-7-master-containers-build at hudson.remoting.Channel@6b4b7bb0:upstream-centos-7-rdo-cloud-tripleo-30473: hudson.remoting.ChannelClosedException: channel is already closed
06:15:20 at hudson.FilePath.act(FilePath.java:986)
06:15:20 at hudson.FilePath.act(FilePath.java:968)
06:15:20 at hudson.FilePath.createTextTempFile(FilePath.java:1356)
06:15:20 ... 16 more
06:15:20 Caused by: hudson.remoting.ChannelClosedException: channel is already closed
06:15:20 at hudson.remoting.Channel.send(Channel.java:578)
06:15:20 at hudson.remoting.Request.call(Request.java:130)
06:15:20 at hudson.remoting.Channel.call(Channel.java:780)
06:15:20 at hudson.FilePath.act(FilePath.java:979)
06:15:20 ... 18 more
06:15:20 Caused by: java.io.IOException: Unexpected termination of the channel
06:15:20 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
06:15:20 Caused by: java.io.EOFException
06:15:20 at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
06:15:20 at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
06:15:20 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
06:15:20 at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
06:15:20 at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
06:15:20 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
06:15:20 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
06:15:20 Build step 'Execute a set of scripts' marked build as failure
06:15:20 [PostBuildScript] - Execution post build scripts.
06:15:20 FATAL: Unable to produce a script file
06:15:20 java.io.IOException: Failed to create a temp file on /home/jenkins/workspace/periodic-tripleo-centos-7-master-containers-build
06:15:20 at hudson.FilePath.createTextTempFile(FilePath.java:1382)
06:15:20 at hudson.tasks.CommandInterpreter.createScriptFile(CommandInterpreter.java:142)
06:15:20 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:80)
06:15:20 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:64)
06:15:20 at org.jenkinsci.plugins.postbuildscript.PostBuildScript.processBuildSteps(PostBuildScript.java:204)
06:15:20 at org.jenkinsci.plugins.postbuildscript.PostBuildScript.processScripts(PostBuildScript.java:143)
06:15:20 at org.jenkinsci.plugins.postbuildscript.PostBuildScript._perform(PostBuildScript.java:105)
06:15:20 at org.jenkinsci.plugins.postbuildscript.PostBuildScript.perform(PostBuildScript.java:85)
06:15:20 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
06:15:20 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
06:15:20 at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:723)
06:15:20 at hudson.model.Build$BuildExecution.post2(Build.java:185)
06:15:20 at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:668)
06:15:20 at hudson.model.Run.execute(Run.java:1763)
06:15:20 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
06:15:20 at hudson.model.ResourceController.execute(ResourceController.java:98)
06:15:20 at hudson.model.Executor.run(Executor.java:410)
06:15:20 Caused by: java.io.IOException: remote file operation failed: /home/jenkins/workspace/periodic-tripleo-centos-7-master-containers-build at hudson.remoting.Channel@6b4b7bb0:upstream-centos-7-rdo-cloud-tripleo-30473: hudson.remoting.ChannelClosedException: channel is already closed
06:15:20 at hudson.FilePath.act(FilePath.java:986)
06:15:20 at hudson.FilePath.act(FilePath.java:968)
06:15:20 at hudson.FilePath.createTextTempFile(FilePath.java:1356)
06:15:20 ... 16 more
06:15:20 Caused by: hudson.remoting.ChannelClosedException: channel is already closed
06:15:20 at hudson.remoting.Channel.send(Channel.java:578)
06:15:20 at hudson.remoting.Request.call(Request.java:130)
06:15:20 at hudson.remoting.Channel.call(Channel.java:780)
06:15:20 at hudson.FilePath.act(FilePath.java:979)
06:15:20 ... 18 more
06:15:20 Caused by: java.io.IOException: Unexpected termination of the channel
06:15:20 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
06:15:20 Caused by: java.io.EOFException
06:15:20 at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
06:15:20 at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
06:15:20 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
06:15:20 at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
06:15:20 at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
06:15:20 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
06:15:20 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
06:15:20 Build step 'Execute a set of scripts' marked build as failure

on a single job we've hit this error twice in the last day.

Changed in tripleo:
status: Confirmed → Triaged
Alan Pevec (apevec) wrote :

From #rdo
http://eavesdrop.openstack.org/irclogs/%23rdo/latest.log.html#t2017-11-02T15:53:45

* might be overloaded Jenkins master
* nodepool node is standard size 8G 8cpu with default 8G swap
  so might need to trim-down the job (there is UC install which shouldn't be needed for containers build

Alfredo Moralejo (amoralej) wrote :

Note that not only container build jobs fail with this issue. We just got multiple jobs failures with this issue including multinode, ovb and containers build. As the failure happened simultaneously in different jobs i'd say we can discard slaves issues as root cause. I'd say that the problem could be in the jenkins master or an actual networking problem.

wes hayutin (weshayutin) wrote :

removing alert, promotion blocker stays in place. This is being worked by rdo-infra

tags: removed: alert
Alan Pevec (apevec) wrote :

Quickfix workaround: max-servers decreased in https://review.rdoproject.org/r/10591 to lower the load on RDO Jenkins.

Longer-term: start migrating TripleO periodic jobs running in review.rdoproject.org to Zuul v3 after upgrade to SF 2.7

Changed in tripleo:
milestone: queens-2 → queens-3
Alan Pevec (apevec) wrote :

Let's close this, workaround helped and tripleo-ci Zuul v3 migration is tracked elsewhere.

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers