Could not initialize class jenkins.model.Jenkins$MasterComputer

Bug #1260654 reported by Thierry Carrez
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Core Infrastructure
Fix Released
Critical
Jeremy Stanley
OpenStack-Gate
Fix Released
Critical
Jeremy Stanley

Bug Description

Test job fails on a Jenkins error:
http://logs.openstack.org/32/61532/1/gate/gate-heat-python27/5d7c9dc/

Building remotely on precise14 in workspace /home/jenkins/workspace/gate-heat-python27
2013-12-13 06:48:41.337 | FATAL: command execution failed
2013-12-13 06:48:41.338 | java.io.IOException: Remote call on precise14 failed
2013-12-13 06:48:41.338 | at hudson.remoting.Channel.call(Channel.java:722)
2013-12-13 06:48:41.339 | at hudson.Launcher$RemoteLauncher.launch(Launcher.java:862)
2013-12-13 06:48:41.339 | at hudson.Launcher$ProcStarter.start(Launcher.java:353)
2013-12-13 06:48:41.339 | at hudson.Launcher$ProcStarter.join(Launcher.java:360)
2013-12-13 06:48:41.340 | at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
2013-12-13 06:48:41.340 | at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
2013-12-13 06:48:41.340 | at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
2013-12-13 06:48:41.341 | at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
2013-12-13 06:48:41.343 | at hudson.model.Build$BuildExecution.build(Build.java:199)
2013-12-13 06:48:41.343 | at hudson.model.Build$BuildExecution.doRun(Build.java:160)
2013-12-13 06:48:41.343 | at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
2013-12-13 06:48:41.344 | at hudson.model.Run.execute(Run.java:1593)
2013-12-13 06:48:41.344 | at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
2013-12-13 06:48:41.344 | at hudson.model.ResourceController.execute(ResourceController.java:88)
2013-12-13 06:48:41.344 | at hudson.model.Executor.run(Executor.java:247)
2013-12-13 06:48:41.345 | Caused by: java.lang.NoClassDefFoundError: Could not initialize class jenkins.model.Jenkins$MasterComputer
2013-12-13 06:48:41.345 | at hudson.Launcher$LocalLauncher.<init>(Launcher.java:755)
2013-12-13 06:48:41.345 | at hudson.Launcher$RemoteLaunchCallable.call(Launcher.java:991)
2013-12-13 06:48:41.345 | at hudson.Launcher$RemoteLaunchCallable.call(Launcher.java:965)
2013-12-13 06:48:41.345 | at hudson.remoting.UserRequest.perform(UserRequest.java:118)
2013-12-13 06:48:41.346 | at hudson.remoting.UserRequest.perform(UserRequest.java:48)
2013-12-13 06:48:41.346 | at hudson.remoting.Request$2.run(Request.java:326)
2013-12-13 06:48:41.346 | at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
2013-12-13 06:48:41.346 | at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
2013-12-13 06:48:41.346 | at java.util.concurrent.FutureTask.run(FutureTask.java:166)
2013-12-13 06:48:41.347 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
2013-12-13 06:48:41.347 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
2013-12-13 06:48:41.347 | at java.lang.Thread.run(Thread.java:724)

Revision history for this message
Alan Pevec (apevec) wrote :

We have more bad slaves, now precise20 and different NoClassDefFoundError:
 Could not initialize class org.apache.tools.ant.Location

https://jenkins02.openstack.org/job/gate-nova-python27/13176/console

Changed in openstack-ci:
status: New → Confirmed
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

logstash query: "Remote call on" AND "failed" a lot of failures in the last 12 hours.

precise14 - 854 hits
precise20 - 30 hits
devstack-precise-hpcloud-az2-850860 - 8 hits

Revision history for this message
Jeremy Stanley (fungi) wrote :

devstack-precise-hpcloud-az2-850860 would have been deleted right away after this impacted it... chances are it was reused after the jenkins01 crash and restart.

As for precise14 and precise20, they were exhibiting Jenkins slave agent communication issues. Most likely something happened to agent communication between those two and jenkins02 after we performed a planned restart of it to prevent the JVM out-of-memory condition which had caused jenkins01 to shoot itself in the head.

I took both affected slaves out of service in jenkins02, rebooted them for good measure, then disconnected and relaunched the slave agent making sure it succeeded on both. Then I watched a job run to completion successfully on each so they should be okay at this point.

Near term measures for prevention are already underway, migrating our current long-term-slave jobs to single-use bare (non-devstack) slaves managed by nodepool. We have already moved some infra jobs to them as dogfood, so hopefully this issue with long-term slaves going into rapid-fire job failure will soon be behind us.

Changed in openstack-ci:
status: Confirmed → Fix Released
importance: Undecided → Critical
assignee: nobody → Jeremy Stanley (fungi)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to elastic-recheck (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/97993

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to elastic-recheck (master)

Reviewed: https://review.openstack.org/97993
Committed: https://git.openstack.org/cgit/openstack-infra/elastic-recheck/commit/?id=dd20b10729e1a80566c8cd83cf81b632e11995de
Submitter: Jenkins
Branch: master

commit dd20b10729e1a80566c8cd83cf81b632e11995de
Author: Joe Gordon <email address hidden>
Date: Wed Jun 4 17:17:20 2014 -0700

    Suppress the graph for bug 1260654

    Bug 1260654 shouldn't actually fail changes because
    zuul recognizes that class of failure as an instance where it should
    retrigger jobs

    Change-Id: I19a5e19058197ef24fc9cd2495b95420763338e1
    Related-Bug: #1260654

Jeremy Stanley (fungi)
Changed in openstack-gate:
status: New → Fix Released
importance: Undecided → Critical
assignee: nobody → Jeremy Stanley (fungi)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.