test_builder.py / test_abort is failing on buildbot, but not locally

Bug #655559 reported by Julian Edwards
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Julian Edwards

Bug Description

Error in test lp.buildmaster.tests.test_builder.TestSlave.test_abort
Traceback (most recent call last):
_StringException: Text attachment: traceback
------------
Traceback (most recent call last):
  File "/srv/buildbot/slaves/launchpad/production-devel/build/orig_sourcecode/eggs/testtools-0.9.6-py2.5.egg/testtools/runtest.py", line 144, in _run_user
    return fn(*args)
  File "/srv/buildbot/slaves/launchpad/production-devel/build/orig_sourcecode/eggs/testtools-0.9.6-py2.5.egg/testtools/testcase.py", line 465, in _run_test_method
    testMethod()
  File "/srv/buildbot/slaves/launchpad/production-devel/build/lib/lp/buildmaster/tests/test_builder.py", line 544, in test_abort
    result = slave.abort()
  File "/srv/buildbot/slaves/launchpad/production-devel/build/lib/lp/buildmaster/model/builder.py", line 150, in abort
    return self._server.abort()
  File "/usr/lib/python2.5/xmlrpclib.py", line 1147, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.5/xmlrpclib.py", line 1437, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.5/xmlrpclib.py", line 1201, in request
    return self._parse_response(h.getfile(), sock)
  File "/usr/lib/python2.5/xmlrpclib.py", line 1340, in _parse_response
    return u.close()
  File "/usr/lib/python2.5/xmlrpclib.py", line 787, in close
    raise Fault(**self._stack[0])
Fault: <Fault 8002: 'error'>
------------
Text attachment: xmlrpc-log-file
------------
2010-10-06 03:08:54+0530 [-] Log opened.
2010-10-06 03:08:54+0530 [-] twistd 10.1.0 (/usr/bin/python 2.5.2) starting up.
2010-10-06 03:08:54+0530 [-] reactor class: twisted.internet.selectreactor.SelectReactor.
2010-10-06 03:08:54+0530 [-] twisted.web.server.Site starting on 8221
2010-10-06 03:08:54+0530 [-] Starting factory <twisted.web.server.Site instance at 0x1522ab8>
2010-10-06 03:08:54+0530 [-] daemon ready!
2010-10-06 03:08:54+0530 [HTTPChannel,0,127.0.0.1] Build log: RUN: /bin/echo ['echo', 'Forking build subprocess...']
2010-10-06 03:08:54+0530 [HTTPChannel,0,127.0.0.1] 127.0.0.1 - - [05/Oct/2010:21:38:54 +0000] "POST /rpc/rpc HTTP/1.0" 200 282 "-" "xmlrpclib.py/1.0.1 (by www.pythonware.com)"
2010-10-06 03:08:54+0530 [-] Build log: Forking build subprocess...
2010-10-06 03:08:54+0530 [HTTPChannel,1,127.0.0.1] Unhandled Error
 Traceback (most recent call last):
   File "/srv/buildbot/slaves/launchpad/production-devel/build/orig_sourcecode/eggs/Twisted-10.1.0-py2.5-linux-x86_64.egg/twisted/web/server.py", line 125, in process
     self.render(resrc)
   File "/srv/buildbot/slaves/launchpad/production-devel/build/orig_sourcecode/eggs/Twisted-10.1.0-py2.5-linux-x86_64.egg/twisted/web/server.py", line 132, in render
     body = resrc.render(self)
   File "/srv/buildbot/slaves/launchpad/production-devel/build/orig_sourcecode/eggs/Twisted-10.1.0-py2.5-linux-x86_64.egg/twisted/web/resource.py", line 210, in render
     return m(request)
   File "/srv/buildbot/slaves/launchpad/production-devel/build/orig_sourcecode/eggs/Twisted-10.1.0-py2.5-linux-x86_64.egg/twisted/web/xmlrpc.py", line 148, in render_POST
     d = defer.maybeDeferred(function, *args)
 --- <exception caught here> ---
   File "/srv/buildbot/slaves/launchpad/production-devel/build/orig_sourcecode/eggs/Twisted-10.1.0-py2.5-linux-x86_64.egg/twisted/internet/defer.py", line 125, in maybeDeferred
     result = f(*args, **kw)
   File "/srv/buildbot/slaves/launchpad/production-devel/build/lib/canonical/buildd/slave.py", line 640, in xmlrpc_abort
     self.slave.abort()
   File "/srv/buildbot/slaves/launchpad/production-devel/build/lib/canonical/buildd/slave.py", line 362, in abort
     self.manager.abort()
   File "/srv/buildbot/slaves/launchpad/production-devel/build/lib/canonical/buildd/slave.py", line 197, in abort
     self._subprocess.transport.signalProcess('TERM')
   File "/srv/buildbot/slaves/launchpad/production-devel/build/orig_sourcecode/eggs/Twisted-10.1.0-py2.5-linux-x86_64.egg/twisted/internet/process.py", line 336, in signalProcess
     raise ProcessExitedAlready()
 twisted.internet.error.ProcessExitedAlready:

2010-10-06 03:08:54+0530 [HTTPChannel,1,127.0.0.1] 127.0.0.1 - - [05/Oct/2010:21:38:54 +0000] "POST /rpc/rpc HTTP/1.0" 200 263 "-" "xmlrpclib.py/1.0.1 (by www.pythonware.com)"
2010-10-06 03:08:54+0530 [-] Iterating with success flag 0 against stage INIT
2010-10-06 03:08:54+0530 [-] Build log: RUN: /var/tmp/buildd/slavebin/unpack-chroot ['unpack-chroot', 'lp.buildmaster.tests.test_builder.TestSlave.test_abort-1', '/var/tmp/buildd/filecache/fake-chroot']
2010-10-06 03:08:54+0530 [-] Build log: Upon execvpe /var/tmp/buildd/slavebin/unpack-chroot ['unpack-chroot', 'lp.buildmaster.tests.test_builder.TestSlave.test_abort-1', '/var/tmp/buildd/filecache/fake-chroot'] in environment id 23013584
2010-10-06 03:08:54+0530 [-] :Traceback (most recent call last):
2010-10-06 03:08:54+0530 [-] File "/srv/buildbot/slaves/launchpad/production-devel/build/orig_sourcecode/eggs/Twisted-10.1.0-py2.5-linux-x86_64.egg/twisted/internet/process.py", line 411, in _fork
2010-10-06 03:08:54+0530 [-] executable, args, environment)
2010-10-06 03:08:54+0530 [-] File "/srv/buildbot/slaves/launchpad/production-devel/build/orig_sourcecode/eggs/Twisted-10.1.0-py2.5-linux-x86_64.egg/twisted/internet/process.py", line 457, in _execChild
2010-10-06 03:08:54+0530 [-] os.execvpe(executable, args, environment)
2010-10-06 03:08:54+0530 [-] File "/usr/lib/python2.5/os.py", line 362, in execvpe
2010-10-06 03:08:54+0530 [-] _execvpe(file, args, env)
2010-10-06 03:08:54+0530 [-] File "/usr/lib/python2.5/os.py", line 377, in _execvpe
2010-10-06 03:08:54+0530 [-] func(file, *argrest)
2010-10-06 03:08:54+0530 [-] OSError: [Errno 2] No such file or directory
2010-10-06 03:08:54+0530 [-] Iterating with success flag 1 against stage UNPACK
2010-10-06 03:08:54+0530 [-] Build log: RUN: /var/tmp/buildd/slavebin/remove-build ['remove-build', 'lp.buildmaster.tests.test_builder.TestSlave.test_abort-1']
2010-10-06 03:08:54+0530 [-] Received SIGTERM, shutting down.
2010-10-06 03:08:54+0530 [-] Build log: Upon execvpe /var/tmp/buildd/slavebin/remove-build ['remove-build', 'lp.buildmaster.tests.test_builder.TestSlave.test_abort-1'] in environment id 26217728
2010-10-06 03:08:54+0530 [-] :Traceback (most recent call last):
------------

Changed in soyuz:
status: New → Triaged
importance: Undecided → High
tags: added: tech-debt
Revision history for this message
Jonathan Lange (jml) wrote :

From #launchpad-dev,

<bigjools> jml: can you look at the prod_lp buildbot output, the test_abort that we added is failing. I'm going to disable it as I don't think there's a quick fix but you might see differently.
<jml> bigjools: at a first glance it looks like a race: a bug in the slave maybe
<bigjools> jml: we're not waiting for the build to start before trying to kill it
<wgrant> The issue is more that you're waiting too long, so the test config crashes.
<jml> in either case, bug in the slave
<jml> it's a server, it should handle people asking it to do things
<bigjools> yes
<bigjools> as I discussed with wgrant above, we need to catch that ProcessAlreadyExited and also deal with trying to abort something that's not started
<jml> yeah, that sounds about right
<wgrant> Or just fix the test config to sleep a lot.
<bigjools> no :)
<wgrant> The problem is that it gets too far. Not that it doesn't get far enough.
<wgrant> Adding sleeps won't delay the test suite.
<jml> wgrant: even so, the slave should catch the ProcessAlreadyExited exception
<wgrant> jml: Indeed. It happens a bit locally.
<wgrant> But blah twisted blah confusing.
<jml> not at all.
<jml> try: kill_process(); except ProcessAlreadyExited: pass # yay less work for me
<bigjools> I wonder if in this case it should not be raising exceptions here, the vast majority of use cases don't care if it's already exited.
<bigjools> a return value would be sufficient if you need to know
<jml> bigjools: I don't know the use case here, but lots of similar code just swallows the exception
<jml> bigjools: it's not so much "kill the process" as "ensure that it's dead"
<bigjools> jml: yes - that points to a flaw in the API IMO
<jml> bigjools: file a ticket on Twisted if you'd like
<bigjools> mebbe
<wgrant> jml: OK then, "but blah twisted blah confusing blah slave blah untested."

Revision history for this message
Launchpad QA Bot (lpqabot) wrote : Bug fixed by a commit

Fixed in stable r11681 (http://bazaar.launchpad.net/~launchpad-pqm/launchpad/stable/revision/11681) by a commit, but not testable.

Changed in soyuz:
assignee: nobody → Julian Edwards (julian-edwards)
milestone: none → 10.10
tags: added: qa-untestable
Changed in soyuz:
status: Triaged → Fix Committed
Curtis Hovey (sinzui)
Changed in soyuz:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.