Comment 1 for bug 369109

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Today we had this in the log:

2010-04-15 23:53:23+0100 [-] Starting scanning cycle.
2010-04-15 23:56:46+0100 [-] Disabling builder: http://gourd.buildd:8221/ -- timed out
2010-04-15 23:56:46+0100 [-] Traceback (most recent call last):
2010-04-15 23:56:46+0100 [-] File "/srv/launchpad.net/codelines/soyuz-production-rev-9191/lib/lp/buildmaster/model/builder.py", line 205, in updateBuilderStatus
2010-04-15 23:56:46+0100 [-] builder.checkSlaveAlive()
2010-04-15 23:56:46+0100 [-] File "/srv/launchpad.net/codelines/soyuz-production-rev-9191/lib/lp/buildmaster/model/builder.py", line 320, in checkSlaveAlive
2010-04-15 23:56:46+0100 [-] if self.slave.echo("Test")[0] != "Test":
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/xmlrpclib.py", line 1147, in __call__
2010-04-15 23:56:46+0100 [-] return self.__send(self.__name, args)
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/xmlrpclib.py", line 1437, in __request
2010-04-15 23:56:46+0100 [-] verbose=self.__verbose
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/xmlrpclib.py", line 1185, in request
2010-04-15 23:56:46+0100 [-] errcode, errmsg, headers = h.getreply()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/httplib.py", line 1199, in getreply
2010-04-15 23:56:46+0100 [-] response = self._conn.getresponse()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/httplib.py", line 928, in getresponse
2010-04-15 23:56:46+0100 [-] response.begin()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/httplib.py", line 385, in begin
2010-04-15 23:56:46+0100 [-] version, status, reason = self._read_status()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/httplib.py", line 343, in _read_status
2010-04-15 23:56:46+0100 [-] line = self.fp.readline()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/socket.py", line 331, in readline
2010-04-15 23:56:46+0100 [-] data = recv(1)
2010-04-15 23:56:46+0100 [-] timeout: timed out

It seems as though there are two competing ways of timing stuff out.
1. the code in lib/lp/buildmaster/manager.py (QueryWithTimeoutProtocol)
2. lib/lp/buildmaster/model/builder.py (TimeoutTransport)

Different actions seem to cause timeouts in each of these. This is crap.

It also seems as though the updateBuilderStatus() should catch the above timeout exception. When it doesn't it will produce the traceback as above and leave the builder disabled but with the build still on it.