2010-04-15 23:53:23+0100 [-] Starting scanning cycle.
2010-04-15 23:56:46+0100 [-] Disabling builder: http://gourd.buildd:8221/ -- timed out
2010-04-15 23:56:46+0100 [-] Traceback (most recent call last):
2010-04-15 23:56:46+0100 [-] File "/srv/launchpad.net/codelines/soyuz-production-rev-9191/lib/lp/buildmaster/model/builder.py", line 205, in updateBuilderStatus
2010-04-15 23:56:46+0100 [-] builder.checkSlaveAlive()
2010-04-15 23:56:46+0100 [-] File "/srv/launchpad.net/codelines/soyuz-production-rev-9191/lib/lp/buildmaster/model/builder.py", line 320, in checkSlaveAlive
2010-04-15 23:56:46+0100 [-] if self.slave.echo("Test")[0] != "Test":
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/xmlrpclib.py", line 1147, in __call__
2010-04-15 23:56:46+0100 [-] return self.__send(self.__name, args)
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/xmlrpclib.py", line 1437, in __request
2010-04-15 23:56:46+0100 [-] verbose=self.__verbose
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/xmlrpclib.py", line 1185, in request
2010-04-15 23:56:46+0100 [-] errcode, errmsg, headers = h.getreply()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/httplib.py", line 1199, in getreply
2010-04-15 23:56:46+0100 [-] response = self._conn.getresponse()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/httplib.py", line 928, in getresponse
2010-04-15 23:56:46+0100 [-] response.begin()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/httplib.py", line 385, in begin
2010-04-15 23:56:46+0100 [-] version, status, reason = self._read_status()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/httplib.py", line 343, in _read_status
2010-04-15 23:56:46+0100 [-] line = self.fp.readline()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/python2.5/socket.py", line 331, in readline
2010-04-15 23:56:46+0100 [-] data = recv(1)
2010-04-15 23:56:46+0100 [-] timeout: timed out
It seems as though there are two competing ways of timing stuff out.
1. the code in lib/lp/buildmaster/manager.py (QueryWithTimeoutProtocol)
2. lib/lp/buildmaster/model/builder.py (TimeoutTransport)
Different actions seem to cause timeouts in each of these. This is crap.
It also seems as though the updateBuilderStatus() should catch the above timeout exception. When it doesn't it will produce the traceback as above and leave the builder disabled but with the build still on it.
Today we had this in the log:
2010-04-15 23:53:23+0100 [-] Starting scanning cycle. gourd.buildd: 8221/ -- timed out .net/codelines/ soyuz-productio n-rev-9191/ lib/lp/ buildmaster/ model/builder. py", line 205, in updateBuilderStatus checkSlaveAlive () .net/codelines/ soyuz-productio n-rev-9191/ lib/lp/ buildmaster/ model/builder. py", line 320, in checkSlaveAlive echo("Test" )[0] != "Test": python2. 5/xmlrpclib. py", line 1147, in __call__ send(self. __name, args) python2. 5/xmlrpclib. py", line 1437, in __request self.__ verbose python2. 5/xmlrpclib. py", line 1185, in request python2. 5/httplib. py", line 1199, in getreply getresponse( ) python2. 5/httplib. py", line 928, in getresponse python2. 5/httplib. py", line 385, in begin python2. 5/httplib. py", line 343, in _read_status python2. 5/socket. py", line 331, in readline
2010-04-15 23:56:46+0100 [-] Disabling builder: http://
2010-04-15 23:56:46+0100 [-] Traceback (most recent call last):
2010-04-15 23:56:46+0100 [-] File "/srv/launchpad
2010-04-15 23:56:46+0100 [-] builder.
2010-04-15 23:56:46+0100 [-] File "/srv/launchpad
2010-04-15 23:56:46+0100 [-] if self.slave.
2010-04-15 23:56:46+0100 [-] File "/usr/lib/
2010-04-15 23:56:46+0100 [-] return self.__
2010-04-15 23:56:46+0100 [-] File "/usr/lib/
2010-04-15 23:56:46+0100 [-] verbose=
2010-04-15 23:56:46+0100 [-] File "/usr/lib/
2010-04-15 23:56:46+0100 [-] errcode, errmsg, headers = h.getreply()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/
2010-04-15 23:56:46+0100 [-] response = self._conn.
2010-04-15 23:56:46+0100 [-] File "/usr/lib/
2010-04-15 23:56:46+0100 [-] response.begin()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/
2010-04-15 23:56:46+0100 [-] version, status, reason = self._read_status()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/
2010-04-15 23:56:46+0100 [-] line = self.fp.readline()
2010-04-15 23:56:46+0100 [-] File "/usr/lib/
2010-04-15 23:56:46+0100 [-] data = recv(1)
2010-04-15 23:56:46+0100 [-] timeout: timed out
It seems as though there are two competing ways of timing stuff out. buildmaster/ manager. py (QueryWithTimeo utProtocol) buildmaster/ model/builder. py (TimeoutTransport)
1. the code in lib/lp/
2. lib/lp/
Different actions seem to cause timeouts in each of these. This is crap.
It also seems as though the updateBuilderSt atus() should catch the above timeout exception. When it doesn't it will produce the traceback as above and leave the builder disabled but with the build still on it.