geard, zuul, and jenkins do not handle function registration cleanly

Bug #1270319 reported by Clark Boylan on 2014-01-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Core Infrastructure
Fix Released
High
James E. Blair

Bug Description

We moved zuul today from old 8GB host to a new 30GB host so that the zuul scratch git space could be hosted on a tmpfs. When we did this the jenkins masters gearman plugin was not registering all of their functions with gearman. It looked like something caused geard to crash resulting in many stacktrace (included below) in the gearman-server.log. Best guess is whatever caused those stacktraces broke job registration for the remaining jobs.

We worked around this by stopping all jenkins. Then bringing them online one at a time so that they could each register in their own time slices. This seems to have worked around the problem well enough.

From gearman-server.log:
2014-01-17 18:32:56,058 ERROR gear.BaseClientServer: Exception in poll loop:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 697, in _doPollLoop
    self._pollLoop()
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 734, in _pollLoop
    self.handleAdminRequest(p)
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 2270, in handleAdminRequest
    self.handleStatus(request)
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 2304, in handleStatus
    functions = self._getFunctionStats()
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 2295, in _getFunctionStats
    functions[job.name][0] += 1
KeyError: 'stop:jenkins03.openstack.org'
2014-01-17 18:34:20,450 ERROR gear.BaseClientServer: Exception in poll loop:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 697, in _doPollLoop
    self._pollLoop()
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 734, in _pollLoop
    self.handleAdminRequest(p)
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 2270, in handleAdminRequest
    self.handleStatus(request)
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 2304, in handleStatus
    functions = self._getFunctionStats()
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 2295, in _getFunctionStats
    functions[job.name][0] += 1
KeyError: 'stop:jenkins03.openstack.org'

From zuul debug.log:
2014-01-17 18:34:26,057 ERROR zuul.Gearman: Exception while checking functions
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/zuul/launcher/gearman.py", line 204, in isJobRegistered
    connection.sendAdminRequest(req)
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 313, in sendAdminRequest
    raise TimeoutError()
TimeoutError
2014-01-17 18:34:26,058 DEBUG zuul.Gearman: Function set_description:jenkins04.openstack.org is not registered
2014-01-17 18:35:56,058 ERROR zuul.Gearman: Exception while checking functions
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/zuul/launcher/gearman.py", line 204, in isJobRegistered
    connection.sendAdminRequest(req)
  File "/usr/local/lib/python2.7/dist-packages/gear/__init__.py", line 313, in sendAdminRequest
    raise TimeoutError()
TimeoutError
2014-01-17 18:35:56,058 DEBUG zuul.Gearman: Function set_description:jenkins03.openstack.org is not registered

Clark Boylan (cboylan) on 2014-02-04
Changed in openstack-ci:
assignee: nobody → James E. Blair (corvus)
Clark Boylan (cboylan) wrote :

This was fixed in Id6d4569bed6cd51dc4f1698184c54a0cb343fb0d opensatck-infra/gear change which handles an exception in the status command. This change is included in the latest 0.5.3 gear release.

Changed in openstack-ci:
status: Triaged → Fix Released
milestone: none → icehouse
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers