Event streamer eventually gets in error state consuming 100% of a cpu
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ciwatch |
Fix Released
|
High
|
Mikhail S Medvedev |
Bug Description
After running for some time (between 2 and 10 hours), ci-watch-
(events.py) gets into a state where it continuously outputs in a quick
succession:
...
2015-11-16 14:04:17,406 [ERROR] Failed json.loads on event:
2015-11-16 14:04:17,406 [ERROR] No JSON object could be decoded
Traceback (most recent call last):
File "/opt/ciwatch/
event = json.loads(event)
File "/usr/lib/
return _default_
File "/usr/lib/
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/
raise ValueError("No JSON object could be decoded")
...
That means that one of the GerritEventStre
And because there is no sleep before fetching next event, this results
in a very busy loop. Why does next() return an empty event is unclear, but
could be because paramiko ssh connection did get disconnected.
We might want to borrow, or at least look at, gerrit listener implementation
from zuul, which is relatively stable:
https:/
Changed in ciwatch: | |
importance: | Undecided → High |
Was able to easily reproduce by forcefully killing gerrit ssh connection with tcpkill.
Steps:
Start listening to stream events:
root@vagrant:~# ci-watch- stream- events openstack. org:29418 as msmedved using /opt/ciwatch/ vagrant_ gerrit_ rsa
2015-11-17 00:33:21,603 [DEBUG] Connecting to review.
In a separate terminal: 15:43932- >review. openstack. org:29418 (ESTABLISHED)
root@vagrant:~# lsof -i |grep review
ci-watch- 14982 root 3u IPv4 158372 0t0 TCP 10.0.2.
Note the port, and do
root@vagrant:~# tcpkill -i eth0 port 43932
After the connection is killed, the ci-watch- stream- events would start dumping lots of error log messages, and unable to recover.