Gnocchi is bombing logs with error

Bug #1640225 reported by Sagi (Sergey) Shnaidman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gnocchi
Fix Released
Medium
Unassigned
tripleo
Fix Released
High
Unassigned

Bug Description

From /var/log/gnocchi/metricd.log

2016-11-08 12:24:43.923 31477 ERROR gnocchi.cli [-] Unexpected error during measures processing
2016-11-08 12:24:43.923 31477 ERROR gnocchi.cli Traceback (most recent call last):
2016-11-08 12:24:43.923 31477 ERROR gnocchi.cli File "/usr/lib/python2.7/site-packages/gnocchi/cli.py", line 275, in _run_job
2016-11-08 12:24:43.923 31477 ERROR gnocchi.cli metrics = self.queue.get(block=True, timeout=10)
2016-11-08 12:24:43.923 31477 ERROR gnocchi.cli File "<string>", line 2, in get
2016-11-08 12:24:43.923 31477 ERROR gnocchi.cli File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
2016-11-08 12:24:43.923 31477 ERROR gnocchi.cli conn.send((self._id, methodname, args, kwds))
2016-11-08 12:24:43.923 31477 ERROR gnocchi.cli IOError: [Errno 32] Broken pipe
2016-11-08 12:24:43.923 31477 ERROR gnocchi.cli
2016-11-08 12:24:43.924 31477 ERROR gnocchi.cli [-] Unexpected error during measures processing
2016-11-08 12:24:43.924 31477 ERROR gnocchi.cli Traceback (most recent call last):
2016-11-08 12:24:43.924 31477 ERROR gnocchi.cli File "/usr/lib/python2.7/site-packages/gnocchi/cli.py", line 275, in _run_job
2016-11-08 12:24:43.924 31477 ERROR gnocchi.cli metrics = self.queue.get(block=True, timeout=10)
2016-11-08 12:24:43.924 31477 ERROR gnocchi.cli File "<string>", line 2, in get
2016-11-08 12:24:43.924 31477 ERROR gnocchi.cli File "/usr/lib64/python2.7/multiprocessing/managers.py", line 758, in _callmethod
2016-11-08 12:24:43.924 31477 ERROR gnocchi.cli conn.send((self._id, methodname, args, kwds))
2016-11-08 12:24:43.924 31477 ERROR gnocchi.cli IOError: [Errno 32] Broken pipe

Logs are filled too fast and take all space on disk.

no longer affects: tripleo
Revision history for this message
Julien Danjou (jdanjou) wrote :

Is there anything else failing earlier? This seems to be a problem for the scheduler to fill the queue for the processors, so I'd expect them to have failed or something.

Changed in gnocchi:
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

@Julien, there was problem with redis, which might not have accepted connections and find its master. Does it help?

Revision history for this message
Julien Danjou (jdanjou) wrote :

This should be fixed by https://review.openstack.org/#/c/394387/ then.

Changed in gnocchi:
status: Incomplete → Triaged
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

The bug was resolved.

Changed in gnocchi:
status: Triaged → Fix Released
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

Seems like bug still appears, look for example TripleO CI job:

http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-nonha/56232ef/logs/overcloud-controller-0/var/log/gnocchi/metricd.txt.gz#_2017-02-20_08_05_10_039

It's about 70 connection in a second when redis is down, and every connection dumps to logs making them really huge.

Changed in gnocchi:
status: Fix Released → Confirmed
Changed in tripleo:
importance: Undecided → High
Revision history for this message
Julien Danjou (jdanjou) wrote :

There's no bug. There are 204 connection retry (there are several *processes* trying to connect with a backoff mechanism up to 60 second) in 45 minutes in your log, which means 4.5 connection try per minute.

I don't see how that's a flood of anything. Nothing is huge here.

Changed in gnocchi:
status: Confirmed → Fix Released
Changed in tripleo:
status: New → Triaged
Revision history for this message
Alex Schultz (alex-schultz) wrote :

Given that this has been fixed in gnocchi, i'm updating the status in tripleo. Please reopen it if it's still a problem

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.