The gnocchi wsgi app experiences timeout errors when using influxdb
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Gnocchi |
Invalid
|
Medium
|
Ilya Tyaptin |
Bug Description
In a devstack, 32GB, 8 core machine, 4 gnocchi mod_wsgi processes, gnocchi using influxdb as the backend, ceilometer dispatching to gnocchi with two collectors, poll period of 10 seconds, 10 nova instances, this error is showing up regularly in the log:
Timeout when reading response headers from daemon process 'gnocchi': /var/www/
Then after some time (as a result of the blocking):
(11)Resource temporarily unavailable: [client 192.168.2.3:44679] mod_wsgi (pid=5715): Unable to connect to WSGI daemon process 'gnocchi' on '/var/run/
After killing the collector the influxdb log showed that the gnocchi processes were still feeding data to influxdb >5 minutes after collector shutdown.
Some kind of tuning is required here to avoid this blocking.
Or influxdb is slow.
Or we need to use metricd with influxdb to put a stronger asynchrony gap in place.
The last POST to gnocchi was a full ten minutes before the last POST from gnocchi to influxdb.
Changed in gnocchi: | |
assignee: | nobody → Ilya Tyaptin (ityaptin) |
Changed in gnocchi: | |
status: | New → Triaged |
importance: | Undecided → Medium |
More investigation is required to determine if the problem is that influxdb isn't ingesting data fast enough, or that gnocchi is getting hung up somehow trying to write and blocking.
One possibility is that requests (used by the influxdb client) is pooling connections but the pool is much too small for the rate we are throwing.
I have to be away for the afternoon, so I can't look now, if someone else can, cool.