Ceilometer times out sending data to Gnocchi

Bug #1805340 reported by Christian Zunker
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceilometer
New
Undecided
Unassigned

Bug Description

After upgrading form Pike to Queens, we see a lot of these error messages in ceilometer-agent-notification.log:
2018-11-27 06:33:42.975 146962 ERROR ceilometer.publisher.gnocchi [-] Request to http://internal.api.url:8041/v1/batch/resources/metrics/measures timed out: ConnectionTimeout: Request to http://internal.api.url:8041/v1/batch/resources/metrics/measures timed out

Ceilometer version is 10.0.2.dev2
Gnocchi client version is 7.0.1
Gnocchi version is 4.2.0

After adding some debug logging we could verify it only happens when ceilomemter tries to send 0 measures to Gnocchi. Every request sending at least 1 measure workes.

We could get around the Problem by setting batch_timeout in ceilometer.conf to None.

We were also able to reproduce the behavior with the openstack cli:
echo "{}" | openstack --timing --debug metric measures batch-resources-metrics --create-metrics -
...
REQ: curl -g -i -X POST "http://internal.api.url:8041/v1/batch/resources/metrics/measures?create_metrics=True" -H "User-Agent: osc-lib/1.9.0 keystoneauth1/3.4.0 python-requests/2.18.4 CPython/2.7.12" -H "Content-Type: application/json" -H "Accept: application/json, */*" -H "X-Auth-Token: {SHA1}***" -d '{}'
Starting new HTTP connection (1): internal.api.url
http://internal.api.url:8041 "POST /v1/batch/resources/metrics/measures?create_metrics=True HTTP/1.1" 202 0
RESP: [202] Date: Tue, 27 Nov 2018 05:33:15 GMT Server: Apache Content-Length: 0 Content-Type: application/json
clean_up CliResourcesMetricsMeasuresBatch:

run(Namespace(columns=[], fit_width=False, format_config=None, formatter='table', max_width=0, noindent=False, print_empty=False, quote_mode='nonnumeric', sort_columns=[]))
+---------------------------------------------------------------------------------------------------------+-----------+
| URL | Seconds |
+---------------------------------------------------------------------------------------------------------+-----------+
| GET https://internal.api.url:5000/v3 | 0.026579 |
| POST https://internal.api.url:5000/v3/auth/tokens | 0.410096 |
| POST https://internal.api.url:5000/v3/auth/tokens | 0.396178 |
| POST http://internal.api.url:8041/v1/batch/resources/metrics/measures?create_metrics=True | 31.054647 |
| Total | 31.8875 |
+---------------------------------------------------------------------------------------------------------+-----------+
END return value: 0

The requests takes longer as the default request_timeout in ceilometer of 6.05s.

The request return much faster sending some data:
REQ: curl -g -i -X POST "http://internal.api.url:8041/v1/batch/resources/metrics/measures?create_metrics=True" -H "User-Agent: osc-lib/1.9.0 keystoneauth1/3.4.0 python-requests/2.18.4 CPython/2.7.12" -H "Content-Type: application/json" -H "Accept: application/json, */*" -H "X-Auth-Token: {SHA1}904608e0fe75460452a748c3ecd9d8d113ccad52" -d '{"8dbaa111-c9ae-4dad-9f2d-388418462207":{"memory.usage":[{"timestamp":"2018-11-27T06:23:25.713279","value":3731}]},"791c6968-95a7-4cf9-b3f0-0664f4f9ed79":{"volume.size":[{"timestamp":"2018-11-27T06:23:25.654068","value":9}]},"eddaaf67-b5a0-4851-8024-0319ddac72bf":{"memory.usage":[{"timestamp":"2018-11-27T06:23:25.713279","value":3254}]},"4ad7eb50-4a34-4e65-9645-9ea51064bafb":{"memory.usage":[{"timestamp":"2018-11-27T06:23:25.713279","value":945}]},"670ab511-4351-4171-920f-c27184048368":{"memory.resident":[{"timestamp":"2018-11-27T06:23:25.778222","value":2259}]},"ef00c155-8f03-4e68-9137-7eaf731bd216":{"volume.size":[{"timestamp":"2018-11-27T06:23:25.654068","value":20}]},"940305f6-4637-4553-b8fc-12f773154333":{"memory.usage":[{"timestamp":"2018-11-27T06:23:25.616891","value":11067}]},"2b502727-eb0f-4489-ab3b-de9a69861c1b":{"image.size":[{"timestamp":"2018-11-27T06:23:25.743257","value":2361393152}]}}'
Starting new HTTP connection (1): internal.api.url
http://internal.api.url:8041 "POST /v1/batch/resources/metrics/measures?create_metrics=True HTTP/1.1" 202 0
RESP: [202] Date: Tue, 27 Nov 2018 06:26:25 GMT Server: Apache Content-Length: 0 Content-Type: application/json
clean_up CliResourcesMetricsMeasuresBatch:

run(Namespace(columns=[], fit_width=False, format_config=None, formatter='table', max_width=0, noindent=False, print_empty=False, quote_mode='nonnumeric', sort_columns=[]))
+---------------------------------------------------------------------------------------------------------+----------+
| URL | Seconds |
+---------------------------------------------------------------------------------------------------------+----------+
| GET https://internal.api.url:5000/v3 | 0.019071 |
| POST https://internal.api.url:5000/v3/auth/tokens | 0.410266 |
| POST https://internal.api.url:5000/v3/auth/tokens | 0.451102 |
| POST http://internal.api.url:8041/v1/batch/resources/metrics/measures?create_metrics=True | 1.131044 |
| Total | 2.011483 |
+---------------------------------------------------------------------------------------------------------+----------+
END return value: 0

Revision history for this message
Christian Zunker (christian-zunker) wrote :

An update to the above description:
I was quite sure batch_timeout = None worked in my test, but I couldn't reproduce it.
Finally I had to set request_timeout = 45.0, but this option is marked as deprecated.

I also tried setting batch_timeout = 1800, but it does not seem to work. ceilometer.publisher.gnocchi is called more often than the configured timeout.

Revision history for this message
Pierre Riteau (priteau) wrote :

@christian-zunker:

[dispatcher_gnocchi]/request_timeout was deprecated because the dispatcher itself was. It has been removed in Rocky: https://docs.openstack.org/releasenotes/ceilometer/rocky.html#upgrade-notes

The Queens release notes (https://docs.openstack.org/releasenotes/ceilometer/queens.html#upgrade-notes) state:

The Gnocchi dispatcher has been removed and replaced by a native Gnocchi publisher. The configuration options from the [dispatcher_gnocchi] has been removed and should be passed via the URL in pipeline.yaml. The service authentication override can be done by adding specific credentials to a [gnocchi] section instead.

Indeed, it looks like you can pass a timeout parameter to the gnocchi publisher URL: https://opendev.org/openstack/ceilometer/src/branch/stable/queens/ceilometer/publisher/gnocchi.py#L167

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.