http-request timeout can cause services to become out of sync
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla-ansible |
Fix Released
|
Medium
|
Doug Szumski |
Bug Description
In services which use the Apache HTTP server to service HTTP requests,
there exists a TimeOut directive [1] which defaults to 60 seconds. A
similar timeout also exists in HAProxy, and is set to 60 seconds. APIs
which come under heavy load, such as Cinder, can sometimes exceed the
shortest of these periods which results in a HTTP 504 Gateway timeout,
or similar. However, the request can still be serviced without error.
For example, if Nova calls the Cinder API to detach a volume, and
this operation takes longer than the shortest of the two timeouts, Nova
will emit a stack trace with a 504 Gateway timeout. At some time later,
the request to detach the volume will succeed. The Nova and Cinder DBs
then become out-of-sync with each other, and in the worst case DB
surgery is required.
Although strictly this category of bugs should be fixed in OpenStack
services it is not realistic to expect this happen in the short term.
Therefore it makes sense to try and reduce the likelihood of
triggering such bugs in Kolla Ansible.
An example of a related bug is here:
description: | updated |
Changed in kolla-ansible: | |
status: | In Progress → Fix Released |
I'd say half-fixed by https:/ /review. opendev. org/c/openstack /kolla- ansible/ +/778507