Glance-api prematurely terminates client connections

Bug #1592140 reported by Eugene Nikanorov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
Medium
MOS Glance
5.1.x
Invalid
Medium
MOS Glance

Bug Description

Sometimes when glance-api workers consume lots of memory, client connections could be terminated prematurely.
When glance-api is accessed via haproxy, it results in 502 errors given to the client.

The only noticeable condition is that glance-api processes worked for half a year, and memory consumption has reached 10 Gigs per worker.

That might be that eventlet is receiving some sort of system error trying to accept incoming connection.
Failed requests never reach glance code itself, therefore they are never logged.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

The only version that was discovered is 5.1

Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Kairat Kushaev (kkushaev) wrote :

there is a workaround for that bug, it is also seems very hard to re-produce.
because of that I will mark that as medium, hope we will get some time to dig into this soon.

Changed in mos:
importance: Undecided → Medium
assignee: nobody → MOS Glance (mos-glance)
status: New → Confirmed
Revision history for this message
Dina Belova (dbelova) wrote :

Targeting to 5.1.1 updates.. this a medium bug, so it is not going to be fixed, but Glance team would like to check it anyway.

Changed in mos:
milestone: none → 5.0-updates
milestone: 5.0-updates → 5.1.1-updates
Revision history for this message
Mike Fedosin (mfedosin) wrote :

Hello! This bug was fixed during liberty cycle and backported to kilo - the idea is to introduce client socket timeout, because without it connections can exist forever: https://review.openstack.org/#/c/119132/

description: updated
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

I think the mentioned fix is unrelated to the problem I'm describing.
Faulty glance worker did not have any connections hanging, in neither state.
What is known for sure is that glance workers consumed ~10Gig of memory, which led to memory exhaustion and could let to system error at eventlet level.

Such errors, as well as incoming requests themselves, were never logged in glance logs, so it is only a theory.

So I'd say the first thing that we need to reproduce or find corresponding bug about is memory leak.
10Gig is too much for a stateless service.

Revision history for this message
Roman Rufanov (rrufanov) wrote :

could it affect versions later them 5.1 ? If yes - please nominate.

tags: added: customer-found support
Revision history for this message
Kairat Kushaev (kkushaev) wrote :

Invalid because the root cause was not Glance.
According to explanation from Eugene Nikanorov:

it was all qemu...
8:11 too many OSDs in their cloud. When large files are read/written, qemu opens connections to pretty mch every os, exhausting file descriptors
8:11 because the default limit is low

Changed in mos:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.