glance hangs when rbd pool in read only / full
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Glance |
New
|
Medium
|
Unassigned |
Bug Description
When the ceph pool backing glance is full (goes into read only), Glance IO calls never respond, and the worker taking care of the API call is basically a zombie.
If enough IO requests are made, for example 4 when you have 4 workers, glance will not be able to respond to any kind of requests. You need to restart glance to have responses again.
ceph status:
cluster:
id: ce9a32e4-
health: HEALTH_ERR
3 full osd(s)
3 pool(s) full
1 pool(s) have no replicas configured
services:
mon: 1 daemons, quorum bm0.lxd (age 2h)
mgr: bm0.lxd(active, since 2h)
osd: 3 osds: 3 up (since 2h), 3 in (since 2h)
data:
pools: 3 pools, 161 pgs
objects: 6.92k objects, 47 GiB
usage: 143 GiB used, 6.8 GiB / 150 GiB avail
pgs: 161 active+clean
ceph osd dump | grep ratio:
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
Here's a response from the apache 2 http proxying for glance:
openstack image delete 0a582014-
Failed to delete image with name or ID '0a582014-
Failed to delete 1 of 1 images.
The last log for these requests at debug level is:
DEBUG glance_
2fb5db324433ba5
kend_map /usr/lib/
To fix this, I adjusted the full_ratio to allow writing again, and deleted images. But glance should have a mechanism to detect this / a timeout.
Versions:
glance 27.0.0
ceph 18.2.0 (5dd24139a1eada
description: | updated |
summary: |
- glance hangs when rbd pool in read only + glance hangs when rbd pool in read only / full |
Changed in glance: | |
importance: | Undecided → Medium |