When the ceph pool backing glance is full (goes into read only), Glance IO calls never respond, and the worker taking care of the API call is basically a zombie.
If enough IO requests are made, for example 4 when you have 4 workers, glance will not be able to respond to any kind of requests. You need to restart glance to have responses again.
ceph status:
cluster:
id: ce9a32e4-9768-457a-b811-225b710aeb58
health: HEALTH_ERR
3 full osd(s)
3 pool(s) full
1 pool(s) have no replicas configured
services:
mon: 1 daemons, quorum bm0.lxd (age 2h)
mgr: bm0.lxd(active, since 2h)
osd: 3 osds: 3 up (since 2h), 3 in (since 2h)
Here's a response from the apache 2 http proxying for glance:
openstack image delete 0a582014-832a-4f2a-9944-4111812fe6b2
Failed to delete image with name or ID '0a582014-832a-4f2a-9944-4111812fe6b2': HttpException: 502: Server Error for url: http://10.206.54.243:80/openstack-glance/v2/images/0a582014-832a-4f2a-9944-4111812fe6b2, The proxy server could not handle the requestReason: Error reading from remote server: 502 Proxy Error: Proxy Error: Apache/2.4.52 (Ubuntu) Server at 10.206.54.243 Port 9292: The proxy server received an invalid: response from an upstream server.
Failed to delete 1 of 1 images.
The last log for these requests at debug level is:
When the ceph pool backing glance is full (goes into read only), Glance IO calls never respond, and the worker taking care of the API call is basically a zombie.
If enough IO requests are made, for example 4 when you have 4 workers, glance will not be able to respond to any kind of requests. You need to restart glance to have responses again.
ceph status: 9768-457a- b811-225b710aeb 58
cluster:
id: ce9a32e4-
health: HEALTH_ERR
3 full osd(s)
3 pool(s) full
1 pool(s) have no replicas configured
services:
mon: 1 daemons, quorum bm0.lxd (age 2h)
mgr: bm0.lxd(active, since 2h)
osd: 3 osds: 3 up (since 2h), 3 in (since 2h)
data:
pools: 3 pools, 161 pgs
objects: 6.92k objects, 47 GiB
usage: 143 GiB used, 6.8 GiB / 150 GiB avail
pgs: 161 active+clean
ceph osd dump | grep ratio:
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
Here's a response from the apache 2 http proxying for glance:
openstack image delete 0a582014- 832a-4f2a- 9944-4111812fe6 b2 832a-4f2a- 9944-4111812fe6 b2': HttpException: 502: Server Error for url: http:// 10.206. 54.243: 80/openstack- glance/ v2/images/ 0a582014- 832a-4f2a- 9944-4111812fe6 b2, The proxy server could not handle the requestReason: Error reading from remote server: 502 Proxy Error: Proxy Error: Apache/2.4.52 (Ubuntu) Server at 10.206.54.243 Port 9292: The proxy server received an invalid: response from an upstream server.
Failed to delete image with name or ID '0a582014-
Failed to delete 1 of 1 images.
The last log for these requests at debug level is:
DEBUG glance_ store.location [None req-4cdf1de9- fbe2-49a8- 92d4-db0902773a f2 e7cc50bfcb12464 79c5b9397048377 fe d0c1adff192b40e 9989460336bab7c 8c - - e152fb5db324433 ba53d8ead347c68 02 e15 3d8ead347c6802] Registering scheme rbd with {'ceph': {'store': <glance_ store._ drivers. rbd.Store object at 0x7ff3c5d1b820>, 'location_class': <class 'glance_ store._ drivers. rbd.StoreLocati on'>, 'store_entry': 'rbd'}} register_scheme_bac python3/ dist-packages/ glance_ store/location. py:132
2fb5db324433ba5
kend_map /usr/lib/
To fix this, I adjusted the full_ratio to allow writing again, and deleted images. But glance should have a mechanism to detect this / a timeout.