Activity log for bug #2059768

Date Who What changed Old value New value Message
2024-03-29 10:25:42 Guillaume Boutry bug added bug
2024-03-29 10:31:13 Guillaume Boutry description When the ceph pool backing glance is full (goes into read only), Glance IO calls never respond, and the worker taking care of the API call is basically a zombie. If enough IO requests are made, for example 4 when you have 4 workers, glance will not be able to respond to any kind of requests. You need to restart glance to have responses again. ceph status: cluster: id: ce9a32e4-9768-457a-b811-225b710aeb58 health: HEALTH_ERR 3 full osd(s) 3 pool(s) full 1 pool(s) have no replicas configured services: mon: 1 daemons, quorum bm0.lxd (age 2h) mgr: bm0.lxd(active, since 2h) osd: 3 osds: 3 up (since 2h), 3 in (since 2h) data: pools: 3 pools, 161 pgs objects: 6.92k objects, 47 GiB usage: 143 GiB used, 6.8 GiB / 150 GiB avail pgs: 161 active+clean ceph osd dump | grep ratio: full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 Here's a response from the apache 2 http proxying for glance: openstack image delete 0a582014-832a-4f2a-9944-4111812fe6b2 Failed to delete image with name or ID '0a582014-832a-4f2a-9944-4111812fe6b2': HttpException: 502: Server Error for url: http://10.206.54.243:80/openstack-glance/v2/images/0a582014-832a-4f2a-9944-4111812fe6b2, The proxy server could not handle the requestReason: Error reading from remote server: 502 Proxy Error: Proxy Error: Apache/2.4.52 (Ubuntu) Server at 10.206.54.243 Port 9292: The proxy server received an invalid: response from an upstream server. Failed to delete 1 of 1 images. The last log for these requests at debug level is: DEBUG glance_store.location [None req-4cdf1de9-fbe2-49a8-92d4-db0902773af2 e7cc50bfcb1246479c5b9397048377fe d0c1adff192b40e9989460336bab7c8c - - e152fb5db324433ba53d8ead347c6802 e15 2fb5db324433ba53d8ead347c6802] Registering scheme rbd with {'ceph': {'store': <glance_store._drivers.rbd.Store object at 0x7ff3c5d1b820>, 'location_class': <class 'glance_store._drivers.rbd.StoreLocation'>, 'store_entry': 'rbd'}} register_scheme_bac kend_map /usr/lib/python3/dist-packages/glance_store/location.py:132 To fix this, I adjusted the full_ratio to allow writing again, and deleted images. But glance should have a mechanism to detect this / a timeout. When the ceph pool backing glance is full (goes into read only), Glance IO calls never respond, and the worker taking care of the API call is basically a zombie. If enough IO requests are made, for example 4 when you have 4 workers, glance will not be able to respond to any kind of requests. You need to restart glance to have responses again. ceph status:   cluster:     id: ce9a32e4-9768-457a-b811-225b710aeb58     health: HEALTH_ERR             3 full osd(s)             3 pool(s) full             1 pool(s) have no replicas configured   services:     mon: 1 daemons, quorum bm0.lxd (age 2h)     mgr: bm0.lxd(active, since 2h)     osd: 3 osds: 3 up (since 2h), 3 in (since 2h)   data:     pools: 3 pools, 161 pgs     objects: 6.92k objects, 47 GiB     usage: 143 GiB used, 6.8 GiB / 150 GiB avail     pgs: 161 active+clean ceph osd dump | grep ratio: full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 Here's a response from the apache 2 http proxying for glance: openstack image delete 0a582014-832a-4f2a-9944-4111812fe6b2 Failed to delete image with name or ID '0a582014-832a-4f2a-9944-4111812fe6b2': HttpException: 502: Server Error for url: http://10.206.54.243:80/openstack-glance/v2/images/0a582014-832a-4f2a-9944-4111812fe6b2, The proxy server could not handle the requestReason: Error reading from remote server: 502 Proxy Error: Proxy Error: Apache/2.4.52 (Ubuntu) Server at 10.206.54.243 Port 9292: The proxy server received an invalid: response from an upstream server. Failed to delete 1 of 1 images. The last log for these requests at debug level is: DEBUG glance_store.location [None req-4cdf1de9-fbe2-49a8-92d4-db0902773af2 e7cc50bfcb1246479c5b9397048377fe d0c1adff192b40e9989460336bab7c8c - - e152fb5db324433ba53d8ead347c6802 e15 2fb5db324433ba53d8ead347c6802] Registering scheme rbd with {'ceph': {'store': <glance_store._drivers.rbd.Store object at 0x7ff3c5d1b820>, 'location_class': <class 'glance_store._drivers.rbd.StoreLocation'>, 'store_entry': 'rbd'}} register_scheme_bac kend_map /usr/lib/python3/dist-packages/glance_store/location.py:132 To fix this, I adjusted the full_ratio to allow writing again, and deleted images. But glance should have a mechanism to detect this / a timeout. Versions: glance 27.0.0 ceph 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable)
2024-03-29 16:34:22 Guillaume Boutry summary glance hangs when rbd pool in read only glance hangs when rbd pool in read only / full
2024-04-04 05:51:29 Abhishek Kekane glance: importance Undecided Medium