glance saves images to wrong backing storage

Bug #1861005 reported by Marian Gasparovic
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Glance Charm
Triaged
Medium
Unassigned

Bug Description

glance charm 397
ceph-mon charm 405

After deploying openstack, rally could not create VM complaining image is missing attributes. Turns out glance stored images to local filesystem instead of ceph. And it stored it only on one of three units.

I destroyed and created the juju model three times with different results, using the same bundle, so it seems like a timing issue

+++++++++++++++
attempt 1
+++++++++++++++

$ openstack image list
+--------------------------------------+---------------------------------------------------------------+--------+
| ID | Name | Status |
+--------------------------------------+---------------------------------------------------------------+--------+
| 772ffbcb-837e-4050-916e-5add0898b108 | auto-sync/ubuntu-bionic-18.04-amd64-server-20200107-disk1.img | active |
| 4244197f-f874-4e18-a238-976a74ada6d2 | auto-sync/ubuntu-trusty-14.04-amd64-server-20191107-disk1.img | active |
| 1ac58138-0505-4856-8d42-ea270f75e404 | auto-sync/ubuntu-xenial-16.04-amd64-server-20200108-disk1.img | active |
+--------------------------------------+---------------------------------------------------------------+--------+

$ juju run --application glance -- ls -l /var/lib/glance/images
- Stdout: |
    total 0
  UnitId: glance/0
- Stdout: |
    total 885768
    -rw-r----- 1 glance glance 297140224 Jan 25 19:15 1ac58138-0505-4856-8d42-ea270f75e404
    -rw-r----- 1 glance glance 264897024 Jan 25 19:15 4244197f-f874-4e18-a238-976a74ada6d2
    -rw-r----- 1 glance glance 344981504 Jan 25 19:16 772ffbcb-837e-4050-916e-5add0898b108
  UnitId: glance/1
- Stdout: |
    total 0
  UnitId: glance/2

glance/0
glance-api.log:2020-01-25 20:07:44.284 270548 INFO eventlet.wsgi.server [req-ef32b72d-c83b-41df-9539-e58ff4f5ef27 ed3e0dc21c514698b9a2c19cad4e22b9 f842dacbfc2c46c8a1fe696be6e49f6f - a18f88a49f094ab49a7c0c6005bb1bcd a18f88a49f094ab49a7c0c6005bb1bcd] 192.168.33.79 - - [25/Jan/2020 20:07:44] "GET /v2/images/772ffbcb-837e-4050-916e-5add0898b108 HTTP/1.1" 200 2065 0.208421
glance-api.log:2020-01-25 20:07:44.310 270548 WARNING glance.location [req-960ec0aa-4913-4a8b-903d-ffb3ac751a5f ed3e0dc21c514698b9a2c19cad4e22b9 f842dacbfc2c46c8a1fe696be6e49f6f - a18f88a49f094ab49a7c0c6005bb1bcd a18f88a49f094ab49a7c0c6005bb1bcd] Get image 772ffbcb-837e-4050-916e-5add0898b108 data failed: Image /var/lib/glance/images/772ffbcb-837e-4050-916e-5add0898b108 not found.: glance_store.exceptions.NotFound: Image /var/lib/glance/images/772ffbcb-837e-4050-916e-5add0898b108 not found
glance-api.log:2020-01-25 20:07:44.311 270548 ERROR glance.location [req-960ec0aa-4913-4a8b-903d-ffb3ac751a5f ed3e0dc21c514698b9a2c19cad4e22b9 f842dacbfc2c46c8a1fe696be6e49f6f - a18f88a49f094ab49a7c0c6005bb1bcd a18f88a49f094ab49a7c0c6005bb1bcd] Glance tried all active locations/stores to get data for image 772ffbcb-837e-4050-916e-5add0898b108 but all have failed.
glance-api.log:2020-01-25 20:07:44.312 270548 INFO eventlet.wsgi.server [req-960ec0aa-4913-4a8b-903d-ffb3ac751a5f ed3e0dc21c514698b9a2c19cad4e22b9 f842dacbfc2c46c8a1fe696be6e49f6f - a18f88a49f094ab49a7c0c6005bb1bcd a18f88a49f094ab49a7c0c6005bb1bcd] 192.168.33.79 - - [25/Jan/2020 20:07:44] "GET /v2/images/772ffbcb-837e-4050-916e-5add0898b108/file HTTP/1.1" 204 173 0.023415
glance-api.log:2020-01-25 20:07:44.364 270548 INFO eventlet.wsgi.server [req-61a6980a-01a7-4f4d-9b82-9a8ef0777c84 ed3e0dc21c514698b9a2c19cad4e22b9 f842dacbfc2c46c8a1fe696be6e49f6f - a18f88a49f094ab49a7c0c6005bb1bcd a18f88a49f094ab49a7c0c6005bb1bcd] 192.168.33.79 - - [25/Jan/2020 20:07:44] "GET /v2/images/772ffbcb-837e-4050-916e-5add0898b108 HTTP/1.1" 200 2065 0.025742
glance-api.log:2020-01-25 20:07:44.384 270548 WARNING glance.location [req-ed6c9ef3-7a90-410a-b4ff-95cb99cfcb76 ed3e0dc21c514698b9a2c19cad4e22b9 f842dacbfc2c46c8a1fe696be6e49f6f - a18f88a49f094ab49a7c0c6005bb1bcd a18f88a49f094ab49a7c0c6005bb1bcd] Get image 772ffbcb-837e-4050-916e-5add0898b108 data failed: Image /var/lib/glance/images/772ffbcb-837e-4050-916e-5add0898b108 not found.: glance_store.exceptions.NotFound: Image /var/lib/glance/images/772ffbcb-837e-4050-916e-5add0898b108 not found
glance-api.log:2020-01-25 20:07:44.385 270548 ERROR glance.location [req-ed6c9ef3-7a90-410a-b4ff-95cb99cfcb76 ed3e0dc21c514698b9a2c19cad4e22b9 f842dacbfc2c46c8a1fe696be6e49f6f - a18f88a49f094ab49a7c0c6005bb1bcd a18f88a49f094ab49a7c0c6005bb1bcd] Glance tried all active locations/stores to get data for image 772ffbcb-837e-4050-916e-5add0898b108 but all have failed.

glance pool in ceph existed but it was empty

+++++++++++++++
attempt 2
+++++++++++++++

$ openstack image list
+--------------------------------------+---------------------------------------------------------------+--------+
| ID | Name | Status |
+--------------------------------------+---------------------------------------------------------------+--------+
| 89a0727b-a40b-42ba-829e-19983f824b18 | auto-sync/ubuntu-bionic-18.04-amd64-server-20200107-disk1.img | active |
| 6e71f2f9-5380-4ae7-a68b-c5ad055be441 | auto-sync/ubuntu-trusty-14.04-amd64-server-20191107-disk1.img | active |
| 50c1fb83-7211-4e95-94cf-4211534ee78a | auto-sync/ubuntu-trusty-14.04-amd64-server-20191107-disk1.img | active |
| 1e491bf7-8788-445e-848f-a06c1b8f6738 | auto-sync/ubuntu-xenial-16.04-amd64-server-20200108-disk1.img | active |
+--------------------------------------+---------------------------------------------------------------+--------+

$ juju run --application glance -- ls -l /var/lib/glance/images
- Stdout: |
    total 0
  UnitId: glance/0
- Stdout: |
    total 290180
    -rw-r----- 1 glance glance 297140224 Jan 26 10:32 1e491bf7-8788-445e-848f-a06c1b8f6738
  UnitId: glance/1
- Stdout: |
    total 0
  UnitId: glance/2

root@juju-9e53c1-21-lxd-0:~# rbd -p glance ls --long
NAME SIZE PARENT FMT PROT LOCK
50c1fb83-7211-4e95-94cf-4211534ee78a 253 MiB 2
50c1fb83-7211-4e95-94cf-4211534ee78a@snap 253 MiB 2 yes
6e71f2f9-5380-4ae7-a68b-c5ad055be441 253 MiB 2
6e71f2f9-5380-4ae7-a68b-c5ad055be441@snap 253 MiB 2 yes
89a0727b-a40b-42ba-829e-19983f824b18 329 MiB 2
89a0727b-a40b-42ba-829e-19983f824b18@snap 329 MiB 2 yes

this worked as expected

+++++++++++++++
attempt 3
+++++++++++++++

root@juju-ce54cd-21-lxd-0:~# rbd -p glance ls --long
root@juju-ce54cd-21-lxd-0:~#

$ openstack image list
+--------------------------------------+---------------------------------------------------------------+--------+
| ID | Name | Status |
+--------------------------------------+---------------------------------------------------------------+--------+
| 8a8abab1-dd87-4d64-a4c2-404d82d85afd | auto-sync/ubuntu-bionic-18.04-amd64-server-20200107-disk1.img | active |
| 91337830-f658-40ed-8805-d7edd54ac7a3 | auto-sync/ubuntu-trusty-14.04-amd64-server-20191107-disk1.img | active |
| cf07bf2b-6764-4725-9211-aad06e70d94c | auto-sync/ubuntu-xenial-16.04-amd64-server-20200108-disk1.img | active |
+--------------------------------------+---------------------------------------------------------------+--------+

$ juju run --application glance -- ls -l /var/lib/glance/images
- Stdout: |
    total 0
  UnitId: glance/0
- Stdout: |
    total 0
  UnitId: glance/1
- Stdout: |
    total 885764
    -rw-r----- 1 glance glance 344981504 Jan 26 13:20 8a8abab1-dd87-4d64-a4c2-404d82d85afd
    -rw-r----- 1 glance glance 264897024 Jan 26 13:20 91337830-f658-40ed-8805-d7edd54ac7a3
    -rw-r----- 1 glance glance 297140224 Jan 26 13:20 cf07bf2b-6764-4725-9211-aad06e70d94c
  UnitId: glance/2

$ juju status --relations |grep ceph|grep glance
ceph-mon:client glance:ceph ceph-client regular

grep glance /var/log/juju/unit-ceph-mon-0.log
2020-01-26 11:58:04 INFO juju.worker.uniter.relation relations.go:553 joining relation "glance:ceph ceph-mon:client"
2020-01-26 11:58:04 INFO juju.worker.uniter.relation relations.go:589 joined relation "glance:ceph ceph-mon:client"
2020-01-26 13:15:13 DEBUG osd-relation-changed Error ENOENT: failed to find client.glance in keyring
2020-01-26 13:15:13 DEBUG juju-log osd:49: Creating new key for glance
2020-01-26 13:15:14 DEBUG juju-log osd:49: Calling check_output: ['sudo', '-u', 'ceph', 'ceph', '--name', 'mon.', '--keyring', '/var/lib/ceph/mon/ceph-juju-ce54cd-21-lxd-0/keyring', 'auth', 'get-or-create', 'client.glance', 'mon', 'allow r; allow command "osd blacklist"', 'osd', 'allow rwx']
2020-01-26 13:15:16 DEBUG osd-relation-changed exported keyring for client.glance
2020-01-26 13:15:18 DEBUG osd-relation-changed exported keyring for client.glance
2020-01-26 13:16:08 DEBUG client-relation-changed exported keyring for client.glance
2020-01-26 13:16:20 DEBUG client-relation-changed exported keyring for client.glance
2020-01-26 13:16:43 DEBUG client-relation-changed exported keyring for client.glance

$ grep ceph-mon /var/log/juju/unit-glance-0.log
2020-01-26 11:54:18 INFO juju.worker.uniter.relation relations.go:553 joining relation "glance:ceph ceph-mon:client"
2020-01-26 11:54:18 INFO juju.worker.uniter.relation relations.go:589 joined relation "glance:ceph ceph-mon:client"

glance-api.log
2020-01-26 13:20:01.018 239049 INFO eventlet.wsgi.server [req-c805e819-37a5-444b-930d-e64e50c606b5 905400bc0509434fa692e66eccfad7ef d3dde46344b04020b979442371bd3854 - 2941bbb93a12415981767401cdfaf739 2941bbb93a12415981767401cdfaf739] 10.244.8.14 - - [26/Jan/2020 13:20:01] "PUT /v2/images/cf07bf2b-6764-4725-9211-aad06e70d94c/file HTTP/1.1" 204 213 2.685471
2020-01-26 13:20:13.392 239049 INFO eventlet.wsgi.server [req-0a32f13c-3cde-4426-af09-d3d6b68bec7f 905400bc0509434fa692e66eccfad7ef d3dde46344b04020b979442371bd3854 - 2941bbb93a12415981767401cdfaf739 2941bbb93a12415981767401cdfaf739] 10.244.8.14 - - [26/Jan/2020 13:20:13] "PUT /v2/images/91337830-f658-40ed-8805-d7edd54ac7a3/file HTTP/1.1" 204 213 2.346065
2020-01-26 13:20:29.129 239049 INFO eventlet.wsgi.server [req-d2593364-3a54-426d-ad14-2928ca9efed5 905400bc0509434fa692e66eccfad7ef d3dde46344b04020b979442371bd3854 - 2941bbb93a12415981767401cdfaf739 2941bbb93a12415981767401cdfaf739] 10.244.8.14 - - [26/Jan/2020 13:20:29] "PUT /v2/images/8a8abab1-dd87-4d64-a4c2-404d82d85afd/file HTTP/1.1" 204 213 3.155675

Tags: cdo-qa
Revision history for this message
Marian Gasparovic (marosg) wrote :

images are published by glance-simplestream-sync 48

Revision history for this message
Marian Gasparovic (marosg) wrote :

Logs and configs from one of previous runs where we had this issue

https://oil-jenkins.canonical.com/artifacts/1fba182c-cc13-42b8-b117-3006a159d456/index.html

Revision history for this message
Marian Gasparovic (marosg) wrote :

Here is another similar failure, this shows output after rally was ran (and failed on glance tests)

$ openstack image list
+--------------------------------------+-----------------------------------------------------------------+--------+
| ID | Name | Status |
+--------------------------------------+-----------------------------------------------------------------+--------+
| e723a65a-5b33-4c8e-b139-911adc43e158 | auto-sync/ubuntu-bionic-18.04-amd64-server-20200129.1-disk1.img | active |
| 7604ca48-d618-441c-90f5-4a43909ec88a | auto-sync/ubuntu-trusty-14.04-amd64-server-20191107-disk1.img | active |
| 909aa400-993d-4f48-b8d6-e3d1bc83afbc | auto-sync/ubuntu-xenial-16.04-amd64-server-20200129-disk1.img | active |
| 42ead9a9-b10d-493d-8c27-95a18e78c258 | s_rally_e405e253_1x5q4ynN | saving |
| 7f1870ab-77c1-4929-8451-fd97c1d6fc63 | s_rally_e405e253_Cvi3NFxm | saving |
| 30ed6b51-ad13-4b4d-bcca-83d7bb30ff22 | s_rally_e405e253_XK8JqZ5w | saving |
| dc0022d1-60b2-420b-9001-213437cfabb3 | s_rally_e405e253_eykKEvtl | saving |
| 50fdcf31-bc3b-4456-94ee-9a6080e1e780 | s_rally_e405e253_hLaAQc9z | saving |
| 077d4ab3-c8b1-4a3c-b534-42a337f39bb8 | s_rally_e405e253_lpIUCcwx | saving |
| 5b7f2da0-e478-43ba-8c3a-9d06c4a8bc84 | s_rally_e405e253_qsuyxuzO | queued |
| c83af4ac-8ede-4656-a834-c60c6aa70cd8 | s_rally_e405e253_vMUNs9LB | queued |
+--------------------------------------+-----------------------------------------------------------------+--------+
ubuntu@production-cpe-87e28009-c50d-4b13-9da3-ceb4eb6b9c08:~$ juju run --application glance -- ls -l /var/lib/glance/images
- Stdout: |
    total 0
  UnitId: glance/0
- Stdout: |
    total 885448
    -rw-r----- 1 glance glance 264897024 Feb 3 15:15 7604ca48-d618-441c-90f5-4a43909ec88a
    -rw-r----- 1 glance glance 297009152 Feb 3 15:14 909aa400-993d-4f48-b8d6-e3d1bc83afbc
    -rw-r----- 1 glance glance 344784896 Feb 3 15:15 e723a65a-5b33-4c8e-b139-911adc43e158
  UnitId: glance/1
- Stdout: |
    total 0
  UnitId: glance/2

# rbd -p glance ls details --long
2020-02-03 17:46:41.828 7f94c6ffd700 -1 librbd::image::OpenRequest: failed to retrieve initial metadata: (2) No such file or directory
2020-02-03 17:46:41.828 7f94c67fc700 -1 librbd::io::AioCompletion: 0x558f6547c6e0 fail: (2) No such file or directory
NAME SIZE PARENT FMT PROT LOCK
077d4ab3-c8b1-4a3c-b534-42a337f39bb8 0 B 2
30ed6b51-ad13-4b4d-bcca-83d7bb30ff22 0 B 2
42ead9a9-b10d-493d-8c27-95a18e78c258 48 MiB 2 excl
50fdcf31-bc3b-4456-94ee-9a6080e1e780 40 MiB 2 excl
dc0022d1-60b2-420b-9001-213437cfabb3 40 MiB 2 excl

Revision history for this message
Marian Gasparovic (marosg) wrote :

glance did not fail 100% of time

+---------------------------------------------------------------------------------------------------------------------------------+
| Response Times (sec) |
+------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| Action | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
+------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| glance_v2.create_image | 11.879 | 96.417 | 99.476 | 100.133 | 100.79 | 85.533 | 20.0% | 10 |
| -> glance_v2.get_image | 0.041 | 0.045 | 0.052 | 0.054 | 0.055 | 0.046 | 100.0% | 8 |
| -> glance_v2.upload_data | 9.608 | 91.447 | 94.545 | 94.766 | 94.987 | 81.894 | 20.0% | 10 |
| -> glance_v2.get_image (x2) | 0.074 | 0.074 | 0.075 | 0.075 | 0.075 | 0.074 | 100.0% | 2 |
| glance_v2.delete_image | 0.982 | 0.999 | 1.012 | 1.013 | 1.015 | 0.999 | 100.0% | 2 |
| total | 12.894 | 96.417 | 99.476 | 100.133 | 100.79 | 85.732 | 20.0% | 10 |
| -> duration | 12.894 | 96.417 | 99.476 | 100.133 | 100.79 | 85.732 | 20.0% | 10 |
| -> idle_duration | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 20.0% | 10 |
+------------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+

glanceclient.exc.CommunicationError: Error finding address for http://10.244.8.85:9292/v2/images/dc0022d1-60b2-420b-9001-213437cfabb3/file: Unable to establish connection to http://10.244.8.85:9292/v2/images/dc0022d1-60b2-420b-9001
-213437cfabb3/file: [Errno 32] Broken pipe

Revision history for this message
Marian Gasparovic (marosg) wrote :

replicated in a lab

$ juju ssh glance/0 sudo ls -l /var/lib/glance/images
total 548936
-rw-r----- 1 glance glance 297205760 Feb 27 11:46 99e3c2c7-56b4-4abd-ae74-1ffc003aa9da
-rw-r----- 1 glance glance 264897024 Feb 27 11:46 bd55dc8c-8277-4289-9a6c-01ff8fc923c5
Connection to 192.168.210.94 closed.

$ juju ssh ceph-mon/0 sudo rbd -p glance ls details
8e0f14b5-280a-42c8-9384-185a430bcfe6
e2d2a206-8a0a-4011-9489-694333f67bc2
Connection to 10.0.3.71 closed.

$ openstack image list
+--------------------------------------+-----------------------------------------------------------------+--------+
| ID | Name | Status |
+--------------------------------------+-----------------------------------------------------------------+--------+
| e2d2a206-8a0a-4011-9489-694333f67bc2 | auto-sync/ubuntu-bionic-18.04-amd64-server-20200218-disk1.img | active |
| 8e0f14b5-280a-42c8-9384-185a430bcfe6 | auto-sync/ubuntu-trusty-14.04-amd64-server-20191107-disk1.img | active |
| bd55dc8c-8277-4289-9a6c-01ff8fc923c5 | auto-sync/ubuntu-trusty-14.04-amd64-server-20191107-disk1.img | active |
| 99e3c2c7-56b4-4abd-ae74-1ffc003aa9da | auto-sync/ubuntu-xenial-16.04-amd64-server-20200218.1-disk1.img | active |
+--------------------------------------+-----------------------------------------------------------------+--------+

icey was able to replicate this

https://pastebin.ubuntu.com/p/F8N6GM9vMy/

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

This absolutely seems to be a race condition where images are uploaded to Glance before the Ceph relation is complete. It might be worth ensuring that Glance refuses service once the ceph units are related, regardless of relation completeness, until the ceph relation is complete

Revision history for this message
Andrew McLeod (admcleod) wrote :

Could you test this and let us know the result: configure g-s-s-s with run: false (https://github.com/openstack/charm-glance-simplestreams-sync/blob/master/config.yaml#L12) and then set it to true after the deployment is stable

Changed in charm-glance:
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Yes, we're using the workaround of changing the config after things are stable.

It's a bit tricky; we have to wait for things to be stable via juju wait, then issue a config change, then wait for that hook to complete, then wait 60 seconds for the sync to start, then juju wait again. That's all extra code we have to carry in FCE to workaround this bug.

Changed in charm-glance:
status: Incomplete → New
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

In essence, this is a bug where charm-glance asserts that it's ready to serve User requests when it has a pending (or not yet actually started) relation to ceph. Given the presence of goal-state in Juju, it should be possible to update charm-glance to not enable the local filesystem at all when it is going to have a ceph relation.

Changed in charm-glance:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.