nova-live-migration fails 100% with "mysql: command not found" on subnode

Bug #1860021 reported by Eric Fried on 2020-01-16
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Critical
Unassigned
devstack
Undecided
Radosław Piliszek

Bug Description

Since [1] nova-live-migration failures can be seen in devstack-subnodes-early.txt.gz like

 + ./stack.sh:main:1158 : is_glance_enabled
 + lib/glance:is_glance_enabled:90 : [[ , =~ ,glance ]]
 + lib/glance:is_glance_enabled:91 : [[ ,c-bak,c-vol,dstat,g-api,n-cpu,peakmem_tracker,placement-client,q-agt =~ ,g- ]]
 + lib/glance:is_glance_enabled:91 : return 0
 + ./stack.sh:main:1159 : echo_summary 'Configuring Glance'
 + ./stack.sh:echo_summary:452 : [[ -t 3 ]]
 + ./stack.sh:echo_summary:458 : echo -e Configuring Glance
 + ./stack.sh:main:1160 : init_glance
 + lib/glance:init_glance:276 : rm -rf /opt/stack/data/glance/images
 + lib/glance:init_glance:277 : mkdir -p /opt/stack/data/glance/images
 + lib/glance:init_glance:280 : recreate_database glance
 + lib/database:recreate_database:110 : local db=glance
 + lib/database:recreate_database:111 : recreate_database_mysql glance
 + lib/databases/mysql:recreate_database_mysql:63 : local db=glance
 + lib/databases/mysql:recreate_database_mysql:64 : mysql -uroot -psecretmysql -h127.0.0.1 -e 'DROP DATABASE IF EXISTS glance;'
/opt/stack/new/devstack/lib/databases/mysql: line 64: mysql: command not found
 + lib/databases/mysql:recreate_database_mysql:1 : exit_trap

[1] https://review.opendev.org/#/c/702707/

Eric Fried (efried) on 2020-01-16
Changed in nova:
importance: Undecided → Critical
Radosław Piliszek (yoctozepto) wrote :

Analysis: Subnode should not init glance (or maybe not even run glance if it has no shared storage).

Fix proposed to branch: master
Review: https://review.opendev.org/702960

Changed in devstack:
assignee: nobody → Radosław Piliszek (yoctozepto)
status: New → In Progress

Fix proposed to branch: master
Review: https://review.opendev.org/703131

Changed in devstack:
assignee: Radosław Piliszek (yoctozepto) → Stephen Finucane (stephenfinucane)

Fix proposed to branch: master
Review: https://review.opendev.org/703137

Reviewed: https://review.opendev.org/703137
Committed: https://git.openstack.org/cgit/openstack/devstack/commit/?id=98f3bbe509c2de9efaf4f3fc1b5dbc42d7a67987
Submitter: Zuul
Branch: master

commit 98f3bbe509c2de9efaf4f3fc1b5dbc42d7a67987
Author: Stephen Finucane <email address hidden>
Date: Fri Jan 17 17:41:22 2020 +0000

    Revert "Stop enabling g-reg by default"

    This reverts commit d7dfcdb4674daae8a294848b1de6fa87c5d7d4eb. A
    subsquent change that depends on this,
    d8dec362baa2bf7f6ffe1c47352fdbe032eaf20a, has knock on effects for
    devstack-gate and needs to be reverted. Revert this first.

    Change-Id: Ic5402f57052648e10eacf3c3de67d2cdd2d42f63
    Signed-off-by: Stephen Finucane <email address hidden>
    Partial-bug: #1860021

Matt Riedemann (mriedem) on 2020-01-18
tags: added: gate-failure
Radosław Piliszek (yoctozepto) wrote :

original fix going in

Changed in devstack:
assignee: Stephen Finucane (stephenfinucane) → Radosław Piliszek (yoctozepto)

Reviewed: https://review.opendev.org/702960
Committed: https://git.openstack.org/cgit/openstack/devstack/commit/?id=ec3543a02883c3d9b288128e0a6cb941315e72cc
Submitter: Zuul
Branch: master

commit ec3543a02883c3d9b288128e0a6cb941315e72cc
Author: Radosław Piliszek <email address hidden>
Date: Thu Jan 16 19:58:37 2020 +0100

    Init Glance database only on the node with the database backend

    Since [1] Glance init depends on either g-api or g-reg being
    enabled.
    This broke multinode g-api deployments with singlenode database
    backend.
    This commit aligns Glance with other services w.r.t when to
    apply database init.

    [1] d8dec362baa2bf7f6ffe1c47352fdbe032eaf20a

    Change-Id: Idc07764d6ba3a828f19691f56c73cbe9179c2673
    Closes-bug: #1860021

Changed in devstack:
status: In Progress → Fix Released
Ghanshyam Mann (ghanshyammann) wrote :
Download full text (15.8 KiB)

seems like job still failing.

error- nova.exception.ImageUnacceptable: Image 9c9cd2a3-4615-4470-b90b-3125fa6d8a7f is unacceptable: Image is not raw format

Log:
Jan 19 00:33:42.995905 ubuntu-bionic-inap-mtl01-0013962963 nova-compute[29402]: DEBUG nova.virt.libvirt.storage.rbd_utils [None req-1329ef64-efb0-4aad-ad29-c258467f4a74 demo admin] rbd image 93477076-e164-49ad-b34a-d501ddaa58c7_disk does not exist {{(pid=29402) __init__ /opt/stack/new/nova/nova/virt/libvirt/storage/rbd_utils.py:78}}
Jan 19 00:33:43.018468 ubuntu-bionic-inap-mtl01-0013962963 nova-compute[29402]: DEBUG nova.virt.libvirt.storage.rbd_utils [None req-1329ef64-efb0-4aad-ad29-c258467f4a74 demo admin] rbd image 93477076-e164-49ad-b34a-d501ddaa58c7_disk does not exist {{(pid=29402) __init__ /opt/stack/new/nova/nova/virt/libvirt/storage/rbd_utils.py:78}}
Jan 19 00:33:43.022006 ubuntu-bionic-inap-mtl01-0013962963 nova-compute[29402]: DEBUG oslo_concurrency.lockutils [None req-1329ef64-efb0-4aad-ad29-c258467f4a74 demo admin] Lock "33ff3631ee628ec1b14823c5483cabc8a8011abe" acquired by "nova.virt.libvirt.imagebackend.Image.cache.<locals>.fetch_func_sync" :: waited 0.000s {{(pid=29402) inner /usr/local/lib/python3.6/dist-packages/oslo_concurrency/lockutils.py:358}}
Jan 19 00:33:43.069594 ubuntu-bionic-inap-mtl01-0013962963 nova-compute[29402]: DEBUG nova.virt.libvirt.imagebackend [None req-1329ef64-efb0-4aad-ad29-c258467f4a74 demo admin] Image locations are: [{'url': 'swift+config://ref1/glance/9c9cd2a3-4615-4470-b90b-3125fa6d8a7f', 'metadata': {}}] {{(pid=29402) clone /opt/stack/new/nova/nova/virt/libvirt/imagebackend.py:966}}
Jan 19 00:33:43.130274 ubuntu-bionic-inap-mtl01-0013962963 nova-compute[29402]: DEBUG oslo_concurrency.lockutils [None req-1329ef64-efb0-4aad-ad29-c258467f4a74 demo admin] Lock "33ff3631ee628ec1b14823c5483cabc8a8011abe" released by "nova.virt.libvirt.imagebackend.Image.cache.<locals>.fetch_func_sync" :: held 0.108s {{(pid=29402) inner /usr/local/lib/python3.6/dist-packages/oslo_concurrency/lockutils.py:370}}
Jan 19 00:33:43.140567 ubuntu-bionic-inap-mtl01-0013962963 nova-compute[29402]: ERROR nova.compute.manager [None req-1329ef64-efb0-4aad-ad29-c258467f4a74 demo admin] [instance: 93477076-e164-49ad-b34a-d501ddaa58c7] Instance failed to spawn: glanceclient.exc.HTTPBadGateway: HTTP 502 Proxy Error: Proxy Error: The proxy server received an invalid: response from an upstream server.: The proxy server could not handle the requestReason: Error reading from remote server: Apache/2.4.29 (Ubuntu) Server at 198.72.124.61 Port 80
Jan 19 00:33:43.140567 ubuntu-bionic-inap-mtl01-0013962963 nova-compute[29402]: ERROR nova.compute.manager [instance: 93477076-e164-49ad-b34a-d501ddaa58c7] Traceback (most recent call last):
Jan 19 00:33:43.140567 ubuntu-bionic-inap-mtl01-0013962963 nova-compute[29402]: ERROR nova.compute.manager [instance: 93477076-e164-49ad-b34a-d501ddaa58c7] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 3855, in clone_fallback_to_fetch
Jan 19 00:33:43.140567 ubuntu-bionic-inap-mtl01-0013962963 nova-compute[29402]: ERROR nova.compute.manager [instance: 93477076-e164-49ad-b34a-d501ddaa58c7] backend.clone(context, disk_images['image_id'...

Ghanshyam Mann (ghanshyammann) wrote :

Attaching subnode n-cpu log.

Ghanshyam Mann (ghanshyammann) wrote :

Adding this job in devstack gate till we migrate this to zuulv3 - https://review.opendev.org/#/c/703271/

Ghanshyam Mann (ghanshyammann) wrote :

glance image error is consistent as seen in https://review.opendev.org/#/c/697669/.

So devstack fix did not actually fixed the issue - https://review.opendev.org/#/c/702960/.

Reviewed: https://review.opendev.org/703131
Committed: https://git.openstack.org/cgit/openstack/devstack/commit/?id=48d1f028c43dd26aab852715e451e1ec08421a2f
Submitter: Zuul
Branch: master

commit 48d1f028c43dd26aab852715e451e1ec08421a2f
Author: Stephen Finucane <email address hidden>
Date: Fri Jan 17 17:23:11 2020 +0000

    Revert "Run Glance initialization when Glance is enabled, not just registry"

    This reverts commit d8dec362baa2bf7f6ffe1c47352fdbe032eaf20a. This has
    knock on effects for devstack-gate, which configures g-api on subnodes
    node but not mysql, resulting in failures. A longer term fix would be to
    either a) stop configuring g-api on subnodes if we can determine it's
    not necessary or b) only configure the database if on the main node.
    However, both options are subject to debate so for now just unclog the
    gate.

    Change-Id: I58baa3b6c63c648836ae8152c2d6d7ceff11a388
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-bug: #1860021

I believe current issue:
nova.exception.ImageUnacceptable: Image 9c9cd2a3-4615-4470-b90b-3125fa6d8a7f is unacceptable: Image is not raw format

is different from the one reported which *was* fixed. We already verified that. Now we are running in circles and there are many issues plaguing that job.

both primary and secondary did:
+ functions:upload_image:380 : openstack --os-cloud=devstack-admin --os-region-name=RegionOne image create cirros-0.4.0-x86_64-disk --property hw_rng_model=virtio --public --container-format=bare --disk-format qcow2

so the image is qcow2 and the "raw" looks like false alarm.

Glance errors on:
Jan 18 21:36:00.234880 ubuntu-bionic-rax-iad-0013962439 <email address hidden>[7736]: WARNING glance.location [None req-a2e09ece-737c-4174-a42f-c5185ac6142a demo admin] Get image eeba1e7b-55d7-4092-8a50-842ec5d47ccb data failed: Unknown scheme 'swift+config' found in URI.: glance_store.exceptions.UnknownScheme: Unknown scheme 'swift+config' found in URI
Jan 18 21:36:00.235138 ubuntu-bionic-rax-iad-0013962439 <email address hidden>[7736]: ERROR glance.location [None req-a2e09ece-737c-4174-a42f-c5185ac6142a demo admin] Glance tried all active locations to get data for image eeba1e7b-55d7-4092-8a50-842ec5d47ccb but all have failed.
Jan 18 21:36:00.240304 ubuntu-bionic-rax-iad-0013962439 <email address hidden>[7736]: CRITICAL glance [None req-a2e09ece-737c-4174-a42f-c5185ac6142a demo admin] Unhandled error: TypeError: 'ImageProxy' object is not callable

which looks like a bug in glance and/or its swift backend...

All in all, the altered behavior is in the double upload. I tried to find out why that would break glance+swift but have no idea so far...

Marking as invalid for nova since the change needed was in DevStack, not nova.

Changed in nova:
status: New → Invalid

Change abandoned by Eric Fried (<email address hidden>) on branch: master
Review: https://review.opendev.org/702961
Reason: It looks like we may still be working on this, but the dep is merged and the reverts-of-reverts are not yet proposed (I think?). Can add a new sniffer patch or resurrect this one as needed.

Reviewed: https://review.opendev.org/703271
Committed: https://git.openstack.org/cgit/openstack/devstack/commit/?id=2e45f2c267c9ababdbdfc4c505b329398391c5f9
Submitter: Zuul
Branch: master

commit 2e45f2c267c9ababdbdfc4c505b329398391c5f9
Author: Ghanshyam <email address hidden>
Date: Sat Jan 18 19:59:29 2020 -0600

    Adding nova-live-migration job in devstack gate

    nova-live-migration is legacy job and and rely on
    devstack-gate + devstack setting so any change in devstack can
    break it. Example bug: 1860021

    We can remove this job once it is migrated to zuulv3 native.

    Change-Id: Ie34d4dc1ab30ced8161796fe32628db07de86cc9
    Related-bug: #1860021

Reviewed: https://review.opendev.org/703288
Committed: https://git.openstack.org/cgit/openstack/devstack/commit/?id=09e860fc2c306774076c1814ba3ab7c44404066d
Submitter: Zuul
Branch: master

commit 09e860fc2c306774076c1814ba3ab7c44404066d
Author: Radosław Piliszek <email address hidden>
Date: Sun Jan 19 12:41:14 2020 +0100

    Run Glance initialization when Glance is enabled, not just registry (v2)

    Per [1] Glance registry should not be required to run since Queens.

    v2 improves on v1 [2] (now reverted [3]) by applying minor comments
    from reviews so far and ensuring nova-live-migration job does not see
    a change in behavior and hence does not break [4].
    [5] tried to fix the issue but it did only partially, regarding
    the database but not the image upload [6].
    This patch ensures double cirros image upload does not happen as well.

    [1] https://specs.openstack.org/openstack/glance-specs/specs/queens/approved/glance/deprecate-registry.html
    [2] https://review.opendev.org/702707
    [3] https://review.opendev.org/703131
    [4] https://bugs.launchpad.net/devstack/+bug/1860021
    [5] https://review.opendev.org/702960
    [6] https://bugs.launchpad.net/devstack/+bug/1860021/comments/16

    Change-Id: I61538acd6bd4c7b3da26c4084225b220d7d1aa2c
    Closes-bug: #1859847
    Related-bug: #1860021

Reviewed: https://review.opendev.org/703247
Committed: https://git.openstack.org/cgit/openstack/devstack/commit/?id=89cb80d2120a7247dcc8b1f6a073cf9c9e488806
Submitter: Zuul
Branch: master

commit 89cb80d2120a7247dcc8b1f6a073cf9c9e488806
Author: Radosław Piliszek <email address hidden>
Date: Sat Jan 18 15:41:17 2020 +0000

    Revert "Revert "Stop enabling g-reg by default""

    This reverts commit 98f3bbe509c2de9efaf4f3fc1b5dbc42d7a67987.

    This is no longer necessary as proper fix [1]
    is now applied.

    [1] https://review.opendev.org/703288

    Change-Id: Ibc40f79b1daf30246ed24790e9b305caea497cb2
    Related-bug: #1859847
    Related-bug: #1860021

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers