legacy-grenade-dsvm-neutron-multinode-live-migration failing with "is not on shared storage: Shared storage live-migration requires either shared storage or boot-from-volume with no local disks." since Jan 21

Bug #1813216 reported by Matt Riedemann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Queens
Fix Committed
Medium
Matt Riedemann
Rocky
Fix Committed
Medium
Matt Riedemann

Bug Description

Seen here:

http://logs.openstack.org/04/632904/2/check/legacy-grenade-dsvm-neutron-multinode-live-migration/8f886af/logs/subnode-2/screen-n-cpu.txt.gz?level=TRACE#_Jan_24_20_30_16_310783

Jan 24 20:30:16.310783 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server [None req-86fed6ff-bca5-4365-a11e-9fa2bf88635c tempest-LiveMigrationTest-1327523144 tempest-LiveMigrationTest-1327523144] Exception during message handling: InvalidSharedStorage: ubuntu-xenial-ovh-bhs1-0002107027 is not on shared storage: Shared storage live-migration requires either shared storage or boot-from-volume with no local disks.
Jan 24 20:30:16.311129 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Jan 24 20:30:16.311370 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
Jan 24 20:30:16.311626 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Jan 24 20:30:16.311888 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
Jan 24 20:30:16.312172 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Jan 24 20:30:16.312414 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
Jan 24 20:30:16.312731 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Jan 24 20:30:16.313186 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/opt/stack/old/nova/nova/exception_wrapper.py", line 79, in wrapped
Jan 24 20:30:16.313500 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server function_name, call_dict, binary, tb)
Jan 24 20:30:16.313752 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Jan 24 20:30:16.314065 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server self.force_reraise()
Jan 24 20:30:16.314300 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jan 24 20:30:16.314573 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Jan 24 20:30:16.314791 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/opt/stack/old/nova/nova/exception_wrapper.py", line 69, in wrapped
Jan 24 20:30:16.315031 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
Jan 24 20:30:16.315243 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/opt/stack/old/nova/nova/compute/utils.py", line 1141, in decorated_function
Jan 24 20:30:16.315481 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Jan 24 20:30:16.315684 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/opt/stack/old/nova/nova/compute/manager.py", line 216, in decorated_function
Jan 24 20:30:16.315898 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
Jan 24 20:30:16.316131 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Jan 24 20:30:16.316433 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server self.force_reraise()
Jan 24 20:30:16.316706 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jan 24 20:30:16.316913 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Jan 24 20:30:16.317109 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/opt/stack/old/nova/nova/compute/manager.py", line 204, in decorated_function
Jan 24 20:30:16.317323 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Jan 24 20:30:16.317515 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/opt/stack/old/nova/nova/compute/manager.py", line 6115, in check_can_live_migrate_source
Jan 24 20:30:16.317749 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server block_device_info)
Jan 24 20:30:16.318009 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 6753, in check_can_live_migrate_source
Jan 24 20:30:16.318235 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server raise exception.InvalidSharedStorage(reason=reason, path=source)
Jan 24 20:30:16.318459 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server InvalidSharedStorage: ubuntu-xenial-ovh-bhs1-0002107027 is not on shared storage: Shared storage live-migration requires either shared storage or boot-from-volume with no local disks.
Jan 24 20:30:16.318797 ubuntu-xenial-ovh-bhs1-0002107027 nova-compute[2857]: ERROR oslo_messaging.rpc.server

Looks like this started around Jan 21:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22is%20not%20on%20shared%20storage%3A%20Shared%20storage%20live-migration%20requires%20either%20shared%20storage%20or%20boot-from-volume%20with%20no%20local%20disks.%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

I don't see anything in the nova commit history around that time.

I wonder if https://review.openstack.org/#/c/631811/ could be related, but I'm not sure how it would be, and we should see weird DB errors in the logs if that were the case (and all of the tests should fail).

I am seeing MessagingTimeouts in the nova-compute logs trying to hit conductor which is weird.

Matt Riedemann (mriedem)
Changed in nova:
status: New → Confirmed
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
Matt Riedemann (mriedem) wrote :

One thing I noticed is that the script that sets up the grenade + live migration environment is trying to use nova.conf rather than nova-cpu.conf here:

https://github.com/openstack/nova/blob/5283b464b5c4976224a8d7ea3898dfe3e6bec786/nova/tests/live_migration/hooks/run_tests.sh#L57

But that's not happening and I'm seeing nova-compute running with nova-cpu.conf:

http://logs.openstack.org/04/632904/2/check/legacy-grenade-dsvm-neutron-multinode-live-migration/8f886af/logs/screen-n-cpu.txt.gz#_Jan_24_19_59_15_757250

Jan 24 19:59:15.757250 ubuntu-xenial-ovh-bhs1-0002106526 nova-compute[31553]: DEBUG oslo_service.service [None req-b65724f9-3239-41b5-88be-f7f3c932e931 None None] command line args: ['--config-file', '/etc/nova/nova-cpu.conf'] {{(pid=31553) log_opt_values /usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py:3008}}

And I'm not seeing that $GRENADE_OLD_BRANCH check get run in the console logs, so probably need to add some debug logging to see where that variable comes from, but I do see it here:

http://logs.openstack.org/04/632904/2/check/legacy-grenade-dsvm-neutron-multinode-live-migration/8f886af/logs/devstack-gate-setup-host.txt.gz

  GRENADE_NEW_BRANCH: master
  GRENADE_OLD_BRANCH: stable/rocky

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/634962

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

I also noticed the job is failing on stable branches which implies the regression could be due to a change to a branchless project, like tempest/openstack-zuul-jobs/devstack-gate but I don't see anything immediately obvious there. It could also be that we backported something in devstack that messed up the job.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looking at the job in pike, nova-compute would run with nova.conf rather than nova-cpu.conf:

http://logs.openstack.org/11/627011/2/check/legacy-grenade-dsvm-neutron-multinode-live-migration/6e4c69c/logs/new/screen-n-cpu.txt.gz#_2019-01-07_14_12_37_218

2019-01-07 14:12:37.218 19245 DEBUG oslo_service.service [req-8874018b-41f7-4799-b7cf-a57d8a5f0367 - -] command line args: ['--config-file', '/etc/nova/nova.conf'] log_opt_values /usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py:2863

There actually isn't a nova-cpu.conf in that job:

http://logs.openstack.org/11/627011/2/check/legacy-grenade-dsvm-neutron-multinode-live-migration/6e4c69c/logs/etc/nova/

Revision history for this message
Matt Riedemann (mriedem) wrote :

Here is a stable/queens job run from Jan 2 and nova.conf is being used in the nova-compute service as well:

http://logs.openstack.org/10/627010/2/check/legacy-grenade-dsvm-neutron-multinode-live-migration/281c4a4/logs/

And in that job the only thing in nova-cpu.conf is this:

[key_manager]
fixed_key = c7e7e2d754d569f54c685e24c5bb73d46eef4aa8806e7457bbf9ebba53279a23a5c73942

So it definitely seems like https://review.openstack.org/#/c/631811/ or https://review.openstack.org/#/c/609755/ messed something up here.

Also, when I diff the nova-cpu.conf and nova.conf from a failing job, this is in nova.conf:

[libvirt]
live_migration_uri = qemu+ssh://stack@%s/system
cpu_mode = none
virt_type = qemu
rbd_user = cinder
rbd_secret_uuid = ff43d118-c258-490f-9142-a4b9a7663842
inject_key = false
inject_partition = -2
disk_cachemodes = network=writeback
images_type = rbd
images_rbd_pool = vms
images_rbd_ceph_conf = /etc/ceph/ceph.conf

But not nova-cpu.conf, and that's the main issue - we're running with ceph but nova-compute is using a config file that's not configured for ceph.

Changed in nova:
status: In Progress → Triaged
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/634962
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f2db43d57ae980d8cdf08de4f5e6290d7edcc95c
Submitter: Zuul
Branch: master

commit f2db43d57ae980d8cdf08de4f5e6290d7edcc95c
Author: Matt Riedemann <email address hidden>
Date: Tue Feb 5 10:40:33 2019 -0500

    Fix legacy-grenade-dsvm-neutron-multinode-live-migration

    Since change I81301eeecc7669a169deeb1e2c5d298a595aab94 in
    devstack, nova-cpu.conf is a copy of nova.conf minus
    database access. Grenade jobs also run nova-compute with
    nova-cpu.conf anyway so we can just drop the conditional
    which was otherwise messing up the config file that the
    ceph script would write rbd configuration which is why
    live block migration tests with ceph were failing.

    While in here, the zuul job configuration is updated so
    that changes to nova/tests/live_migration/ can be
    self-testing.

    Change-Id: I902e459093af9b82f9033d58cffcb2a628f5ec39
    Closes-Bug: #1813216

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/640186

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/640197

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/640186
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c4a2a32f07cbe139b5889654bbc82d3ddeaeeb94
Submitter: Zuul
Branch: stable/rocky

commit c4a2a32f07cbe139b5889654bbc82d3ddeaeeb94
Author: Matt Riedemann <email address hidden>
Date: Tue Feb 5 10:40:33 2019 -0500

    Fix legacy-grenade-dsvm-neutron-multinode-live-migration

    Since change I81301eeecc7669a169deeb1e2c5d298a595aab94 in
    devstack, nova-cpu.conf is a copy of nova.conf minus
    database access. Grenade jobs also run nova-compute with
    nova-cpu.conf anyway so we can just drop the conditional
    which was otherwise messing up the config file that the
    ceph script would write rbd configuration which is why
    live block migration tests with ceph were failing.

    NOTE(mriedem): The original change to .zuul.yaml is not
    backported since it is not needed since change
    Ibce77d3442e21bbd5f5ce379c203542f1f31ce9e, which regressed
    the irrelevant-files for the job, was stein-only.

    Conflicts:
          .zuul.yaml

    NOTE(mriedem): The conflict is due to not having change
    Ibce77d3442e21bbd5f5ce379c203542f1f31ce9e nor change
    I93e938277454a1fc203b3d930ec1bc1eceac0a1e in stable/rocky.

    Change-Id: I902e459093af9b82f9033d58cffcb2a628f5ec39
    Closes-Bug: #1813216
    (cherry picked from commit f2db43d57ae980d8cdf08de4f5e6290d7edcc95c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.0

This issue was fixed in the openstack/nova 18.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/653811

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/pike)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/pike
Review: https://review.opendev.org/653811
Reason: https://review.opendev.org/#/c/640207/ is working.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/640197
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5d5e7be8cbb6d37b20c7efcac7850562d3281904
Submitter: Zuul
Branch: stable/queens

commit 5d5e7be8cbb6d37b20c7efcac7850562d3281904
Author: Matt Riedemann <email address hidden>
Date: Tue Feb 5 10:40:33 2019 -0500

    Fix legacy-grenade-dsvm-neutron-multinode-live-migration

    Since change I81301eeecc7669a169deeb1e2c5d298a595aab94 in
    devstack, nova-cpu.conf is a copy of nova.conf minus
    database access. Grenade jobs also run nova-compute with
    nova-cpu.conf anyway so we can just drop the conditional
    which was otherwise messing up the config file that the
    ceph script would write rbd configuration which is why
    live block migration tests with ceph were failing.

    NOTE(mriedem): This is not a pure backport since this change
    in queens just disables running the job with ceph because of
    a mess of changes to devstack queens related to change
    https://review.openstack.org/625131/ and change
    https://review.openstack.org/632100/. As a result, the pike
    node runs with nova.conf (because of grenade singleconductor
    mode makes it so) but the queens node runs with nova-cpu.conf
    and then the _ceph_configure_nova function does not configure
    the nodes for rbd auth properly which makes rbd-backed live
    migration in the job fail. Rather than try to sort that all out
    with a one-off change in queens (which is pretty old at this
    point anyway), this change just skips the ceph portion of the
    job. An alternative is just not running the job in queens, but
    we can easily still get block migrate live migration coverage
    so we might as well keep that running.

    Change-Id: I902e459093af9b82f9033d58cffcb2a628f5ec39
    Closes-Bug: #1813216
    (cherry picked from commit f2db43d57ae980d8cdf08de4f5e6290d7edcc95c)
    (cherry picked from commit c4a2a32f07cbe139b5889654bbc82d3ddeaeeb94)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.11

This issue was fixed in the openstack/nova 17.0.11 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.