OpenStack Compute (nova)

gate-grenade-dsvm-neutron-multinode-live-migration-nv fails in pike: "Failed to restart <email address hidden>: Unit <email address hidden> not found."

Bug #1691769 reported by Matt Riedemann on 2017-05-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Triaged	Medium	Unassigned

Bug Description

Seen here:

http://logs.openstack.org/42/465042/5/check/gate-grenade-dsvm-neutron-multinode-live-migration-nv/6aae0c3/console.html#_2017-05-18_00_39_25_007223

This was regressed by https://review.openstack.org/#/c/461803/ which changed to use systemd for everything in that job, but we forgot/missed that we have a grenade job using those scripts, which means the old side (ocata) isn't using systemd, so when the new side tries to restart glance-api and expects it to be running under systemd it fails.

Tags:

Revision history for this message

Matt Riedemann (mriedem) wrote on 2017-05-18:

Here is the state of our jobs that run live migration today:

1. gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial

This is the multinode non-upgrade same-level computes live-migration job. This does not run with live_migrate_back_and_forth enabled. It tests local disk block migration and ceph-backed local disk shared storage migration (no volumes).

2. gate-tempest-dsvm-neutron-multinode-full-ubuntu-xenial-nv

Same as #1 except no ceph.

3. gate-grenade-dsvm-neutron-multinode-ubuntu-xenial

This does not test live migration since it only runs smoke tests and the live migration tests aren't listed as smoke tests.

4. gate-grenade-dsvm-neutron-multinode-live-migration-nv

Multinode mixed-level compute job which tests live migration for local disk live block migration and local disk shared storage on ceph live migration, no volume-backed live migration. It also runs with live_migrate_back_and_forth=True which means it live migrations between mixed level computes, so pike->ocata->pike (well, it would after https://review.openstack.org/#/c/466033/ anyway).

So right now we have no live migration coverage for ceph, and we have no live migration coverage for mixed-level computes since gate-grenade-dsvm-neutron-multinode-live-migration-nv isn't working.

Our options are:

a) Make the job work with both systemd and screen, by either re-writing it to re-use functions available in devstack, or bake in our own logic for how to handle restarts depending on how the job is configured (I don't have a good sense for which is harder to do).

b) Do nothing - basically disable the job from running on master (pike) and then re-enable it for Queens when n-1 would be pike and would use systemd by default. This would mean we'd have no ceph-backed live migration or mixed-compute live migration for all of Pike (and when it's stable/pike).

Revision history for this message

Matt Riedemann (mriedem) wrote on 2017-05-18:

It might be possible to do something like this:

https://github.com/openstack-dev/devstack/commit/c006bbdeb26df2c60f43d222bdf918f9e24d551f

Or re-use those common functions.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-05-18: Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/466097

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-05-19: Related fix merged to nova (master)

Reviewed: https://review.openstack.org/466097
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=994caa464efdbed3d684354d699cd79e41d7eb0f
Submitter: Jenkins
Branch: master

commit 994caa464efdbed3d684354d699cd79e41d7eb0f
Author: Matt Riedemann <email address hidden>
Date: Thu May 18 16:20:56 2017 -0400

Skip ceph in grenade live migration job due to restart failures

    Chnage I914430d68f64d29932c9409d055b15e4cb384ec4 made the
    live migration scripts assume everything is running under systemd,
    which is fine for the non-grenade job since devstack on master (pike)
    defaults to run everything under systemd.

    We missed, however, that the grenade live migration job is starting
    from Ocata where screen is used by default, so when we get to the ceph
    part of this job in the grenade setup, trying to restart glance-api
    via systemctl fails since it's running under screen, not systemd.

    For now we'll just skip the ceph live migration setup in the grenade
    run until either the bug is fixed or until Queens is our master branch,
    at which point the old side for grenade is Pike and running under
    systemd.

Change-Id: Ia0ec32dc7cfe744b21b926949c4ab046f9417bc7
Related-Bug: #1691769

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.