compute logs tell me live migration finished successfully when it actually failed

Bug #1685340 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Newton
New
Undecided
Shane Peters
Ocata
Fix Committed
Medium
Shane Peters

Bug Description

This tells me post live migration at destination failed:

http://logs.openstack.org/43/458843/1/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/697a501/logs/subnode-2/screen-n-cpu.txt.gz#_2017-04-21_13_54_10_281

2017-04-21 13:54:10.281 10362 ERROR nova.compute.manager [req-7ecbf938-9e55-4e4c-b7da-63eef0f8d4a9 tempest-LiveBlockMigrationTestJSON-208732686 tempest-LiveBlockMigrationTestJSON-208732686] [instance: 9bf9f268-5242-4b1d-8fe6-ee348b2b8d3e] Post live migration at destination ubuntu-xenial-2-node-osic-cloud1-s3500-8527282 failed

Later on, the logs tell me it was successful:

http://logs.openstack.org/43/458843/1/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/697a501/logs/subnode-2/screen-n-cpu.txt.gz#_2017-04-21_13_54_11_080

2017-04-21 13:54:11.080 10362 INFO nova.compute.manager [req-7ecbf938-9e55-4e4c-b7da-63eef0f8d4a9 tempest-LiveBlockMigrationTestJSON-208732686 tempest-LiveBlockMigrationTestJSON-208732686] [instance: 9bf9f268-5242-4b1d-8fe6-ee348b2b8d3e] Migrating instance to ubuntu-xenial-2-node-osic-cloud1-s3500-8527282 finished successfully.

That's because we don't stop on the failure because we want to continue with cleanup, but we don't check if we failed when emitting the success message.

Matt Riedemann (mriedem)
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
Matt Riedemann (mriedem) wrote :

The actual failure in this case happened on the source host:

http://logs.openstack.org/43/458843/1/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/697a501/logs/subnode-2/libvirt/qemu/instance-00000001.txt.gz

qemu-system-x86_64: /build/qemu-5OJ39u/qemu-2.8+dfsg/block/io.c:1514: bdrv_co_pwritev: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.

And shows up like this on the dest host:

http://logs.openstack.org/43/458843/1/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/697a501/logs/libvirt/qemu/instance-00000001.txt.gz

/build/qemu-5OJ39u/qemu-2.8+dfsg/nbd/server.c:nbd_receive_request():L710: read failed

This ML post is related:

http://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg01086.html

Revision history for this message
Matt Riedemann (mriedem) wrote :

(3:02:48 PM) kashyap: mriedem: It means (according to the QEMU commit that introduced it):
(3:03:12 PM) kashyap: Something tried to write to the image file on the source, _while_ it is being migrated

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/458958

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/458958
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=746e48efa32fd599817197ffd7ad434a35f96165
Submitter: Jenkins
Branch: master

commit 746e48efa32fd599817197ffd7ad434a35f96165
Author: Matt Riedemann <email address hidden>
Date: Thu Apr 27 14:44:52 2017 -0400

    Do not log live migration success when it actually failed

    During post live migration, if post live migration on destination
    fails, then we log a stacktrace but continue to perform cleanup
    on the source side. However, at the end of the _post_live_migration
    method it was logging that things were successful on the destination
    host, which they weren't, which is really confusing when you're trying
    to debug the failure and seeing this conflict in the logs.

    This patch simply sets a flag if we failed post live migration at
    the destination host so we don't log the success message later on
    the source host, plus tests to show the flag is set and checked.

    Change-Id: I16e70912a13c963031397e66a8553b2c199d50bd
    Closes-Bug: #1685340

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0b2

This issue was fixed in the openstack/nova 16.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/480744

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/480744
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b296f1e2e6590262a6702fb49d175f761cb901e2
Submitter: Jenkins
Branch: stable/ocata

commit b296f1e2e6590262a6702fb49d175f761cb901e2
Author: Matt Riedemann <email address hidden>
Date: Thu Apr 27 14:44:52 2017 -0400

    Do not log live migration success when it actually failed

    During post live migration, if post live migration on destination
    fails, then we log a stacktrace but continue to perform cleanup
    on the source side. However, at the end of the _post_live_migration
    method it was logging that things were successful on the destination
    host, which they weren't, which is really confusing when you're trying
    to debug the failure and seeing this conflict in the logs.

    This patch simply sets a flag if we failed post live migration at
    the destination host so we don't log the success message later on
    the source host, plus tests to show the flag is set and checked.

    Change-Id: I16e70912a13c963031397e66a8553b2c199d50bd
    Closes-Bug: #1685340
    (cherry picked from commit 746e48efa32fd599817197ffd7ad434a35f96165)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.8

This issue was fixed in the openstack/nova 15.0.8 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.