Activity log for bug #1640676

Date Who What changed Old value New value Message
2016-11-10 04:33:03 Hua Zhang bug added bug
2016-11-10 07:43:13 Dominique Poulain bug added subscriber Dominique Poulain
2016-11-10 07:56:14 Hua Zhang summary Nova live-migration corrupts some instances libvirt live-migration corrupts some instances
2016-11-11 08:45:26 Christian Ehrhardt  bug added subscriber Ubuntu Server Team
2016-11-11 08:51:03 Christian Ehrhardt  bug added subscriber ChristianEhrhardt
2016-11-15 13:16:53 Hua Zhang summary libvirt live-migration corrupts some instances libvirt 1.2.12 live-migration corrupts some instances
2016-11-15 13:18:11 Hua Zhang attachment added trusty_libvirt_migration_image_corruption.debdiff https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1640676/+attachment/4777701/+files/trusty_libvirt_migration_image_corruption.debdiff
2016-11-15 13:18:36 Hua Zhang description We can replicate the corruption pretty much at will. The sequence of events to trigger it is: Create an instance using a cloud image Start a job running with the following command: "dd if=/dev/urandom of=/var/tmp/mjb.1 bs=4M count=1000" Live migrate the instance using a command like: "nova live-migration --block-migrate <server-id> <target-hypervisor>" Once the migration has finished, stop the dd job on the instance do a "Hard reboot" of the instance When the instance boots, file system corruption will be observed and it won't boot correctly [Impact] While memory load is high, libvirt 1.2.12 (kilo) live-migration corrupts some instances [Test Case] We can replicate the corruption pretty much at will. The sequence of events to trigger it is: Create an instance using a cloud image Start a job running with the following command: "dd if=/dev/urandom of=/var/tmp/mjb.1 bs=4M count=1000" Live migrate the instance using a command like: "nova live-migration --block-migrate <server-id> <target-hypervisor>" Once the migration has finished, stop the dd job on the instance do a "Hard reboot" of the instance (eg: for openstack, nova reboot --hard $INSTANCE) When the instance boots, file system corruption will be observed and it won't boot correctly [Regression Potential] [Other Info] Both libvirt 1.2.16 (kilo) and libvirt 1.2.13 have already fixed this problem. Backported from upstream patches, before the commit 80c5f10e libvirt just polls the events we are interested which can lead to drive mirror can not be cancelled, then the destination is not in a consistent state. in this case it is not safe to continue with the migration. so the commit 80c5f10e introduces listening queue events instead of polling to fix the problem. http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=80c5f10e865cda0302519492f197cb020bd14a07 http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=76c61cdca20c106960af033e5d0f5da70177af0f http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c37943a0687a8fdb08e6eda8ae4b9f4f43f4f2ed http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c88b323bf5d5a070c074fda7adc11085f14415ce BTW, we have completed 20 to 30 live migrations with I/O running and have had no problems, and also tested that other functions continue to work as expected.
2016-11-15 13:18:45 Hua Zhang libvirt (Ubuntu): assignee Hua Zhang (zhhuabj)
2016-11-15 13:22:46 Hua Zhang description [Impact] While memory load is high, libvirt 1.2.12 (kilo) live-migration corrupts some instances [Test Case] We can replicate the corruption pretty much at will. The sequence of events to trigger it is: Create an instance using a cloud image Start a job running with the following command: "dd if=/dev/urandom of=/var/tmp/mjb.1 bs=4M count=1000" Live migrate the instance using a command like: "nova live-migration --block-migrate <server-id> <target-hypervisor>" Once the migration has finished, stop the dd job on the instance do a "Hard reboot" of the instance (eg: for openstack, nova reboot --hard $INSTANCE) When the instance boots, file system corruption will be observed and it won't boot correctly [Regression Potential] [Other Info] Both libvirt 1.2.16 (kilo) and libvirt 1.2.13 have already fixed this problem. Backported from upstream patches, before the commit 80c5f10e libvirt just polls the events we are interested which can lead to drive mirror can not be cancelled, then the destination is not in a consistent state. in this case it is not safe to continue with the migration. so the commit 80c5f10e introduces listening queue events instead of polling to fix the problem. http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=80c5f10e865cda0302519492f197cb020bd14a07 http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=76c61cdca20c106960af033e5d0f5da70177af0f http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c37943a0687a8fdb08e6eda8ae4b9f4f43f4f2ed http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c88b323bf5d5a070c074fda7adc11085f14415ce BTW, we have completed 20 to 30 live migrations with I/O running and have had no problems, and also tested that other functions continue to work as expected. [Impact] While memory load is high, libvirt 1.2.12 (kilo) live-migration corrupts some instances [Test Case] We can replicate the corruption pretty much at will. The sequence of events to trigger it is: Create an instance using a cloud image Start a job running with the following command: "dd if=/dev/urandom of=/var/tmp/mjb.1 bs=4M count=1000" Live migrate the instance using a command like: "nova live-migration --block-migrate <server-id> <target-hypervisor>" Once the migration has finished, stop the dd job on the instance do a "Hard reboot" of the instance (eg: for openstack, nova reboot --hard $INSTANCE) When the instance boots, file system corruption will be observed and it won't boot correctly [Regression Potential] [Other Info] Both libvirt 1.2.16 (kilo) and libvirt 1.2.13 have already fixed this problem. So this problem only happens on trusty. Backported from upstream patches, before the commit 80c5f10e libvirt just polls the events we are interested which can lead to drive mirror can not be cancelled, then the destination is not in a consistent state. in this case it is not safe to continue with the migration. so the commit 80c5f10e introduces listening queue events instead of polling to fix the problem. http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=80c5f10e865cda0302519492f197cb020bd14a07 http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=76c61cdca20c106960af033e5d0f5da70177af0f http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c37943a0687a8fdb08e6eda8ae4b9f4f43f4f2ed http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c88b323bf5d5a070c074fda7adc11085f14415ce BTW, we have completed 20 to 30 live migrations with I/O running and have had no problems, and also tested that other functions continue to work as expected.
2016-11-15 13:34:26 Louis Bouchard nominated for series Ubuntu Trusty
2016-11-15 13:34:26 Louis Bouchard bug task added libvirt (Ubuntu Trusty)
2016-11-15 13:56:47 Christian Ehrhardt  libvirt (Ubuntu Trusty): status New Triaged
2016-11-15 13:56:50 Christian Ehrhardt  libvirt (Ubuntu): status New Fix Released
2016-11-15 13:56:53 Christian Ehrhardt  libvirt (Ubuntu Trusty): importance Undecided High
2016-11-15 15:01:16 Hua Zhang bug task added cloud-archive
2016-11-15 15:05:33 Edward Hope-Morley nominated for series cloud-archive/kilo
2016-11-15 15:07:34 Hua Zhang attachment removed trusty_libvirt_migration_image_corruption.debdiff https://bugs.launchpad.net/cloud-archive/+bug/1640676/+attachment/4777701/+files/trusty_libvirt_migration_image_corruption.debdiff
2016-11-15 15:09:11 Hua Zhang attachment added trusty_libvirt_migration_image_corruption.debdiff https://bugs.launchpad.net/cloud-archive/+bug/1640676/+attachment/4777731/+files/trusty_libvirt_migration_image_corruption.debdiff
2016-11-15 15:20:19 Hua Zhang description [Impact] While memory load is high, libvirt 1.2.12 (kilo) live-migration corrupts some instances [Test Case] We can replicate the corruption pretty much at will. The sequence of events to trigger it is: Create an instance using a cloud image Start a job running with the following command: "dd if=/dev/urandom of=/var/tmp/mjb.1 bs=4M count=1000" Live migrate the instance using a command like: "nova live-migration --block-migrate <server-id> <target-hypervisor>" Once the migration has finished, stop the dd job on the instance do a "Hard reboot" of the instance (eg: for openstack, nova reboot --hard $INSTANCE) When the instance boots, file system corruption will be observed and it won't boot correctly [Regression Potential] [Other Info] Both libvirt 1.2.16 (kilo) and libvirt 1.2.13 have already fixed this problem. So this problem only happens on trusty. Backported from upstream patches, before the commit 80c5f10e libvirt just polls the events we are interested which can lead to drive mirror can not be cancelled, then the destination is not in a consistent state. in this case it is not safe to continue with the migration. so the commit 80c5f10e introduces listening queue events instead of polling to fix the problem. http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=80c5f10e865cda0302519492f197cb020bd14a07 http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=76c61cdca20c106960af033e5d0f5da70177af0f http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c37943a0687a8fdb08e6eda8ae4b9f4f43f4f2ed http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c88b323bf5d5a070c074fda7adc11085f14415ce BTW, we have completed 20 to 30 live migrations with I/O running and have had no problems, and also tested that other functions continue to work as expected. [Impact] While memory load is high, libvirt 1.2.12 (kilo) live-migration corrupts some instances [Test Case] We can replicate the corruption pretty much at will. The sequence of events to trigger it is: Create an instance using a cloud image Start a job running with the following command: "dd if=/dev/urandom of=/var/tmp/mjb.1 bs=4M count=1000" Live migrate the instance using a command like: "nova live-migration --block-migrate <server-id> <target-hypervisor>" Once the migration has finished, stop the dd job on the instance do a "Hard reboot" of the instance (eg: for openstack, nova reboot --hard $INSTANCE) When the instance boots, file system corruption will be observed and it won't boot correctly [Regression Potential] [Other Info] Both libvirt 1.2.16 (liberty) and libvirt 1.2.13 have already fixed this problem. So this problem only happens on kilo. Backported from upstream patches, before the commit 80c5f10e libvirt just polls the events we are interested which can lead to drive mirror can not be cancelled, then the destination is not in a consistent state. in this case it is not safe to continue with the migration. so the commit 80c5f10e introduces listening queue events instead of polling to fix the problem. http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=80c5f10e865cda0302519492f197cb020bd14a07 http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=76c61cdca20c106960af033e5d0f5da70177af0f http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c37943a0687a8fdb08e6eda8ae4b9f4f43f4f2ed http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c88b323bf5d5a070c074fda7adc11085f14415ce BTW, we have completed 20 to 30 live migrations with I/O running and have had no problems, and also tested that other functions continue to work as expected.
2016-11-15 15:30:29 Corey Bryant bug task added cloud-archive/kilo
2016-11-15 15:30:46 Corey Bryant cloud-archive/kilo: status New Triaged
2016-11-15 15:30:48 Corey Bryant cloud-archive/kilo: importance Undecided High
2016-11-15 15:34:12 Hua Zhang tags sts-sru
2016-11-17 09:30:10 Hua Zhang summary libvirt 1.2.12 live-migration corrupts some instances [SRU] libvirt 1.2.12 live-migration corrupts some instances
2016-11-17 09:30:28 Hua Zhang libvirt (Ubuntu): assignee Hua Zhang (zhhuabj)
2016-11-17 09:31:12 Hua Zhang cloud-archive/kilo: assignee Hua Zhang (zhhuabj)
2016-11-17 09:34:28 Hua Zhang bug added subscriber Ubuntu Sponsors Team
2016-11-17 15:06:33 Christian Ehrhardt  libvirt (Ubuntu Trusty): status Triaged Incomplete
2016-11-21 04:11:22 Mathew Hodson libvirt (Ubuntu): importance Undecided High
2016-11-23 15:16:56 Corey Bryant cloud-archive: status New Invalid
2016-11-23 16:40:43 Ryan Beisner cloud-archive/kilo: status Triaged Fix Committed
2016-11-23 16:40:45 Ryan Beisner tags sts-sru sts-sru verification-kilo-needed
2016-11-28 02:16:26 Hua Zhang tags sts-sru verification-kilo-needed sts-sru verification-kilo-done
2016-12-07 16:22:17 James Page cloud-archive/kilo: status Fix Committed Fix Released
2017-01-11 16:01:17 Sebastien Bacher removed subscriber Ubuntu Sponsors Team
2017-01-18 09:22:57 Christian Ehrhardt  bug task deleted libvirt (Ubuntu Trusty)
2017-03-22 15:41:26 Louis Bouchard tags sts-sru verification-kilo-done sts-sru-done verification-kilo-done