Activity log for bug #1821755

Date Who What changed Old value New value Message
2019-03-26 13:54:23 Boxiang Zhu bug added bug
2019-03-26 13:54:54 Boxiang Zhu description Description =========== If we live migrate two instance simultaneously, the instances will break the instance group policy. Steps to reproduce ================== OpenStack env with three compute nodes(node1, node2 and node3). Then we create two VMs(vm1, vm2) with the anti-affinity policy. At last, we live migrate two VMs simultaneously. Before live-migration, the VMs are located as followed: node1 -> vm1 node2 -> vm2 node3 * nova live-migration vm1 * nova live-migration vm2 Expected result =============== Fail to live migrate vm1 and vm2. Actual result ============= node1 node2 node3 -> vm1,vm2 Environment =========== master branch of openstack As described above, the live migration could not check the in-progress live-migration and just select the host by scheduler filter. So that they are migrated to the same host. Description =========== If we live migrate two instance simultaneously, the instances will break the instance group policy. Steps to reproduce ================== OpenStack env with three compute nodes(node1, node2 and node3). Then we create two VMs(vm1, vm2) with the anti-affinity policy. At last, we live migrate two VMs simultaneously. Before live-migration, the VMs are located as followed: node1 -> vm1 node2 -> vm2 node3 * nova live-migration vm1 * nova live-migration vm2 Expected result =============== Fail to live migrate vm1 and vm2. Actual result ============= node1 node2 node3 -> vm1,vm2 Environment =========== master branch of openstack As described above, the live migration could not check the in-progress live-migration and just select the host by scheduler filter. So that they are migrated to the same host.
2019-03-27 13:26:07 Matt Riedemann tags live-migration scheduler
2019-03-27 13:35:51 Matt Riedemann tags live-migration scheduler live-migration scheduler starlingx
2019-03-27 13:35:55 Matt Riedemann nova: status New Triaged
2019-03-27 13:35:58 Matt Riedemann nova: importance Undecided Medium
2019-03-27 13:38:28 Matt Riedemann bug added subscriber Chris Friesen
2019-03-27 13:44:11 Matt Riedemann marked as duplicate 1526642
2019-03-28 02:17:14 Boxiang Zhu removed duplicate marker 1526642
2019-04-12 05:24:33 OpenStack Infra nova: status Triaged In Progress
2019-04-12 05:24:33 OpenStack Infra nova: assignee Boxiang Zhu (bxzhu-5355)
2019-04-25 12:15:19 Hu Zhou bug added subscriber Hu Zhou
2021-06-02 17:57:27 OpenStack Infra nova: status In Progress Fix Released
2021-06-10 22:00:42 melanie witt nominated for series nova/victoria
2021-06-10 22:00:42 melanie witt bug task added nova/victoria
2021-06-10 22:00:42 melanie witt nominated for series nova/wallaby
2021-06-10 22:00:42 melanie witt bug task added nova/wallaby
2021-06-15 18:55:58 OpenStack Infra nova/wallaby: status New Fix Committed
2021-06-16 17:30:37 OpenStack Infra nova/victoria: status New Fix Committed
2021-06-29 15:55:27 OpenStack Infra tags live-migration scheduler starlingx in-stable-ussuri live-migration scheduler starlingx
2021-07-07 19:20:57 melanie witt nominated for series nova/ussuri
2021-07-07 19:20:57 melanie witt bug task added nova/ussuri
2021-07-07 19:20:57 melanie witt nominated for series nova/train
2021-07-07 19:20:57 melanie witt bug task added nova/train
2021-07-07 19:21:32 melanie witt nova/ussuri: status New Fix Committed
2021-07-08 13:55:53 OpenStack Infra nova/train: status New Fix Committed
2021-07-15 15:25:12 OpenStack Infra tags in-stable-ussuri live-migration scheduler starlingx in-stable-stein in-stable-ussuri live-migration scheduler starlingx
2021-07-16 09:42:14 Elod Illes nova/ussuri: status Fix Committed Fix Released
2021-07-16 09:45:55 Elod Illes nova/victoria: status Fix Committed Fix Released
2021-07-16 09:48:00 Elod Illes nova/wallaby: status Fix Committed Fix Released
2021-07-22 20:42:47 Rodrigo Barbieri description Description =========== If we live migrate two instance simultaneously, the instances will break the instance group policy. Steps to reproduce ================== OpenStack env with three compute nodes(node1, node2 and node3). Then we create two VMs(vm1, vm2) with the anti-affinity policy. At last, we live migrate two VMs simultaneously. Before live-migration, the VMs are located as followed: node1 -> vm1 node2 -> vm2 node3 * nova live-migration vm1 * nova live-migration vm2 Expected result =============== Fail to live migrate vm1 and vm2. Actual result ============= node1 node2 node3 -> vm1,vm2 Environment =========== master branch of openstack As described above, the live migration could not check the in-progress live-migration and just select the host by scheduler filter. So that they are migrated to the same host. -------------------------------- NOTE: SRU template at the bottom -------------------------------- Description =========== If we live migrate two instance simultaneously, the instances will break the instance group policy. Steps to reproduce ================== OpenStack env with three compute nodes(node1, node2 and node3). Then we create two VMs(vm1, vm2) with the anti-affinity policy. At last, we live migrate two VMs simultaneously. Before live-migration, the VMs are located as followed: node1 -> vm1 node2 -> vm2 node3 * nova live-migration vm1 * nova live-migration vm2 Expected result =============== Fail to live migrate vm1 and vm2. Actual result ============= node1 node2 node3 -> vm1,vm2 Environment =========== master branch of openstack As described above, the live migration could not check the in-progress live-migration and just select the host by scheduler filter. So that they are migrated to the same host. ---------------------------------------------------- =============== SRU Description =============== [Impact] When performing multiple live migration, cold migration or resize simultaneously, the affinity or anti-affinity policy is violated, allowing the migrated VM to land in a host that conflicts with the policy. [Test case] 1. Setting up the env 1a. Deploy env with 5 compute nodes 1b. Confirm that all nodes have the same CPU architecture (so live-migration works between them) either by running lscpu or "openstack hypervisor show <node>" on each of the nodes 1c. Create anti-affinity policy openstack server group create anti-aff --policy anti-affinity 1c. Create flavor openstack flavor create --vcpu 1 --ram 1024 --disk 0 --id 100 test-flavor 1d. Create volumes openstack volume create --image cirros --size 1 vol1 openstack volume create --source vol1 --size 1 vol2 && openstack volume create --source vol1 --size 1 vol3 2. Prepare to reproduce the bug 2a. Get group ID GROUP_ID=$(openstack server group show anti-aff -c id -f value) 2b. Create VMs openstack server create --network private --volume vol1 --flavor 100 --hint group=$GROUP_ID ins1 && openstack server create --network private --volume vol2 --flavor 100 --hint group=$GROUP_ID ins2 && openstack server create --network private --volume vol3 --flavor 100 --hint group=$GROUP_ID ins3 2c. Confirm each one is in a different host by running "openstack server list --long" and take note of the hosts 3. Reproducing the bug (Live migration) 3a. Perform set of steps (2) if hasn't. 3b. openstack server migrate ins1 --live-migration & openstack server migrate ins2 --live-migration & openstack server migrate ins3 --live-migration 3c. watch "openstack server list --long" until it is finished 3d. Confirm that at least 1 host is in the same host as another host. Otherwise, repeat step 3a. 4. Reproducing the bug (Cold Migration) 4a. Perform set os steps (2) if hasn't 4b. openstack server migrate ins1 & openstack server migrate ins2 & openstack server migrate ins3 4c. watch "openstack server list --long" until are statuses are "VERIFY_RESIZE" 4d. Confirm that at least 1 host is in the same host as another host. Otherwise, repeat step 4a. 4e. Confirm all the resizes running "openstack server resize confirm <vm>" 5. Install package that contains the fixed code on all compute nodes 6. Confirm fix (Live migration) 6a. Perform steps 3a - 3c 6b. Confirm there are no VMs in the same hosts nor VMs with ERROR status. 6c. Confirm there are VMs that have ACTIVE status and did not move hosts. Otherwise, repeat step 6a. 6d. Run "openstack server event list <vm-id>, then "openstack server event show <vm-id> <req-id>" for the live-migration event. Confirm the "message" field is "error" and the traceback is part of the "compute_check_can_live_migrate_destination" or "compute_pre_live_migration" events with result=Error and the traceback ends in the _do_validation function. Repeat this step to capture both events. 7. Confirm fix (Cold migration) 7a. Perform steps 4a - 4c, while taking note of the the timestamp (by running $(date)) before running the migration command 7b. Confirm there are no VMs in the same same hosts nor VMs with ERROR status. There should be VMs with "VERIFY_RESIZE" and "ACTIVE" statuses. If there are no ACTIVE instances, confirm the resizes and repeat step 7a. 7c. For the ones that are ACTIVE, check logs for error messages. There should be message with error about "anti-affinity": egrep -rnIi "3e926491-d0dc-4611-8e87-75604c67f308.*Anti-affinity instance group policy was violated" /var/log/nova /var/log/nova/nova-compute.log:40797:2021-07-22 19:19:54.075 1692 ERROR oslo_messaging.rpc.server nova.exception.RescheduledException: Build of instance 3e926491-d0dc-4611-8e87-75604c67f308 was re-scheduled: Anti-affinity instance group policy was violated. 7d. Confirm that the log timestamp matches a few seconds after the migration command was issued. 7e. Run "openstack server event list <vm-id>", then "openstack server event show <vm-id> <req-id>" for the migration event. Confirm the "message" field is "error" and the "events" field include a "No Valid Host" final message, with the "compute_prep_resize" event with result=Error and ending the traceback in the _do_validation function. [Regression Potential] Part of the new code path has been tested in upstream CI in happy migration paths. Concurrency has not been tested in the CI to trigger the error in a negative test. The exception handling code is executed only in case the exception is raised (in case of policy violation), so this code path is being tested manually as part of the upstream patch work and SRU. [Other Info] None
2021-07-22 20:51:14 Rodrigo Barbieri summary live migration break the anti-affinity policy of server group simultaneously [SRU] live migration break the anti-affinity policy of server group simultaneously
2021-07-22 21:07:33 Rodrigo Barbieri description -------------------------------- NOTE: SRU template at the bottom -------------------------------- Description =========== If we live migrate two instance simultaneously, the instances will break the instance group policy. Steps to reproduce ================== OpenStack env with three compute nodes(node1, node2 and node3). Then we create two VMs(vm1, vm2) with the anti-affinity policy. At last, we live migrate two VMs simultaneously. Before live-migration, the VMs are located as followed: node1 -> vm1 node2 -> vm2 node3 * nova live-migration vm1 * nova live-migration vm2 Expected result =============== Fail to live migrate vm1 and vm2. Actual result ============= node1 node2 node3 -> vm1,vm2 Environment =========== master branch of openstack As described above, the live migration could not check the in-progress live-migration and just select the host by scheduler filter. So that they are migrated to the same host. ---------------------------------------------------- =============== SRU Description =============== [Impact] When performing multiple live migration, cold migration or resize simultaneously, the affinity or anti-affinity policy is violated, allowing the migrated VM to land in a host that conflicts with the policy. [Test case] 1. Setting up the env 1a. Deploy env with 5 compute nodes 1b. Confirm that all nodes have the same CPU architecture (so live-migration works between them) either by running lscpu or "openstack hypervisor show <node>" on each of the nodes 1c. Create anti-affinity policy openstack server group create anti-aff --policy anti-affinity 1c. Create flavor openstack flavor create --vcpu 1 --ram 1024 --disk 0 --id 100 test-flavor 1d. Create volumes openstack volume create --image cirros --size 1 vol1 openstack volume create --source vol1 --size 1 vol2 && openstack volume create --source vol1 --size 1 vol3 2. Prepare to reproduce the bug 2a. Get group ID GROUP_ID=$(openstack server group show anti-aff -c id -f value) 2b. Create VMs openstack server create --network private --volume vol1 --flavor 100 --hint group=$GROUP_ID ins1 && openstack server create --network private --volume vol2 --flavor 100 --hint group=$GROUP_ID ins2 && openstack server create --network private --volume vol3 --flavor 100 --hint group=$GROUP_ID ins3 2c. Confirm each one is in a different host by running "openstack server list --long" and take note of the hosts 3. Reproducing the bug (Live migration) 3a. Perform set of steps (2) if hasn't. 3b. openstack server migrate ins1 --live-migration & openstack server migrate ins2 --live-migration & openstack server migrate ins3 --live-migration 3c. watch "openstack server list --long" until it is finished 3d. Confirm that at least 1 host is in the same host as another host. Otherwise, repeat step 3a. 4. Reproducing the bug (Cold Migration) 4a. Perform set os steps (2) if hasn't 4b. openstack server migrate ins1 & openstack server migrate ins2 & openstack server migrate ins3 4c. watch "openstack server list --long" until are statuses are "VERIFY_RESIZE" 4d. Confirm that at least 1 host is in the same host as another host. Otherwise, repeat step 4a. 4e. Confirm all the resizes running "openstack server resize confirm <vm>" 5. Install package that contains the fixed code on all compute nodes 6. Confirm fix (Live migration) 6a. Perform steps 3a - 3c 6b. Confirm there are no VMs in the same hosts nor VMs with ERROR status. 6c. Confirm there are VMs that have ACTIVE status and did not move hosts. Otherwise, repeat step 6a. 6d. Run "openstack server event list <vm-id>, then "openstack server event show <vm-id> <req-id>" for the live-migration event. Confirm the "message" field is "error" and the traceback is part of the "compute_check_can_live_migrate_destination" or "compute_pre_live_migration" events with result=Error and the traceback ends in the _do_validation function. Repeat this step to capture both events. 7. Confirm fix (Cold migration) 7a. Perform steps 4a - 4c, while taking note of the the timestamp (by running $(date)) before running the migration command 7b. Confirm there are no VMs in the same same hosts nor VMs with ERROR status. There should be VMs with "VERIFY_RESIZE" and "ACTIVE" statuses. If there are no ACTIVE instances, confirm the resizes and repeat step 7a. 7c. For the ones that are ACTIVE, check logs for error messages. There should be message with error about "anti-affinity": egrep -rnIi "3e926491-d0dc-4611-8e87-75604c67f308.*Anti-affinity instance group policy was violated" /var/log/nova /var/log/nova/nova-compute.log:40797:2021-07-22 19:19:54.075 1692 ERROR oslo_messaging.rpc.server nova.exception.RescheduledException: Build of instance 3e926491-d0dc-4611-8e87-75604c67f308 was re-scheduled: Anti-affinity instance group policy was violated. 7d. Confirm that the log timestamp matches a few seconds after the migration command was issued. 7e. Run "openstack server event list <vm-id>", then "openstack server event show <vm-id> <req-id>" for the migration event. Confirm the "message" field is "error" and the "events" field include a "No Valid Host" final message, with the "compute_prep_resize" event with result=Error and ending the traceback in the _do_validation function. [Regression Potential] Part of the new code path has been tested in upstream CI in happy migration paths. Concurrency has not been tested in the CI to trigger the error in a negative test. The exception handling code is executed only in case the exception is raised (in case of policy violation), so this code path is being tested manually as part of the upstream patch work and SRU. [Other Info] None -------------------------------- NOTE: SRU template at the bottom -------------------------------- Description =========== If we live migrate two instance simultaneously, the instances will break the instance group policy. Steps to reproduce ================== OpenStack env with three compute nodes(node1, node2 and node3). Then we create two VMs(vm1, vm2) with the anti-affinity policy. At last, we live migrate two VMs simultaneously. Before live-migration, the VMs are located as followed: node1 -> vm1 node2 -> vm2 node3 * nova live-migration vm1 * nova live-migration vm2 Expected result =============== Fail to live migrate vm1 and vm2. Actual result ============= node1 node2 node3 -> vm1,vm2 Environment =========== master branch of openstack As described above, the live migration could not check the in-progress live-migration and just select the host by scheduler filter. So that they are migrated to the same host. ---------------------------------------------------- =============== SRU Description =============== [Impact] When performing multiple live migration, cold migration or resize simultaneously, the affinity or anti-affinity policy is violated, allowing the migrated VM to land in a host that conflicts with the policy. [Test case] 1. Setting up the env 1a. Deploy env with 5 compute nodes 1b. Confirm that all nodes have the same CPU architecture (so live-migration works between them) either by running lscpu or "openstack hypervisor show <node>" on each of the nodes 1c. Create anti-affinity policy openstack server group create anti-aff --policy anti-affinity 1c. Create flavor openstack flavor create --vcpu 1 --ram 1024 --disk 0 --id 100 test-flavor 1d. Create volumes openstack volume create --image cirros --size 1 vol1 openstack volume create --source vol1 --size 1 vol2 && openstack volume create --source vol1 --size 1 vol3 2. Prepare to reproduce the bug 2a. Get group ID GROUP_ID=$(openstack server group show anti-aff -c id -f value) 2b. Create VMs openstack server create --network private --volume vol1 --flavor 100 --hint group=$GROUP_ID ins1 && openstack server create --network private --volume vol2 --flavor 100 --hint group=$GROUP_ID ins2 && openstack server create --network private --volume vol3 --flavor 100 --hint group=$GROUP_ID ins3 2c. Confirm each one is in a different host by running "openstack server list --long" and take note of the hosts 3. Reproducing the bug (Live migration) 3a. Perform set of steps (2) if hasn't. 3b. openstack server migrate ins1 --live-migration & openstack server migrate ins2 --live-migration & openstack server migrate ins3 --live-migration 3c. watch "openstack server list --long" until all migrations are finished 3d. Confirm that at least 1 host is in the same host as another host. Otherwise, repeat steps 3a - 3c. 4. Reproducing the bug (Cold Migration) 4a. Perform set os steps (2) if hasn't 4b. openstack server migrate ins1 & openstack server migrate ins2 & openstack server migrate ins3 4c. watch "openstack server list --long" until all statuses are "VERIFY_RESIZE" 4d. Confirm that at least 1 host is in the same host as another host. Otherwise, repeat steps 4a - 4c. 4e. Confirm all the resizes running "openstack server resize confirm <vm>" 5a. Install package that contains the fixed code on all compute nodes 5b. Cleanup all the VMs 6. Confirm fix (Live migration) 6a. Perform steps 3a - 3c 6b. Confirm there are no VMs in the same hosts nor VMs with ERROR status. 6c. Confirm there are VMs that have ACTIVE status and did not move hosts. Otherwise, repeat step 6a. 6d. Run "openstack server event list <vm-id>, then "openstack server event show <vm-id> <req-id>" for the live-migration event of the VMs assessed in step 6c. Confirm the "message" field is "error" and the traceback is part of the "compute_check_can_live_migrate_destination" or "compute_pre_live_migration" events with result=Error and the traceback ends in the _do_validation function. Repeat this step to capture both events. 6e. Check the logs for messages related to the VMs assessed in step (6c), where: - For compute_check_can_live_migrate_destination: egrep -rnIi "MigrationPreCheckError: Migration pre-check error: Failed to validate instance group policy due to.*e9ec173a-4491-4541-9bd4-951692e48c8f.*Anti-affinity instance group policy was violated" /var/log/nova - For compute_pre_live_migration: grep -rnIi "RescheduledException_Remote: Build of instance c55889d9-6cbe-409a-b118-7b4a8d808972 was re-scheduled: Anti-affinity instance group policy was violated." /var/log/nova 7. Confirm fix (Cold migration) 7a. Perform steps 4a - 4c, while taking note of the the timestamp (by running $(date)) before running the migration command 7b. Confirm there are no VMs in the same same hosts nor VMs with ERROR status. There should be VMs with "VERIFY_RESIZE" and "ACTIVE" statuses. If there are no ACTIVE instances, confirm the resizes and repeat step 7a. 7c. For the ones that are ACTIVE, check logs for error messages. There should be message with error about "anti-affinity": egrep -rnIi "3e926491-d0dc-4611-8e87-75604c67f308.*Anti-affinity instance group policy was violated" /var/log/nova /var/log/nova/nova-compute.log:40797:2021-07-22 19:19:54.075 1692 ERROR oslo_messaging.rpc.server nova.exception.RescheduledException: Build of instance 3e926491-d0dc-4611-8e87-75604c67f308 was re-scheduled: Anti-affinity instance group policy was violated. 7d. Confirm that the log timestamp matches a few seconds after the migration command was issued. 7e. Run "openstack server event list <vm-id>", then "openstack server event show <vm-id> <req-id>" for the migration event. Confirm the "message" field is "error" and the "events" field include a "No Valid Host" final message, with the "compute_prep_resize" event with result=Error and ending the traceback in the _do_validation function. [Regression Potential] Part of the new code path has been tested in upstream CI in happy migration paths. Concurrency has not been tested in the CI to trigger the error in a negative test. The exception handling code is executed only in case the exception is raised (in case of policy violation), so this code path is being tested manually as part of the upstream patch work and SRU. [Other Info] None
2021-07-26 15:54:13 Rodrigo Barbieri merge proposal linked https://code.launchpad.net/~rodrigo-barbieri2010/ubuntu/+source/nova/+git/nova/+merge/406169
2021-09-23 12:20:23 Rodrigo Barbieri merge proposal linked https://code.launchpad.net/~rodrigo-barbieri2010/ubuntu/+source/nova/+git/nova/+merge/409020
2021-09-23 12:21:05 Rodrigo Barbieri bug task added cloud-archive
2021-09-23 12:21:22 Rodrigo Barbieri nominated for series cloud-archive/stein
2021-09-23 12:21:22 Rodrigo Barbieri bug task added cloud-archive/stein
2021-09-23 12:37:35 Chris MacNaughton cloud-archive: status New Fix Released
2021-09-23 16:59:32 Rodrigo Barbieri merge proposal unlinked https://code.launchpad.net/~rodrigo-barbieri2010/ubuntu/+source/nova/+git/nova/+merge/406169
2021-09-28 12:30:41 Rodrigo Barbieri merge proposal linked https://code.launchpad.net/~rodrigo-barbieri2010/ubuntu/+source/nova/+git/nova/+merge/409244
2021-09-29 13:05:12 Chris MacNaughton nominated for series cloud-archive/train
2021-09-29 13:05:12 Chris MacNaughton bug task added cloud-archive/train
2021-10-11 13:45:45 Rodrigo Barbieri tags in-stable-stein in-stable-ussuri live-migration scheduler starlingx in-stable-stein in-stable-ussuri live-migration scheduler starlingx sts-sru-needed
2021-11-08 14:27:35 Chris MacNaughton tags in-stable-stein in-stable-ussuri live-migration scheduler starlingx sts-sru-needed in-stable-stein in-stable-ussuri live-migration scheduler starlingx sts-sru-needed verification-stein-needed
2021-11-08 14:28:13 Chris MacNaughton tags in-stable-stein in-stable-ussuri live-migration scheduler starlingx sts-sru-needed verification-stein-needed in-stable-stein in-stable-ussuri live-migration scheduler starlingx sts-sru-needed verification-stein-needed verification-train-needed
2021-11-15 13:40:17 Chris MacNaughton cloud-archive/train: status New Fix Committed
2021-11-15 13:40:19 Chris MacNaughton cloud-archive/stein: status New Fix Committed
2021-11-22 20:39:10 Rodrigo Barbieri attachment added validation_1821755_train.txt https://bugs.launchpad.net/nova/+bug/1821755/+attachment/5542683/+files/validation_1821755_train.txt
2021-11-22 20:39:33 Rodrigo Barbieri attachment added validation_1821755_stein.txt https://bugs.launchpad.net/nova/+bug/1821755/+attachment/5542684/+files/validation_1821755_stein.txt
2021-11-22 20:44:16 Rodrigo Barbieri tags in-stable-stein in-stable-ussuri live-migration scheduler starlingx sts-sru-needed verification-stein-needed verification-train-needed in-stable-stein in-stable-ussuri live-migration scheduler starlingx sts-sru-needed verification-stein-done verification-train-done
2021-11-23 15:30:18 Corey Bryant cloud-archive/stein: status Fix Committed Fix Released
2021-11-23 15:30:22 Corey Bryant cloud-archive/train: status Fix Committed Fix Released
2021-11-24 20:50:21 Rodrigo Barbieri tags in-stable-stein in-stable-ussuri live-migration scheduler starlingx sts-sru-needed verification-stein-done verification-train-done in-stable-stein in-stable-ussuri live-migration scheduler starlingx sts sts-sru-needed verification-stein-done verification-train-done
2023-06-19 08:23:45 Christian Rohmann bug added subscriber Christian Rohmann