Brief Description
-----------------
Ping from Natbox to instance failures in live migration tests
Severity
--------
Major
Steps to Reproduce
------------------
The testcase performs the following operations
a. created a bootable bolume
--bootable --image=fd57e28e-6ea4-4380-ba0d-32d0f787c152 --size=2 vol-tenant1-volume-stress-1-1
b. boot a single instance from the volume with flavor (hw:mem_page_size=large)
--flavor=b8f20ebe-4c60-4d9d-81e7-f30332e01049 --boot-volume=753449a4-494b-42f5-aa19-768f8fc0d60e --key-name=keypair-tenant1 --poll --nic net-id=dcfc9285-916f-41b5-b5d4-cc9dcdc1bd99 --nic net-id=30fed34d-e58d-47b7-902f-7566ccc264a9 tenant1-volume-stress-1
c. live migrates the instance many times.
The instance live migrates between compute-0, compute-1 and compute-2 in this test
The last live migration at ~ [2019-08-22 19:30:20,631] for this instance was from compute-2 to compute-1
698586e7-0ff0-4f52-a9f5-76ff1693539c
(Note: compute-0 was rebooted ~ this time.)
d. After the live migration to compute-1, ping failures were observed (ie. can not ping from Natbox to vm)
Expected Behavior
------------------
Did not expect ping failure from natbox to vm
(& did not expect lock/unlock of compute-0 to cause CPU usage on compute-2 to reach critical level)
Actual Behavior
----------------
Ping failures
Reproducibility
---------------
reproduceable
System Configuration
--------------------
Lab: IP_20-27
Load: 2019-08-20_20-59-00
testcases/stress/nova/test_migrate.py::test_migrate_stress[volume-1000]
Last Pass
---------
Timestamp/Logs
--------------
-----------------[2019-08-22 19:33:41]-----------------
==================Ping from NatBox to vm 698586e7-0ff0-4f52-a9f5-76ff1693539c failed for 200 seconds.===============
-----------------[2019-08-22 19:33:44]-----------------
#### Console log for vm 698586e7-0ff0-4f52-a9f5-76ff1693539c ####
[ 4248.518530] chronyd[520]: Can't synchronise: no selectable sources
-----------------[2019-08-22 19:33:59]-----------------
#### Ping router interfaces succeeded ####
{'192.168.104.1': 0, '192.168.104.65': 0, '192.168.104.33': 0}
Nova-compute.log compute-2
{"log":"2019-08-22 19:30:18.743 861834 INFO nova.compute.manager [-] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] Migrating instance to compute-1 finished successfully.\n","stream":"stdout","time":"2019-08-22T19:30:18.743651362Z"}
{"log":"2019-08-22 19:30:18.743 861834 INFO nova.compute.manager [-] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] Migrating instance to compute-1 finished successfully.\n","stream":"stdout","time":"2019-08-22T19:30:18.743877833Z"}
Nova-compute.log compute-1
{"log":"2019-08-22 19:30:09.452 852129 INFO nova.compute.manager [req-3675697d-dfd7-497a-8526-c3037f843654 - - - - -] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] During the sync_power process the instance has moved from host compute-2 to host compute-1\n","stream":"stdout","time":"2019-08-22T19:30:09.453965979Z"}
{"log":"2019-08-22 19:30:09.452 852129 INFO nova.compute.manager [req-3675697d-dfd7-497a-8526-c3037f843654 - - - - -] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] During the sync_power process the instance has moved from host compute-2 to host compute-1\n","stream":"stdout","time":"2019-08-22T19:30:09.453994652Z"}
{"log":"2019-08-22 19:30:16.162 852129 INFO nova.compute.manager [req-85e0e1b9-d6e4-482f-a401-24256e2113fc 59cf4be13765408298e8805246364b99 5e79f82bd4244daca343a86d2dbe8408 - default default] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] Post operation of migration started\n","stream":"stdout","time":"2019-08-22T19:30:16.163123222Z"}
{"log":"2019-08-22 19:30:16.162 852129 INFO nova.compute.manager [req-85e0e1b9-d6e4-482f-a401-24256e2113fc 59cf4be13765408298e8805246364b99 5e79f82bd4244daca343a86d2dbe8408 - default default] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] Post operation of migration started\n","stream":"stdout","time":"2019-08-22T19:30:16.163768603Z"}
{"log":"2019-08-22 19:30:20.732 852129 WARNING nova.compute.manager [req-ef961e8d-8e94-498e-851e-5aff2a24085f e4e954886a0a46b6a688f091633d08d1 305a436d092647d8815325402f17d3a8 - 4a3ccc82d5b046c4b9d903daa39752db 4a3ccc82d5b046c4b9d903daa39752db] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] Received unexpected event network-vif-plugged-17154804-f1c6-45d9-9773-835fdd85baf8 for instance with vm_state active and task_state None.\n","stream":"stdout","time":"2019-08-22T19:30:20.73334443Z"}
{"log":"2019-08-22 19:30:20.732 852129 WARNING nova.compute.manager [req-ef961e8d-8e94-498e-851e-5aff2a24085f e4e954886a0a46b6a688f091633d08d1 305a436d092647d8815325402f17d3a8 - 4a3ccc82d5b046c4b9d903daa39752db 4a3ccc82d5b046c4b9d903daa39752db] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] Received unexpected event network-vif-plugged-17154804-f1c6-45d9-9773-835fdd85baf8 for instance with vm_state active and task_state None.\n","stream":"stdout","time":"2019-08-22T19:30:20.733373191Z"}
{"log":"2019-08-22 19:30:22.809 852129 WARNING nova.compute.manager [req-2eefca8b-6913-4a6b-8b82-42378126fb94 e4e954886a0a46b6a688f091633d08d1 305a436d092647d8815325402f17d3a8 - 4a3ccc82d5b046c4b9d903daa39752db 4a3ccc82d5b046c4b9d903daa39752db] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] Received unexpected event network-vif-plugged-f2cc70c4-2c6c-403e-9c6f-ad288c3d7128 for instance with vm_state active and task_state None.\n","stream":"stdout","time":"2019-08-22T19:30:22.81447433Z"}
{"log":"2019-08-22 19:30:22.809 852129 WARNING nova.compute.manager [req-2eefca8b-6913-4a6b-8b82-42378126fb94 e4e954886a0a46b6a688f091633d08d1 305a436d092647d8815325402f17d3a8 - 4a3ccc82d5b046c4b9d903daa39752db 4a3ccc82d5b046c4b9d903daa39752db] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] Received unexpected event network-vif-plugged-f2cc70c4-2c6c-403e-9c6f-ad288c3d7128 for instance with vm_state active and task_state None.\n","stream":"stdout","time":"2019-08-22T19:30:22.814500403Z"}
{"log":"2019-08-22 19:33:43.812 852129 INFO nova.compute.manager [req-43d7e446-9cce-45c3-ab61-91a29a06aea2 59cf4be13765408298e8805246364b99 5e79f82bd4244daca343a86d2dbe8408 - default default] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] Get console output\n","stream":"stdout","time":"2019-08-22T19:33:43.844570027Z"}
{"log":"2019-08-22 19:33:43.812 852129 INFO nova.compute.manager [req-43d7e446-9cce-45c3-ab61-91a29a06aea2 59cf4be13765408298e8805246364b99 5e79f82bd4244daca343a86d2dbe8408 - default default] [instance: 698586e7-0ff0-4f52-a9f5-76ff1693539c] Get console output\n","stream":"stdout","time":"2019-08-22T19:33:43.844602Z"}
{"log":"2019-08-22 19:37:25.632 852129 WARNING nova.virt.libvirt.imagecache [req-2322e957-1704-437c-bab2-45b9ac21426b - - - - -] Unknown base file: /var/lib/nova/instances/_base/10bae7f8102b3e457aba60d89da898ea5b019fd6\n","stream":"stdout","time":"2019-08-22T19:37:25.647764906Z"}
{"log":"2019-08-22 19:37:25.632 852129 WARNING nova.virt.libvirt.imagecache [req-2322e957-1704-437c-bab2-45b9ac21426b - - - - -] Unknown base file: /var/lib/nova/instances/_base/10bae7f8102b3e457aba60d89da898ea5b019fd6\n","stream":"stdout","time":"2019-08-22T19:37:25.647800983Z"}
{"log":"2019-08-22 19:37:25.632 852129 INFO nova.virt.libvirt.imagecache [req-2322e957-1704-437c-bab2-45b9ac21426b - - - - -] Removable base files: /var/lib/nova/instances/_base/10bae7f8102b3e457aba60d89da898ea5b019fd6\n","stream":"stdout","time":"2019-08-22T19:37:25.647818282Z"}
{"log":"2019-08-22 19:37:25.632 852129 INFO nova.virt.libvirt.imagecache [req-2322e957-1704-437c-bab2-45b9ac21426b - - - - -] Removable base files: /var/lib/nova/instances/_base/10bae7f8102b3e457aba60d89da898ea5b019fd6\n","stream":"stdout","time":"2019-08-22T19:37:25.647825746Z"}
{"log":"2019-08-22 19:37:25.634 852129 INFO nova.virt.libvirt.imagecache [req-2322e957-1704-437c-bab2-45b9ac21426b - - - - -] Removing base or swap file: /var/lib/nova/instances/_base/10bae7f8102b3e457aba60d89da898ea5b019fd6\n","stream":"stdout","time":"2019-08-22T19:37:25.647843155Z"}
{"log":"2019-08-22 19:37:25.634 852129 INFO nova.virt.libvirt.imagecache [req-2322e957-1704-437c-bab2-45b9ac21426b - - - - -] Removing base or swap file: /var/lib/nova/instances/_base/10bae7f8102b3e457aba60d89da898ea5b019fd6\n","stream":"stdout","time":"2019-08-22T19:37:25.647849756Z"}
2019-08-22T19:46:00 clear 100.101 Platform CPU threshold exceeded ; threshold 95.00%, actual 99.32% host=compute-2 critical
2019-08-22T19:44:00 set 100.101 Platform CPU threshold exceeded ; threshold 95.00%, actual 99.32% host=compute-2 critical
2019-08-22T19:41:17 clear 100.101 Platform CPU threshold exceeded ; threshold 95.00%, actual 99.93% host=compute-0 critical
2019-08-22T19:40:17 log 275.001 Host compute-0 hypervisor is now unlocked-enabled host=compute-0.hypervisor=5743f8ea-7e02-4c9d-8ddd-d66bfa4ab79a critical
2019-08-22T19:39:40 clear 200.006 compute-0 is degraded due to the failure of its 'pci-irq-affinity-agent' process. Auto recovery of this major process is in progress. host=compute-0.process=pci-irq-affinity-agent major
2019-08-22T19:39:17 set 100.101 Platform CPU threshold exceeded ; threshold 95.00%, actual 99.93% host=compute-0 critical
2019-08-22T19:38:04 log 275.001 Host compute-0 hypervisor is now unlocked-disabled host=compute-0.hypervisor=5743f8ea-7e02-4c9d-8ddd-d66bfa4ab79a critical
2019-08-22T19:37:50 clear 200.004 compute-0 experienced a service-affecting failure. Auto-recovery in progress. Manual Lock and Unlock may be required if auto-recovery is unsuccessful. host=compute-0 critical
2019-08-22T19:37:39 log 200.022 compute-0 is now 'enabled' host=compute-0.state=enabled not-applicable
2019-08-22T19:37:37 set 200.006 compute-0 is degraded due to the failure of its 'pci-irq-affinity-agent' process. Auto recovery of this major process is in progress. host=compute-0.process=pci-irq-affinity-agent major
2019-08-22T19:37:28 clear 200.009 compute-0 experienced a persistent critical 'Cluster-host Network' communication failure. host=compute-0.network=Cluster-host critical
2019-08-22T19:37:28 clear 200.005 compute-0 experienced a persistent critical 'Management Network' communication failure. host=compute-0.network=Management critical
2019-08-22T19:30:17 clear 700.008 Instance tenant1-volume-stress-1 owned by tenant1 is live migrating from host compute-2 tenant=71e4950f-b29c-40d8-ac13-742b4990dfad.instance=698586e7-0ff0-4f52-a9f5-76ff1693539c warning
2019-08-22T19:30:17 log 700.156 Live-Migrate complete for instance tenant1-volume-stress-1 now enabled on host compute-1 tenant=71e4950f-b29c-40d8-ac13-742b4990dfad.instance=698586e7-0ff0-4f52-a9f5-76ff1693539c critical
2019-08-22T19:30:16 log 275.001 Host compute-0 hypervisor is now locked-disabled host=compute-0.hypervisor=5743f8ea-7e02-4c9d-8ddd-d66bfa4ab79a critical
2019-08-22T19:30:16 set 200.004 compute-0 experienced a service-affecting failure. Auto-recovery in progress. Manual Lock and Unlock may be required if auto-recovery is unsuccessful. host=compute-0 critical
2019-08-22T19:30:16 log 200.022 compute-0 is 'disabled-failed' to the system host=compute-0.status=failed not-applicable
2019-08-22T19:30:16 log 200.022 compute-0 is now 'offline' host=compute-0.status=offline not-applicable
Ironpass 20-27
Ping from nat to vm unsuccessful from [2019-08-22 19:28:19,881]
Test Activity
-------------
Regression
*wording correction
Expected Behavior
------------------
Did not expect ping failure from natbox to vm
(& did not expect reboot of compute-0 to cause CPU usage on compute-2 to reach critical level)