Reproduced the issue once again - see info below and logs (attaching now) ### StarlingX BUILD_ID="20190912T013000Z" ww: Ironpass 20-27 have reproduced the issue with the same reproduction scenario (Ie. live migrating instance and I rebooted compute-0 during the test) I will update the LP with this info and the new logs Instance instance-0000000b (tenant1-volume-stress-1) ping from NATbox to the vm failed. I am able to login to the instance using the ‘sudo virsh console instance-0000000b’ and the instance appears to have an IP, however, I am unable to ping the internal interface to the tenant1-router (192.168.104.1) from the VM. compute-1:~$ sudo virsh list Password: Id Name State ----------------------------------- 6 instance-0000000b running 7 instance-0000000e running tenant1-mgmt-net subnets as follows: • tenant1-mgmt0-subnet0 192.168.104.0/27 • tenant1-mgmt0-subnet1 192.168.104.32/27 • tenant1-mgmt0-subnet2 192.168.104.64/27 (29080474-5be2) (29080474-5be2) 192.168.104.1 fa:16:3e:f6:83:91 network:router_interface Active UP (49ce62a2-0270) 192.168.104.65 fa:16:3e:cc:9c:d1 network:router_interface Active UP (4a3a2700-bd7e) 192.168.104.33 fa:16:3e:d9:5f:22 network:router_interface Active UP ID 29080474-5be2-4885-aba3-d278d7a5c23c Network Name tenant1-mgmt-net Network ID 113023fe-ca52-4a01-9db9-5534324f9fcc Project ID c824ca71b6fc441bbe6df26773223155 MAC Address fa:16:3e:f6:83:91 Status Active Admin State UP Port Security Enabled False DNS Name None DNS Assignment None Fixed IPs IP Address 192.168.104.1 Subnet ID 25d506d6-dc76-426f-a37d-38d5022319ef Attached Device Device Owner network:router_interface Device ID 558ebc16-10f5-488f-9f78-bf4bf11f9733 Security Groups tenant1-volume-stress-1 login: root [ 7158.596103] login[11064]: pam_unix(login:session): session opened for user root by LOGIN(uid=0) [ 7158.602495] systemd-logind[529]: New session 4 of user root. [ 7158.604156] login[11064]: DIALUP AT ttyS0 BY root [ 7158.605429] systemd[1]: Started Session 4 of user root. [ 7158.607371] login[11064]: ROOT LOGIN ON ttyS0 [ 7158.609313] systemd[1]: Starting Session 4 of user root. tenant1-volume-stress-1:~# ifconfig [ 7167.675942] -bash[11086]: HISTORY: PID=11086 UID=0 ifconfig eth0: flags=4163 mtu 1500 inet 192.168.104.7 netmask 255.255.255.224 broadcast 192.168.104.31 inet6 fe80::f816:3eff:fe93:728c prefixlen 64 scopeid 0x20 ether fa:16:3e:93:72:8c txqueuelen 1000 (Ethernet) RX packets 1068 bytes 102974 (100.5 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 700 bytes 50286 (49.1 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163 mtu 1500 inet 172.16.3.246 netmask 255.255.255.0 broadcast 172.16.3.255 inet6 fe80::f816:3eff:fe44:8ea9 prefixlen 64 scopeid 0x20 ether fa:16:3e:44:8e:a9 txqueuelen 1000 (Ethernet) RX packets 301 bytes 22348 (21.8 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 151 bytes 10290 (10.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 130 bytes 13192 (12.8 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 130 bytes 13192 (12.8 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 tenant1-volume-stress-1:~# ping 192.168.104.1 [10465.281139] -bash[11133]: HISTORY: PID=11133 UID=0 ping 192.168.104.1 PING 192.168.104.1 (192.168.104.1) 56(84) bytes of data. --- 192.168.104.1 ping statistics --- 9 packets transmitted, 0 received, 100% packet loss, time 7999ms Note: Another instance ten1test2 (instance-0000000e) launched on the same host after the failure above can ping to ip of tenant1-volume-stress-1 without issue AND ping all the way to the NATbox is not an issue for this instance. ten1test2:~# ifconfig [ 3672.645559] -bash[1714]: HISTORY: PID=1714 UID=0 ifconfig eth0: flags=4163 mtu 1500 inet 192.168.104.30 netmask 255.255.255.224 broadcast 192.168.104.31 inet6 fe80::f816:3eff:fef3:79bc prefixlen 64 scopeid 0x20 ether fa:16:3e:f3:79:bc txqueuelen 1000 (Ethernet) RX packets 945 bytes 76980 (75.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 863 bytes 82660 (80.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163 mtu 1500 inet 172.16.3.131 netmask 255.255.255.0 broadcast 172.16.3.255 inet6 fe80::f816:3eff:fe2a:493 prefixlen 64 scopeid 0x20 ether fa:16:3e:2a:04:93 txqueuelen 1000 (Ethernet) RX packets 10 bytes 1292 (1.2 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 12 bytes 1388 (1.3 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 8 bytes 612 (612.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes 612 (612.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ten1test2:~# ping 192.168.104.1 [ 3979.065866] -bash[1714]: HISTORY: PID=1714 UID=0 ping 192.168.104.1 PING 192.168.104.1 (192.168.104.1) 56(84) bytes of data. 64 bytes from 192.168.104.1: icmp_seq=1 ttl=64 time=0.793 ms 64 bytes from 192.168.104.1: icmp_seq=2 ttl=64 time=0.409 ms 64 bytes from 192.168.104.1: icmp_seq=3 ttl=64 time=0.192 ms 64 bytes from 192.168.104.1: icmp_seq=4 ttl=64 time=0.274 ms --- 192.168.104.1 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3005ms rtt min/avg/max/mdev = 0.192/0.417/0.793/0.230 ms The instance tenant1-volume-stress-1 last migrated from compute-3 to compute-1 2019-09-16T18:54:19 clear 100.101 Platform CPU threshold exceeded ; threshold 95.00%, actual 100.00% host=compute-0 critical 2019-09-16T18:52:47 log 275.001 Host compute-0 hypervisor is now unlocked-enabled host=compute-0.hypervisor=99c226e3-11d4-4c68-a5d8-c959617e89e8 critical 2019-09-16T18:52:30 clear 200.006 compute-0 is degraded due to the failure of its 'pci-irq-affinity-agent' process. Auto recovery of this major process is in progress. host=compute-0.process=pci-irq-affinity-agent major 2019-09-16T18:52:19 set 100.101 Platform CPU threshold exceeded ; threshold 95.00%, actual 100.00% host=compute-0 critical 2019-09-16T18:51:40 log 275.001 Host compute-0 hypervisor is now unlocked-disabled host=compute-0.hypervisor=99c226e3-11d4-4c68-a5d8-c959617e89e8 critical 2019-09-16T18:51:00 clear 200.004 compute-0 experienced a service-affecting failure. Auto-recovery in progress. Manual Lock and Unlock may be required if auto-recovery is unsuccessful. host=compute-0 critical 2019-09-16T18:50:49 log 200.022 compute-0 is now 'enabled' host=compute-0.state=enabled not-applicable 2019-09-16T18:50:47 set 200.006 compute-0 is degraded due to the failure of its 'pci-irq-affinity-agent' process. Auto recovery of this major process is in progress. host=compute-0.process=pci-irq-affinity-agent major 2019-09-16T18:50:38 clear 200.009 compute-0 experienced a persistent critical 'Cluster-host Network' communication failure. host=compute-0.network=Cluster-host critical 2019-09-16T18:50:38 clear 200.005 compute-0 experienced a persistent critical 'Management Network' communication failure. host=compute-0.network=Management critical 2019-09-16T18:46:18 clear 100.101 Platform CPU threshold exceeded ; threshold 95.00%, actual 99.49% host=compute-2 critical 2019-09-16T18:45:58 clear 700.008 Instance tenant1-volume-stress-1 owned by tenant1 is live migrating from host compute-3 tenant=c824ca71-b6fc-441b-be6d-f26773223155.instance=f9b18cc7-4002-499b-83a3-fd82b77cb68a warning 2019-09-16T18:45:58 log 700.156 Live-Migrate complete for instance tenant1-volume-stress-1 now enabled on host compute-1 tenant=c824ca71-b6fc-441b-be6d-f26773223155.instance=f9b18cc7-4002-499b-83a3-fd82b77cb68a critical 2019-09-16T18:45:33 log 700.152 Live-Migrate inprogress for instance tenant1-volume-stress-1 from host compute-3 tenant=c824ca71-b6fc-441b-be6d-f26773223155.instance=f9b18cc7-4002-499b-83a3-fd82b77cb68a critical 2019-09-16T18:45:33 set 700.008 Instance tenant1-volume-stress-1 owned by tenant1 is live migrating from host compute-3 tenant=c824ca71-b6fc-441b-be6d-f26773223155.instance=f9b18cc7-4002-499b-83a3-fd82b77cb68a warning router is on compute-3 (after the failure) [sysadmin@controller-0 ~(keystone_admin)]$ neutron l3-agent-list-hosting-router 558ebc16-10f5-488f-9f78-bf4bf11f9733 neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. +--------------------------------------+-----------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+-----------+----------------+-------+----------+ | e625752d-14b3-459a-ac6d-f9dd9aeedcdd | compute-3 | True | :-) | | +--------------------------------------+-----------+----------------+-------+----------+ compute-1:~$ sudo ovs-vsctl show Password: 9289f374-6f7c-4fd6-bd15-a9de5fa4b593 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port br-int Interface br-int type: internal Port "tape8d90651-c9" tag: 4 Interface "tape8d90651-c9" type: internal Port "tapb4fa67c6-1c" tag: 7 Interface "tapb4fa67c6-1c" type: internal Port "qg-b7ef5e47-17" tag: 2 Interface "qg-b7ef5e47-17" type: internal Port "tapa38d1594-2d" tag: 5 Interface "tapa38d1594-2d" Port "tap8fb372d3-6a" tag: 5 Interface "tap8fb372d3-6a" Port "int-br-phy0" Interface "int-br-phy0" type: patch options: {peer="phy-br-phy0"} Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port "tap98ab5184-3f" tag: 3 Interface "tap98ab5184-3f" type: internal Port "qr-831217e4-f0" tag: 3 Interface "qr-831217e4-f0" type: internal Port "tap64b21573-76" tag: 1 Interface "tap64b21573-76" type: internal Port "tap6f67cc35-53" tag: 5 Interface "tap6f67cc35-53" type: internal Port "qr-037f9921-d0" tag: 3 Interface "qr-037f9921-d0" type: internal Port "tap70447bb4-83" tag: 6 Interface "tap70447bb4-83" type: internal Port "tap4272218d-d4" tag: 13 Interface "tap4272218d-d4" Port "qr-a758d463-cb" tag: 3 Interface "qr-a758d463-cb" type: internal Port "tap7bab5c4a-d2" tag: 13 Interface "tap7bab5c4a-d2" Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port br-tun Interface br-tun type: internal Bridge "br-phy0" Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "br-phy0" Interface "br-phy0" type: internal Port "ens2f0" Interface "ens2f0" Port "phy-br-phy0" Interface "phy-br-phy0" type: patch options: {peer="int-br-phy0"} compute-3:~$ sudo ovs-vsctl show Password: 6f198630-5687-4a6c-a00a-e063ead2e5f7 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge "br-phy0" Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "br-phy0" Interface "br-phy0" type: internal Port "ens2f0" Interface "ens2f0" Port "phy-br-phy0" Interface "phy-br-phy0" type: patch options: {peer="int-br-phy0"} Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "qr-4a3a2700-bd" tag: 4 Interface "qr-4a3a2700-bd" type: internal Port "tapa254e929-c3" tag: 6 Interface "tapa254e929-c3" type: internal Port "tapa49cab58-58" tag: 2 Interface "tapa49cab58-58" type: internal Port "tap07e5581c-82" tag: 11 Interface "tap07e5581c-82" type: internal Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port "tap65db7c9d-45" tag: 1 Interface "tap65db7c9d-45" type: internal Port br-int Interface br-int type: internal Port "tap7f3ea8f1-e8" tag: 10 Interface "tap7f3ea8f1-e8" type: internal Port "qr-29080474-5b" tag: 4 Interface "qr-29080474-5b" type: internal Port "qg-5eb00be5-9f" tag: 3 Interface "qg-5eb00be5-9f" type: internal Port "tapb0bdd2e7-29" tag: 5 Interface "tapb0bdd2e7-29" type: internal Port "qr-49ce62a2-02" tag: 4 Interface "qr-49ce62a2-02" type: internal Port "int-br-phy0" Interface "int-br-phy0" type: patch options: {peer="phy-br-phy0"} Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port br-tun Interface br-tun type: internal