Activity log for bug #1519926

Date Who What changed Old value New value Message
2015-11-25 19:23:42 Stephen Ma bug added bug
2015-11-26 03:08:02 Hong Hui Xiao neutron: assignee Hong Hui Xiao (xiaohhui)
2015-11-26 14:44:50 Rossella Sblendido neutron: status New Confirmed
2015-11-26 14:44:54 Rossella Sblendido neutron: importance Undecided Medium
2015-11-26 16:57:17 Carl Baldwin neutron: importance Medium High
2015-11-27 10:29:18 Miguel Angel Ajo tags l3-ipam-dhcp
2015-12-03 18:23:58 Carl Baldwin neutron: importance High Medium
2015-12-08 06:53:08 OpenStack Infra neutron: status Confirmed In Progress
2015-12-08 20:49:47 Stephen Ma description L3-agent restart causes VM connectivity loss To test whether a the L3-agent on a network node can recover after a it was stopped and then restarted. I ran this test on a devstack setup using the latest neutron code on the master branch. The L3-agent is running in legacy mode. 1. Create a network, subnetwork. 2. Create a router, tie the router to the subnetwork and the external network. 3. Create a VM using the network and assign a floating IP to the VM. The VM can be pinged and ssh'ed using the floating IP. 4. On the controller node, kill the L3 agent. 5. Delete the qrouter namespace of the router created in (2) on the controller node. 6. Start up the L3-agent again. 7. Now the VM can no longer be ssh'ed using the FIP. The VM connectivity is lost to the VM because the L3-agent failed to reconstruct all the interfaces in the qrouter namespace. For example: Before running steps 4-6, the qrouter namespace on the controller node looks like (router-id=e86b277a-5f49-4fcb-8d85-241594db418e, VM's FIP=10.127.10.5): stack@Ubuntu-38:~/DEVSTACK/demo$ sudo ip netns exec qrouter-e86b277a-5f49-4fcb-8d85-241594db418e ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 33: qr-50b99abf-a4: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default link/ether fa:16:3e:17:3e:b0 brd ff:ff:ff:ff:ff:ff inet 10.1.2.1/24 brd 10.1.2.255 scope global qr-50b99abf-a4 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe17:3eb0/64 scope link valid_lft forever preferred_lft forever 34: qg-3d1a888a-33: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default link/ether fa:16:3e:60:9a:43 brd ff:ff:ff:ff:ff:ff inet 10.127.10.4/24 brd 10.127.10.255 scope global qg-3d1a888a-33 valid_lft forever preferred_lft forever inet 10.127.10.5/32 brd 10.127.10.5 scope global qg-3d1a888a-33 valid_lft forever preferred_lft forever inet6 2001:db8::3/64 scope global valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe60:9a43/64 scope link valid_lft forever preferred_lft forever After deleting the qrouter-e86b277a-5f49-4fcb-8d85-241594db418e namespace and then restarting the L3-agent on the controller node, the L3-agent did recreate the namespace again, however, not all the interfaces and IP addresses are created: stack@Ubuntu-38:~/DEVSTACK/demo$ sudo ip netns exec qrouter-e86b277a-5f49-4fcb-8d85-241594db418e ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever So the VM can't be ssh'ed because all the required plumbing is not re-created. When the L3 agent is running in dvr-snat mode on the controller and dvr on the compute node, if I do steps 4-6 on the compute node, the VM will no longer be ssh'ed either. The qrouter namespace doesn't have all the needed interfaces either. Also if I ran the same test using neutron based on stable/liberty or stable/kilo, after running steps 4-6, the VM can still be ssh'ed after the L3-agent restart. L3-agent restart causes VM connectivity loss To test whether a the L3-agent on a network node can recover after a it was stopped and then restarted. I ran this test on a devstack setup using the latest neutron code on the master branch. The L3-agent is running in legacy mode. 1. Create a network, subnetwork. 2. Create a router, tie the router to the subnetwork and the external network. 3. Create a VM using the network and assign a floating IP to the VM. The VM can be pinged and ssh'ed using the floating IP. 4. On the controller node, kill the L3 agent. 5. Delete the qrouter namespace of the router created in (2) on the controller node. 6. Start up the L3-agent again. 7. Now the VM can no longer be ssh'ed using the FIP. The VM connectivity is lost to the VM because the L3-agent failed to reconstruct all the interfaces in the qrouter namespace. For example: Before running steps 4-6, the qrouter namespace on the controller node looks like (router-id=e86b277a-5f49-4fcb-8d85-241594db418e, VM's FIP=10.127.10.5): stack@Ubuntu-38:~/DEVSTACK/demo$ sudo ip netns exec qrouter-e86b277a-5f49-4fcb-8d85-241594db418e ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00     inet 127.0.0.1/8 scope host lo        valid_lft forever preferred_lft forever     inet6 ::1/128 scope host        valid_lft forever preferred_lft forever 33: qr-50b99abf-a4: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default     link/ether fa:16:3e:17:3e:b0 brd ff:ff:ff:ff:ff:ff     inet 10.1.2.1/24 brd 10.1.2.255 scope global qr-50b99abf-a4        valid_lft forever preferred_lft forever     inet6 fe80::f816:3eff:fe17:3eb0/64 scope link        valid_lft forever preferred_lft forever 34: qg-3d1a888a-33: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default     link/ether fa:16:3e:60:9a:43 brd ff:ff:ff:ff:ff:ff     inet 10.127.10.4/24 brd 10.127.10.255 scope global qg-3d1a888a-33        valid_lft forever preferred_lft forever     inet 10.127.10.5/32 brd 10.127.10.5 scope global qg-3d1a888a-33        valid_lft forever preferred_lft forever     inet6 2001:db8::3/64 scope global        valid_lft forever preferred_lft forever     inet6 fe80::f816:3eff:fe60:9a43/64 scope link        valid_lft forever preferred_lft forever After deleting the qrouter-e86b277a-5f49-4fcb-8d85-241594db418e namespace and then restarting the L3-agent on the controller node, the L3-agent did recreate the namespace again, however, not all the interfaces and IP addresses are created: stack@Ubuntu-38:~/DEVSTACK/demo$ sudo ip netns exec qrouter-e86b277a-5f49-4fcb-8d85-241594db418e ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00     inet 127.0.0.1/8 scope host lo        valid_lft forever preferred_lft forever     inet6 ::1/128 scope host        valid_lft forever preferred_lft forever So the VM can't be ssh'ed because all the required plumbing is not re-created. When the L3 agent is running in dvr-snat mode on the controller and dvr on the compute node, if I do steps 4-6 on the compute node, the VM will no longer be ssh'ed either. The qrouter namespace doesn't have all the needed interfaces either.
2016-01-07 04:50:59 OpenStack Infra neutron: status In Progress Fix Released
2016-05-12 02:11:04 OpenStack Infra tags l3-ipam-dhcp in-stable-liberty l3-ipam-dhcp