L3-agent restart causes VM connectivity loss
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
Hong Hui Xiao |
Bug Description
L3-agent restart causes VM connectivity loss
To test whether a the L3-agent on a network node can recover after a it was stopped and then restarted. I ran this test on a devstack setup using the latest neutron code on the master branch. The L3-agent is running in legacy mode.
1. Create a network, subnetwork.
2. Create a router, tie the router to the subnetwork and the external network.
3. Create a VM using the network and assign a floating IP to the VM. The VM can be pinged and ssh'ed using the floating IP.
4. On the controller node, kill the L3 agent.
5. Delete the qrouter namespace of the router created in (2) on the controller node.
6. Start up the L3-agent again.
7. Now the VM can no longer be ssh'ed using the FIP.
The VM connectivity is lost to the VM because the L3-agent failed to reconstruct all the interfaces in the qrouter namespace. For example:
Before running steps 4-6, the qrouter namespace on the controller node looks like (router-
stack@Ubuntu-
1: lo: <LOOPBACK,
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
33: qr-50b99abf-a4: <BROADCAST,
link/ether fa:16:3e:17:3e:b0 brd ff:ff:ff:ff:ff:ff
inet 10.1.2.1/24 brd 10.1.2.255 scope global qr-50b99abf-a4
valid_lft forever preferred_lft forever
inet6 fe80::f816:
valid_lft forever preferred_lft forever
34: qg-3d1a888a-33: <BROADCAST,
link/ether fa:16:3e:60:9a:43 brd ff:ff:ff:ff:ff:ff
inet 10.127.10.4/24 brd 10.127.10.255 scope global qg-3d1a888a-33
valid_lft forever preferred_lft forever
inet 10.127.10.5/32 brd 10.127.10.5 scope global qg-3d1a888a-33
valid_lft forever preferred_lft forever
inet6 2001:db8::3/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::f816:
valid_lft forever preferred_lft forever
After deleting the qrouter-
stack@Ubuntu-
1: lo: <LOOPBACK,
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
So the VM can't be ssh'ed because all the required plumbing is not re-created.
When the L3 agent is running in dvr-snat mode on the controller and dvr on the compute node, if I do steps 4-6 on the compute node, the VM will no longer be ssh'ed either. The qrouter namespace doesn't have all the needed interfaces either.
Changed in neutron: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
Changed in neutron: | |
importance: | Medium → High |
tags: | added: l3-ipam-dhcp |
Hi i tried this on master I am able to ping and ssh if
- i bring down l3
- delete qrouter namespace
-restart l3-agent
I was able to ping ssh on l3-agt restart
but weird thing i am noticing is when it recovered qrouter
ip a i qrouternamespace shows http:// paste.openstack .org/show/ 480037/
before i stopped l3-agt it was http:// paste.openstack .org/show/ 480036/
Only issue here i see is it does not update new router namespace completely.
l3-agt logs
http:// paste.openstack .org/show/ 480038/