Retired GRE and VXLAN tunnels persists in neutron db
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Romil Gupta |
Bug Description
Setup is multi-node, with per-tenant routers and gre or vxlan tunneling,
ovs or ML2, both affected.
SYMPTOM:
VM's are available on the external network for about 1-2 minutes, after which point the connection times out and cannot be re-established unless traffic is generated from the VM console. VMs with dhcp interface settings will periodically and temporarily come back on line after requesting new leases.
When I attempt to ping from the external network, I can trace the traffic all the way to the tap interface on the compute node, where the VM responds to the arp request sent by the tenant router (which is on the separate network node). However, this arp reply never makes it back to the tenant router. It seems to die at the GRE terminus at bridge br-tun.
CAUSE:
* I have a three nics on my network node. The VM traffic goes out the 1st nic on 192.168.239.99/24 to the other compute nodes, while management traffic goes out the 2nd nic on 192.168.241.99. The 3rd nic is external and has no IP.
* I have four GRE endpoints on the VM network, one at the network node (192.168.239.99) and three on compute nodes (192.168.
* I have a fifth GRE endpoint with id 1 to 192.168.241.99, the network node's management interface, on each of the compute nodes. This was the first tunnel created when I deployed the network node because that is how I set the remote_ip in the ovs plugin ini. I corrected the setting later, but the 192.168.241.99 endpoint persists:
mysql> select * from ovs_tunnel_
+------
| ip_address | id |
+------
| 192.168.239.110 | 3 |
| 192.168.239.114 | 4 |
| 192.168.239.115 | 5 |
| 192.168.239.99 | 2 |
| 192.168.241.99 | 1 | <======== HERE
+------
5 rows in set (0.00 sec)
* Thus, after plugin restarts or reboots, this endpoint is re-created every time.
* The effect is that traffic from the VM has two possible flows from which to make a routing/switching decision. I was unable to determine how this decision is made, but obviously this is not a working configuration. Traffic the originates from the VM always seems to use the correct flow initially, but traffic which originates from the network node is never returned via the right flow unless the connection has been active within the previous 1-2 minutes. In both cases, successful connections timeout after 1-2 minutes of inactivity.
SOLUTION:
mysql> delete from ovs_tunnel_
Query OK, 1 row affected (0.00 sec)
mysql> select * from ovs_tunnel_
+------
| ip_address | id |
+------
| 192.168.239.110 | 3 |
| 192.168.239.114 | 4 |
| 192.168.239.115 | 5 |
| 192.168.239.99 | 2 |
+------
4 rows in set (0.00 sec)
* After doing that, I simply restarted the quantum ovs agents on the network and compute nodes. The old GRE tunnel is not re-created. Thereafter, VM network traffic to and from the external network proceeds without incident.
* Should these tables be cleaned up as well, I wonder:
mysql> select * from ovs_network_
+------
| network_id | network_type | physical_network | segmentation_id |
+------
| 4e8aacca-
| af224f3f-
+------
2 rows in set (0.00 sec)
mysql> select * from ovs_tunnel_
+------
| tunnel_id | allocated |
+------
| 1 | 1 |
| 2 | 1 |
+------
2 rows in set (0.00 sec)
summary: |
- Retired GRE tunnels spersists in quantum db + Retired GRE tunnels persists in quantum db |
Changed in quantum: | |
assignee: | nobody → Jiajun Liu (ljjjustin) |
tags: | added: ovs |
Changed in neutron: | |
assignee: | Jiajun Liu (ljjjustin) → nobody |
Changed in neutron: | |
assignee: | Kyle Mestery (mestery) → Pengfei Zhang (eaglezpf) |
Changed in neutron: | |
importance: | Undecided → Medium |
Changed in neutron: | |
milestone: | none → juno-2 |
Changed in neutron: | |
milestone: | juno-2 → juno-3 |
Changed in neutron: | |
assignee: | Pengfei Zhang (eaglezpf) → Romil Gupta (romilg) |
Changed in neutron: | |
milestone: | juno-3 → juno-rc1 |
Changed in neutron: | |
milestone: | juno-rc1 → kilo-1 |
importance: | Medium → High |
milestone: | kilo-1 → juno-rc1 |
Changed in neutron: | |
milestone: | juno-rc1 → none |
Changed in neutron: | |
milestone: | none → kilo-1 |
tags: |
added: ml2 removed: ovs |
tags: | added: ovs |
Changed in neutron: | |
milestone: | kilo-1 → kilo-2 |
Changed in neutron: | |
milestone: | kilo-2 → kilo-3 |
Changed in neutron: | |
status: | Fix Committed → Fix Released |
Changed in neutron: | |
milestone: | kilo-3 → 2015.1.0 |
in my case is like that :
at the moment I have
openvswitch-switch | 1.4.0-1ubuntu1.5 | http:// gb.archive. ubuntu. com/ubuntu/ precise- updates/ universe amd64 Packages gb.archive. ubuntu. com/ubuntu/ precise/universe amd64 Packages gb.archive. ubuntu. com/ubuntu/ precise/universe Sources gb.archive. ubuntu. com/ubuntu/ precise- updates/ universe Sources
openvswitch-switch | 1.4.0-1ubuntu1 | http://
openvswitch | 1.4.0-1ubuntu1 | http://
openvswitch | 1.4.0-1ubuntu1.5 | http://
I have 2 compute / 1 net / 1 cont:
I can ping vm each other, I can ssh from outside , inside the vm I can ping google and bbc but I can't do apt-get update from example a vm with ubuntu cloud image or not able to surf from a vm instace with ubuntu desktop .
so I think is a problem with DNs .
grep dns /var/log/syslog dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.3 fa:16:3e:32:fa:5a 50-50-1-3 dhcp[19298] : DHCPREQUEST( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf 50-50-1-4 dhcp[19298] : DHCPREQUEST( tap689b75b7- f5) 50.50.1.3 fa:16:3e:32:fa:5a dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.3 fa:16:3e:32:fa:5a 50-50-1-3 dhcp[19298] : DHCPREQUEST( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf 50-50-1-4 dhcp[19298] : DHCPREQUEST( tap689b75b7- f5) 50.50.1.3 fa:16:3e:32:fa:5a dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.3 fa:16:3e:32:fa:5a 50-50-1-3 dhcp[19298] : DHCPREQUEST( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf 50-50-1-4 dhcp[19298] : DHCPREQUEST( tap689b75b7- f5) 50.50.1.3 fa:16:3e:32:fa:5a dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.3 fa:16:3e:32:fa:5a 50-50-1-3 dhcp[19298] : DHCPREQUEST( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf 50-50-1-4 dhcp[19298] : DHCPREQUEST( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf 50-50-1-4 dhcp[19298] : DHCPREQUEST( tap689b75b7- f5) 50.50.1.3 fa:16:3e:32:fa:5a dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.3 fa:16:3e:32:fa:5a 50-50-1-3 dhcp[19298] : DHCPREQUEST( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.4 fa:16:3e:cb:f7:cf 50-50-1-4 dhcp[19298] : DHCPREQUEST( tap689b75b7- f5) 50.50.1.3 fa:16:3e:32:fa:5a dhcp[19298] : DHCPACK( tap689b75b7- f5) 50.50.1.3 fa:16:3e:32:fa:5a 50-50-1-3
loki is my net node !
from net node :
infinity Sep 24 14:32:03 loki dnsmasq-
Sep 24 14:32:43 loki dnsmasq-
Sep 24 14:32:43 loki dnsmasq-
Sep 24 14:33:03 loki dnsmasq-
Sep 24 14:33:03 loki dnsmasq-
Sep 24 14:33:37 loki dnsmasq-
Sep 24 14:33:37 loki dnsmasq-
Sep 24 14:34:03 loki dnsmasq-
Sep 24 14:34:03 loki dnsmasq-
Sep 24 14:34:21 loki dnsmasq-
Sep 24 14:34:21 loki dnsmasq-
Sep 24 14:35:03 loki dnsmasq-
Sep 24 14:35:03 loki dnsmasq-
Sep 24 14:35:05 loki dnsmasq-
Sep 24 14:35:05 loki dnsmasq-
Sep 24 14:35:51 loki dnsmasq-
Sep 24 14:35:51 loki dnsmasq-
Sep 24 14:36:03 loki dnsmasq-
Sep 24 14:36:03 loki dnsmasq-
Sep 24 14:36:39 loki dnsmasq-
Sep 24 14:36:39 loki dnsmasq-
Sep 24 14:37:03 loki dnsmasq-
Sep 24 14:37:03 loki dnsmasq-
Sep 24 14:37:29 loki dnsmasq-dhcp[192...