restart of openvswitch-switch causes instance network down when l2population enabled
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Invalid
|
Medium
|
James Page | ||
Icehouse |
Fix Released
|
Undecided
|
Unassigned | ||
Juno |
Won't Fix
|
Medium
|
Unassigned | ||
Kilo |
Fix Released
|
Medium
|
James Page | ||
neutron |
Fix Released
|
Undecided
|
James Page | ||
Kilo |
New
|
Undecided
|
Unassigned | ||
neutron (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Trusty |
Fix Released
|
High
|
James Page | ||
Wily |
Fix Released
|
High
|
James Page | ||
Xenial |
Fix Released
|
High
|
Unassigned |
Bug Description
[Impact]
Restarts of openvswitch (typically on upgrade) result in loss of tunnel connectivity when the l2population driver is in use. This results in loss of access to all instances on the effected compute hosts
[Test Case]
Deploy cloud with ml2/ovs/
boot instances
restart ovs; instance connectivity will be lost until the neutron-
[Regression Potential]
Minimal - in multiple stable branches upstream.
[Original Bug Report]
On 2015-05-28, our Landscape auto-upgraded packages on two of our
OpenStack clouds. On both clouds, but only on some compute nodes, the
upgrade of openvswitch-switch and corresponding downtime of
ovs-vswitchd appears to have triggered some sort of race condition
within neutron-
any new instances come up with non-functional network but pre-existing
instances appear unaffected. Restarting n-p-ovs-agent on the affected
compute nodes is sufficient to work around the problem.
The packages Landscape upgraded (from /var/log/
Start-Date: 2015-05-28 14:23:07
Upgrade: nova-compute-
End-Date: 2015-05-28 14:24:47
From /var/log/
2015-05-28 14:24:18.336 47866 ERROR neutron.
Looking at a stuck instances, all the right tunnels and bridges and
what not appear to be there:
root@vector:~# ip l l | grep c-3b
460002: qbr7ed8b59c-3b: <BROADCAST,
460003: qvo7ed8b59c-3b: <BROADCAST,
460004: qvb7ed8b59c-3b: <BROADCAST,
460005: tap7ed8b59c-3b: <BROADCAST,
root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
qvo7ed8b59c-3b
root@vector:~#
But I can't ping the unit from within the qrouter-${id} namespace on
the neutron gateway. If I tcpdump the {q,t}*c-3b interfaces, I don't
see any traffic.
Changed in neutron (Ubuntu): | |
status: | Confirmed → Triaged |
importance: | Undecided → High |
tags: | added: sts |
tags: | added: l2-pop |
Changed in neutron (Ubuntu Xenial): | |
status: | Triaged → Fix Released |
Changed in neutron (Ubuntu Wily): | |
importance: | Undecided → High |
status: | New → In Progress |
assignee: | nobody → James Page (james-page) |
description: | updated |
Changed in neutron (Ubuntu Trusty): | |
status: | New → In Progress |
assignee: | nobody → James Page (james-page) |
importance: | Undecided → High |
Changed in cloud-archive: | |
status: | In Progress → Invalid |
tags: | removed: kilo-backport-potential liberty-backport-potential |
I should have said; both clouds are Ubuntu 14.04 running OpenStack Icehouse. I've put all the relevant logs I could think of/find up at:
https:/ /chinstrap. canonical. com/~james/ nx/vector- logs.tar. xz
(It's only accessible by Canonical people, sorry.)