Change to linuxbridge agent defaults affects upgrades

Bug #1563448 reported by James Denton on 2016-03-29
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

Upgrading from r11.1.3 to 12.0.8 using the documentation found at:

Changes to the LinuxBridge agent defaults negatively impacts upgrades if the config options are not called out in the user_variables file.

In this case, the following was in place prior to the upgrade:

l2_population = True

After the upgrade, the following was in place:

l2_population = False

This caused any new virtual machine on a vxlan network to not be accessible after the upgrade. The jinja template appears to make that setting configurable (Yay!) but the default is set to False here without regards to the original setting (boo!):

Updating /etc/openstack_deploy/user_variables.yml file, rerunning the os-neutron-install playbook and restarting the LinuxBridge agent appears to have resolved the issue.

Changed in openstack-ansible:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → admin0 (shashi-eu)
milestone: none → 13.0.1
milestone: 13.0.1 → 12.0.10

Please see the entire history of review comments and changes in - enabling l2pop is specifically advised against by the Neutron team.

Change abandoned by Sashi Dahal (<email address hidden>) on branch: liberty
Reason: l2_pop is not set to be enabled by default .. abandoning this approach

Nolan Brubaker (nolan-brubaker) wrote :

As Jesse said, this is advised against, but that doesn't help the problem where VMs were inaccessible. It seems like neutron doesn't upgrade gracefully when changing the default.

James Denton (james-denton) wrote :

So it looks like restarting the LinuxBridge agent isn't enough to initiate the change. The vxlan interfaces will persist a restart of the service, so they must be deleted first, then a restart of a agent will rebuild them:

Before and after restarting the agent:

293: vxlan-48: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master brq41984be8-05 state UNKNOWN mode DEFAULT group default
    link/ether 22:2f:74:e5:51:c3 brd ff:ff:ff:ff:ff:ff promiscuity 1
    vxlan id 48 dev br-vxlan port 32768 61000 proxy ageing 300

After deleting/restarting:

955: vxlan-48: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master brq41984be8-05 state UNKNOWN mode DEFAULT group default
    link/ether 52:d9:60:02:f0:fa brd ff:ff:ff:ff:ff:ff promiscuity 1
    vxlan id 48 group dev br-vxlan port 32768 61000 ageing 300

This will, of course, cause downtime for connected instances.

Nolan Brubaker (nolan-brubaker) wrote :

James - that's still with re-enabling L2pop, correct?

Would we be able to continue using vxlan networks without l2pop?

James Denton (james-denton) wrote :

Hi Nolan,

After disabling l2pop through the upgrade it should be possible to use VXLAN with multicast. In practice, however, it appears that the vxlan interfaces must be torn down and regenerated via agent restart to a) clear the existing FDB table and add the multicast flooding entry and b) setup the VTEP with multicast and no arp proxy. As it stands now, the interfaces will remain configured as if l2pop was still enabled and new VMs are inaccessible.

Nolan Brubaker (nolan-brubaker) wrote :

Ah, ok - so could this be addressed by documentation? I'm not sure there's an automated way to address this if we don't want to re-enable l2pop.

Ideally we should have an entry in the documentation for Liberty that describes the difference between using using the mechanism that was default in Kilo and the default in Liberty and why the default was switched. There should be a release note pointing to this documentation. Perhaps the doc entry should be in the upgrade guide? I would think that a deployer may choose to stick with l2pop at their own risk and the documentation should describe how that can be done.

Changed in openstack-ansible:
milestone: 12.0.10 → 12.1.0
status: Triaged → Invalid
milestone: 12.1.0 → none
importance: Medium → Undecided
assignee: admin0 (shashi-eu) → nobody
Matthew Thode (prometheanfire) wrote :

not sure if this addresses the issue and is fixed vis a sha update, but here, have a link.

Milestone will be re-added if/when a patch is submitted.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers