Change to linuxbridge agent defaults affects upgrades

Bug #1563448 reported by James Denton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Invalid
Undecided
Unassigned
Liberty
Triaged
Medium
Unassigned

Bug Description

Upgrading from r11.1.3 to 12.0.8 using the documentation found at:

http://docs.openstack.org/developer/openstack-ansible/liberty/upgrade-guide/manual-upgrade.html

Changes to the LinuxBridge agent defaults negatively impacts upgrades if the config options are not called out in the user_variables file.

In this case, the following was in place prior to the upgrade:

[vxlan]
l2_population = True

After the upgrade, the following was in place:

[vxlan]
l2_population = False

This caused any new virtual machine on a vxlan network to not be accessible after the upgrade. The jinja template appears to make that setting configurable (Yay!) but the default is set to False here without regards to the original setting (boo!):

https://github.com/openstack/openstack-ansible/blob/12.0.8/playbooks/roles/os_neutron/defaults/main.yml#L260

Updating /etc/openstack_deploy/user_variables.yml file, rerunning the os-neutron-install playbook and restarting the LinuxBridge agent appears to have resolved the issue.

Changed in openstack-ansible:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → admin0 (shashi-eu)
milestone: none → 13.0.1
milestone: 13.0.1 → 12.0.10
Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :
Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

Please see the entire history of review comments and changes in https://review.openstack.org/252100 - enabling l2pop is specifically advised against by the Neutron team.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on openstack-ansible (liberty)

Change abandoned by Sashi Dahal (<email address hidden>) on branch: liberty
Review: https://review.openstack.org/301759
Reason: l2_pop is not set to be enabled by default .. abandoning this approach

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

As Jesse said, this is advised against, but that doesn't help the problem where VMs were inaccessible. It seems like neutron doesn't upgrade gracefully when changing the default.

Revision history for this message
James Denton (james-denton) wrote :

So it looks like restarting the LinuxBridge agent isn't enough to initiate the change. The vxlan interfaces will persist a restart of the service, so they must be deleted first, then a restart of a agent will rebuild them:

Before and after restarting the agent:

293: vxlan-48: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master brq41984be8-05 state UNKNOWN mode DEFAULT group default
    link/ether 22:2f:74:e5:51:c3 brd ff:ff:ff:ff:ff:ff promiscuity 1
    vxlan id 48 dev br-vxlan port 32768 61000 proxy ageing 300

After deleting/restarting:

955: vxlan-48: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master brq41984be8-05 state UNKNOWN mode DEFAULT group default
    link/ether 52:d9:60:02:f0:fa brd ff:ff:ff:ff:ff:ff promiscuity 1
    vxlan id 48 group 239.1.1.1 dev br-vxlan port 32768 61000 ageing 300

This will, of course, cause downtime for connected instances.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

James - that's still with re-enabling L2pop, correct?

Would we be able to continue using vxlan networks without l2pop?

Revision history for this message
James Denton (james-denton) wrote :

Hi Nolan,

After disabling l2pop through the upgrade it should be possible to use VXLAN with multicast. In practice, however, it appears that the vxlan interfaces must be torn down and regenerated via agent restart to a) clear the existing FDB table and add the multicast flooding entry and b) setup the VTEP with multicast and no arp proxy. As it stands now, the interfaces will remain configured as if l2pop was still enabled and new VMs are inaccessible.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

Ah, ok - so could this be addressed by documentation? I'm not sure there's an automated way to address this if we don't want to re-enable l2pop.

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

Ideally we should have an entry in the documentation for Liberty that describes the difference between using using the mechanism that was default in Kilo and the default in Liberty and why the default was switched. There should be a release note pointing to this documentation. Perhaps the doc entry should be in the upgrade guide? I would think that a deployer may choose to stick with l2pop at their own risk and the documentation should describe how that can be done.

Changed in openstack-ansible:
milestone: 12.0.10 → 12.1.0
status: Triaged → Invalid
milestone: 12.1.0 → none
importance: Medium → Undecided
assignee: admin0 (shashi-eu) → nobody
Revision history for this message
Matthew Thode (prometheanfire) wrote :

not sure if this addresses the issue and is fixed vis a sha update, but here, have a link.

https://bugs.launchpad.net/neutron/+bug/1445089

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

Milestone will be re-added if/when a patch is submitted.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.