l3ha with L2pop disabled breaks neutron

Bug #1521793 reported by Kevin Carter
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
High
Kevin Carter
Liberty
Fix Released
High
Kevin Carter
Trunk
Fix Released
High
Kevin Carter

Bug Description

when using l3ha the system will fail to build a vm if L2 population is disabled under most circumstances. To resolve this issue the variable `neutron_l2_population` should be set to "true" by default. The current train of thought was that we'd use L3HA by default however due to current differences in the neutron linux bridge agent it seems that is impossible and will require additional upstream work within neutron. In the near term we should re-enable l2 pop by default and effectively disable the built in L3HA.

This issue was reported in the channel by @Ville Vuorinen (IRC: kysse),
see http://eavesdrop.openstack.org/irclogs/%23openstack-ansible/%23openstack-ansible.2015-12-01.log.html from 18:47 onwards.

Tags: linuxbridge
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (master)

Fix proposed to branch: master
Review: https://review.openstack.org/252100

Changed in openstack-ansible:
status: Triaged → In Progress
Changed in openstack-ansible:
milestone: 12.0.2 → mitaka-1
summary: - Master/Liberty w/ L2pop disabled breaks neutron
+ l3ha with L2pop disabled breaks neutron
description: updated
Changed in neutron:
assignee: nobody → venkata anil (anil-venkata)
Changed in neutron:
assignee: venkata anil (anil-venkata) → nobody
Revision history for this message
Assaf Muller (amuller) wrote :

I read through the IRC logs, there's no info to go on there. There's a log with a trace from nova compute, but I don't have the trace from the Neutron server log. Turning on l2pop will not solve any port binding issues I can guarantee that. There's something else going on in that setup and turning on l2pop somehow masked the issue.

Can you re-run tests with L3 HA, LB and l2pop turned *off*, and upload logs when you run in to issues?

Revision history for this message
Sean M. Collins (scollins) wrote :

Here's a patch that they were using to test:

https://review.openstack.org/252574

We may be able to get logs from those failed runs

Revision history for this message
Kevin Carter (kevin-carter) wrote :

Here are some of the logs from the failure w/in nova:
http://paste.openstack.org/show/480906/

Here's the trace from the LXB agent:
http://sprunge.us/QcVZ

I can generate / test whatever's needed just let me know what need.

Revision history for this message
Assaf Muller (amuller) wrote :

From the LB agent log, it looks like you're hitting:
https://bugs.launchpad.net/neutron/+bug/1470584

Revision history for this message
Kevin Carter (kevin-carter) wrote :

I put this together to test in the gate w/ l2 pop disabled + l3ha enabled https://review.openstack.org/#/c/253606/

Revision history for this message
Assaf Muller (amuller) wrote :

OK, looking at Kevin's job I noticed that the LB agent is logging the same output as in comment 4. After looking at the code it's clear it's crashing without logging anything. I reported a Neutron bug here: https://bugs.launchpad.net/neutron/+bug/1522966.

Meanwhile, the issue is that with the LB agent, you need to configure either l2pop or VXLAN.vxlan_group. If you have both of them off the LB agent won't start.

Revision history for this message
Matt Kassawara (ionosphere80) wrote :

Not a neutron bug. Turns out, openstack-ansible configures the "vxlan_group" option with an empty value. Disabling L2 population causes the Linux bridge agent to use multicast. Without a proper value for "vxlan_group", the Linux bridge agent throws a warning and does not operate. This warning should probably become an error that terminates the agent.

tags: added: linuxbridge
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (master)

Reviewed: https://review.openstack.org/252100
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=00609ea56e09f5dd0488062b64a03420d296e614
Submitter: Jenkins
Branch: master

commit 00609ea56e09f5dd0488062b64a03420d296e614
Author: Kevin Carter <email address hidden>
Date: Tue Dec 1 17:35:06 2015 -0600

    Fix neutron issue w/ l2pop

    This change resolves an issue where neutron is not able to bind to
    a given port because l2 population is disabled and no vxlan multicast
    group has been defined. To resolve this the `neutron_l2_population`
    variable is being defined and set to "False" in the os_neutron defaults
    and the vxlan multicast group will now contain a default value instead
    of an empty string.

    The change also removes the neutron_l2_population checks in the tasks
    and templates because the variable is now being defined.

    Change-Id: Ic2973626d88781bfc67a4275afcf9feffeb63f36
    Closes-Bug: #1521793
    Co-Authored-by: Ville Vuorinen <email address hidden>
    Signed-off-by: Kevin Carter <email address hidden>

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (liberty)

Fix proposed to branch: liberty
Review: https://review.openstack.org/255624

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (liberty)

Reviewed: https://review.openstack.org/255624
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=742eaa2667746af6a6e5dface57dc10960af148d
Submitter: Jenkins
Branch: liberty

commit 742eaa2667746af6a6e5dface57dc10960af148d
Author: Kevin Carter <email address hidden>
Date: Tue Dec 1 17:35:06 2015 -0600

    Fix neutron issue w/ l2pop

    This change resolves an issue where neutron is not able to bind to
    a given port because l2 population is disabled and no vxlan multicast
    group has been defined. To resolve this the `neutron_l2_population`
    variable is being defined and set to "False" in the os_neutron defaults
    and the vxlan multicast group will now contain a default value instead
    of an empty string.

    The change also removes the neutron_l2_population checks in the tasks
    and templates because the variable is now being defined.

    Change-Id: Ic2973626d88781bfc67a4275afcf9feffeb63f36
    Closes-Bug: #1521793
    Co-Authored-by: Ville Vuorinen <email address hidden>
    Signed-off-by: Kevin Carter <email address hidden>
    (cherry picked from commit 00609ea56e09f5dd0488062b64a03420d296e614)

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible 12.0.8

This issue was fixed in the openstack/openstack-ansible 12.0.8 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible 12.0.9

This issue was fixed in the openstack/openstack-ansible 12.0.9 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible 13.0.0

This issue was fixed in the openstack/openstack-ansible 13.0.0 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

This issue was fixed in the openstack/openstack-ansible 13.0.0 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 12.0.11

This issue was fixed in the openstack/openstack-ansible 12.0.11 release.

no longer affects: neutron
Changed in neutron:
status: New → Confirmed
importance: Undecided → Wishlist
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/336128

Changed in neutron:
assignee: nobody → Sean M. Collins (scollins)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/336132

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_neutron (master)

Reviewed: https://review.openstack.org/336132
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-os_neutron/commit/?id=18a9491ba5e0a89529c865390574e217bc21c76e
Submitter: Jenkins
Branch: master

commit 18a9491ba5e0a89529c865390574e217bc21c76e
Author: Sean M. Collins <email address hidden>
Date: Thu Jun 30 13:32:52 2016 -0400

    Clarify the default for neutron_vxlan_group

    It used to be an empty string, which triggered some serious bugs,
    and was fixed to have a default.

    Change-Id: Iea03142cf03e56428184dea36bad7673c9980e9c
    Related-Bug: #1521793

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/336128
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
status: In Progress → Incomplete
assignee: Sean M. Collins (scollins) → nobody
no longer affects: neutron
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.