L2pop raises exception when deleting an unbound port

Bug #1533013 reported by Assaf Muller on 2016-01-12
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Networking ML2 Generic Switch
Undecided
Unassigned
neutron
Low
Assaf Muller

Bug Description

Some brilliant individual introduced a regression during a refactor (https://review.openstack.org/#/c/263471/) that causes an exception to be raised when an unbound port is deleted. For example:

neutron port-create --name=port some_network
neutron port-delete port
Deleted port: port

In the neutron-server log we can see:
http://paste.openstack.org/show/483517/

Apart from the scary TRACE there's no real implications. What should have happened is an early return, so the l2pop mech driver shouldn't be doing anything in this case, and it's spamming the log with bogus information instead.

Similarly, when updating the IP address of an unbound port, a 'chg_ip' RPC message is fanout when there's no need to do so.

Assaf Muller (amuller) on 2016-01-12
description: updated

Reviewed: https://review.openstack.org/266114
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2540c84c7624892cd64514a5864731433f3889bb
Submitter: Jenkins
Branch: master

commit 2540c84c7624892cd64514a5864731433f3889bb
Author: Assaf Muller <email address hidden>
Date: Mon Jan 11 21:58:30 2016 -0500

    Fix regression with unbound ports and l2pop

    When l2pop is enabled and an unbound port is deleted l2pop mech
    driver raises an exception as a result of patch:
    https://review.openstack.org/#/c/263471/

    As a result of the same patch, when an unbound port's IP
    address is changed l2pop sends a fanout RPC message needlessly.

    Change-Id: Ia81c03dcdf7aef9528c9c2b9527399251fa6aad7
    Closes-Bug: #1533013

Changed in neutron:
status: In Progress → Fix Released

This issue was fixed in the openstack/neutron 8.0.0.0b2 development milestone.

We are getting this issue in Newtton/Stable neutron:9.1.2.

Kevin Benton (kevinbenton) wrote :

Can you please provide a traceback?

Here is the code for update_device_up in newton and it has the early return: https://github.com/openstack/neutron/blob/9.1.1/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L258

Kevin Benton (kevinbenton) wrote :

Sorry, I was looking at the wrong branch. This is still susceptible to tracebacks.

Kevin Benton (kevinbenton) wrote :

I think what you are hitting is slightly different. It must not be an unbound port if an 'agent_host' value is being passed in that makes it through this conditional.

https://github.com/openstack/neutron/blob/9.1.1/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L302

Reviewed: https://review.openstack.org/445253
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c7fb24b3cb9cda1cc78e834a0153d219995ce97f
Submitter: Jenkins
Branch: master

commit c7fb24b3cb9cda1cc78e834a0153d219995ce97f
Author: Kevin Benton <email address hidden>
Date: Mon Mar 13 15:06:22 2017 -0700

    Check for None in _get_agent_fdb for agent

    get_agent_by_host can return None in the l2pop
    driver so we need to check for that case before
    we blindly try to decode configuration values on
    the result.

    There are a couple of cases that can lead to this.
    * The deployment can be misconfigured and is missing
      either a tunneling_ip option for the agent on a
      host or is missing an L2 agent with that host_id
      entirely.
    * Multiple mech drivers are in use and a port is being
      deleted from an agentless host.

    Related-Bug: #1533013
    Closes-Bug: #1672564
    Change-Id: I1e79f600172edad1e31e8231a0a6a2c55f46804c

冯龙飞 (longfei.feng) wrote :

2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers [req-f0bcecc8-caff-44fa-9866-c41bda521983 945856473cb34ca787b2cb20fb6a4f66 d311f5cb5d0c403c97e5654360d14ada - - -] Mechanism driver 'l2population' failed in delete_port_postcommit
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py", line 433, in _call_on_drivers
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context)
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 74, in delete_port_postcommit
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers port, agent_host)
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 312, in _get_agent_fdb
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers if not self._validate_segment(segment, port['id'], agent):
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 190, in _validate_segment
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers network_types = l2pop_db.get_agent_l2pop_network_types(agent)
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/l2pop/db.py", line 52, in get_agent_l2pop_network_types
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers configuration = jsonutils.loads(agent.configurations)
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers AttributeError: 'NoneType' object has no attribute 'configurations'
2017-03-27 11:31:23.510 58098 ERROR neutron.plugins.ml2.managers
2017-03-27 11:31:23.511 58098 DEBUG networking_generic_switch.generic_switch_mech [req-f0bcecc8-caff-44fa-9866-c41bda521983 945856473cb34ca787b2cb20fb6a4f66 d311f5cb5d0c403c97e5654360d14ada - - -] Deleting port GigabitEthernet0/0/8 on sw-hostname from vlan: 2005 delete_port_postcommit /usr/lib/python2.7/site-packages/networking_generic_switch/generic_switch_mech.py:361

Esha Seth (eshaseth) wrote :

I am facing the same issue in ocata, I tried this defect fix https://review.openstack.org/445253 and that solved the issue. <email address hidden> or Assaf Muller could you cherrypick and backport this to ocata also?

Reviewed: https://review.openstack.org/460867
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f15031f406b61f12e809e15d1b83bb24cdafd494
Submitter: Jenkins
Branch: stable/ocata

commit f15031f406b61f12e809e15d1b83bb24cdafd494
Author: Kevin Benton <email address hidden>
Date: Mon Mar 13 15:06:22 2017 -0700

    Check for None in _get_agent_fdb for agent

    get_agent_by_host can return None in the l2pop
    driver so we need to check for that case before
    we blindly try to decode configuration values on
    the result.

    There are a couple of cases that can lead to this.
    * The deployment can be misconfigured and is missing
      either a tunneling_ip option for the agent on a
      host or is missing an L2 agent with that host_id
      entirely.
    * Multiple mech drivers are in use and a port is being
      deleted from an agentless host.

    Related-Bug: #1533013
    Closes-Bug: #1672564
    Change-Id: I1e79f600172edad1e31e8231a0a6a2c55f46804c
    (cherry picked from commit c7fb24b3cb9cda1cc78e834a0153d219995ce97f)

tags: added: in-stable-ocata
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers