L3 plugin exists / dies when external bridge isn't up

Bug #1052522 reported by Endre Karlson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
dan wendlandt
quantum (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

So after a reboot the L3 agent dies when it's starting before OVS and the MQ:

2012-09-18 16:26:02 DEBUG [quantumclient.client] RESP BODY:[Errno 111] ECONNREFUSED

2012-09-18 16:26:02 DEBUG [quantumclient.v2_0.client] Error message: [Errno 111] ECONNREFUSED
2012-09-18 16:26:02 ERROR [quantum.agent.l3_agent] Error running l3_nat daemon_loop
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 175, in daemon_loop
    self.do_single_loop()
  File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 228, in do_single_loop
    self.process_router(ri)
  File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 254, in process_router
    device_owner=l3_db.DEVICE_OWNER_ROUTER_INTF)['ports']
  File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 102, in with_params
    ret = self.function(instance, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 208, in list_ports
    return self.get(self.ports_path, params=_params)
  File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 525, in get
    headers=headers, params=params)
  File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 510, in retry_request
    headers=headers, params=params)
  File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 455, in do_request
    self._handle_fault_response(status_code, replybody)
  File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 436, in _handle_fault_response
    exception_handler_v20(status_code, des_error_body)
  File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 82, in exception_handler_v20
    message=message)
QuantumClientException: [Errno 111] ECONNREFUSED
2012-09-18 16:26:58 DEBUG [quantum.agent.linux.utils] Running command: ip -o link show br-ex
2012-09-18 16:26:58 DEBUG [quantum.agent.linux.utils]
Command: ['ip', '-o', 'link', 'show', 'br-ex']
Exit code: 1
Stdout: ''
Stderr: 'Device "br-ex" does not exist.\n'
2012-09-18 16:30:33 DEBUG [quantum.agent.linux.utils] Running command: ip -o link show br-ex
2012-09-18 16:30:33 DEBUG [quantum.agent.linux.utils]
Command: ['ip', '-o', 'link', 'show', 'br-ex']
Exit code: 1
Stdout: ''
Stderr: 'Device "br-ex" does not exist.\n'

Revision history for this message
Emilien Macchi (emilienm) wrote :

I think it's because of Ubuntu Packaging :

OVS starts after L3 agent, so L3 starting fails since it needs br-ex existence.

OVS should start before all quantum services.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

  I've been unable to reproduce this using a simple test:

# apt-get install openvswitch-switch quantum-l3-agent
# service quantum-l3-agent status
quantum-l3-agent stop/waiting # <- this is expected, currently.
# ovs-vsctl add-br br-int
# ovs-vsctl add-br br-ex
# ip addr
3: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether 0a:ce:78:5e:20:44 brd ff:ff:ff:ff:ff:ff
5: br-ex: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether c2:b8:49:11:9b:4f brd ff:ff:ff:ff:ff:ff
# service quantum-l3-agent restart
# service quantum-l3-agent status
quantum-l3-agent start/running, process 9138
# reboot
# service quantum-l3-agent status
quantum-l3-agent start/running, process 822

Revision history for this message
Gary Kotton (garyk) wrote :

From the trace it looks like there are a number of problems:
1. keystone authentication has failed - this means that the keystone credentials are not correctly set
2. the br-ex is not configured. This can be set as follows: sudo ovs-vsctl add-br br-ex
Thanks
Gary

dan wendlandt (danwent)
Changed in quantum:
status: New → Confirmed
importance: Undecided → Medium
dan wendlandt (danwent)
Changed in quantum:
importance: Medium → Critical
Revision history for this message
dan wendlandt (danwent) wrote :

Yeah, the traceback is unrelated. The bug is valid though, as even if the bridge is configured, if the l3-agent starts before OVS, the l3-agent will exit permanently, which is obviously bad.

I do think there's probably a decent argument to be made that the lack of an external bridge should probably not kill the entire service, so much as just prevent it from configuring gateways. I was trying to be very loud and upfront about the error, which makes sense if the process is started manually, but not if its a service that starts automatically and especially if it could start before OVS. There's also the point that if the l3-agent was only implementing routers with no gateway service, technically you wouldn't need an external gateway at all.

That said, in several reboot on ubuntu at least, I saw this issue 1 out of 4 times, which is still enough that we need to treat this as a serious issue.

The question is whether we should fix this in the code, or put the requirement on packagers. I tend to lean toward fixing it in the code, and instead checking and printing an error message and skipping router configuration in the router loop if no bridge is configured.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to quantum (master)

Fix proposed to branch: master
Review: https://review.openstack.org/13443

Changed in quantum:
assignee: nobody → dan wendlandt (danwent)
status: Confirmed → In Progress
dan wendlandt (danwent)
Changed in quantum:
milestone: none → folsom-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to quantum (master)

Reviewed: https://review.openstack.org/13443
Committed: http://github.com/openstack/quantum/commit/da1cf7c27014182033b86a156e829d096459d9f0
Submitter: Jenkins
Branch: master

commit da1cf7c27014182033b86a156e829d096459d9f0
Author: Dan Wendlandt <email address hidden>
Date: Thu Sep 20 23:52:52 2012 -0700

    l3-agent: move check if ext-net bridge exists within daemon loop

    bug 1052522

    the l3 agent checked if the external network bridge exists in its
    constructor, raising an uncaught exception if it did not. this does not
    make much sense when running the l3-agent as a deamon, especially since
    it can be the case that the l3-agent starts before open vswitch.

    Change-Id: Ie1717b2c02c9f0bc0caf34a6fdb0dc3a930123c0

Changed in quantum:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to quantum (milestone-proposed)

Fix proposed to branch: milestone-proposed
Review: https://review.openstack.org/13457

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to quantum (milestone-proposed)

Reviewed: https://review.openstack.org/13457
Committed: http://github.com/openstack/quantum/commit/1aa2ac5d791cb1d0738f7a61ea6fd50721bec8a8
Submitter: Jenkins
Branch: milestone-proposed

commit 1aa2ac5d791cb1d0738f7a61ea6fd50721bec8a8
Author: Dan Wendlandt <email address hidden>
Date: Thu Sep 20 23:52:52 2012 -0700

    l3-agent: move check if ext-net bridge exists within daemon loop

    bug 1052522

    the l3 agent checked if the external network bridge exists in its
    constructor, raising an uncaught exception if it did not. this does not
    make much sense when running the l3-agent as a deamon, especially since
    it can be the case that the l3-agent starts before open vswitch.

    Change-Id: Ie1717b2c02c9f0bc0caf34a6fdb0dc3a930123c0

Changed in quantum:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in quantum:
milestone: folsom-rc2 → 2012.2
Revision history for this message
Chuck Short (zulcss) wrote :

This is fixed in Quantal/Precise.

Changed in quantum (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.