l3_agent not disabling namespace use

Bug #1060559 reported by Koaps on 2012-10-03
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
High
Gary Kotton
Folsom
High
Gary Kotton
quantum (Ubuntu)
Undecided
Unassigned
Precise
Undecided
Unassigned
Quantal
Undecided
Unassigned

Bug Description

Seems related to: https://bugs.launchpad.net/quantum/+bug/1042104

Centos 6.3 iproute doesn't support netns.

From the docs I have:

In quantum.conf

allow_overlapping_ips=False

In both dhcp_agent.ini and l3_agent.ini

use_namespaces=False set

The l3_agent seems to be still attempting to use netns.

2012-10-02 18:02:33 DEBUG [quantum.agent.linux.utils] Running command: sudo ip netns list
2012-10-02 18:02:33 DEBUG [quantum.agent.linux.utils]
Command: ['sudo', 'ip', 'netns', 'list']
Exit code: 255
Stdout: ''
Stderr: 'Object "netns" is unknown, try "ip help".\n'

Browsing the code:

/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/l3_agent.py

I seem to load the L3NATAgent class which in it's init runs self._destroy_all_router_namespaces() which runs root_ip.get_namespaces, which is where that sudo ip netns list is being called.

No where in that chain do I see anything checking if use_namespace is set to True.

dan wendlandt (danwent) wrote :

how are you invoking quantum-l3-agent? you're not running from packages, right?

Koaps (koaps) wrote :

I have a startup script that I use for all the openstack services, it's a basic redhat/centos init script for chkconfig.

The start command is:

    daemon --user quantum --pidfile $pidfile "$exec --config-dir $config_dir --config-file $config_file --log-file $logfile &>/dev/null & echo \$! > $pidfile"

Where:

suffix=dhcp
prog=openstack-quantum-${suffix}-agent
exec="/usr/bin/quantum-${suffix}-agent"
config_dir="/etc/openstack/quantum"
config_file="/etc/openstack/quantum/${suffix}_agent.ini"
pidfile="/var/run/openstack/quantum-${suffix}-agent.pid"
logfile="/var/log/openstack/quantum-${suffix}-agent.log"

I end up with a python command running like:

/usr/bin/python /usr/bin/quantum-dhcp-agent --config-dir /etc/openstack/quantum --config-file /etc/openstack/quantum/dhcp_agent.ini --log-file /var/log/openstack/quantum-dhcp-agent.log

The only odd thing I see is it runs dnsmasq twice, not sure why:

dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap77a55569-aa --except-interface=lo --domain=openstacklocal --pid-file=/var/lib/openstack/quantum/data/dhcp/8a56b7a4-06ae-404f-acee-9715b8823f7f/pid --dhcp-hostsfile=/var/lib/openstack/quantum/data/dhcp/8a56b7a4-06ae-404f-acee-9715b8823f7f/host --dhcp-optsfile=/var/lib/openstack/quantum/data/dhcp/8a56b7a4-06ae-404f-acee-9715b8823f7f/opts --dhcp-script=/usr/bin/quantum-dhcp-agent-dnsmasq-lease-update --leasefile-ro --dhcp-range=set:tag0,10.0.0.0,static,120s

The only difference I see between the two processes is one is run as root and the other one is run as nobody.

Gary Kotton (garyk) wrote :

Hi,
Can you please add in the trace so that we can unerstand where the call took place.
Thanks
Gary

Koaps (koaps) wrote :

Hi Gary,

Here you go:

sudo -u quantum sudo /usr/bin/quantum-l3-agent --config-dir /etc/openstack/quantum --config-file /etc/openstack/quantum/l3_agent.ini --log-file /var/log/openstack/quantum-l3-agent.log -v -d

Traceback (most recent call last):
  File "/usr/bin/quantum-l3-agent", line 9, in <module>
    load_entry_point('quantum==2013.1', 'console_scripts', 'quantum-l3-agent')()
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/l3_agent.py", line 530, in main
    mgr = L3NATAgent(conf)
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/l3_agent.py", line 129, in __init__
    self._destroy_all_router_namespaces()
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/l3_agent.py", line 136, in _destroy_all_router_namespaces
    for ns in root_ip.get_namespaces(self.conf.root_helper):
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/linux/ip_lib.py", line 124, in get_namespaces
    output = cls._execute('', 'netns', ('list',), root_helper=root_helper)
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/linux/ip_lib.py", line 56, in _execute
    root_helper=root_helper)
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/linux/utils.py", line 60, in execute
    raise RuntimeError(m)
RuntimeError:
Command: ['sudo', 'ip', 'netns', 'list']
Exit code: 255
Stdout: ''
Stderr: 'Object "netns" is unknown, try "ip help".\n'

Gary Kotton (garyk) wrote :

Thanks. I'll take care of a fix soon.

Changed in quantum:
status: New → Confirmed
assignee: nobody → Gary Kotton (garyk)
milestone: none → grizzly-1
importance: Undecided → High

Fix proposed to branch: master
Review: https://review.openstack.org/14079

Changed in quantum:
status: Confirmed → In Progress
Gary Kotton (garyk) on 2012-10-05
tags: added: folsom-backport-potential
Koaps (koaps) wrote :

Hi Gary,

I was able to get l3_agent running by commenting out the self._destroy_all_router_namespaces(), of course your way is the proper way :)

I'm still having an issue trying to get NAT working.

I tried to follow the test case and workflow:

https://fedoraproject.org/wiki/QA:Testcase_Quantum_V2
http://docs.openstack.org/trunk/openstack-network/admin/content/l3_workflow.html

But I can't get NAT to work and the VM can't leave its private network, though it can ping the public interface of the controller/gateway node, so I'm pretty sure the GRE tunnel is working right.

I can file a new bug on this is that is a better way, but any help would be great because I'm stuck and this is the last mile for my stack.

Thanks

Reviewed: https://review.openstack.org/14079
Committed: http://github.com/openstack/quantum/commit/8eb7ca51c0bfda334eda8a25d599aa1d9cd21c22
Submitter: Jenkins
Branch: master

commit 8eb7ca51c0bfda334eda8a25d599aa1d9cd21c22
Author: Gary Kotton <email address hidden>
Date: Fri Oct 5 06:07:13 2012 -0400

    Treat invalid namespace call

    Fixes bug 1060559

    Change-Id: I29250100416b87f55781fb7e97339f6d3761513f

Changed in quantum:
status: In Progress → Fix Committed

On 10/05/2012 08:11 PM, Koaps wrote:
> Hi Gary,
>
> I was able to get l3_agent running by commenting out the
> self._destroy_all_router_namespaces(), of course your way is the proper
> way :)
>
> I'm still having an issue trying to get NAT working.
>
> I tried to follow the test case and workflow:
>
> https://fedoraproject.org/wiki/QA:Testcase_Quantum_V2
> http://docs.openstack.org/trunk/openstack-network/admin/content/l3_workflow.html
>
> But I can't get NAT to work and the VM can't leave its private network,
> though it can ping the public interface of the controller/gateway node,
> so I'm pretty sure the GRE tunnel is working right.
>
> I can file a new bug on this is that is a better way, but any help would
> be great because I'm stuck and this is the last mile for my stack.
>
> Thanks
>
Hi,
Can you please help provide some additional information about your
setup. From the bug I assume that you are working with namespaces
disabled and recall that you are using openvswicth. When I wrote the
above test cases it was done with namespaces enabled.
I have a few questions:
1. Can you please print out the ifconfig?
2. Can you please send ovs-vsctl show
3. When you assign a floating IP are you able to ping the floating IP
(after it has been assigned to a VM)?
Thanks
Gary

Koaps (koaps) wrote :
Download full text (7.4 KiB)

Hi Gary,

Here's the network info:

br-ex Link encap:Ethernet
          inet addr:10.2.1.201 Bcast:10.2.1.207 Mask:255.255.255.248
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

br-int Link encap:Ethernet
          inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

br-omg Link encap:Ethernet
          inet addr:10.0.1.1 Bcast:10.0.1.255 Mask:255.255.255.0
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

eth0 Link encap:Ethernet
          inet addr:10.2.1.175 Bcast:10.2.1.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

eth1 Link encap:Ethernet
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

eth2 Link encap:Ethernet
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

eth3 Link encap:Ethernet
          UP BROADCAST PROMISC MULTICAST MTU:1500 Metric:1

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          UP LOOPBACK RUNNING MTU:16436 Metric:1

    Bridge br-tun
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
    Bridge br-ex
        Port br-ex
            Interface br-ex
                type: internal
    Bridge br-omg
        Port br-omg
            Interface br-omg
                type: internal
        Port "eth2"
            Interface "eth2"
    Bridge br-int
        Port "eth1"
            Interface "eth1"
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "1.7.1"

No I can't ping the floating ip assigned to the VM.
The VM can ping the public and private IPs assigned to the controller node.

I don't really see anything in IPtables doing forwarding.

iptables -L -n -v

Chain INPUT (policy ACCEPT 10M packets, 2399M bytes)
 pkts bytes target prot opt in out source destination
4183K 977M nova-api-INPUT all -- * * 0.0.0.0/0 0.0.0.0/0

Chain FORWARD (policy ACCEPT 13 packets, 1092 bytes)
 pkts bytes target prot opt in out source destination
   11 924 nova-filter-top all -- * * 0.0.0.0/0 0.0.0.0/0
   11 924 nova-api-FORWARD all -- * * 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT 10M packets, 2420M bytes)
 pkts bytes target prot opt in out source destination
8482K 2035M nova-filter-top all -- * * 0.0.0.0/0 0.0.0.0/0
4094K 990M nova-api-OUTPUT all -- * * 0.0.0.0/0 0.0.0.0/0

Chain nova-api-FORWARD (1 references)
 pkts bytes target prot opt in out source destination

Chain nova-api-INPUT (1 references)
 pkts bytes target prot op...

Read more...

Koaps (koaps) wrote :

I just noticed that the br-ex was missing it's port,

It should be:

    Bridge br-ex
        Port "eth3"
            Interface "eth3"
        Port br-ex
            Interface br-ex
                type: internal

Still didn't change anything, but I did fix that.

Gary Kotton (garyk) wrote :

On 10/07/2012 12:08 PM, Koaps wrote:
> I just noticed that the br-ex was missing it's port,
>
> It should be:
>
> Bridge br-ex
> Port "eth3"
> Interface "eth3"
> Port br-ex
> Interface br-ex
> type: internal
>
> Still didn't change anything, but I did fix that.
>
Hi,
I am looking into this at the moment. Please note that when namespaces
are disabled then you need to pass the router_id to the layer 3 agent.
Can you please ensure that this is correctly configured in the
l3_agent.ini file.
Thanks
Gary

Koaps (koaps) wrote :

Hi Gary,

That was probably a key thing, so now i'm in a slightly different situation but I think I need to sort out the configs to get it right.

Once I added the router id to the ini and restarted the l3_agent, it created the qg- and qr- interfaces and I can now ping the floating IP from the controller node.

Unfortunately it also changed my routing table, adding a new gateway (10.2.1.201) which knocked my controller off the public network.

Luckily I can still access it via the internal bridges from the compute node, I also have IPMI as a worse case.

qg-acda11d9-dd Link encap:Ethernet
          inet addr:10.2.1.202 Bcast:10.2.1.207 Mask:255.255.255.248
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

qr-792fef06-66 Link encap:Ethernet
          inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

I have IP's on br-ex and br-int, should I remove those?

br-int Link encap:Ethernet
          inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0

br-ex Link encap:Ethernet
          inet addr:10.2.1.201 Bcast:10.250.1.207 Mask:255.255.255.248

I also have an public interface, eth0, that is how I normally connect to the server remotely.

eth0 Link encap:Ethernet
          inet addr:10.2.1.175 Bcast:10.250.1.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Ideally I would like:
eth0 to be the default gateway interface for the system.
br-ex with eth3 port used just for 10.2.1.200/29 (instance VM NAT traffic)
br-int with eth1 port used for 10.0.0.0/24 (instance VM traffic)
br-img with eth2 port used for 10.0.1.0/24 (openstack management traffic)

I can delete the added gateway and get traffic flowing through the server again:

route del -net 0.0.0.0 gw 10.2.1.201

I can ping the floating ip and gateway ( 10.2.1.201 ) from the VM, but still not able to get past that, can't ping the next hop gateway ( 10.2.1.2 ).

The system default gateway is 10.2.1.2

Reviewed: https://review.openstack.org/14743
Committed: http://github.com/openstack/quantum/commit/b4f9b1f5f629e4be21ca51c17cfd72cab4aefe39
Submitter: Jenkins
Branch: stable/folsom

commit b4f9b1f5f629e4be21ca51c17cfd72cab4aefe39
Author: Gary Kotton <email address hidden>
Date: Fri Oct 5 06:07:13 2012 -0400

    Treat invalid namespace call

    Fixes bug 1060559

    Change-Id: I29250100416b87f55781fb7e97339f6d3761513f

Gary Kotton (garyk) on 2012-10-29
tags: removed: folsom-backport-potential
Chuck Short (zulcss) on 2012-11-06
Changed in quantum (Ubuntu):
status: New → Fix Released
Changed in quantum (Ubuntu Precise):
status: New → Confirmed
Thierry Carrez (ttx) on 2012-11-21
Changed in quantum:
status: Fix Committed → Fix Released
Changed in quantum (Ubuntu Quantal):
status: New → Confirmed

Hello Koaps, or anyone else affected,

Accepted quantum into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/quantum/2012.2.1-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in quantum (Ubuntu Quantal):
status: Confirmed → Fix Committed
tags: added: verification-needed
Launchpad Janitor (janitor) wrote :
Download full text (3.8 KiB)

This bug was fixed in the package quantum - 2012.2.1-0ubuntu1

---------------
quantum (2012.2.1-0ubuntu1) quantal-proposed; urgency=low

  * Resynchronize with stable/folsom (1e774867) (LP: #1085255):
    - [aeabb42] There are routing problems when the dnsmasq port does not come
      first in the routing table (LP: #1083238)
    - [04aab72] Quantum linux bridge not optimized with libvirt (LP: #1078210)
    - [ca7fc10] getting quotas from database has severe performance implications
      (LP: #1075369)
    - [66605e8] failed to update an external network into non external network
      (LP: #1083387)
    - [c60051a] Quantum test suite leaks memory like a sieve (LP: #1065276)
    - [3179dfc] clear_db() does incomplete db teardown (LP: #1080988)
    - [c1e19d7] Unauthorized command: cat /proc/None/cmdline (LP: #1077651)
    - [af9e076] At times a instance will not receive an IP address from the DHCP
      agent (LP: #1081664)
    - [e0d1a7d] allow multiple floating-ip on single port if they use different
      fixed ips and/or external nets (LP: #1057844)
    - [8471d79] Delete port fails to gateway ip (LP: #1079980)
    - [aca8b4a] fixed_ip allocation which is not included within
      allocation_pools makes error when delete port or re-create port
      (LP: #1077292)
    - [eacc9d3] Mapping same bridge to different phyiscal networks succeed
      (LP: #1067669)
    - [51b4c82] python-quantum: not region aware (LP: #1080793)
    - [6f0a486] delete floatingip should be in one transaction to delete port
      (LP: #1080516)
    - [db6cda7] Remove qpid configuration variables no longer supported
    - [a112840] Allow NVP plugin to use per-tenant quota extension
    - [82b1a55] Quantum service does not restart after reboot (LP: #1073999)
    - [c01a839] There are some cases that L3 API with an invalid parameter
      returns 500. (LP: #1064765)
    - [26b383f] external network can be plugged also as internal network for one
      router (LP: #1053633)
    - [49f649c] There is a lot of cases that API with an invalid parameter
      returns 500. (LP: #1062046)
    - [4546a18] When create subnet, you con set up the value as cidr (the value
      isn't cidr form). (LP: #1067959)
    - [9ba453a] killfilter should handle updated/deleted executables
      (LP: #1073768)
    - [7c8a55c] a port which is not able to delete is made when floatingip
      create fails. (LP: #1064748)
    - [c9b84cf] Linux bridge port update causes exception (LP: #1072713)
    - [cb57932] I can't add interface to router, if there is another port in
      non-shared network of other tenant (LP: #1057558)
    - [574e278] Ryu plugin does not support Security Groups (LP: #1059393)
    - [607f486] tap device added to integration bridge without tag
      (LP: #1064070)
    - [21a0fdf] L3 agent external network flag (LP: #1056720)
    - [5cbaff4] router create with external_gateway_info fails with 500 always.
      (LP: #1064235)
    - [63b81f6] l3 db operations failed in multiple transactions (LP: #1070335)
    - [bff17fb] Ensure that the SqlSoup import is still supported.
    - [e091a29] l3_nat_agent was renamed to l3_agent
    - [9030969] remove default value of 'local_ip' of 10...

Read more...

Changed in quantum (Ubuntu Quantal):
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2013-04-04
Changed in quantum:
milestone: grizzly-1 → 2013.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers