unhandled trace if no namespaces in metering agent

Bug #1286209 reported by George Shuklin on 2014-02-28
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Medium
Unassigned
neutron (Ubuntu)
Medium
Unassigned

Bug Description

If network node has no active routers on it l3-agent, metering-agent tracing:

2014-02-28 17:04:51.286 1121 DEBUG neutron.services.metering.agents.metering_agent [-] Get router traffic counters _get_traffic_counters /usr/lib/python2.7/dist-packages/neutron/services/metering/agents/metering_agent.py:214
2014-02-28 17:04:51.286 1121 DEBUG neutron.openstack.common.lockutils [-] Got semaphore "metering-agent" for method "_invoke_driver"... inner /usr/lib/python2.7/dist-packages/neutron/openstack/common/lockutils.py:191
2014-02-28 17:04:51.286 1121 DEBUG neutron.common.log [-] neutron.services.metering.drivers.iptables.iptables_driver.IptablesMeteringDriver method get_traffic_counters called with arguments (<neutron.context.ContextBase object at 0x2504510>, [{u'status': u'ACTIVE', u'name': u'r', u'gw_port_id': u'86be6088-d967-45a8-bf69-8af76d956a3e', u'admin_state_up': True, u'tenant_id': u'1483a06525a5485e8a7dd64abaa66619', u'_metering_labels': [{u'rules': [{u'remote_ip_prefix': u'0.0.0.0/0', u'direction': u'ingress', u'metering_label_id': u'19de35e4-ea99-4d84-9fbf-7b0c7a390540', u'id': u'3991421b-50ce-46ea-b264-74bb47d09e65', u'excluded': False}, {u'remote_ip_prefix': u'0.0.0.0/0', u'direction': u'egress', u'metering_label_id': u'19de35e4-ea99-4d84-9fbf-7b0c7a390540', u'id': u'706e55db-e2f7-4eb9-940a-67144a075a2c', u'excluded': False}], u'id': u'19de35e4-ea99-4d84-9fbf-7b0c7a390540'}], u'id': u'5ccfe6b8-9c3b-44c4-9580-da0d74ccdcf8'}]) {} wrapper /usr/lib/python2.7/dist-packages/neutron/common/log.py:33
2014-02-28 17:04:51.286 1121 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'ip', 'netns', 'exec', 'qrouter-5ccfe6b8-9c3b-44c4-9580-da0d74ccdcf8', 'iptables', '-t', 'filter', '-L', 'neutron-meter-l-19de35e4-ea9', '-n', '-v', '-x', '-Z'] execute /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:43
2014-02-28 17:04:51.291 1121 DEBUG neutron.agent.linux.utils [-]
Command: ['sudo', 'ip', 'netns', 'exec', 'qrouter-5ccfe6b8-9c3b-44c4-9580-da0d74ccdcf8', 'iptables', '-t', 'filter', '-L', 'neutron-meter-l-19de35e4-ea9', '-n', '-v', '-x', '-Z']
Exit code: 1
Stdout: ''
Stderr: 'Cannot open network namespace: No such file or directory\n' execute /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:60
2014-02-28 17:04:51.291 1121 ERROR neutron.openstack.common.loopingcall [-] in fixed duration looping call
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall Traceback (most recent call last):
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/openstack/common/loopingcall.py", line 78, in _inner
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall self.f(*self.args, **self.kw)
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/services/metering/agents/metering_agent.py", line 163, in _metering_loop
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall self._add_metering_infos()
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/services/metering/agents/metering_agent.py", line 155, in _add_metering_infos
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall accs = self._get_traffic_counters(self.context, self.routers.values())
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/services/metering/agents/metering_agent.py", line 215, in _get_traffic_counters
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall return self._invoke_driver(context, routers, 'get_traffic_counters')
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/openstack/common/lockutils.py", line 247, in inner
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall retval = f(*args, **kwargs)
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/neutron/services/metering/agents/metering_agent.py", line 180, in _invoke_driver
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall {'driver': cfg.CONF.metering_driver,
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/oslo/config/cfg.py", line 1648, in __getattr__
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall raise NoSuchOptError(name)
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall NoSuchOptError: no such option: metering_driver
2014-02-28 17:04:51.291 1121 TRACE neutron.openstack.common.loopingcall

No routers is perfectly fine state for l3-agent, and this should not cause errors.

George Shuklin (george-shuklin) wrote :

 neutron-plugin-metering-agent 1:2013.2.1-0ubuntu1~cloud0

affects: neutron → neutron (Ubuntu)
George Shuklin (george-shuklin) wrote :

any agent solve the problem (inc. dhcp-agent). Host without agents cause traces.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in neutron (Ubuntu):
status: New → Confirmed
liuweicai (liuuweicai) on 2014-05-12
Changed in neutron (Ubuntu):
assignee: nobody → liuweicai (liuuweicai)
Changed in neutron:
assignee: nobody → liuweicai (liuuweicai)
liuweicai (liuuweicai) wrote :

hi, George. I need more information about this issue:

1. " Host without agents cause traces.".
    Does it means this network node only deploy a metering_agent ?
    Or can you show me what services you've deployed on the network node?

2. have you done something that cause this error directly ?

George Shuklin (george-shuklin) wrote :

Hello.

1. I was slightly wrong, not 'agent', but router (and same thing for dhcp-agent). The means if l3-agent got no routers on itself, trace happens. neutron l3-agent-router-remove allow to remove routers from host.

2. Nope, I think this is default behavior for metering agent, because there is no namespaces at all (before l3-agent creates them for router).

liuweicai (liuuweicai) wrote :

i cannot reproduce this.

George Shuklin (george-shuklin) wrote :

Can you check if you got no namespaces on machine during that time (ip netns list)? I think that bug happens when l3/dhcp agents remove it own namespaces from machine.

tags: added: metering
Changed in neutron:
status: New → Incomplete
liuweicai (liuuweicai) on 2014-05-28
Changed in neutron (Ubuntu):
assignee: liuweicai (liuuweicai) → nobody
Changed in neutron:
assignee: liuweicai (liuuweicai) → nobody
George Shuklin (george-shuklin) wrote :

I still see than bug and definitively can reproduce it.

If router is placed on other network node, metering agent fail, because it expecting to see network namespace for that router on own network node.

There is no filtering 'our' router or 'not our'.

It will works on single-node installation, but fails on multi-node.

George Shuklin (george-shuklin) wrote :

I was able to fix this by ignoring exception in __exit__() function. But still, proper solution should filter out routers from other network nodes.

Changed in neutron:
status: Incomplete → New
Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
Ilya Shakhat (shakhat) on 2014-08-01
Changed in neutron:
assignee: nobody → Ilya Shakhat (shakhat)
James Page (james-page) on 2014-08-01
Changed in neutron (Ubuntu):
importance: Undecided → Critical
importance: Critical → Medium
status: Confirmed → Triaged
Bellantuono Daniel (kelfen) wrote :

I have a similar problem, the metering service print this error even when you delete a router.
I think that the namespace is deleted but the service does not notice

Liping Mao (limao) wrote :

I get the same problem. Here is the reproduce steps:
I have two network nodes in my environment. Both of them installed L3 agent and metering agent.
Then I create some metering rules and two routers, and the routers are scheduled on different network nodes. Then this error will happen. The error is because metering agent will monitor all the router namespace include the namespace on other network nodes.

Thanks.

Ilya Shakhat (shakhat) on 2014-09-01
Changed in neutron:
status: Confirmed → In Progress

Change abandoned by Salvatore Orlando (<email address hidden>) on branch: master
Review: https://review.openstack.org/118321
Reason: This patch has been inactive long enough that I think it's safe to abandon.
The author can resurrect it if needed.

Matt Riedemann (mriedem) on 2015-04-20
Changed in neutron:
assignee: Ilya Shakhat (shakhat) → nobody
status: In Progress → Triaged
Li Ma (nick-ma-z) on 2015-08-07
Changed in neutron:
assignee: nobody → Li Ma (nick-ma-z)
Li Ma (nick-ma-z) wrote :

I followed these steps mentioned above, but still cannot reproduce this bug.

Changed in neutron:
assignee: Li Ma (nick-ma-z) → nobody

marking it as invalid, since we are not able to reproduce it

Changed in neutron:
status: Triaged → Invalid
Chuck Short (zulcss) on 2016-10-05
Changed in neutron (Ubuntu):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers