NeutronServerTrytoFindL3agentOnComputeNodewhenWeUseLinuxBridge

Bug #1737917 reported by Aaasidncza
72
This bug affects 13 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Slawek Kaplonski

Bug Description

Hi all Cool Developer of Openstack,I followed the pike install doc and successful running openstack now,and In
Neutron configuration section,Verify operation Networking Option 2: Self-service networks(I choosed Self-Servrice network to deploy on Centos7x64 everything up to date)there we see:
The output should indicate four agents on the controller node and one agent on each compute node.

$ openstack network agent list

+--------------------------------------+--------------------+------------+-------------------+-------+-------+---------------------------+
| ID | Agent Type | Host | Availability Zone | Alive | State | Binary |
+--------------------------------------+--------------------+------------+-------------------+-------+-------+---------------------------+
| f49a4b81-afd6-4b3d-b923-66c8f0517099 | Metadata agent | controller | None | True | UP | neutron-metadata-agent |
| 27eee952-a748-467b-bf71-941e89846a92 | Linux bridge agent | controller | None | True | UP | neutron-linuxbridge-agent |
| 08905043-5010-4b87-bba5-aedb1956e27a | Linux bridge agent | compute1 | None | True | UP | neutron-linuxbridge-agent |
| 830344ff-dc36-4956-84f4-067af667a0dc | L3 agent | controller | nova | True | UP | neutron-l3-agent |
| dd3644c9-1a3a-435a-9282-eb306b4b0391 | DHCP agent | controller | nova | True | UP | neutron-dhcp-agent |
+--------------------------------------+--------------------+------------+-------------------+-------+-------+---------------------------+

and I run this command on controller node too after setup and configuration for openstack is over and then got this output:
[root@controller neutron]# openstack network agent list
+--------------------------------------+--------------------+------------+------
-------------+-------+-------+---------------------------+
| ID | Agent Type | Host | Avail
ability Zone | Alive | State | Binary |
+--------------------------------------+--------------------+------------+------
-------------+-------+-------+---------------------------+
| 010608cc-01cf-4143-97d7-df617aaf2ac1 | Linux bridge agent | compute1 | None
             | :-) | UP | neutron-linuxbridge-agent |
| 09cd7c61-b874-44a2-afd1-691bedbb8a97 | Metadata agent | controller | None
             | :-) | UP | neutron-metadata-agent |
| 865bcc2f-183f-4c3a-8b05-b6724beca5f0 | L3 agent | controller | nova
             | :-) | UP | neutron-l3-agent |
| 960a7d2c-5345-4245-8494-3adf885a61ff | DHCP agent | controller | nova
             | :-) | UP | neutron-dhcp-agent |
| c2bb1518-817a-4d6e-a159-7ac01858f874 | Linux bridge agent | controller | None
             | :-) | UP | neutron-linuxbridge-agent |
+--------------------------------------+--------------------+------------+------
-------------+-------+-------+---------------------------+

seems running normally just like what we got in offical docs.and ip netns on controller:
[root@controller neutron]# ip netns exec qrouter-89466cea-7d58-4a45-92da-e636c0
958358 ping -c 5 www.bing.com
PING cn-0001.cn-msedge.net (202.89.233.100) 56(84) bytes of data.
64 bytes from 202.89.233.100 (202.89.233.100): icmp_seq=1 ttl=115 time=31.6 ms
64 bytes from 202.89.233.100 (202.89.233.100): icmp_seq=2 ttl=115 time=31.5 ms
64 bytes from 202.89.233.100 (202.89.233.100): icmp_seq=3 ttl=115 time=31.4 ms
64 bytes from 202.89.233.100 (202.89.233.100): icmp_seq=4 ttl=115 time=32.4 ms
64 bytes from 202.89.233.100 (202.89.233.100): icmp_seq=5 ttl=115 time=31.8 ms

--- cn-0001.cn-msedge.net ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 31.490/31.810/32.405/0.397 ms
[root@controller neutron]# ip netns
qrouter-89466cea-7d58-4a45-92da-e636c0958358 (id: 2)
qdhcp-fca97929-aed3-45dc-9f19-f7ead767fbc3 (id: 0)
qdhcp-c08c44ed-d71e-4acb-b1a5-3ebc4715a01d (id: 1)

and every virtual machine on openstack runs perfectly(self network or provider network,access to Internet),except one
day I log into compute node,and found the linuxbridge.log seems too big(over 300mb),quickly I use grep to filter out most INFO information and got so many ERROR log like this(also the only one type of error happens every fix period):

2017-12-13 17:23:16.030 1334 INFO neutron.agent.securitygroups_rpc [req-7fbf230a
-bf45-4817-a7e8-447005e1700a - - - - -] Security group member updated [u'626f761
f-7451-42cf-afbf-724da51190c0']
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt [req-7fbf230a-bf45-4817-a7e8-447005e1700a - - - - -] Error occurred while rem
oving port tapa97e89f4-2d: RemoteError: Remote error: AgentNotFoundByTypeHost Ag
ent with agent_type=L3 agent and host=compute1 could not be found
[u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-pack
ages/oslo_messaging/rpc/server.py", line 160, in _process_incoming\n res = se
lf.dispatcher.dispatch(message)\n', u' File "/usr/lib/python2.7/site-packages/o
slo_messaging/rpc/dispatcher.py", line 213, in dispatch\n return self._do_dis
patch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/python2.7/site-packag
es/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch\n result = fu
nc(ctxt, **new_args)\n', u' File "/usr/lib/python2.7/site-packages/neutron/plug
ins/ml2/rpc.py", line 234, in update_device_down\n n_const.PORT_STATUS_DOWN,
host)\n', u' File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py"
, line 331, in notify_l2pop_port_wiring\n l2pop_driver.obj.update_port_down(p
ort_context)\n', u' File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/
drivers/l2pop/mech_driver.py", line 253, in update_port_down\n admin_context,
 agent_host, [port[\'device_id\']]):\n', u' File "/usr/lib/python2.7/site-packa
ges/neutron/db/l3_agentschedulers_db.py", line 303, in list_router_ids_on_host\n
    context, constants.AGENT_TYPE_L3, host)\n', u' File "/usr/lib/python2.7/sit
e-packages/neutron/db/agents_db.py", line 291, in _get_agent_by_type_and_host\n
   host=host)\n', u'AgentNotFoundByTypeHost: Agent with agent_type=L3 agent and
host=compute1 could not be found\n'].
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt Traceback (most recent call last):
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/agent/_c
ommon_agent.py", line 336, in treat_devices_removed
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt cfg.CONF.host)
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt File "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 139, in
update_device_down
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt agent_id=agent_id, host=host)
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 162, in
 call
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt return self._original_context.call(ctxt, method, **kwargs)
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line
169, in call
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt retry=self.retry)
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 1
23, in _send
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt timeout=timeout, retry=retry)
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.p
y", line 578, in send
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt retry=retry)
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.p
y", line 569, in _send
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt raise result
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt RemoteError: Remote error: AgentNotFoundByTypeHost Agent with agent_type=L3 a
gent and host=compute1 could not be found
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-p
ackages/oslo_messaging/rpc/server.py", line 160, in _process_incoming\n res =
 self.dispatcher.dispatch(message)\n', u' File "/usr/lib/python2.7/site-package
s/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch\n return self._do_
dispatch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/python2.7/site-pac
kages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch\n result =
 func(ctxt, **new_args)\n', u' File "/usr/lib/python2.7/site-packages/neutron/p
lugins/ml2/rpc.py", line 234, in update_device_down\n n_const.PORT_STATUS_DOW
N, host)\n', u' File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.
py", line 331, in notify_l2pop_port_wiring\n l2pop_driver.obj.update_port_dow
n(port_context)\n', u' File "/usr/lib/python2.7/site-packages/neutron/plugins/m
l2/drivers/l2pop/mech_driver.py", line 253, in update_port_down\n admin_conte
xt, agent_host, [port[\'device_id\']]):\n', u' File "/usr/lib/python2.7/site-pa
ckages/neutron/db/l3_agentschedulers_db.py", line 303, in list_router_ids_on_hos
t\n context, constants.AGENT_TYPE_L3, host)\n', u' File "/usr/lib/python2.7/
site-packages/neutron/db/agents_db.py", line 291, in _get_agent_by_type_and_host
\n host=host)\n', u'AgentNotFoundByTypeHost: Agent with agent_type=L3 agent a
nd host=compute1 could not be found\n'].
2017-12-13 17:23:16.172 1334 ERROR neutron.plugins.ml2.drivers.agent._common_age
nt
2017-12-13 17:23:16.174 1334 INFO neutron.plugins.ml2.drivers.agent._common_agen
t [req-7fbf230a-bf45-4817-a7e8-447005e1700a - - - - -] Attachment tap8d780fbe-84
 removed
2017-12-13 17:23:16.218 1334 INFO neutron.plugins.ml2.drivers.agent._common_agen
t [req-7fbf230a-bf45-4817-a7e8-447005e1700a - - - - -] Port tap8d780fbe-84 updat
ed.
2017-12-13 17:23:16.218 1334 INFO neutron.plugins.ml2.drivers.agent._common_agen
t [req-7fbf230a-bf45-4817-a7e8-447005e1700a - - - - -] Attachment tap3ca9698e-61
 removed

Here we see

"AgentNotFoundByTypeHost: Agent with agent_type=L3 agent a
nd host=compute1 could not be found\n"

So I come back to the compute node neutron configration section to check if there was setup l3 agent on compute node I was missing,But checked and found there never mentions that we need to install l3 agent on compute node.

I googled these keywords and found only a little posts about this problem,and most posts was about dhcp agent cannot found or old release of openstack,so Is there a bug on neutron what should I do to avoid this,should I install l3 agent on compute node or just ignore this error info since all VMs runns as usual?

-----------------------------------
Release: 11.0.3.dev21 on 2017-12-11 21:40
SHA: 1caba2d2e0f5c9752a858f28957d7dbb33640180
Source: https://git.openstack.org/cgit/openstack/neutron/tree/doc/source/install/verify-option2.rst
URL: https://docs.openstack.org/neutron/pike/install/verify-option2.html

Revision history for this message
Aaasidncza (aascigsz2) wrote :
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

For me it looks more like some error in l2 population mech driver because error message is IMHO returned to Linuxbridge agent from server as a response to rpc calls.
Can You check also what You have in neutron-server logs for same time?
And do You have l2pop enabled on Your cluster?

tags: added: linuxbridge
removed: doc
tags: added: l2-pop
Revision history for this message
Aaasidncza (aascigsz2) wrote :
Download full text (21.5 KiB)

Hi Slawek Kaplonski (slaweq),thanks for help.I'm not very sure about it's my fault or a bug of neutron,here is
server.log on controller node:
[HEAD]
[root@controller neutron]# head -n 50 server.log
2017-12-13 03:30:04.153 1862 INFO neutron.notifiers.nova [-] Nova event response
: {u'status': u'completed', u'tag': u'1b6e1b2c-aea4-41b1-9a98-9bee24ee8a0e', u'n
ame': u'network-vif-plugged', u'server_uuid': u'08044f19-3184-4f01-8c6f-5a2f5f03
f215', u'code': 200}
2017-12-13 03:30:04.153 1862 INFO neutron.notifiers.nova [-] Nova event response
: {u'status': u'completed', u'tag': u'f28b362c-0f15-41be-b08c-ce8f161fc8d4', u'n
ame': u'network-vif-plugged', u'server_uuid': u'97eff4de-19a2-4558-8562-0a1471bb
73c8', u'code': 200}
2017-12-13 03:30:04.153 1862 INFO neutron.notifiers.nova [-] Nova event response
: {u'status': u'completed', u'tag': u'd1b3dab4-fdf6-4cac-ac78-c1b33686655a', u'n
ame': u'network-vif-plugged', u'server_uuid': u'77722c79-e584-4200-b4c4-230bdecd
9a1a', u'code': 200}
2017-12-13 03:30:09.689 1862 INFO neutron.notifiers.nova [-] Nova event response
: {u'status': u'completed', u'tag': u'31f89863-e597-4f96-bb06-c092d9126775', u'n
ame': u'network-vif-plugged', u'server_uuid': u'ea492dec-12eb-4282-8196-97182cd6
1e08', u'code': 200}
2017-12-13 03:30:09.689 1862 INFO neutron.notifiers.nova [-] Nova event response
: {u'status': u'completed', u'tag': u'430af11e-a111-418d-b46e-1bdec38d62a7', u'n
ame': u'network-vif-plugged', u'server_uuid': u'87480b8a-ce25-45ef-8bed-785ac321
172d', u'code': 200}
2017-12-13 03:30:09.689 1862 INFO neutron.notifiers.nova [-] Nova event response
: {u'status': u'completed', u'tag': u'fdb0d573-1ddc-4228-86c7-d1b2e0ee9f14', u'n
ame': u'network-vif-plugged', u'server_uuid': u'b406c12e-95c9-4fa9-ae0b-14e30089
b02f', u'code': 200}
2017-12-13 03:30:10.241 1858 INFO neutron.wsgi [req-b5734451-603f-450b-a33b-4c54
efb7448d fc93769465504021a04b29bee9b93e8b c957b73284424c1bbe7aee5a8eefa075 - def
ault default] 10.88.1.10 "GET /v2.0/ports?tenant_id=2b17a96ea1054d8e8f67f808e0fd
7fe5&device_id=ea492dec-12eb-4282-8196-97182cd61e08 HTTP/1.1" status: 200 len:
1074 time: 0.0387630
2017-12-13 03:30:10.321 1858 INFO neutron.wsgi [req-762bc966-a6e6-4e01-8b76-b79a
86b930e8 fc93769465504021a04b29bee9b93e8b c957b73284424c1bbe7aee5a8eefa075 - def
ault default] 10.88.1.10 "GET /v2.0/networks?id=c08c44ed-d71e-4acb-b1a5-3ebc4715
a01d HTTP/1.1" status: 200 len: 891 time: 0.0778539
2017-12-13 03:30:10.339 1858 INFO neutron.wsgi [req-34ea4f7c-9f91-4648-ad82-c57e
26243d20 fc93769465504021a04b29bee9b93e8b c957b73284424c1bbe7aee5a8eefa075 - def
ault default] 10.88.1.10 "GET /v2.0/floatingips?fixed_ip_address=10.99.1.49&port
_id=31f89863-e597-4f96-bb06-c092d9126775 HTTP/1.1" status: 200 len: 217 time: 0
.0128708
2017-12-13 03:30:10.411 1862 ERROR oslo_messaging.rpc.server [req-7fbf230a-bf45-
4817-a7e8-447005e1700a - - - - -] Exception during message handling: AgentNotFou
ndByTypeHost: Agent with agent_type=L3 agent and host=compute1 could not be foun
d
2017-12-13 03:30:10.411 1862 ERROR oslo_messaging.rpc.server Traceback (most rec
ent call last):
2017-12-13 03:30:10.411 1862 ERROR oslo_messaging.rpc.server File "/usr/lib/py
thon2.7/site-packages/...

Aaasidncza (aascigsz2)
summary: - Networking Option 2: Self-service networks in neutron
+ NeutronServerTrytoFindL3agentOnComputeNodewhenWeUseLinuxBridge
Revision history for this message
Avtar singh (avtarsingh12015) wrote :
Download full text (13.3 KiB)

I am facing the same issue after the update from Centos 7.3 to Centos 7.4. Here are the compute linuxbridge logs :-

=================================
==> neutron/linuxbridge-agent.log <==
2017-12-19 06:15:52.481 39537 INFO neutron.agent.securitygroups_rpc [req-f8952149-d0bd-4a2a-862a-3bd6447bf853 - - - - -] Security group member updated [u'c0580b27-a250-4b63-94f2-81cf32f75518']
 2017-12-19 06:15:52.633 39537 ERROR neutron.plugins.ml2.drivers.agent._common_agent [req-f8952149-d0bd-4a2a-862a-3bd6447bf853 - - - - -] Error occurred while removing port tap7f94b3a4-1c: RemoteError: Remote error: AgentNotFoundByTypeHost Agent with agent_type=L3 agent and host=compute1 could not be found
[u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming\n res = self.dispatcher.dispatch(message)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch\n result = func(ctxt, **new_args)\n', u' File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py", line 234, in update_device_down\n n_const.PORT_STATUS_DOWN, host)\n', u' File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py", line 331, in notify_l2pop_port_wiring\n l2pop_driver.obj.update_port_down(port_context)\n', u' File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 253, in update_port_down\n admin_context, agent_host, [port[\'device_id\']]):\n', u' File "/usr/lib/python2.7/site-packages/neutron/db/l3_agentschedulers_db.py", line 303, in list_router_ids_on_host\n context, constants.AGENT_TYPE_L3, host)\n', u' File "/usr/lib/python2.7/site-packages/neutron/db/agents_db.py", line 291, in _get_agent_by_type_and_host\n host=host)\n', u'AgentNotFoundByTypeHost: Agent with agent_type=L3 agent and host=compute1 could not be found\n'].
2017-12-19 06:15:52.633 39537 ERROR neutron.plugins.ml2.drivers.agent._common_agent Traceback (most recent call last):
2017-12-19 06:15:52.633 39537 ERROR neutron.plugins.ml2.drivers.agent._common_agent File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 336, in treat_devices_removed
2017-12-19 06:15:52.633 39537 ERROR neutron.plugins.ml2.drivers.agent._common_agent cfg.CONF.host)
2017-12-19 06:15:52.633 39537 ERROR neutron.plugins.ml2.drivers.agent._common_agent File "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 139, in update_device_down
2017-12-19 06:15:52.633 39537 ERROR neutron.plugins.ml2.drivers.agent._common_agent agent_id=agent_id, host=host)
2017-12-19 06:15:52.633 39537 ERROR neutron.plugins.ml2.drivers.agent._common_agent File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 162, in call
2017-12-19 06:15:52.633 39537 ERROR neutron.plugins.ml2.drivers.agent._common_agent return self._original_context.call(ctxt, method, **kwargs)
2017-12-19 06:15:52.633 39537 ERROR neutron.plugins.ml2.drivers.a...

Revision history for this message
Aaasidncza (aascigsz2) wrote :
Revision history for this message
Georgy (suquant) wrote :

After restart

neutron-linuxbridge-cleanup.service
neutron-linuxbridge-agent.service
nova-compute.service

the problem may disappear!

Revision history for this message
Aalaesar (aalaesar) wrote :

Hi There
hitting the very same issue each time OpenStack stop an instance for migration or for stop
on ubuntu 16.04, followed Pike's installation guide for self-service network

the error appears on a compute host, each time the linux bridge agent needs to delete a port.
 - it happen only once if the instance is destroyed
 - it happen in loop if the instance is stopped or migrated.
   The loop can be stopped by restarting neutron-linuxbridge-cleanup.service or neutron-linuxbridge-agent.service. But the error come back as soon as another instance is stopped or migrated.

I had also noted (but note tested) some network performances drop on the instances in the same compute host during the error loops.

No errors logged on the neutron controller side...

I'm searching for configuration references to see if it's a config issue, but no luck yet.

Any hint ?
Regards,
Aal

Revision history for this message
Chris Apsey (bitskrieg) wrote :

All,

Also seeing this on Pike (UCA/Ubuntu 16.04) w/ linuxbridge and all neutron agents on dedicated network nodes. Errors about missing l3agents on compute nodes appears in the logs on both compute nodes and the dedicated network nodes, although it is much more severe on the compute nodes (if I ignore it for too long, vif plugging starts to fail for new instances unless I restart linuxbridge-agent). I have never installed l3agent on compute nodes, and can find no references to it anywhere within the database, so I'm not sure why neutron thinks there should be l3agents on compute nodes.

Is this a new best practice/requirement that we missed?

Revision history for this message
Robert Putt (robert-putt) wrote :

Hi,

Also experiencing this with a fresh Pike deployment in my OpenStack playground :-(. It is unfortunate as it means if someone is new to Pike and follow the installation guide online their OpenStack deployment is fundamentally broken. It would be great if we could get some more information about this. I am not going to bother posting my logs as they are just like the ones from the previous posts.

Best Regards,

Rob

Revision history for this message
Chris Apsey (bitskrieg) wrote :

Just want to add that this is still happening on Queens with UCA packages on 16.04.

Revision history for this message
Lucas Costa Beyeler (lucas.beyeler) wrote :

A new information: because of this bug I'm losing my instances when I try to shutdown/start. And like Chris said, the bugs still happens in Queens.

Revision history for this message
zhaobo (zhaobo6) wrote :
Changed in neutron:
status: New → Confirmed
importance: Undecided → Medium
Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
Revision history for this message
Kevin Tibi (ktibi) wrote :

Same bug on fresh pike install. And I use OVS.

Revision history for this message
Georgii (georgii.sytov) wrote :

OS: Centos
Fresh install Pike.
Same bug.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Thx guys for confirming that this issue happens for many of You. I started checking it and I hope I will push fix in next few days (it will be probably next week because of Easter) :)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/557908

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Kevin Tibi (ktibi) wrote :

Hi Slawek,

I use kolla with pike version. I use three controllers. I patch the first (ctrl01) with your patch but I have always the error log on each controller.

"Hostname": "ctrl01","Payload": "Failed to update device cb5f787c-8cd9-4995-a423-80dd6ea5838c down: AgentNotFoundByTypeHost: Agent with agent_type=L3 agent and host=compute02 could not be found"

"Hostname": "ctrl02","Payload": "Failed to update device cb5f787c-8cd9-4995-a423-80dd6ea5838c down: AgentNotFoundByTypeHost: Agent with agent_type=L3 agent and host=compute02 could not be found"

"Hostname": "ctrl03","Payload": "Failed to update device cb5f787c-8cd9-4995-a423-80dd6ea5838c down: AgentNotFoundByTypeHost: Agent with agent_type=L3 agent and host=compute02 could not be found"

patch apply on ctrl01 on "/usr/lib/python2.7/site-packages/neutron/db/l3_agentschedulers_db.py" :

from neutron_lib.exceptions import agent as agent_exc
......
    def list_router_ids_on_host(self, context, host, router_ids=None):
        try:
            agent = self._get_agent_by_type_and_host(
                context, constants.AGENT_TYPE_L3, host)
        except agent_exc.AgentNotFoundByTypeHost:
            LOG.debug("L3 Agent not found on host %s", host)
            return []
        if not agentschedulers_db.services_available(agent.admin_state_up):
            return []
        return self._get_router_ids_for_agent(context, agent, router_ids)

Th for your time ;)

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi Kevin,

Can You send me whole log from neutron-server and L3 agent where You have still this issue?

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@Kevin: are You sure that Your patch is applied properly? I compared for example fullstack logs from any (random) patch without my change to fix this issue and there is plenty of such error messages there, see e.g.: http://logs.openstack.org/77/558677/1/check/neutron-fullstack/ebea841/logs/dsvm-fullstack-logs/TestSecurityGroupsSameNetwork.test_securitygroup_linuxbridge-iptables_/neutron-server--2018-04-04--09-22-02-354194.txt.gz?level=ERROR
in fullstack logs from same test from my patch proposed to fix this issue there is no any of such errors in logs: http://logs.openstack.org/08/557908/1/check/neutron-fullstack/8e35c0b/logs/dsvm-fullstack-logs/TestSecurityGroupsSameNetwork.test_securitygroup_linuxbridge-iptables_/neutron-server--2018-03-30--09-19-31-057586.txt.gz - there are only debug messages which my patch adds in such case.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/557908
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7b0f6330d6f877f3d2093a64c2bca4c14334574c
Submitter: Zuul
Branch: master

commit 7b0f6330d6f877f3d2093a64c2bca4c14334574c
Author: Sławek Kapłoński <email address hidden>
Date: Fri Mar 30 10:24:59 2018 +0200

    Handle AgentNotFoundByTypeHost exception properly

    During listing router_ids on host it is possible that on some hosts
    there are no L3 agents.
    In such case AgentNotFoundByTypeHost exception is raised in
    neutron.db.agents_db module in _get_agent_by_type_and_host() method.
    Now this exception is properly handled during listing routers on host.

    Change-Id: Ia5ff1b57ef63c98b4ada4f2d46c45336e413be3d
    Closes-Bug: #1737917

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/559932

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/559932
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7a67987a646e64255984fa13cd7433351099c46d
Submitter: Zuul
Branch: stable/queens

commit 7a67987a646e64255984fa13cd7433351099c46d
Author: Sławek Kapłoński <email address hidden>
Date: Fri Mar 30 10:24:59 2018 +0200

    Handle AgentNotFoundByTypeHost exception properly

    During listing router_ids on host it is possible that on some hosts
    there are no L3 agents.
    In such case AgentNotFoundByTypeHost exception is raised in
    neutron.db.agents_db module in _get_agent_by_type_and_host() method.
    Now this exception is properly handled during listing routers on host.

    Change-Id: Ia5ff1b57ef63c98b4ada4f2d46c45336e413be3d
    Closes-Bug: #1737917
    (cherry picked from commit 7b0f6330d6f877f3d2093a64c2bca4c14334574c)

tags: added: in-stable-queens
Revision history for this message
Andreas (fattony666) wrote :

Hi,

Issue as well on Queens, Ubuntu 16.04.4 LTS.
Is there a workaround or fix already available that has been released?

We have "neutron-l3-agent/xenial-updates,xenial-updates,now 2:12.0.0-0ubuntu1.4~cloud0 all [installed]" installed, so far there is no update available for neutron-l3-agent package.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Fix for stable/queens should be already merged: https://review.openstack.org/#/c/559932/
Do You still have this issue with this patch applied?

Revision history for this message
Andreas (fattony666) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.0.0b1

This issue was fixed in the openstack/neutron 13.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/564755

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.2

This issue was fixed in the openstack/neutron 12.0.2 release.

Revision history for this message
Crazik (crazik) wrote :

Fingers crossed and waiting for package for Pike.

Revision history for this message
Crazik (crazik) wrote :

@slaweq Looks like you made a typo for Pike commit:

You have:

  `from neutron_lib.exceptions import agent as agent_exc`

it should be (analogy to Queens' patch)

  `from neutron.extensions import agent as agent_exc`

Revision history for this message
Crazik (crazik) wrote :

Sorry for the mess, I was looking at other commit.
Patch is ok, issue fixed in Pike release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/564755
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cb0afda244f87ef1e84b6b4b7ed6e759962d95fa
Submitter: Zuul
Branch: stable/pike

commit cb0afda244f87ef1e84b6b4b7ed6e759962d95fa
Author: Sławek Kapłoński <email address hidden>
Date: Fri Mar 30 10:24:59 2018 +0200

    Handle AgentNotFoundByTypeHost exception properly

    During listing router_ids on host it is possible that on some hosts
    there are no L3 agents.
    In such case AgentNotFoundByTypeHost exception is raised in
    neutron.db.agents_db module in _get_agent_by_type_and_host() method.
    Now this exception is properly handled during listing routers on host.

    Change-Id: Ia5ff1b57ef63c98b4ada4f2d46c45336e413be3d
    Closes-Bug: #1737917
    (cherry picked from commit 7b0f6330d6f877f3d2093a64c2bca4c14334574c)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.6

This issue was fixed in the openstack/neutron 11.0.6 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.