NeutronServerTrytoFindL3agentOnComputeNodewhenWeUseLinuxBridge
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
Slawek Kaplonski |
Bug Description
Hi all Cool Developer of Openstack,I followed the pike install doc and successful running openstack now,and In
Neutron configuration section,Verify operation Networking Option 2: Self-service networks(I choosed Self-Servrice network to deploy on Centos7x64 everything up to date)there we see:
The output should indicate four agents on the controller node and one agent on each compute node.
$ openstack network agent list
+------
| ID | Agent Type | Host | Availability Zone | Alive | State | Binary |
+------
| f49a4b81-
| 27eee952-
| 08905043-
| 830344ff-
| dd3644c9-
+------
and I run this command on controller node too after setup and configuration for openstack is over and then got this output:
[root@controller neutron]# openstack network agent list
+------
-------
| ID | Agent Type | Host | Avail
ability Zone | Alive | State | Binary |
+------
-------
| 010608cc-
| :-) | UP | neutron-
| 09cd7c61-
| :-) | UP | neutron-
| 865bcc2f-
| :-) | UP | neutron-l3-agent |
| 960a7d2c-
| :-) | UP | neutron-dhcp-agent |
| c2bb1518-
| :-) | UP | neutron-
+------
-------
seems running normally just like what we got in offical docs.and ip netns on controller:
[root@controller neutron]# ip netns exec qrouter-
958358 ping -c 5 www.bing.com
PING cn-0001.
64 bytes from 202.89.233.100 (202.89.233.100): icmp_seq=1 ttl=115 time=31.6 ms
64 bytes from 202.89.233.100 (202.89.233.100): icmp_seq=2 ttl=115 time=31.5 ms
64 bytes from 202.89.233.100 (202.89.233.100): icmp_seq=3 ttl=115 time=31.4 ms
64 bytes from 202.89.233.100 (202.89.233.100): icmp_seq=4 ttl=115 time=32.4 ms
64 bytes from 202.89.233.100 (202.89.233.100): icmp_seq=5 ttl=115 time=31.8 ms
--- cn-0001.
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 31.490/
[root@controller neutron]# ip netns
qrouter-
qdhcp-fca97929-
qdhcp-c08c44ed-
and every virtual machine on openstack runs perfectly(self network or provider network,access to Internet),except one
day I log into compute node,and found the linuxbridge.log seems too big(over 300mb),quickly I use grep to filter out most INFO information and got so many ERROR log like this(also the only one type of error happens every fix period):
2017-12-13 17:23:16.030 1334 INFO neutron.
-bf45-4817-
f-7451-
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt [req-7fbf230a-
oving port tapa97e89f4-2d: RemoteError: Remote error: AgentNotFoundBy
ent with agent_type=L3 agent and host=compute1 could not be found
[u'Traceback (most recent call last):\n', u' File "/usr/lib/
ages/oslo_
lf.dispatcher.
slo_messaging/
patch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/
es/oslo_
nc(ctxt, **new_args)\n', u' File "/usr/lib/
ins/ml2/rpc.py", line 234, in update_
host)\n', u' File "/usr/lib/
, line 331, in notify_
ort_context)\n', u' File "/usr/lib/
drivers/
agent_host, [port[\
ges/neutron/
context, constants.
e-packages/
host=host)\n', u'AgentNotFound
host=compute1 could not be found\n'].
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt Traceback (most recent call last):
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt File "/usr/lib/
ommon_agent.py", line 336, in treat_devices_
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt cfg.CONF.host)
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt File "/usr/lib/
update_device_down
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt agent_id=agent_id, host=host)
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt File "/usr/lib/
call
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt return self._original_
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt File "/usr/lib/
169, in call
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt retry=self.retry)
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt File "/usr/lib/
23, in _send
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt timeout=timeout, retry=retry)
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt File "/usr/lib/
y", line 578, in send
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt retry=retry)
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt File "/usr/lib/
y", line 569, in _send
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt raise result
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt RemoteError: Remote error: AgentNotFoundBy
gent and host=compute1 could not be found
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt [u'Traceback (most recent call last):\n', u' File "/usr/lib/
ackages/
self.dispatche
s/oslo_
dispatch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/
kages/oslo_
func(ctxt, **new_args)\n', u' File "/usr/lib/
lugins/ml2/rpc.py", line 234, in update_
N, host)\n', u' File "/usr/lib/
py", line 331, in notify_
n(port_context)\n', u' File "/usr/lib/
l2/drivers/
xt, agent_host, [port[\
ckages/
t\n context, constants.
site-packages/
\n host=host)\n', u'AgentNotFound
nd host=compute1 could not be found\n'].
2017-12-13 17:23:16.172 1334 ERROR neutron.
nt
2017-12-13 17:23:16.174 1334 INFO neutron.
t [req-7fbf230a-
removed
2017-12-13 17:23:16.218 1334 INFO neutron.
t [req-7fbf230a-
ed.
2017-12-13 17:23:16.218 1334 INFO neutron.
t [req-7fbf230a-
removed
Here we see
"AgentNotFoundB
nd host=compute1 could not be found\n"
So I come back to the compute node neutron configration section to check if there was setup l3 agent on compute node I was missing,But checked and found there never mentions that we need to install l3 agent on compute node.
I googled these keywords and found only a little posts about this problem,and most posts was about dhcp agent cannot found or old release of openstack,so Is there a bug on neutron what should I do to avoid this,should I install l3 agent on compute node or just ignore this error info since all VMs runns as usual?
-------
Release: 11.0.3.dev21 on 2017-12-11 21:40
SHA: 1caba2d2e0f5c97
Source: https:/
URL: https:/
summary: |
- Networking Option 2: Self-service networks in neutron + NeutronServerTrytoFindL3agentOnComputeNodewhenWeUseLinuxBridge |
Changed in neutron: | |
assignee: | nobody → Slawek Kaplonski (slaweq) |
For me it looks more like some error in l2 population mech driver because error message is IMHO returned to Linuxbridge agent from server as a response to rpc calls.
Can You check also what You have in neutron-server logs for same time?
And do You have l2pop enabled on Your cluster?