Eventlet fails when starting network agents
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
I have a two nodes openstack setup, where one node (w3) runs all controller services (keystone, glance, placement, nova, neutron, horizon, cinder) as well as nova-compute and cinder-volume and the second (w6) runs nova-compute and linuxbridge agent.
All network agents on w3 are dead
[root@w3 ~]# openstack network agent list
+------
| ID | Agent Type | Host | Availability Zone | Alive | State | Binary |
+------
| 330269b7-
| 83d16241-
| a52ab60f-
| abd75644-
| c05c65bc-
+------
, and I cannot start them anymore. I tried restarting said agent alone, restarting all openstack daemon on w3 and even restarting the whole node but nothing seems to help and I always have teh same issue and the same trace as show below.
I could not find any useful info in the logs, but systemd does report an issue with eventlet/greenlet:
[root@w3 ~]# journalctl -fu neutron-
-- Logs begin at Wed 2022-03-16 04:32:31 EDT. --
Mar 16 09:14:09 w3.int.lunarc sudo[37085]: neutron : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=
Mar 16 09:14:12 w3.int.lunarc sudo[37107]: neutron : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=
Mar 16 09:14:15 w3.int.lunarc neutron-
Mar 16 09:14:15 w3.int.lunarc neutron-
Mar 16 09:14:15 w3.int.lunarc neutron-
Mar 16 09:14:15 w3.int.lunarc neutron-
Mar 16 09:14:15 w3.int.lunarc neutron-
Mar 16 09:14:15 w3.int.lunarc neutron-
Mar 16 09:14:15 w3.int.lunarc neutron-
Mar 16 09:14:15 w3.int.lunarc neutron-
I am running OpenStack Xena on CentOS Stream 8 freshly installed. Here are other details:
[root@w3 ~]# uname -a
Linux w3.int.lunarc 4.18.0-
Any clue on how I can find out what makes this happen, or just how I can get past this crippling greenlet/eventlet error, and get these agents to run again?
Based on the information at hand, I don't think this is a bug. The upstream Neutron team tests the linuxbridge agent with every patch that is submitted for the stable/xena branch: https:/ /review. opendev. org/q/project: openstack% 252Fneutron+ branch: stable% 252Fxena+ status: open . One example is https:/ /review. opendev. org/c/openstack /neutron/ +/833857. Note that we have a job named neutron- tempest- plugin- scenario- linuxbridge- xena. Here's the successful execution of that job for the aforementioned patch https:/ /zuul.opendev. org/t/openstack /build/ 1ee2736adf46480 ea9d48f125b5cd2 29 and the corresponding linuxbridge agent log https:/ /storage. bhs.cloud. ovh.net/ v1/AUTH_ dcaab5e32b234d5 6b626f72581e364 4c/zuul_ opendev_ logs_1ee/ 833857/ 1/check/ neutron- tempest- plugin- scenario- linuxbridge- xena/1ee2736/ controller/ logs/screen- q-agt.txt . As you can see, that agent started running today with no problems: "Logs begin at Wed 2022-03-16 10:05:56 UTC".
As far of thoughts on what may be causing your problem, I lean towards an installation / dependencies issue:
1) Is this a fresh new install or did it run successfully before and has started failing recently? If the latter, what changed?
2) Your logging seem a little odd. Note that the third line in the upstream log is: Mar 16 10:19:43.410735 nested- virt-ubuntu- focal-vexxhost- ca-ymq- 1-0028853094 neutron- linuxbridge- agent[71074] : INFO neutron. common. config [-] Logging enabled!. In your case, logging doesn't seem to be enabled at all. The second thing the agent does is setup logging: https:/ /github. com/openstack/ neutron/ blob/2f4661c876 81567bb08d7733c 723c2b0c31ed6c8 /neutron/ plugins/ ml2/drivers/ linuxbridge/ agent/linuxbrid ge_neutron_ agent.py# L1017
3) Next, the agent sets up privsep (to be able to execute commands with privileges): https:/ /github. com/openstack/ neutron/ blob/2f4661c876 81567bb08d7733c 723c2b0c31ed6c8 /neutron/ plugins/ ml2/drivers/ linuxbridge/ agent/linuxbrid ge_neutron_ agent.py# L1018. It seems this is where your agent fails, because we see in your log twice:
Mar 16 09:14:09 w3.int.lunarc sudo[37085]: neutron : TTY=unknown ; PWD=/ ; USER=root ; COMMAND= /bin/neutron- rootwrap /etc/neutron/ rootwrap. conf privsep-helper --config-file /usr/share/ neutron/ neutron- dist.conf --config-file /etc/neutron/ neutron. conf --config-file /etc/neutron/ plugins/ ml2/linuxbridge _agent. ini --config-dir /etc/neutron/ conf.d/ neutron- linuxbridge- agent --privsep_context neutron. privileged. default --privsep_sock_path /tmp/tmp93tzwqg 3/privsep. sock
and then the traceback. Comparing with the upstream agent, we see that the privsep daemon starts running successfully:
Mar 16 10:19:43.421615 nested- virt-ubuntu- focal-vexxhost- ca-ymq- 1-0028853094 sudo[71765]: stack : TTY=unknown ; PWD=/ ; USER=root ; COMMAND= /usr/local/ bin/neutron- rootwrap /etc/neutron/ rootwrap. conf privsep-helper --config-file /etc/neutron/ neutron. conf --config-file /etc/neutron/ plugins/ ml2/ml2_ conf.ini --privsep_context neutron. privileged. default --privsep_sock_path /tmp/tmppedn6su i/privsep. sock virt-ubuntu- focal-vexxhost- ca-ymq- 1-0028853094 sudo[71765]: pam_unix( sudo:session) : session opened for user root by (uid=0)
Mar 16 10:19:43.421948 nested-
Mar 16 10:19:43.898488 nested-virt-...