[RFE]Neutron API server: unexpected behavior with multiple long live clients

Bug #1800599 reported by Le, Huifeng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Unassigned

Bug Description

High level description:
The current openstack API server uses eventlet.wsgi.server implementation. The default behavior of eventlet.wsgi.server will do an accept() call before knowing whether a greenthread is available in the pool to service that socket. If all socket connections are shortlived then this is not an issue as a greenthread will eventually become available and the request will be serviced (hopefully before the client times out waiting).

But in some scenarios in real system, such as during large system deployment stage, there are many compute nodes which caused many long-lived connections from nova-compute to the neutron API, this will cause issue/unexpected behavior as below:

1. for single neutron server case:
if neutron server has all of its greenthreads tied up on open sockets, when one more connection request arrives, the server call accept() but will never distribute it to a working thread to process it and the client will timeout with long time waiting (e.g. CONF.client_socket_timeout)

Expect behavior: return quick TCP connect timeout if no processing thread available

2. for multiple neutron server cases (e.g. cfg.CONF.api_workers>1 or cpu_count>1):
in this case, there are multiple neutron server child processes waiting for client requests (e.g. doing accept() on the same socket), if one neutron server's accept() is invoked by linux kernel to accept a client request but all of its greenthreads had tied up on open sockets then the client will timeout with long time waiting. But actually, at this time, other neutron child processes may still have available greenthreads to process this request but there is no opportunity for them to process it (as accepted by the first neutron server child process).

Expect behavior: the request can be processed if any of neutron server process has available greenthread or return quick TCP connect timeout if no processing thread available

Version: latest devstack

Potential solution: implement a custom pool for wsgi.server which will block the spawn_n call (e.g. by sem.acquire()) to avoid calling accept() until green working thread available.

Tags: api
Revision history for this message
zhaobo (zhaobo6) wrote :

This may be a new RFE for Neutron. But to be honest, I'm not familiar about your mentioned, we can discuss this in the weekly driver meeting.

summary: - Neutron API server: unexpected behavior with multiple long live clients
+ [RFE]Neutron API server: unexpected behavior with multiple long live
+ clients
tags: added: rfe
Miguel Lavalle (minsel)
Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Miguel Lavalle (minsel) wrote :

Per the conversation we had during the latest weekly meeting (http://eavesdrop.openstack.org/meetings/networking/2018/networking.2018-11-06-14.00.log.html#l-135) Neutron server can be executed with apache WSGI integration. You mentioned during the meeting that you could verify such deployment. This is the documentation that we have: https://docs.openstack.org/neutron/latest/admin/config-wsgi.html. Are you going to try it?

Revision history for this message
Miguel Lavalle (minsel) wrote :

Since apache WSGI integration, let's treat this as a normal bug.

Revision history for this message
Miguel Lavalle (minsel) wrote :

You may also find this useful: https://review.openstack.org/#/c/580049/

tags: added: api
removed: rfe
Revision history for this message
Le, Huifeng (hle2) wrote :

Thanks much for the comments, will do the test with that configuration.

Revision history for this message
Le, Huifeng (hle2) wrote :

Miguel,
Thanks much for the instructions to enable new uwsgi based web server for neutron, Following the steps, it is verified that new uwsgi based web server can solve this issue.

I am using uwsgi (Neutron API behind uwsgi) for the test, and it seems the original Wiki steps missing one configuration to enable http socket which can listen port 9696:
[uwsgi]
...
http-socket = :9696

Could you please help to check whether it is the correct configuration?

Is there any roadmap or plan to switch to the new uswgi based web server for Neutron?

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Bug closed due to lack of activity, please feel free to reopen if needed.

Changed in neutron:
status: Confirmed → Won't Fix
status: Won't Fix → Fix Released
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

My bad, this bug is solved.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.