HTTP Keep-alive connections prevent keystone from terminating

Bug #1408612 reported by Mark Goddard
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
New
Undecided
Unassigned

Bug Description

Seen on RDO Juno, running on CentOS 7.

Steps to reproduce:

- Set admin_workers=1 and public_workers=1 in /etc/keystone/keystone.conf
- Start the keystone service: `systemctl start openstack-keystone`
- Start a 'persistent' TCP connection to keystone: `telnet localhost 5000 &`
- Stop the service: `systemctl stop openstack-keystone`

The final systemctl invokation will hang, as the process fails to terminate. Eventually it will time out and forcefully kill the process.

Output of `systemctl status openstack-keystone`:

Jan 08 05:07:38 mgoddard systemd[1]: openstack-keystone.service stopping timed out. Killing.
Jan 08 05:07:38 mgoddard systemd[1]: openstack-keystone.service: main process exited, code=killed, status=9/KILL
Jan 08 05:07:38 mgoddard systemd[1]: Stopped OpenStack Identity Service.
Jan 08 05:07:38 mgoddard systemd[1]: Unit openstack-keystone.service entered failed state.

The use of telnet here is just to demonstrate the problem. The same effect can be seen when OpenStack services maintain persistent connections to keystone.

With multiple worker processes, the issue is not observed. It is believed that as systemd is able to kill the parent process, the child process holding the persistent connection is killed by systemd, so the issue is not observed (although this is speculation).

When this issue was first observed, multiple workers were used and systemd was not in use. Rather, we used init scripts in /etc/init.d/. In this case the result was worse, as the `service openstack-keystone stop` command would exit successfully, but fail to terminate any child processes with persistent HTTP connections open. Subsequent attempts to start the keystone service would fail due to the lingering stale process.

During the investigation of the issue, some root cause analysis was performed which will be presented below.

- When a keystone process receives SIGTERM, it ends up waiting for all greenthreads in the greenpool to finish at https://github.com/eventlet/eventlet/blob/8d2474197de4827a7bca9c33e71a82573b6fc721/eventlet/wsgi.py#L267.
- Persistent connections, when between HTTP requests, end up waiting at https://github.com/eventlet/eventlet/blob/8d2474197de4827a7bca9c33e71a82573b6fc721/eventlet/wsgi.py#L267 for the next request. The greenthread will not terminate until the connection is closed.

The process will therefore not terminate until all connections have closed. It seems sensible to me to finish servicing individual requests for a graceful shutdown, but there needs to be a mechanism to close persistent connections between requests.

This issue could (should?) be solved in eventlet.wsgi by a mechanism to trigger disconnection of persistent connections between requests when the server is stopped.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

This is related to a known bug in greenlet/eventlet. The general solution is either to disable keepalives or to deploy under Apache. I will mark this as a duplicate of the larger eventlet bug, but in short there is relatively little that can be done.the answer is do not deploy keystone under eventlet.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.