keystone shuts down it's own listening socket with 'too many files open'

Bug #1179709 reported by Robert Collins
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Dashboard (Horizon)
New
Undecided
Unassigned
OpenStack Identity (keystone)
New
High
Unassigned
python-keystoneclient
New
Undecided
Unassigned

Bug Description

We're running with mysql - I can grab a rendered config file for you if needed.

Here is what I see in the log:
Traceback (most recent call last):
  File "/opt/stack/venvs/keystone/local/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 97, in wait
    readers.get(fileno, noop).cb(fileno)
  File "/opt/stack/venvs/keystone/local/lib/python2.7/site-packages/eventlet/greenthread.py", line 194, in main
    result = function(*args, **kwargs)
  File "/opt/stack/venvs/keystone/local/lib/python2.7/site-packages/keystone/common/wsgi.py", line 135, in _run
    log=WritableLogger(log))
  File "/opt/stack/venvs/keystone/local/lib/python2.7/site-packages/eventlet/wsgi.py", line 663, in server
    client_socket = sock.accept()
  File "/opt/stack/venvs/keystone/local/lib/python2.7/site-packages/eventlet/greenio.py", line 166, in accept
    res = socket_accept(fd)
  File "/opt/stack/venvs/keystone/local/lib/python2.7/site-packages/eventlet/greenio.py", line 56, in socket_accept
    return descriptor.accept()
  File "/usr/lib/python2.7/socket.py", line 202, in accept
    sock, addr = self._sock.accept()
error: [Errno 24] Too many open files
Removing descriptor: 5

keystone-all is still running, netstat -anp shows:
tcp 0 0 0.0.0.0:35357 0.0.0.0:* LISTEN 25530/python
unix 3 [ ] STREAM CONNECTED 44834812 25530/python

restarting it gets me:
root@ubuntu:~# netstat -anp | grep 25267
tcp 0 0 0.0.0.0:35357 0.0.0.0:* LISTEN 25267/python
tcp 0 0 0.0.0.0:5000 0.0.0.0:* LISTEN 25267/python
unix 3 [ ] STREAM CONNECTED 44848565 25267/python

Which is rather more useful.

Revision history for this message
Robert Collins (lifeless) wrote :

Oh, and this happened on an idle cloud sitting there with no requests being sent to it...

Dolph Mathews (dolph)
Changed in keystone:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Matthieu Huin (mhu-s) wrote :

Hello,

What is the output of the following commands :

cat /proc/sys/fs/file-max
ulimit -Sn
ulimit -Hn

Your system might be undersized in terms of file descriptors that can be opened.

Revision history for this message
Robert Collins (lifeless) wrote :

Well, possibly, but even so - closing *other* FD's should happen first, no?
cat /proc/sys/fs/file-max
8196860
root@foo:~# ulimit -Sn
1024
root@foo:~# ulimit -Hn
4096

Seems like oodles of fd's for a one-node bare metal cloud with approximately no load.

Dolph Mathews (dolph)
tags: added: grizzly-backport-potential
Revision history for this message
Matthieu Huin (mhu-s) wrote :

Robert, you might want to increase manually the file-max parameters, see for example http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

Revision history for this message
David Stanek (dstanek) wrote :

Can you use lsof to get a list of the open file handles? That will help us to understand if there is a file descriptor leak.

Revision history for this message
David Stanek (dstanek) wrote :

In chat on 10/3 Robert mentioned that he hasn't hit this issue in months.

Revision history for this message
Adam Young (ayoung) wrote :

Leaving in incomplete state until it reappears or we decide to close

Changed in keystone:
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Keystone because there has been no activity for 60 days.]

Changed in keystone:
status: Incomplete → Expired
Revision history for this message
John Griffith (john-griffith) wrote :

Seems we hit this today on a Havana build. Nothing really going on with the system... went to dinner, user came back and couldn't log in. I logged on to controller and noticed traces in keystone, restarted services and back in business, but not sure what's actually causing this.

Changed in keystone:
status: Expired → New
Revision history for this message
Florent Flament (florentflament) wrote :

It seems that there may be some sockets leakage with Apache / Horizon. After some investigation playing with lsof, I found out that apache2 keeps establishing new connexions towards Keystone and other services without closing them.

On my devstack, there isn't any spawned VMs, nor any other resources used. lsof tells me that there are 25 TCP connexions estatblished from apache2 to Keystone and 109 connexions from apache2 to nova-api.

$ sudo lsof -a -c apache2 -i :5000 | wc -l
25
$ sudo lsof -a -c apache2 -i :8774 | wc -l
109

After launching a VM and destroying it, through the Horizon dashboard, lsof reports 29 TCP connexions established between apache2 and Keystone and 128 between apache2 and nova-api.

$ sudo lsof -a -c apache2 -i :5000 | wc -l
29
$ sudo lsof -a -c apache2 -i :8774 | wc -l
128

It looks like Horizon opens sockets that it doesn't close. At some point, there may be too many sockets opened on the Horizon server and/or Keystone and/or any other OpenStack service.

I didn't observe the issue by using the CLI.

I guess that further investigation would be required on how Horizon manages open connexions.

Revision history for this message
mouadino (mouadino) wrote :
Revision history for this message
mouadino (mouadino) wrote :

Basically what happen is the following:

In each request sent by keystoneclient to the server the client will create a requests.Session (python-requests) for it, and because python-requests Sessions are using a connection pool under the hood with the help of http 1.1 keep alive, this connections doesn't get closed after each request instead they are kept open for re-use (Which is great) but the problem is that they will never be re-used because like i said keystoneclient create a requests.Session (a connections pool) of each requests, so you end up with a lot of opened socket which neither the server will close (because keystone server doesn't have a keep alive timeout which is bad), nor the client (at least until the Python GC claim them).

So at one point you reach the limit of opened file of the server and your server just hang there until restart.

I did propose of fix for keystoneclient that no one liked, but for people that use apached/nginx as reverse proxy to keystone server i was thinking that they should be able to configure the keepalive timeout to not allow close this connections but it's a hack, the real fix IMHO will be to fix keystoneclient not horizon nor nova ... .

HTH,

Revision history for this message
Julie Pichon (jpichon) wrote :

Bug 1247056 which handled a similar issue with the nova client may also be of interest.

Revision history for this message
Florent Flament (florentflament) wrote :

mouadino, julie, nice catch.

Looks like this bug is a duplicate of bug 1232089 ( https://bugs.launchpad.net/python-keystoneclient/+bug/1282089 ).

Revision history for this message
Florent Flament (florentflament) wrote :

Sorry, wanted to say bug 1282089.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.