Too many connections to nova-api (and not cleaning up)?

Bug #1247056 reported by Marios Andreou on 2013-11-01
60
This bug affects 10 people
Affects Status Importance Assigned to Milestone
OpenStack Dashboard (Horizon)
High
Matthias Runge
Havana
High
Matthias Runge
python-novaclient
High
Tihomir Trifonov

Bug Description

We hit this bug while doing a tripleo/tuskar provision against 7 baremetal machines. Basically after everything was up and running for a while, whilst using the Horizon UI to view the active instances (the 7 baremetal machines that were provisioned as nova compute nodes) Horizon threw an error complaining about too may open files.

[stack@ucl-control-live ~]$ sudo lsof -i :8774 | wc -l
2073

Restarting openstack-nova-api closed them all (put them all into FIN_WAIT2 / CLOSE_WAIT.)

I was able to recreate this on a more 'standard' setup with devstack. To recreate:

1. Run devstack
2. Monitor connections to nova-api in a terminal: while true; sudo lsof -i :8774; date; sleep 2; done
   At this point for me the output here was steady at 10.

3. Log into Horizon and Launch an instance, or two.
4. In Horizon, alternate between "Project-->Overview" and "Project-->Instances"
5. Watch the output from lsof. In a short time I got this up to 150+. Leaving it idle (doing nothing more anywhere), the connections hang around (i.e. all in ESTABLISHED state). In fact they hang around even after I log out of Horizon.

Is this expected behaviour?

thanks, marios

David Lyle (david-lyle) on 2013-12-29
Changed in horizon:
status: New → Confirmed
importance: Undecided → Medium
Matthias Runge (mrunge) wrote :

I can reproduce this on a Havana and also on an Icehouse install;

During the tests, I got ~ 1700 ESTABLISHED connections

and if it errors out, it's on the page /project/overview

Currently, I'm a bit concerned, if a cloud provider already hit that too.

Changed in horizon:
importance: Medium → High
milestone: none → icehouse-3
Changed in horizon:
assignee: nobody → Tihomir Trifonov (ttrifonov)
Tihomir Trifonov (ttrifonov) wrote :

It seems Jenkins doesn't auto-update the issues sometimes. Here is the proposed patch:

https://review.openstack.org/#/c/75196

Changed in horizon:
status: Confirmed → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/76164

Changed in horizon:
assignee: Tihomir Trifonov (ttrifonov) → Matthias Runge (mrunge)
Matthias Runge (mrunge) on 2014-02-25
Changed in python-novaclient:
status: New → In Progress
Changed in python-novaclient:
assignee: nobody → Tihomir Trifonov (ttrifonov)

Reviewed: https://review.openstack.org/76164
Committed: https://git.openstack.org/cgit/openstack/horizon/commit/?id=ddc479272f5402ff778c45892acc3ac7613b7c11
Submitter: Jenkins
Branch: master

commit ddc479272f5402ff778c45892acc3ac7613b7c11
Author: Matthias Runge <email address hidden>
Date: Tue Feb 25 12:25:13 2014 +0100

    Reduce number of novaclient calls

    Currently, each client creates a new session for
    each call. This fix makes novaclient re-use
    sessions in the most obvoius cases.

    Partial-Bug: #1247056

    Change-Id: Ie99ecb66304cf40e4f5fdd31fab5162ed11b863e

Matthias Runge (mrunge) on 2014-02-27
tags: added: havana-backport-potential

Reviewed: https://review.openstack.org/75196
Committed: https://git.openstack.org/cgit/openstack/python-novaclient/commit/?id=36db3b95f556d5f57a2bf49303b24a0b25b4b7e8
Submitter: Jenkins
Branch: master

commit 36db3b95f556d5f57a2bf49303b24a0b25b4b7e8
Author: Tihomir Trifonov <email address hidden>
Date: Thu Feb 20 23:11:34 2014 +0200

    Fix in in novaclient, to avoid excessive conns

    The current client creates new .Session() on each request,
    but since Horizon is a stateless app, each Session creates
    new HttpAdapter, which itself has its own connection pool,
    and each connection there is used (almost) once and then
    is being kept in the pool(with Keep-Alive) for a certain
    amount of time(waiting for inactivity timeout). The problem
    is that the connection cannot be used anymore from next Django
    calls - they create new connection pool with new connections, etc.
    This keeps lots of open connections on the server.

    Now the client will store an HTTPAdapter for each URL into
    a singleton object, and will reuse its connections between
    Django calls, but still taking advantage of Sessions during
    a single page load(although we do not fully use this).

    Note: the default pool behavior is non-blocking, which means
    that if the max_pool_size is reached, a new connection will
    still be opened, and when released - will be discarded.
    It could be useful to add max_pool_size param into settings,
    for performance fine-tuning. The default max_pool_size is 10.

    Since python-novaclient is also used from non-Django projects,
    I'd expect feedback from more people on the impact this change
    could have over other projects.

    Patch Set 3: Removed explicit connection closing, leaving
    connections open in the pool.

    Change-Id: Icc9dc2fa2863d0e0e26a86c8180f2e0fbcd1fcff
    Closes-Bug: #1247056

Changed in python-novaclient:
status: In Progress → Fix Committed
melanie witt (melwitt) wrote :

novaclient 2.16.0 released on 2/26/2014

Changed in python-novaclient:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2014-03-05
Changed in horizon:
milestone: icehouse-3 → icehouse-rc1
Matthias Runge (mrunge) wrote :

Hmm, this was released before Icehouse-3 release (merged to master on Feb 25 th)

Changed in horizon:
status: In Progress → Fix Released
milestone: icehouse-rc1 → ongoing
milestone: ongoing → none
milestone: none → icehouse-rc1
Thierry Carrez (ttx) on 2014-03-17
Changed in horizon:
milestone: icehouse-rc1 → icehouse-3
Julie Pichon (jpichon) wrote :

For reference: a similar issue is open against the keystone client + Horizon, see bug 1282089.

Julie Pichon (jpichon) wrote :

FYI: There is an open review to revert the nova client changes or at least have them not be the default, at https://review.openstack.org/#/c/83041/

Alan Pevec (apevec) on 2014-03-27
Changed in python-novaclient:
importance: Undecided → High
tags: removed: havana-backport-potential

Reviewed: https://review.openstack.org/76795
Committed: https://git.openstack.org/cgit/openstack/horizon/commit/?id=cf181c1928f263c95ee5e46b168838e95c587881
Submitter: Jenkins
Branch: stable/havana

commit cf181c1928f263c95ee5e46b168838e95c587881
Author: Matthias Runge <email address hidden>
Date: Tue Feb 25 12:25:13 2014 +0100

    Reduce number of novaclient calls

    Currently, each client creates a new session for
    each call. This fix makes novaclient re-use
    sessions in the most obvoius cases.

    Conflicts:
     openstack_dashboard/api/nova.py

    Partial-Bug: #1247056

    Change-Id: Ie99ecb66304cf40e4f5fdd31fab5162ed11b863e
    (cherry picked from commit ddc479272f5402ff778c45892acc3ac7613b7c11)

tags: added: in-stable-havana

That patch does not help!

How to reproduce:

1. run watch 'netstat -np|grep apa|wc -l' on dashboard host (in separate terminal)
2. Go to dashboard
3. Create ten instances, wait until they up
4. Destroy them

Repeat steps 3-4 as nessesary.

Expected result: amount of connection is stable
Actual result: on every iteration of steps 3-4, amount of connections raised (+100 active connections).

Reviewed: https://review.openstack.org/83041
Committed: https://git.openstack.org/cgit/openstack/python-novaclient/commit/?id=98934d7bf1464afe0f7fe98efd2a591d95ac9c41
Submitter: Jenkins
Branch: master

commit 98934d7bf1464afe0f7fe98efd2a591d95ac9c41
Author: Boris Pavlovic <email address hidden>
Date: Wed Mar 26 15:22:03 2014 +0400

    Fix session handling in novaclient

    Prior to this patch, novaclient was handling sessions in an inconsistent
    manner.

    Every time we created a client instance, it would use a global
    connection pool, which made it difficult to use in a process that is
    meant to be forked.

    Obviously sessions like the ones provided by the requests library that
    will automatically cause connections to be kept alive should not be
    implicit. This patch moves the novaclient back to the age of a single
    session-less request call by default, but also adds two more
    resource-reuse friendly options that a user needs to be explicit about.

    The first one is that both v1_1 and v3 clients can now be used as
    context managers,. where the session will be kept open (and thus the
    connection kept-alive) for the duration of the with block. This is far
    more ideal for a web worker use-case as the session can be made
    request-long.

    The second one is the per-instance session. This is very similar to what
    we had up until now, except it is not a global object so forking is
    possible as long as each child instantiates it's own client. The session
    once created will be kept open for the duration of the client object
    lifetime.

    Please note: client instances are not thread safe. As can be seen from
    above forking example - if you wish to use threading/multiprocessing,
    you *must not* share client instances.

    DocImpact

    Related-bug: #1247056
    Closes-Bug: #1297796
    Co-authored-by: Nikola Dipanov <email address hidden>
    Change-Id: Id59e48f61bb3f3c6223302355c849e1e99673410

Thierry Carrez (ttx) on 2014-04-17
Changed in horizon:
milestone: icehouse-3 → 2014.1
Alan Pevec (apevec) on 2014-04-22
tags: removed: in-stable-havana
Dražen Lučanin (kermit666) wrote :

I seem to be experiencing this bug in Icehouse (installed in Ubuntu 14.04, python-novaclient version 1:2.17.0-0ubuntu1).

2014-06-05 11:48:09.648 28175 TRACE nova.api.openstack OperationalError: (OperationalError) (1040, 'Too many connections') None None
2014-06-05 11:48:09.648 28175 TRACE nova.api.openstack
2014-06-05 11:48:09.661 28175 INFO nova.api.openstack [req-b8aa11ed-f4d4-4db6-8d61-377f2955c178 e271c5f1ce7c4276877bd4a20b881d20 0c34bddcc123420297b283e8bee47684] http://controller:8774/v2/0c34bddcc123420297b283e8bee47684/servers returned with HTTP 500
2014-06-05 11:48:09.662 28175 INFO nova.osapi_compute.wsgi.server [req-b8aa11ed-f4d4-4db6-8d61-377f2955c178 e271c5f1ce7c4276877bd4a20b881d20 0c34bddcc123420297b283e8bee47684] 127.0.0.1 "GET /v2/0c34bddcc123420297b283e8bee47684/servers HTTP/1.1" status: 500 len: 335 time: 0.8759379

Any recommendations on how to resolve this?

Julie Pichon (jpichon) wrote :

This one appears to be coming from MySQL (?) and is likely different, I'd suggest checking your MySQL configuration is working.

Dražen Lučanin (kermit666) wrote :

OK, I may have found a possible solution - the max_connections MySQL setting [1]. For now I set it to 1000 and it seems to work.

Could be that the default of 151 was eaten up by all the OpenStack services, but it still seems more like a connection leak in my opinion. If nothing else, it's a documentation bug, because I followed all the Icehouse installation docs and there was no mention of setting this for MySQL.

[1]: http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_max_connections

Nguyen Van Duc (vanduc95) wrote :

I changed the configuration on file /etc/my.cnf.d/openstack.cnf (Centos) or /etc/mysql/conf.d/mysqld_openstack.cnf (Ubuntu) and was successful!

max_connections = 4096 --> max_connections = 10000

Hopy to help you!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers