Fuel for OpenStack

Bug #1324914
Comment #7

Comment 7 for bug 1324914

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2014-06-18:

tl;dr

1 + compute_nodes_count*5 is no way for large deployments (hundreds of compute nodes). We should consider a smaller value.

Long version:

As you may know OpenStack projects use eventlet green threads for handling of concurrent requests. In theory, it allows one to scale an IO-bound application (e.g. API services like Nova API, which mostly do RPC/DB calls) to processing thousands of requests concurrently in a single OS thread by simply monkey patching all socket operations, so that green thread context switch happens when the socket operation blocks.

In practice, not all socket operations can be monkey patched. Notable exceptions are Python modules using C libraries. In our deployments we use MySQL-python DB API driver which delegates all MySQL connectivity tasks to libmysqlclient C library. Obviously, eventlet can only monkey patch operations on Python socket objects, so calls to MySQL-python->libmysqlclient will block the whole process (i.e. if some DB query in Nova takes 2s to complete, all other green threads will be blocked for 2s, no other API requests can be processed).

eventlet has been providing a work around for this which allows one to execute blocking calls in OS threads, rather than green threads (database.use_tpool option in Nova, probably in other projects too). The problem with this approach is that you need a custom eventlet build for this feature to work (https://bitbucket.org/eventlet/eventlet/pull-request/29/ hasn't been merged to eventlet master branch yet).

Nova/Neutron/probably other services too have also been providing another work around for this problem. By the means of api_workers/osapi_compute_worker/etc options you can tell nova-api/neutron-server/etc process to fork right after the start. So even if one of the processes is blocked, there will be a few forks which can process new requests.

So the problem with the numbers you suggest here is that they are a way too large: e.g. for a 200 compute nodes deployment, you'll end up running 1001 nova-api/neutron-server forks on the controller nodes (which is just a waste of memory and CPU resources). I'd suggest to keep the number of forks small (2-3*number of CPU cores).

But we still have to solve the problem with MySQL-python blocking the whole process. So we have a few options here:

0. Do nothing. As long as we are using *sane* number of forks, API processes should work fine (though, obliviously, eventlet won't be that efficient as we want it to be).
1. Build custom eventlet package and set database.use_tpool to True in config files (Rackspace claim they do that, but I doubt anyone else has tried this).
2. Use pure python DB API driver (e.g. pymysql), but this can result in performance drop for API services.
3. Use PostgreSQL + psycopg2, but this is not a short term solution :-)
4. Check if it's possible to make MySQL-python cope well with eventlet (this will probably require a custom MySQL-python) - I have a few ideas on how to do this and going to spend some time on research. Will post the results here.

So whichever way we choose, I believe, we should not increase the number of forks that much, but rather keep it small, instead.

tl;dr

1 + compute_nodes_count*5 is no way for large deployments (hundreds of compute nodes). We should consider a smaller value.

Long version:

Nova/Neutron/probably other services too have also been providing another work around for this problem.  By the means of api_workers/osapi_compute_worker/etc options you can tell nova-api/neutron-server/etc process to fork right after the start. So even if one of the processes is blocked, there will be a few forks which can process new requests.

But we still have to solve the problem with MySQL-python blocking the whole process. So we have a few options here:

So whichever way we choose, I believe, we should not increase the number of forks that much, but rather keep it small, instead.