Scheduler is not working when OpenERP is running with gunicorn

Bug #944273 reported by James Jesudason
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Odoo Server (MOVED TO GITHUB)
Fix Released
Wishlist
OpenERP's Framework R&D

Bug Description

The OpenERP scheduler ois not working correctly in version 6.1 when OpenERP is running using gunicorn. In the initial investigations that I've done I think that the problem is because each gunicorn worker is running in a separate thread, so the memory space is different. OpenERP is designed to be stateless, so this should be fine. However, in the case of the scheduler, tasks are being pushed and popped from the heap... which will be different for each process.

I think that a more robust solution would be add the tasks to the database and the cron Master Thread process would read from the database (instead of the heap).

This is straightforward to verify: create some scheduled actions that are due to be run (there are some in the system by default) and start OpenERP using gunicorn. You should see that the scheduled actions aren't run. Then compare what happens when OpenERP is started in a single-threaded manner e.g. ./openerp-server -d dbname. This time you'll see that the schedule actions are run.

Related branches

Revision history for this message
Vo Minh Thu (thu) wrote :

Hi,

You are guessing too much :)

The main cron thread is simply not running at all when using Gunicorn. Gunicorn workers should only be used to serve requests (so resource consumption is directly related to the request and appropriate actions can be taken if limits are reached without affecting cron jobs).

That being said, we should look to providing a way to start the main cron thread in the main Gunicorn process.

Changed in openobject-server:
assignee: nobody → OpenERP Framework Experts (openerp-expert-framework)
importance: Undecided → Wishlist
status: New → Confirmed
assignee: OpenERP Framework Experts (openerp-expert-framework) → OpenERP's Framework R&D (openerp-dev-framework)
Revision history for this message
James Jesudason (jamesj) wrote :

Yes, after more investigation I did find that the cron thread is not being started. I added a call to 'openerp.cron.start_master_thread()' in openerp.wsgi.core.when_ready and the thread does get started. However, the scheduled actions are not being processed.

Having looked a bit more at cron.py I've seen that there are a number of global variables being used - is that going to work in a multi-threaded environment? Also, I can't see any confirmation that heapq is thread-safe - the alternative is to use Queue (http://docs.python.org/library/queue.html), which is thread-safe.

In the meantime, we are currently piloting version 6.1 and the scheduler plays an important role in our implementation. Do you have a workaround that will enable us to get the scheduler working when OpenERP is running with gunicorn?

Thanks

James

Revision history for this message
James Jesudason (jamesj) wrote :

The problem appears to be that the current code does not work in a multiprocessor and multi-threaded environment. There are a couple of solutions:

1. Simple Solution
Just have the thread running and checking every 60 seconds, with no wake-ups. This is demonstrated in this branch: lp:~jamesj/openobject-server/simple-cron-fix.

2. More Complex Solution
To keep the wake-up functionality (more-or-less the same), it is possible to use the python multiprocessing module (http://docs.python.org/library/multiprocessing.html). This allows the process to have both local and remote concurrency, and to have data to be passed between the processes. This involves replacing the Python 'threading' module with the 'Process' module, and replacing 'heapq' with 'Queue' (the multiprocessing version). A merge proposal from branch lp:~jamesj/openobject-server/complex-cron-fix has been provided.

Revision history for this message
Vo Minh Thu (thu) wrote :

When using Gunicorn, mutliple processes are used, but not multiple threads.

Without Gunicorn, multiple threads are used (provided Werkzeug is available). In that case, the cron jobs management is thread-safe (the use of the heapq is prtoected by locks).

As you have found, openerp.cron.start_master_thread() must be called to run the master thread. It could be run in the master (gunicorn) process if needed. But this is not enough: you have to instruct the master cron thread to check four jobs on a given database. To do so, schedule_cron_jobs() must be called on a particular registry (also called pool).

So a possible workaround when using OpenERP for a known database is to modify the Gunicorn on_starting hook. This can be done for instance by adding the two above calls in the on_starting() function provided in openerp/wsgi/core.py.

Another workaround is to run a non-gunicorn OpenERP server process with the database/registry loaded in order to run the cron jobs but not necessarily serve client requests (those being served by the gunicorn-enabled instance).

For multiple databases, probably the best idea is to use the second above possiblity (a second, cron-only, openerp instance) with some code (unexisintg for now) to load the necessary databases.

HTH,
Thu

Revision history for this message
James Jesudason (jamesj) wrote :

I've provided a merge proposal which works with both single and multiple processes. I think that this is a good long-term solution that keeps the same functionality as we have now. Could you take a look at that and merge it if you agree?

The first workaround that you have suggested still has problems due to the GIL. The merge proposal that I've provided gets around this.

Thanks

James

Revision history for this message
Vo Minh Thu (thu) wrote :

I have commented on the merge prop.

Revision history for this message
Leonardo Santagada (santagada) wrote :

At least for our deployments it would be preferable to have a process that does just cron jobs for a certain set of databases. This is the only way to garantee that scheduled works will happen, even if no one connected to that database on gunicorn or even elsewhere (like after a system restart).

Revision history for this message
Vo Minh Thu (thu) wrote :

It seems we forgot to close this bug.

A fix was provided in 6.1 at revision 4184, revision-id: <email address hidden> back in may.

It provides a separate program called 'openerp-cron-worker' that simply processes cron jobs. This means that when running OpenERP behind Gunicorn, you can run one or more 'openerp-cron-worker' to have one or more cron-handling processes. This works on a separate machine too. This is not very nice from an ops perspective as you need to manage an additional executable, but should prove quite flexible and effective.

This fix is not yet forward-ported in trunk. We might try to provide something a bit different (although the principle should remain the same).

Changed in openobject-server:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.