Same task executed several times at same time when cron is runnning in multi thread

Bug #1274997 reported by Laurent Mignon (Acsone)
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Odoo Server (MOVED TO GITHUB)
Confirmed
Undecided
Unassigned
OpenERP Community Backports (Server)
Fix Released
Undecided
Laurent Mignon (Acsone)
Server Environment And Tools
New
Undecided
Unassigned

Bug Description

When the cron is configured to run in multi thread (max_cron_threads > 1), Some times, the same task can be executed several times et same time.

Related branches

Changed in ocb-server:
assignee: nobody → Laurent Mignon (Acsone) (lmi)
Revision history for this message
Antony Lesuisse (OpenERP) (al-openerp) wrote :

Are you sure ?

SELECT * FROM ir_cron WHERE id=%s FOR UPDATE NOWAIT

raises an exception if an other worker or thread has already a lock on the job.

Revision history for this message
Antony Lesuisse (OpenERP) (al-openerp) wrote :

your patch makes non sense to me. Can you please explain precisely a scenario where things could go wrong ?

Revision history for this message
Laurent Mignon (Acsone) (lmi) wrote :

The problem occurs when running with more than 1 thread for the cron.
Imagine the case when you have 3 pending tasks to be executed at the next run of the cron and the cron is executed by 2 threads (max_cron_threads = 2)
1) The _aquire_job is launched at the same time on thread A and thread B
2) The list of jobs to execute by each thread is retrieved with a non locking query: "SELECT * FROM ir_cron WHERE numbercall != 0 AND active AND nextcall <= (now() at time zone 'UTC') ORDER BY priority"
2) At this time, the list of jobs to execute by each thread is the same for thread A and thread B:
     Job1 (execution time 0.2s), Job2 (execution time 0.04s), Job3 (execution time 0.03s)
4) Thread A acquires a lock on Job1 -> Thread B skip job1 (locked) and executes Job2
5) Thread B acquires and executes job3 (no job remains in the list of jobs for thread B)
6) Job1 is now completed and Thread A can execute the remaining jobs (job2 et job3)
result: In the same second , job2 and job3 are executed 2 times.
This occurs because the query used to acquire the lock doesn't check if the job has been executed in the interval between the query to get the list of jobs and the query to grab an exclusive lock for a job.
This can be avoided by checking in the query used to grab a lock if the job is still active, if the number of call is not reached and if the next execution time is earlier than now.

Regards,

lmi

Revision history for this message
Stéphane Bidoul (Acsone) (sbi) wrote :

Hello Antony,

We meet the problem in production and I believe Laurent's scenario is spot on.

Strictly speaking, jobs do not run *at the same time* (the lock prevents that indeed), but it is very real that jobs scheduled to run every hour end up running more than once in the same minute or second.

-sbi

Changed in ocb-server:
status: New → Confirmed
Changed in openobject-server:
status: New → Confirmed
Revision history for this message
Stefan Rijnhart (Opener) (stefan-opener) wrote :

Also affects server-env-tools I think, as cron_run_manually contains a copy of the affected code.

Changed in ocb-server:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.