Comment 8 for bug 677257

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote : Re: [Bug 677257] Re: Scheduler won't reschedule a task if it takes too long

On 12/19/2011 10:25 AM, Ghislain Nebra (INCB) wrote:
> I would also add that in _poolJos function, sql requests are using
> "now()" of Postgre whereas in the if statement, the "DateTime.now()"
> of Python is used. Only one "now" origin should be used : the Python
> one.
>
> Here is my suggestion :
> cr.execute("select * from ir_cron where numbercall<>0 and active
> and nextcall<=now() order by priority")
> should be
> cr.execute("select * from ir_cron where numbercall<>0 and active
> and nextcall<='" + now.strftime('%Y-%m-%d %H:%M:%S')
> + "' order by priority")
>
> This is very important if your PostgreSQL database is not on the same
> computer than the OpenERP server because a small difference in clock
> could lead to non-working cron jobs.

I agree that it would be more consistent to use the same reference time
everywhere in the code. Practically however the time offset should be
minimal because as of 6.1 we have switched the server to use UTC
everywhere, and we're storing only UTC timestamps in the database.
So the only difference here would be a clock sync difference between the
database machine and the OpenERP machine, as you say.

As of 6.1 the cron mechanism is multi-threaded and supports
load-balancing on multiple OpenERP servers talking to the same database.
In such environments it will be important to have all machines properly
time-sync'ed, e.g. with NTP, because there's a limit to what magic the
system can do to avoid sync issues.

In a load-balancing configuration there is always a way for things to go
wrong if all the servers are not properly sync'ed. Imagine a 1 hour
offset between the machines (while all of them think they're all at
UTC!): you'll get different results depending on which machine runs the
job vs. which was used to configure its execution time.

So if you can come up with a nice patch to improve this logic, feel free
to propose a merge for trunk (see the doc[1]), but keep in mind that
there's a limit to what we can do in case of bad time sync.
In time-critical deployment environments, setting up a proper clock sync
across all machines seems like a pre-requisite.

Note about the patch:
- you should pass the time value as a query parameter instead of
mangling the query string (bad practice!)
- you need to study the latest trunk code and make a patch against it,
such changes won't be accepted in a stable branch (this is a minor
improvement that can be replaced by proper config and clock sync)

> Moreover in this function, the "while nextcall < now and numbercall:"
> should be "while nextcall <= now and numbercall:" because if the cron
> job is planned at 1:00, the timer will wake up at 1:00 ... and you want
> your cron job to be executed !

Actually this will never be a problem because even if the job execution
was started exactly on the second where it was scheduled to run, the
value of datetime.now() includes precision up to the microsecond. It is
compared with the nextcall value from the database, which is rounded to
the second, so you'll get such a comparison:
 2011-12-19 10:33:07 < 2011-12-19 10:33:07.290877

Granted, changing the comparison to `<=` would not hurt (it's only here
to stop repeating calls when nextcall is in the future), but it won't
cause any issue here. If you make a merge proposal it should be accepted
if you change that operator too, if you like.

Thanks!

[1] http://bit.ly/openerp-contrib-mp