Bug #677257 “Scheduler won't reschedule a task if it takes too l...” : Bugs : Odoo Server (MOVED TO GITHUB)

Revision history for this message

Don Kirkby (donkirkby) wrote on 2010-11-18:

#1

We're using version OpenERP 5.0.15 on Ubuntu 10.04.

Revision history for this message

Don Kirkby (donkirkby) wrote on 2010-11-18:

#2

I created a merge proposal that stops checking that the next time for a job is in the future. I asked Christophe to review it, since he modified some of the ir_cron code back in September.

Olivier Dony (Odoo) (odo-openerp) on 2010-11-23

Changed in openobject-server:
assignee:	nobody → OpenERP's Framework R&D (openerp-dev-framework)
importance:	Undecided → Low
status:	New → Confirmed

Revision history for this message

Olivier Dony (Odoo) (odo-openerp) wrote on 2010-12-23:

#3

Hello Don,

Your patch has been merged in trunk in revision 3147 <email address hidden>.
I am re-assigning the backport in 5.0 branch to the maintenance team.

Thanks a lot!

Changed in openobject-server:
milestone:	none → 6.0-rc2
status:	Confirmed → Fix Released
tags:	added: maintenance

Revision history for this message

Jay Vora (Serpent Consulting Services) (jayvora) wrote on 2010-12-31:

#4

Thank you Don.
Your branch has been merged into stable by revision 2168 <email address hidden>.

Revision history for this message

Claire (cjin) wrote on 2011-07-05:

#5

just wondering whether Openerp can reschedule if there is any change or updated sequence for manufacturing orders?

Revision history for this message

Don Kirkby (donkirkby) wrote on 2011-07-05:

#6

Hi Claire,
The last time I looked at the scheduler code, tasks could only be triggered by the clock, not by events. Some entities like production orders, sales orders, and invoices can trigger code from events using the workflow, but products and bills of materials don't have workflow.

This isn't really a good place for asking questions like that. You'd be better to ask them at http://www.openerp.com/forum/ or http://stackoverflow.com/

Revision history for this message

Ghislain Nebra (INCB) (ghislain-nebra-3) wrote on 2011-12-19:

#7

I would also add that in _poolJos function, sql requests are using "now()" of Postgre whereas in the if statement, the "DateTime.now()" of Python is used.
Only one "now" origin should be used : the Python one.

Here is my suggestion :
cr.execute("select * from ir_cron where numbercall<>0 and active and nextcall<=now() order by priority")
should be
cr.execute("select * from ir_cron where numbercall<>0 and active and nextcall<='" + now.strftime('%Y-%m-%d %H:%M:%S') + "' order by priority")

This is very important if your PostgreSQL database is not on the same computer than the OpenERP server because a small difference in clock could lead to non-working cron jobs.

Moreover in this function, the "while nextcall < now and numbercall:" should be "while nextcall <= now and numbercall:" because if the cron job is planned at 1:00, the timer will wake up at 1:00 ... and you want your cron job to be executed !

Revision history for this message

Olivier Dony (Odoo) (odo-openerp) wrote on 2011-12-19: Re: [Bug 677257] Re: Scheduler won't reschedule a task if it takes too long

#8

Download full text (3.4 KiB)

On 12/19/2011 10:25 AM, Ghislain Nebra (INCB) wrote:
> I would also add that in _poolJos function, sql requests are using
> "now()" of Postgre whereas in the if statement, the "DateTime.now()"
> of Python is used. Only one "now" origin should be used : the Python
> one.
>
> Here is my suggestion :
> cr.execute("select * from ir_cron where numbercall<>0 and active
> and nextcall<=now() order by priority")
> should be
> cr.execute("select * from ir_cron where numbercall<>0 and active
> and nextcall<='" + now.strftime('%Y-%m-%d %H:%M:%S')
> + "' order by priority")
>
> This is very important if your PostgreSQL database is not on the same
> computer than the OpenERP server because a small difference in clock
> could lead to non-working cron jobs.

I agree that it would be more consistent to use the same reference time
everywhere in the code. Practically however the time offset should be
minimal because as of 6.1 we have switched the server to use UTC
everywhere, and we're storing only UTC timestamps in the database.
So the only difference here would be a clock sync difference between the
database machine and the OpenERP machine, as you say.

As of 6.1 the cron mechanism is multi-threaded and supports
load-balancing on multiple OpenERP servers talking to the same database.
In such environments it will be important to have all machines properly
time-sync'ed, e.g. with NTP, because there's a limit to what magic the
system can do to avoid sync issues.

In a load-balancing configuration there is always a way for things to go
wrong if all the servers are not properly sync'ed. Imagine a 1 hour
offset between the machines (while all of them think they're all at
UTC!): you'll get different results depending on which machine runs the
job vs. which was used to configure its execution time.

So if you can come up with a nice patch to improve this logic, feel free
to propose a merge for trunk (see the doc[1]), but keep in mind that
there's a limit to what we can do in case of bad time sync.
In time-critical deployment environments, setting up a proper clock sync
across all machines seems like a pre-requisite.

Note about the patch:
- you should pass the time value as a query parameter instead of
mangling the query string (bad practice!)
- you need to study the latest trunk code and make a patch against it,
such changes won't be accepted in a stable branch (this is a minor
improvement that can be replaced by proper config and clock sync)

> Moreover in this function, the "while nextcall < now and numbercall:"
> should be "while nextcall <= now and numbercall:" because if the cron
> job is planned at 1:00, the timer will wake up at 1:00 ... and you want
> your cron job to be executed !

Actually this will never be a problem because even if the job execution
was started exactly on the second where it was scheduled to run, the
value of datetime.now() includes precision up to the microsecond. It is
compared with the nextcall value from the database, which is rounded to
the second, so you'll get such a comparison:
2011-12-19 10:33:07 < 2011-12-19 10:33:07.290877

Granted, changing the comparison to `<=` would not hurt (it's on...

On 12/19/2011 10:25 AM, Ghislain Nebra (INCB) wrote:
> I would also add that in _poolJos function, sql requests are using
> "now()" of Postgre whereas in the if statement, the "DateTime.now()"
> of Python is used. Only one "now" origin should be used : the Python
> one.
>
> Here is my suggestion :
> cr.execute("select * from ir_cron where numbercall<>0 and active
>             and nextcall<=now() order by priority")
> should be
> cr.execute("select * from ir_cron where numbercall<>0 and active
>             and nextcall<='" + now.strftime('%Y-%m-%d %H:%M:%S')
>             + "' order by priority")
>
> This is very important if your PostgreSQL database is not on the same
> computer than the OpenERP server because a small difference in clock
> could lead to non-working cron jobs.

I agree that it would be more consistent to use the same reference time
everywhere in the code. Practically however the time offset should be
minimal because as of 6.1 we have switched the server to use UTC
everywhere, and we're storing only UTC timestamps in the database.
So the only difference here would be a clock sync difference between the
database machine and the OpenERP machine, as you say.

As of 6.1 the cron mechanism is multi-threaded and supports
load-balancing on multiple OpenERP servers talking to the same database.
In such environments it will be important to have all machines properly
time-sync'ed, e.g. with NTP, because there's a limit to what magic the
system can do to avoid sync issues.

In a load-balancing configuration there is always a way for things to go
wrong if all the servers are not properly sync'ed. Imagine a 1 hour
offset between the machines (while all of them think they're all at
UTC!): you'll get different results depending on which machine runs the
job vs. which was used to configure its execution time.

So if you can come up with a nice patch to improve this logic, feel free
to propose a merge for trunk (see the doc[1]), but keep in mind that
there's a limit to what we can do in case of bad time sync.
In time-critical deployment environments, setting up a proper clock sync
across all machines seems like a pre-requisite.

Note about the patch:
- you should pass the time value as a query parameter instead of
mangling the query string (bad practice!)
- you need to study the latest trunk code and make a patch against it,
such changes won't be accepted in a stable branch (this is a minor
improvement that can be replaced by proper config and clock sync)

> Moreover in this function, the "while nextcall < now and numbercall:"
> should be "while nextcall <= now and numbercall:" because if the cron
> job is planned at 1:00, the timer will wake up at 1:00 ... and you want
> your cron job to be executed !

Actually this will never be a problem because even if the job execution
was started exactly on the second where it was scheduled to run, the
value of datetime.now() includes precision up to the microsecond. It is
compared with the nextcall value from the database, which is rounded to
the second, so you'll get such a comparison:
 2011-12-19 10:33:07 < 2011-12-19 10:33:07.290877

Granted, changing the comparison to `<=` would not hurt (it's only here
to stop repeating calls when nextcall is in the future), but it won't
cause any issue here. If you make a merge proposal it should be accepted
if you change that operator too, if you like.

Thanks!

[1] http://bit.ly/openerp-contrib-mp

Odoo Server (MOVED TO GITHUB)

Scheduler won't reschedule a task if it takes too long

Bug Description

Related branches

Other bug subscribers

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	Odoo Server (MOVED TO GITHUB)	Fix Released	Low	OpenERP's Framework R&D	Odoo Server (MOVED TO GITHUB) 6.0-rc2
	5.0	Fix Released	Low	Jay Vora (Serpent Consulting Services)	Odoo Server (MOVED TO GITHUB) 5.0.16