intempestive write concurrency popup

Bug #314696 reported by Raphaël Valyi - http://www.akretion.com
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Odoo Server (MOVED TO GITHUB)
Fix Released
High
Christophe Simonis (OpenERP)
Odoo Web Client
Fix Released
High
Amit Mendapara

Bug Description

recently on trunk (around RC3), we had a new kind of errors that has been recognized by other partners too:

sometimes, when saveing a record, at least with the GTK client, we have a "concurrent access" warning.

The popup seems to come fro the write method in orm.py and especially from that code snippet:
   if delta and self._log_access:
            for i in range(0, len(ids), cr.IN_MAX):
                sub_ids = ids[i:i+cr.IN_MAX]
                cr.execute("select (now() - min(write_date)) <= '%s'::interval " \
                        "from %s where id in (%s)" %
                        (delta, self._table, ",".join(map(str, sub_ids))))
                res = cr.fetchone()
                if res and res[0]:
                    for field in vals:
                        if field in self._columns and self._columns[field]._classic_write:
                            raise except_orm(_('ConcurrencyException'),
                                    _('This record was modified in the meanwhile'))

This is very hard to reproduce the bug, it really seems inconsistent.

We think this is due to now() - min(write_date)) <= '%s'::interval comparing very close numbers. I think that network latency or server temporary overhead might explain the bug.

Here in French the explanation I gave in the #openerp-fr irc channel on freenode:

13:11 rvalyi imaginez la situation suivante:
13:12 rvalyi instant t: le client lance un premier appel write et lance son propre chrono qui donnera le delta de temps d'édition au prochain appel
13:13 rvalyi or, à cause de la latence réseau ou d'éventuelle lenteur, le serveur tarde à écrire le recor et donc son write_date est sensiblement plus tard que l'instant t
13:14 rvalyi quelques secondes après, le client lance un second write à l'instant t + delta
13:15 rvalyi pour le client il s'est écoulé delta et c'est le param qu'il balance au serveur pour tester la concurrence (param rpc read_delta)
13:15 rvalyi or, le serveur est cette fois véloce et la latence réseau est faible
13:16 rvalyi du coup, le serveur, dit, dans ce code http://pastie.org/354575, que delta est supérieur à l'intervale now - write-date
13:16 rvalyi et une fausse concurrence est détectée, CQFD!!!!!!
13:16 rvalyi any thought?

I don't see any trivial fix that would also ensure true concurrent accessed are detected, but I hope this report helps.

Raphaël Valyi

Revision history for this message
Raphaël Valyi - http://www.akretion.com (rvalyi) wrote :

OK, so after discussing this on irc, with Cedric Krier and Christophe Chauvet (Syleam), it appear that it's just on old problem popping up again:
http://openerp.com/forum/topic4579.html?highlight=delta

I guess this annoyance just reached a new level, because both for Christophe and me, the popup ARE QUITE frequent. Like w few times an hour. The more loaded the server is, the more popups there are. I think it's because the Python mutithreading is such a joke (giant lock implementation), that answering several users (or scheduled jobs) just slow down the server, making it more likely the client read_delta is not consistent with the real delta time seen by the server.

Fabien even took position to defend the current position (see previous link), but may be he had no idea back in 2007 how often it does occur now.

Tryton fixed the bug by creating an other hole, meaning they favor not detecting real true concurrent writes instead of having false positives, see their commit on the code I mentioned:
http://hg.tryton.org/hgwebdir.cgi/trytond/rev/af52e79df675

Currently, the situation for ou customer is that popus occur so much often that: they are likely to say to their users: well if you see a popup, then just ignore it.
Meaning that the real result of such a choice is exactly the same as the concurrency hole in Tryton: users WILL MISS true positive concurrent writes. But instead of Tryton, in OpenERP you also fed up the usrs with the popups...

Instead, Cristophe and I think that we could dramatically improve the current test, at least by testing:
if both the concurrence is due to the SAME user and if the the concurrence condition is detected because OF A TINY TIME AMOUNT (like a half second), then we could IGNORE the concurrency warning. Indeed, in the worst case it will be the same users using several tabs in its client, but if he manage to save two different tabs within one second without remembering, he will deserve his popup at least.

Any thought?

Revision history for this message
Raphaël Valyi - http://www.akretion.com (rvalyi) wrote :

BTW, on the IRC, cdk mention that he think the system they did here http://hg.tryton.org/hgwebdir.cgi/trytond/rev/af52e79df675 is perfect:
- no methode signature change
- no more false positive
- doesn't miss true positive.

I didn't check myself if this is possible without pessimistic locking, but might be worth to double check what they do.

Revision history for this message
Amit Mendapara (cristatus) wrote :

The web client doesn't use time_delta that's why you don't see this error with web client. I just checked the related code in GTK client and found possible reason of the error though not sure, I can reproduce the same error with following code.

=======================

read_time = time.time()
data = proxy.read([1])[0]

del data['id']

ctx = context.copy()

ctx['read_delta'] = time.time() - read_time
res = proxy.write([1], data, ctx)

# now right the same record again

ctx['read_delta'] = time.time() - read_time
res = proxy.write([1], data, ctx)

=======================

You will see the concurrency error. The problem here is that we are not updating read_time after first write. The same happens with GTK client. The code of `record.py` at line no `136` should be...

=======================

        if reload:
            self.reload()
        else:
            self.read_time = time.time()

=======================

Not sure, but the above changes might fix the problem...

Revision history for this message
Christophe Simonis (OpenERP) (kangol) wrote :

I'm working on a patch, based on tryton idea.

BTW, why the web client do not use this possibility ?

Changed in openobject-server:
assignee: nobody → kangol
importance: Undecided → High
status: New → In Progress
Revision history for this message
Gerry Reno (greno-verizon) wrote :

As Albert, Pedro and I plus others have commented on this topic many many times over the past three years I've posted an update to the original thread in the forum:
http://www.openerp.com/forum/post28468.html#28468

I wish everyone would quit implementing bandaids for this problem and implement the proper 'design pattern', defacto standard method of dealing with concurrency and that is to use the optimistic LAST_UPDATE method.
OpenERP will never win this concurrency battle using anything based on read-delta approach. That was proven twenty years ago and is why the read-delta approaches were all totally abandoned.

Regards,
Gerry

Revision history for this message
Christophe Simonis (OpenERP) (kangol) wrote :

fixed by:
 * server: 1590 <email address hidden>
 * client: 887 <email address hidden>
 * web-client: to do

Changed in openobject-server:
status: In Progress → Fix Released
Changed in openobject-client-web:
status: New → Confirmed
Revision history for this message
Fabien (Open ERP) (fp-tinyerp) wrote : Re: [Bug 314696] Re: intempestive write concurrency popup

Wouhouw, well done !

Christophe (OpenERP) wrote:
> fixed by:
> * server: 1590 <email address hidden>
> * client: 887 <email address hidden>
> * web-client: to do
>
>
> ** Changed in: openobject-server
> Status: In Progress => Fix Released
>
> ** Changed in: openobject-client-web
> Status: New => Confirmed
>

--
Fabien Pinckaers
CEO Tiny - OpenERP Editor
Chaussée de Namur 40
B-1367 Grand-Rosière
Belgium
Phone: +32.81.81.37.00
Fax: +32.81.73.35.01
Web: http://openerp.com

Great Achievements Start With Tiny Investments
   -- Marty, 2005

Changed in openobject-client-web:
importance: Undecided → Medium
Changed in openobject-client-web:
assignee: nobody → ame-tiny-axelor
importance: Medium → High
status: Confirmed → In Progress
Revision history for this message
Amit Mendapara (cristatus) wrote :

Concurrency check has been implemented for FORM and TREE views. Work is in progress for Calendar and Gantt views...

Revision history for this message
Amit Mendapara (cristatus) wrote :

Concurrency check has just been implemented for CALENDAR and GANTT views too...

Changed in openobject-client-web:
status: In Progress → Fix Released
Revision history for this message
Raphaël Valyi - http://www.akretion.com (rvalyi) wrote :
Download full text (3.3 KiB)

Hey that's all great news, one more great setp forward for large
organization deployment, just like your nice caching system. With v4.2 wit
were warning our customers about large deployements of TinyERP, now I should
say we arguably re-considered our position and welcome it for large
deployments too.

On Fri, Jan 23, 2009 at 12:43 PM, ame (Tiny/Axelor) <email address hidden> wrote:

> Concurrency check has just been implemented for CALENDAR and GANTT views
> too...
>
> ** Changed in: openobject-client-web
> Status: In Progress => Fix Released
>
> --
> intempestive write concurrency popup
> https://bugs.launchpad.net/bugs/314696
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in OpenObject Web Client: Fix Released
> Status in OpenObject Server: Fix Released
>
> Bug description:
> recently on trunk (around RC3), we had a new kind of errors that has been
> recognized by other partners too:
>
> sometimes, when saveing a record, at least with the GTK client, we have a
> "concurrent access" warning.
>
> The popup seems to come fro the write method in orm.py and especially from
> that code snippet:
> if delta and self._log_access:
> for i in range(0, len(ids), cr.IN_MAX):
> sub_ids = ids[i:i+cr.IN_MAX]
> cr.execute("select (now() - min(write_date)) <=
> '%s'::interval " \
> "from %s where id in (%s)" %
> (delta, self._table, ",".join(map(str, sub_ids))))
> res = cr.fetchone()
> if res and res[0]:
> for field in vals:
> if field in self._columns and
> self._columns[field]._classic_write:
> raise except_orm(_('ConcurrencyException'),
> _('This record was modified in the
> meanwhile'))
>
>
> This is very hard to reproduce the bug, it really seems inconsistent.
>
> We think this is due to now() - min(write_date)) <= '%s'::interval
> comparing very close numbers. I think that network latency or server
> temporary overhead might explain the bug.
>
> Here in French the explanation I gave in the #openerp-fr irc channel on
> freenode:
>
> 13:11 rvalyi imaginez la situation suivante:
> 13:12 rvalyi instant t: le client lance un premier appel write et lance
> son propre chrono qui donnera le delta de temps d'édition au prochain appel
> 13:13 rvalyi or, à cause de la latence réseau ou d'éventuelle lenteur,
> le serveur tarde à écrire le recor et donc son write_date est sensiblement
> plus tard que l'instant t
> 13:14 rvalyi quelques secondes après, le client lance un second write à
> l'instant t + delta
> 13:15 rvalyi pour le client il s'est écoulé delta et c'est le param
> qu'il balance au serveur pour tester la concurrence (param rpc read_delta)
> 13:15 rvalyi or, le serveur est cette fois véloce et la latence réseau
> est faible
> 13:16 rvalyi du coup, le serveur, dit, dans ce code
> http://pastie.org/354575, que delta est supérieur à l'intervale now -
> write-date
> 13:16 rvalyi et une fausse concurrence est détectée, CQFD!!!!!!
> 13:16 rvalyi any thought?
...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.