race condition in check_assign() function in stock.picking

Bug #507389 reported by Albert Cervera i Areny - http://www.NaN-tic.com
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Odoo Addons (MOVED TO GITHUB)
Fix Released
High
Olivier Dony (Odoo)

Bug Description

check_assign() function in stock.picking does not ensure no other concurrent calls to the function are being executed and thus may assign more stock than really available. This simply means that OpenERP does not not properly control real stock.

This is a critical bug and has been discussed in community-leaders and expert-framework mailing lists. Mail follows:

Although OpenERP uses serializable isolation level for all transactions this
doesn't solve the problem of assigning two or more stock moves when only one
should be possible.

Let me explain:

Say you have one remaining unit of product X at location Stock. There are also
two outgoing pickings that require this product. Two users may "Check
Availability" at the same time and both stock moves would be marked as
assigned although only one should.

This is true because check_assign() function in stock.picking doesn't try to
avoid that. What the function does is:

- Search how much stock we have using _product_reserve() from stock.location
- If value is above zero, assign it

So if two transactions do the same at the same time both will assign the move.

This may seem unprobable but I don't think it is given that:
- Checking available stock requires a sequential scan on stock_move table
which may be really large.
- It's not necessary that both reads happen at the same time, it's only
necessary that both transactions happen at the same time. In some cases ther
might be large pickings which would take a long time to process. In others the
process may be part of another automated process that does many other things
and transaction time may be very long.

If my worries are true, I'd go for solving the issue in check_assign()
function. My proposal would be to create a new table in postgres just to
handle the lock (ie 'stock_move_lock').

The first thing check_assign() function should do is "LOCK stock_move_lock"
which would be release at the end of the transaction (this is simply how it
works in Postgres, there's not UNLOCK mechanism).

The proposal of using a new table for locking is base on a two facts:
- We don't want to lock 'stock_move' because other processes who only want a
value to be shown to the user shouldn't be locked. There's nothing critical in
there. Those who reallly want to assign the stock must use check_assign()
anyway.
- We don't want a semaphore because this would not allow running several
openobject servers against the same database.

Related branches

Revision history for this message
Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote :

Attached patch should fix the issue while allowing reads on stock.move.

Revision history for this message
Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote :

Ups, where I said "stock.picking" I obviously meant "stock.move"!

Revision history for this message
Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote :

The attached file is an updated patch which adds the lock to the force_assign() function too. The lock is required to any process that updates (or inserts) a stock.move in 'assigned' stated.

Changed in openobject-addons:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Xavier (Open ERP) (xmo-deactivatedaccount) wrote :

workshop conclusion: should use SELECT FOR UPDATE NOWAIT (nowait => avoid deadlocks, throw error and display message)

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

We'll make this the suggested approach for all similar cases in the OpenERP technical guidelines that will be online very soon.

Revision history for this message
Samantha (samantha-z-mathews) wrote :

any news?

Changed in openobject-addons:
status: Confirmed → In Progress
assignee: nobody → Olivier Dony (OpenERP) (odo)
milestone: none → 5.0.13
Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Samantha wrote:
> any news?

Hello,

I've implemented a solution based on what we discussed in our workshop in the community days (using row-wise write-locking in stock.move to guard the product reservation code.)
See the 2 related branches we the implementation, to be merged in 5.0 and 6.0(trunk) soon, after validation.

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

err, that was supposed to be

  "See the 2 related branches *with* the implementation, to be merged in 5.0 and 6.0(trunk) soon, after validation."

in previous comment (when will LP introduce comments edit for absent-minded guys like me? ;-))

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Fix for 5.0 landed in revision: 2838 <email address hidden>

The fix is ready for 6.0 and will be merged soon, revision-id: <email address hidden>

Thanks a lot to everyone involved in the discussion and resolution!

Changed in openobject-addons:
status: In Progress → Fix Released
Revision history for this message
Christophe Hanon (www.adins.be) (chanon) wrote :

I understand that there may be a matter of urgency, so the fix.

But on the long run this is highly questionable !

Your approach is usually described as pessimistic concurrency management, - you protect using locks assuming everybody want to change the same value (which in the case of a stock level makes some sense).

I must congratulate the original reporter:, I never though this was possible in Open ERP and that put some light on the still poor level of the entire framework (I am not laughing here).

In fact this problem may be an issue in a lot of places in OpenERP. Obviouly the consequence will be often less dramatic on business point of view than an incorrect stock level (meaning the software cannot be trusted). It wil be a nice work to check and see where the problem can occur...

These situation happens because of the programming model / ORM :you read, keep some values in memory then simply update without further check.

What we should do is issue an SQL update with a where clause protecting incorrect write operation.

Lest assume you read fields x with a value 1 and modified=10h43m55s

you should issue something like update set x = 1 modified = :now where x = 0 and modified = 10h43m55s

If less than one record is updated you should rollback.

Your solution may sound fine, but I think in the long term it will create problems and will re-occur permanently thus creating larger problem for example if you have two tables concerned in such a process deadlock may occur.

The only good solution is to fix this in the ORM for once !

Revision history for this message
Christophe Hanon (www.adins.be) (chanon) wrote :

I did not checked myself but if postgres does snaphost isolation as indicated in wikipedia - there should be no issue and then no lock is needed...

http://en.wikipedia.org/wiki/Isolation_%28database_systems%29

http://en.wikipedia.org/wiki/Snapshot_isolation

But sill probably my suggestion would be correct - for example this would avoid consistency issue when a user interaction spans multiple transactions - aka a wizard

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Christophe,

Short answer:
Your point about optimistic/pessimistic locking is quite valid, but it does not really apply to OpenERP's ORM at the moment, so you don't need to be worried. It is also not directly applicable to the current bug. A look at the code of the ORM and of this bugfix should make it all clear to you.

Long answer:
Basically, OpenERP's ORM provides you with low-level CRUD primitives, but it does not allow you to directly modify an "Object" representing a database record, and then save your changes (as other ORM frameworks you may have used do). Therefore none of the API methods of the ORM would be in a position to do optimistic locking.
Instead, the write/update and unlink methods of the ORM let you pass an optional context parameter with the "last update date" of the record(s) you want to update. In this case a basic optimistic locking check will be performed before performing the operation.
So in summary, the ORM lets the caller be in charge of the locking strategy, *if* and when it matters.

For example the GTK/Web clients do use this feature to present visual feedback to users in case of concurrent updates, allowing them to compare and pick the changes they want to keep.
BTW, depending on the situations, the "last writer wins" is often the behavior business users desire.

As for this particular bug report, the issue is that different transactions will only *read*, not *update* the same stock.move rows. They base their decision to update on a possible common set of stock.move rows they read. So the optimistic locking strategy you describe would not work here. The strategy we chose is to prevent two transactions from trying to take that decision at the same time. And with the kind of solution we implemented, we don't lock other readers, and we don't allow deadlocks either, as we use NOWAIT locking. I think you will agree that this is a sensible solution for this particular case.

I hope this clarifies things.

(PS: I'm not sure this is in the technical doc, and if it's not we'll add it in a section about transactions.)

Revision history for this message
Christophe Hanon (www.adins.be) (chanon) wrote :

Olivier, thanks for these excellent explanations - my mistake !

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Incidentally, the excellent link you mentioned at http://en.wikipedia.org/wiki/Snapshot_isolation does illustrate the shortcomings of snapshot serialization using an example that is quite similar to the issue at hand: a /write skew/ anomaly in stock reservation could allow the real stock level to become negative when it really wasn't meant to be.
And one of the workarounds for /write skew/ mentioned in the article is Promotion, which is what SELECT FOR UPDATE really does :-)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.