Cannot login while MRP scheduler is running

Bug #713216 reported by Kyle Waid
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Odoo Server (MOVED TO GITHUB)
Fix Released
Medium
OpenERP's Framework R&D

Bug Description

When using the Manufacturing scheduler in a medium sized environment the scheduler will lock down the server in a way where not a single user may connect to the server. The login window shows up but when you type your password and click login it goes to a white screen and nothing happens, because the manufacturing scheduler is preventing the server from letting anyone log in. If you were already logged in then you can use the program. Quite frustrating. Sure the manufacturing scheduler is supposed to run at night, but that is still a bug.

Related branches

Revision history for this message
Raphaël Valyi - http://www.akretion.com (rvalyi) wrote : Re: [Bug 713216] [NEW] MRP Locks down server

Kyle,

this is probably because you have to many procurements in exception. This
can happen if you imported many moved from your ecommerce but never get
their procurement fixed. You could either fill your stock to and run the MRP
from time to time to make sure they disappear. Or eventually you may delete
procurements manually if they deal about historical data you don't care
about.

On Fri, Feb 4, 2011 at 3:01 PM, Kyle Waid - http://www.midwestsupplies.com <
<email address hidden>> wrote:

> Public bug reported:
>
> When using the Manufacturing scheduler in a medium sized environment the
> scheduler will lock down the server in a way where not a single user may
> connect to the server. The login window shows up but when you type your
> password and click login it goes to a white screen and nothing happens,
> because the manufacturing scheduler is preventing the server from
> letting anyone log in. If you were already logged in then you can use
> the program. Quite frustrating. Sure the manufacturing scheduler is
> supposed to run at night, but that is still a bug.
>
> ** Affects: openobject-addons
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are subscribed to OpenERP
> Addons.
> https://bugs.launchpad.net/bugs/713216
>
> Title:
> MRP Locks down server
>
> Status in OpenERP Modules (addons):
> New
>
> Bug description:
> When using the Manufacturing scheduler in a medium sized environment
> the scheduler will lock down the server in a way where not a single
> user may connect to the server. The login window shows up but when you
> type your password and click login it goes to a white screen and
> nothing happens, because the manufacturing scheduler is preventing the
> server from letting anyone log in. If you were already logged in then
> you can use the program. Quite frustrating. Sure the manufacturing
> scheduler is supposed to run at night, but that is still a bug.
>
>
>

Changed in openobject-addons:
status: New → Triaged
Revision history for this message
Kyle Waid (midwest) wrote : Re: MRP Locks down server

Yes you are correct, having many procurement exceptions causes this and I know this. However it does seem relatively annoying that you cant access the server at all during this process. If one was already logged in you can use the program fine, but if you close the program and re-open it while the scheduler is running then it will not respond. Not only does it not respond, it does not indicate that process is running, or return an error message. I knew what was happening but for anyone else in the company they would not understand not being able to access the server BECAUSE it displays not messages or errors of any kind. There is no indication that the scheduler is running, nor any way to see this process running.

You should allow connections to the server, fix the problem, or you should have a message displaying an error to let the person know access is not available.

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

@Kyle: You are right, this is an issue and there is no reason to prevent new connections to the server while the scheduler is running. I would expect some performance impact, but nothing else. Do you have a recipe to reproduce this easily starting from a new database? Also you did not mention if you are using the Web or GTK client or if that makes a difference.

@Raphael: If you have more data suggesting how to track the original cause of this issue, or how to reproduce it consistently, please do not hesitate to share them.
Are you saying that having the mrp scheduler process will cause the calls to login() via xml-rpc to have to wait? If this does not affect existing sessions, it suggests that it's not a general lock, but rather something specific to the login() call?

Thank you both!

affects: openobject-addons → openobject-server
Changed in openobject-server:
assignee: nobody → OpenERP's Framework R&D (openerp-dev-framework)
status: Triaged → Incomplete
Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Some additional info could be useful too: does this affect only users trying to login with the same account that owns the scheduler job (by default admin)?
Because one thing that could delay login() is a transaction in progress that has altered the user's record; this is because login() needs to update the "last login date" of the user, which would be locked by the other transaction.

However it's still not clear if this is the issue, as the scheduler should not need to update the user, as far as I can see... maybe a third-party is involved here somehow?

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Kyle, one way to analyze this further would be if you could run the following query directly in a psql client at the time the problem occurs, and give us the output. This will display all running queries and the locks they hold:

-- Diagnose locks
select
       pg_stat_activity.datname,
       pg_class.relname,
       pg_locks.transactionid,
       pg_locks.virtualxid,
       pg_locks.virtualtransaction,
       pg_locks.mode,
       pg_locks.granted,
       pg_stat_activity.usename,
       pg_stat_activity.current_query,
       pg_stat_activity.query_start,
       pg_stat_activity.procpid
  from pg_stat_activity,
       pg_locks left outer join pg_class
         on (pg_locks.relation = pg_class.oid)
  where pg_locks.pid=pg_stat_activity.procpid
       and procpid != pg_backend_pid()
  order by procpid, query_start;

You can attach the output as a text file or put it on http://pastebin.com/ or similar, in order to avoid issues with comment wrapping.

Thanks!

summary: - MRP Locks down server
+ Cannot login while MRP scheduler is running
Revision history for this message
Raphaël Valyi - http://www.akretion.com (rvalyi) wrote : Re: [Bug 713216] Re: MRP Locks down server

Olivier,

Sorry, I know no simple way to reproduce those issues I have been observing
in production (basically fill of confirmed orders with not enough stock and
no order point rule), this typically happen during the go live of ecommerces
connectors if you don't deal with it.
Also, I can tell you that this is not only the MRP queries that lock down
the server. For instance, during the slow picking or impossible invoice
cancel bugs I reported, it was impossible either to connect to the server.
Looks like as long as we have some long Python loop also talking to
Postgres, it's like it has a priority over all other connections. May be
something to do with Python GIL (Giant Interpreter Lock), not sure.

On Fri, Feb 18, 2011 at 3:13 PM, Olivier Dony (OpenERP) <
<email address hidden>> wrote:

> Kyle, one way to analyze this further would be if you could run the
> following query directly in a psql client at the time the problem
> occurs, and give us the output. This will display all running queries
> and the locks they hold:
>
> -- Diagnose locks
> select
> pg_stat_activity.datname,
> pg_class.relname,
> pg_locks.transactionid,
> pg_locks.virtualxid,
> pg_locks.virtualtransaction,
> pg_locks.mode,
> pg_locks.granted,
> pg_stat_activity.usename,
> pg_stat_activity.current_query,
> pg_stat_activity.query_start,
> pg_stat_activity.procpid
> from pg_stat_activity,
> pg_locks left outer join pg_class
> on (pg_locks.relation = pg_class.oid)
> where pg_locks.pid=pg_stat_activity.procpid
> and procpid != pg_backend_pid()
> order by procpid, query_start;
>
>
> You can attach the output as a text file or put it on http://pastebin.com/or similar, in order to avoid issues with comment wrapping.
>
> Thanks!
>
> ** Summary changed:
>
> - MRP Locks down server
> + Cannot login while MRP scheduler is running
>
> --
> You received this bug notification because you are subscribed to OpenERP
> Server.
> https://bugs.launchpad.net/bugs/713216
>
> Title:
> Cannot login while MRP scheduler is running
>
> Status in OpenERP Server:
> Incomplete
>
> Bug description:
> When using the Manufacturing scheduler in a medium sized environment
> the scheduler will lock down the server in a way where not a single
> user may connect to the server. The login window shows up but when you
> type your password and click login it goes to a white screen and
> nothing happens, because the manufacturing scheduler is preventing the
> server from letting anyone log in. If you were already logged in then
> you can use the program. Quite frustrating. Sure the manufacturing
> scheduler is supposed to run at night, but that is still a bug.
>

Revision history for this message
Kyle Waid (midwest) wrote :

Great interest in this bug. Thats good. Right now I am working to catch our company up on data and going live so I cant do too much as I can work around the issue. But I will for sure provide a better diagnosis on this issue soon. Im looking into the way the tables lock and how it affects operations. Raphael is right, it isnt just the scheduler that causes the issue there are other factors at play. Ill report back soon.

Revision history for this message
xrg (xrg) wrote :

On Friday 18 February 2011, you wrote:
> Olivier,
>
> Sorry, I know no simple way to reproduce those issues I have been observing
> in production (basically fill of confirmed orders with not enough stock and
> no order point rule), this typically happen during the go live of
> ecommerces connectors if you don't deal with it.
> Also, I can tell you that this is not only the MRP queries that lock down
> the server. For instance, during the slow picking or impossible invoice
> cancel bugs I reported, it was impossible either to connect to the server.
> Looks like as long as we have some long Python loop also talking to
> Postgres, it's like it has a priority over all other connections. May be
> something to do with Python GIL (Giant Interpreter Lock), not sure.

So, a few points to consider:
 - are you totally unable to login() during these problems? What is the load
of the system at that time? (we want to see if it's PG locks or python GIL)
 - does the system recover by itself, or needs some intervention (restart of
server eg.)
 - do the ecommerce connectors, by any means, touch the res.users table?
 - please, try to run the query Odo gave you, while the lockups are happening.
 - can you count the threads of the openerp-server process?

Revision history for this message
Raphaël Valyi - http://www.akretion.com (rvalyi) wrote :

>
> So, a few points to consider:
> - are you totally unable to login() during these problems?

Sometimes not at all

> What is the load
> of the system at that time? (we want to see if it's PG locks or python GIL)
>

This typically looks like 60% Python 40% Postgres, possibly spit in two
processes or more

> - does the system recover by itself, or needs some intervention (restart
> of
> server eg.)
>

In some situations I've seen with the MRP yes: after the MRP cron ends, you
can connect. Notice that I had similar issues on v5 already so it doesn't
sounds it's a regression.

> - do the ecommerce connectors, by any means, touch the res.users table?
>

I doubt this is related at all. The reason I cited ecommerces is that you'll
easily import many confirmed orders and thus MRP will be heavier, but it's
juts the regular MRP or other processes that create the lock.

> - please, try to run the query Odo gave you, while the lockups are
> happening.
> - can you count the threads of the openerp-server process?
>

I'm currently not experiencing such issues with MRP (because I did the clean
up with OOOR on our installs), but eventually I'll do it for the other bugs
I reported. Kyle, I let you put your data in your case.

> --
> You received this bug notification because you are subscribed to OpenERP
> Server.
> https://bugs.launchpad.net/bugs/713216
>
> Title:
> Cannot login while MRP scheduler is running
>
> Status in OpenERP Server:
> Incomplete
>
> Bug description:
> When using the Manufacturing scheduler in a medium sized environment
> the scheduler will lock down the server in a way where not a single
> user may connect to the server. The login window shows up but when you
> type your password and click login it goes to a white screen and
> nothing happens, because the manufacturing scheduler is preventing the
> server from letting anyone log in. If you were already logged in then
> you can use the program. Quite frustrating. Sure the manufacturing
> scheduler is supposed to run at night, but that is still a bug.
>

Revision history for this message
Kyle Waid (midwest) wrote :

Just to add a little life into this bug,

User login is not possible in any scenario where "lock=True"

Revision history for this message
Kyle Waid (midwest) wrote :

Also I might add, the lock prevents executing queries. What exactly happens is quite obvious. When the lock=True it prevents ANY query from being executed against the table even in case of just select. So when the user logs into the system, it must be attempting to execute a query against a locked table so the client waits for the return of the data which wont come because the table is locked.

Revision history for this message
Kyle Waid (midwest) wrote :

Hello,

As per your request I have the result of the query. I killed the server. Restarted the server. Logged in. Activated the scheduler. Logged out. Try to log back in. Login locked.

Revision history for this message
Kyle Waid (midwest) wrote :

After several minutes I ran the query again and here is the result

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :
Download full text (3.6 KiB)

Hi Kyle,

Thanks for sending the output of the lock analysis query. For some reason the "relname" column is empty, so we don't know the name of the tables that are locked, but that does not matter very much.
The rest of the analysis is pretty clear, the login() call tries to update the last login "date" of the user trying to login, while the scheduler transaction is apparently holding a lock to that same user row.

This is most likely because the tables on with the scheduler transaction is working (stock.move, procurement_order, etc.) have foreign key references to their creator/updater user in the `create_uid` `write_uid` columns.
For any UPDATE/INSERT/DELETE PostgreSQL not only locks the row being modified, but also acquires a RowShareLock on any referenced rows (FKs), to avoid the foreign keys from changing or disappearing before the referring transaction completes.
Therefore while the scheduler is running it is likely to be holding a RowShareLock on the res_users rows corresponding to the create_uid and write_uid values of all the records it is working on. The most likely user to be locked out is the Admin user (UID 1) because the scheduler is running as admin by default.

Postgres does not keep row-level locks in memory so that won't appear directly in the lock analysis, but you can see this happening because the scheduler transaction [1] has been granted a number of RowShareLocks (on my system I can see one of them is for res_users), and the login transaction is waiting for a ShareLock on the "transaction_id" of the other transaction. This is how a transaction waiting for another to release a row-level lock will appear [2].

 transid | virtrid| relname | locktype | mode | grtd | cur_query
---------+--------+----------+---------------+---------------+------+----------
         | 7/10 | res_users| relation | RowShareLock | t | <IDLE>...
 1077956 | 7/10 | | transactionid | ExclusiveLock | t | <IDLE>...
 1077957 | 8/38 | | transactionid | ExclusiveLock | t | update re
 1077956 | 8/38 | | transactionid | ShareLock | f | update re

You could confirm this in several ways:
- logging in as a new user who has not created or updated any document should work normally even when scheduler is running
- creating a new user 'MRP Scheduler' with full access to all the relevant tables and configuring the scheduler job to run as that user may greatly reduce the chance of blocking other users (but because there are many existing rows this may not take effect for a while)
- disabling the code that updates the "last login" date in res_users or making it tolerant to locked rows should eliminate that issue altogether;

The patch applied in revision 4049[3] should solve the issue by doing the latter option: skipping the update of the login date if the user's record is currently locked. The fix is for trunk/6.1 but you could use the same code to replace the bulk of the login() method in 6.0.

There aren't many other situations where we alter the user record, but those may still be prevented after the patch, so the use of a separate "owner user" for long-running bac...

Read more...

Changed in openobject-server:
importance: Undecided → Medium
milestone: none → 6.1
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.