[Trunk] Windows-specific random deadlock in upgrade_module

Bug #885682 reported by Thibaut DIRLIK (Logica)
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Odoo Server (MOVED TO GITHUB)
Invalid
Low
OpenERP's Framework R&D

Bug Description

Hi.

I'm using OpenERP Trunk (updated at the begining of the week) and I encounter a deadlock problem with threads. The bug is not easy to reproduce, because it's kind of random. The better I found to trigger it is to install/uninstall the same module different times.

I looked into the code with pdb, and found that the block line is this one [1]:

openerp/module/registry.py:
    with cls.registries_lock:

I enabled the debug on the lock (passing verbose=True to the RLock constructor). Here is a "normal" output :

netrpc-client-127.0.0.1:51862: <_RLock owner='netrpc-client-127.0.0.1:51862' count=1>.acquire(1): initial success
netrpc-client-127.0.0.1:51862: <_RLock owner='netrpc-client-127.0.0.1:51862' count=2>.acquire(1): recursive success
netrpc-client-127.0.0.1:51862: <_RLock owner='netrpc-client-127.0.0.1:51862' count=1>.release(): non-final release
netrpc-client-127.0.0.1:51862: <_RLock owner=None count=0>.release(): final release

But when there is the deadlock, I only get this output, before the server stop to respond :

netrpc-client-127.0.0.1:51872: <_RLock owner='netrpc-client-127.0.0.1:51872' count=1>.acquire(1): initial success
netrpc-client-127.0.0.1:51872: <_RLock owner='netrpc-client-127.0.0.1:51872' count=2>.acquire(1): recursive success

I don't understand why it doesn't work because we can see that is has been release correctly just before (here, the output is from the same instance).

Notes :

- I couldn't trigger the bug on Linux, everything seems ok. But running the server on windows make this happen almost everytime. I don't really understand why, but I'm not an expert.

- I was using the GTK Client

- Using Python 2.6, with all libs installed manually using pip and .exe distributions.

Thanks for working on this !

[1] http://bazaar.launchpad.net/~openerp/openobject-server/trunk/view/head:/openerp/modules/registry.py#L149

description: updated
Revision history for this message
Ferdinand (office-chricar) wrote :
Revision history for this message
Thibaut DIRLIK (Logica) (thibaut-dirlik) wrote :

I also encounter your bug (still only on windows, py2.6), but I'm not sure they are related.

Revision history for this message
Thibaut DIRLIK (Logica) (thibaut-dirlik) wrote :

In my case, there is absolutly no error, it just stay blocked on the acquire().

Revision history for this message
Vo Minh Thu (thu) wrote :

Thibaut,

I hadn't a chance to debug it with Windows but I wasn't happy with the locks we added and I have a few changes I want to make in trunk in this branch: https://code.launchpad.net/~openerp-dev/openobject-server/trunk-registry-lock-vmt/+merge/83602

I don't see how it could solve your problem but maybe it does.. can you give it a try?

Revision history for this message
Thibaut DIRLIK (Logica) (thibaut-dirlik) wrote :

I just merge your proposal, and I got the same result. I installed 3 modules withour problems, then a last and I got :

netrpc-client-127.0.0.1:65230: <_RLock owner='netrpc-client-127.0.0.1:65230' count=1>.acquire(1): initial success
netrpc-client-127.0.0.1:65230: <_RLock owner='netrpc-client-127.0.0.1:65230' count=2>.acquire(1): recursive success

In console (still with debugging on locks). Then the client freezed and OpenERP stayed blocked.

It looks to be a windows-specific problem. Could you try it on Windows 7 ?

Thanks for your work anyway,

summary: - [Trunk] Re-Entrant lock deadlock in upgrade_module
+ [Trunk] Windows-specific random deadlock in upgrade_module
Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Hi Thibaut,

Let's confirm this bug as you provided sufficient information for us to investigate further and see if we can find anything, even if we can't reproduce yet. The importance should be Low unless we confirm another way to reproduce this more easily.
Also it happens only during module installation, not during normal production use of the system.

It's sad to see that Python is not as cross-platform as you'd think. We should perhaps advice integrators to always deploy the server on Linux, there's usually no problem with doing that even in a Windows-based company, clients can connect from Windows.

Changed in openobject-server:
assignee: nobody → OpenERP's Framework R&D (openerp-dev-framework)
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Thibaut DIRLIK (Logica) (thibaut-dirlik) wrote :

Just to say that this bug doesn't occur anymore (using 6.1).

Changed in openobject-server:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.