Locks in workers when using multiprocessing workers

Bug #1267089 reported by Guewen Baconnier @ Camptocamp
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenERP Connector
New
Undecided
Unassigned

Bug Description

Sometimes we can observe stalling jobs when using the multiprocessing workers.

From what I could observe, the alive check can't update the "date_alive" field, due to a database lock. Thus, the workers are considered as dead after 5 minutes and should be deleted, but the delete query is blocked by the "alive check" still locked.

As a result, the jobs stay assigned to them and can't be assigned to new workers.

Using this query on my database:

  select bl.pid as blocked_pid, a.usename as blocked_user,
          kl.pid as blocking_pid, ka.usename as blocking_user, a.current_query as blocked_statement
   from pg_catalog.pg_locks bl
        join pg_catalog.pg_stat_activity a
        on bl.pid = a.procpid
        join pg_catalog.pg_locks kl
             join pg_catalog.pg_stat_activity ka
             on kl.pid = ka.procpid
        on bl.transactionid = kl.transactionid and bl.pid != kl.pid
   where not bl.granted;

I obtained:

   blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_statement
  -------------+-------------------+--------------+-------------------+-------------------------------------------------------------------------------
         23234 | openerp_prod_snip | 23291 | openerp_prod_snip | update queue_worker set "date_alive"='2014-01-08 10:02:27' where id IN (8873)
         23249 | openerp_prod_snip | 23347 | openerp_prod_snip | delete from queue_worker where id IN (8873, 8874)

And if I inspect the blocked and blocking pids:

   SELECT datname,usename,procpid,client_addr,waiting,query_start,current_query FROM pg_stat_activity WHERE procpid in (23234, 23291);

   datname | usename | procpid | client_addr | waiting | query_start | current_query
   openerp_prod_snip | openerp_prod_snip | 23234 | 127.0.0.1 | t | 2014-01-08 11:02:27.741848+01 | update queue_worker set "date_alive"='2014-01-08 10:02:27' where id IN (8873)
   openerp_prod_snip | openerp_prod_snip | 23291 | 127.0.0.1 | f | 2014-01-08 11:02:00.313542+01 | <IDLE> in transaction

   SELECT datname,usename,procpid,client_addr,waiting,query_start,current_query FROM pg_stat_activity WHERE procpid in (23249, 23347);

   datname | usename | procpid | client_addr | waiting | query_start | current_query
   openerp_prod_snip | openerp_prod_snip | 23249 | 127.0.0.1 | t | 2014-01-08 11:07:09.585061+01 | delete from queue_worker where id IN (8873, 8874)
   openerp_prod_snip | openerp_prod_snip | 23347 | 127.0.0.1 | t | 2014-01-08 11:07:08.095879+01 | delete from queue_worker where id IN (8873, 8874)

The pid 23291 is the transaction locking all the other, but is <IDLE> in transaction.
Details of the locks of the pid 23291

                   relname | locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted
  ------------------------------------------+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+------------------+---------
   res_users_pkey | relation | 221638 | 221799 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_users_login_key | relation | 221638 | 221801 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_users_login_uniq | relation | 221638 | 221803 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_groups_pkey | relation | 221638 | 221811 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_groups_users_rel_uid_gid_key | relation | 221638 | 221816 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_groups_users_rel | relation | 221638 | 221813 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_groups_users_rel_uid_idx | relation | 221638 | 221828 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_groups_users_rel_gid_idx | relation | 221638 | 221829 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   wkf_instance_pkey | relation | 221638 | 221900 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   ir_module_module_pkey | relation | 221638 | 222001 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   name_uniq | relation | 221638 | 222018 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_partner_pkey | relation | 221638 | 222050 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   queue_job_pkey | relation | 221638 | 225820 | | | | | | | | 16/3926746 | 23291 | RowExclusiveLock | t
   queue_job_pkey | relation | 221638 | 225820 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   queue_job_uuid_index | relation | 221638 | 225825 | | | | | | | | 16/3926746 | 23291 | RowExclusiveLock | t
   queue_job_uuid_index | relation | 221638 | 225825 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   queue_worker_pkey | relation | 221638 | 225832 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   queue_worker_uuid_index | relation | 221638 | 225837 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   queue_worker | relation | 221638 | 225828 | | | | | | | | 16/3926746 | 23291 | RowShareLock | t
   queue_worker | relation | 221638 | 225828 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   queue_job | relation | 221638 | 225816 | | | | | | | | 16/3926746 | 23291 | RowExclusiveLock | t
   queue_job | relation | 221638 | 225816 | | | | | | | | 16/3926746 | 23291 | RowShareLock | t
   wkf_instance_wkf_id_index | relation | 221638 | 222671 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   wkf_instance_res_type_res_id_state_index | relation | 221638 | 222672 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   wkf_instance_res_id_wkf_id_index | relation | 221638 | 222673 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   ir_module_module_name_index | relation | 221638 | 222823 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   ir_module_module_state_index | relation | 221638 | 222824 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   ir_module_module_category_id_index | relation | 221638 | 222825 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   ir_module_module_name_uniq | relation | 221638 | 222826 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_partner_company_id_index | relation | 221638 | 223121 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_partner_date_index | relation | 221638 | 223122 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_partner_name_index | relation | 221638 | 223123 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_partner_ref_index | relation | 221638 | 223124 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_groups_category_id_index | relation | 221638 | 223404 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_groups_name_uniq | relation | 221638 | 223405 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_users_login_date_index | relation | 221638 | 223417 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   wkf_instance | relation | 221638 | 221895 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   ir_module_module | relation | 221638 | 221989 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_groups | relation | 221638 | 221807 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_users | relation | 221638 | 221793 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   mail_alias_pkey | relation | 221638 | 225157 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   mail_alias_alias_unique | relation | 221638 | 225172 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   mail_alias | relation | 221638 | 225153 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
   res_partner | relation | 221638 | 222046 | | | | | | | | 16/3926746 | 23291 | AccessShareLock | t
  (44 lignes)

I still have to find why this transaction stays in a stalling state.

Better display for the long lines of the tables: http://hastebin.com/qaguweveju.md

description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.