Database deadlock while starting regiond

Bug #1458895 reported by Mike Pontillo
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Gavin Panella
1.9
Fix Released
Critical
Gavin Panella

Bug Description

This is error #3 from bug #1457788.

2015-05-22 08:39:24 [maasserver.start_up] ERROR: Database error during start-up
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/maasserver/start_up.py", line 84, in start_up
    yield deferToThread(inner_start_up)
  File "/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191, in _worker
    result = context.call(ctx, function, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/orm.py", line 404, in call_within_transaction
    return func_outside_txn(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/orm.py", line 300, in retrier
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/django/db/transaction.py", line 339, in inner
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/__init__.py", line 229, in call_with_lock
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/start_up.py", line 189, in inner_start_up
    register_all_triggers()
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/orm.py", line 399, in call_within_transaction
    return func_within_txn(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/django/db/transaction.py", line 339, in inner
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/triggers.py", line 390, in register_all_triggers
    "nodegroupinterface_create_notify", "insert")
  File "/usr/lib/python2.7/dist-packages/maasserver/triggers.py", line 303, in register_trigger
    cursor.execute(trigger_sql)
  File "/usr/lib/python2.7/dist-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "/usr/lib/python2.7/dist-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/usr/lib/python2.7/dist-packages/django/db/backends/util.py", line 51, in execute
    return self.cursor.execute(sql)
OperationalError: deadlock detected
DETAIL: Process 25691 waits for AccessExclusiveLock on relation 16924 of database 16385; blocked by process 25706.
Process 25706 waits for AccessShareLock on relation 16759 of database 16385; blocked by process 25691.
HINT: See server log for query details.

2015-05-22 08:39:24 [maas.websocket.listener] Listening for notificaton from database.
2015-05-22 08:39:25 [maasserver.start_up] ERROR: Database error during start-up
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/maasserver/start_up.py", line 84, in start_up
    yield deferToThread(inner_start_up)
OperationalError: deadlock detected
LINE 1: ...maasserver_nodegroup"."default_disable_ipv4" FROM "maasserve...
                                                             ^
DETAIL: Process 25761 waits for AccessShareLock on relation 16759 of database 16385; blocked by process 25730.
Process 25730 waits for AccessExclusiveLock on relation 16924 of database 16385; blocked by process 25761.

This one is weird: seems there is a deadlock when accessing the DB… but this is happening within start_up and this is supposed to grab an exclusive lock before running…. this needs to be investigated.

Related branches

Revision history for this message
Raphaël Badin (rvb) wrote :

Raising this to critical since this is a crash and we shouldn't tolerate them even if they seem inconsequential.

tags: added: stacktrace start-up
Changed in maas:
importance: High → Critical
Revision history for this message
Ricardo Bánffy (rbanffy) wrote :

We managed to reproduce the issue. From the PostgreSQL log:

2015-05-27 16:37:25 EDT ERROR: deadlock detected [87/718]
2015-05-27 16:37:25 EDT DETAIL: Process 22120 waits for AccessExclusiveLock on relation 16759 of database 16385; blocked by process
 22204.
        Process 22204 waits for AccessShareLock on relation 16585 of database 16385; blocked by process 22120.
        Process 22120: DROP TRIGGER IF EXISTS maasserver_nodegroup_nodegroup_create_notify ON maasserver_nodegroup;
        CREATE TRIGGER maasserver_nodegroup_nodegroup_create_notify
        AFTER INSERT ON maasserver_nodegroup
        FOR EACH ROW

        EXECUTE PROCEDURE nodegroup_create_notify();

        Process 22204:
                    SELECT DISTINCT ON (node.hostname)
                        node.hostname, lease.ip
                    FROM maasserver_macaddress AS mac
                    JOIN maasserver_node AS node ON node.id = mac.node_id
                    JOIN maasserver_dhcplease AS lease ON lease.mac = mac.mac_address
                    WHERE lease.nodegroup_id = 1
                    AND (node.status = 9 OR node.status = 6)
                    ORDER BY node.hostname, mac.id

2015-05-27 16:37:25 EDT HINT: See server log for query details.

Changed in maas:
assignee: Mike Pontillo (mpontillo) → nobody
Changed in maas:
milestone: 1.8.0 → 1.8.1
Changed in maas:
milestone: 1.8.1 → 1.8.2
Gavin Panella (allenap)
Changed in maas:
assignee: nobody → Gavin Panella (allenap)
status: Triaged → In Progress
Revision history for this message
JuanJo Ciarlante (jjo) wrote :

FYI hit this issue with 1.8.0+bzr4001-0ubuntu2~trusty1, down to the point
not being able to bring maas services back up, until I did below

workaround: void running multiple regiond-workers ->

stop maas-regiond
sed -i 's/seq 4/seq 1/' /etc/init/maas-regiond.conf
start maas-regiond

tags: added: canonical-bootstack
Changed in maas:
milestone: 1.8.2 → 1.9.0
Gavin Panella (allenap)
Changed in maas:
status: In Progress → Fix Committed
Revision history for this message
tz (csherwood-n) wrote :

the workaround for 1.8.0 by JuanJo works if you have upstart, but if you're running systemd then MAAS installs (4) separate services for each worker, ie:

/lib/systemd/system/maas-regiond.service.wants/maas-regiond-worker@1.service
/lib/systemd/system/maas-regiond.service.wants/maas-regiond-worker@2.service

which reverts the workaround.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.