Database deadlock while starting regiond

Bug #1458895 reported by Mike Pontillo on 2015-05-26
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
High
Gavin Panella
1.9
Critical
Gavin Panella
Trunk
High
Gavin Panella

Bug Description

This is error #3 from bug #1457788.

2015-05-22 08:39:24 [maasserver.start_up] ERROR: Database error during start-up
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/maasserver/start_up.py", line 84, in start_up
    yield deferToThread(inner_start_up)
  File "/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191, in _worker
    result = context.call(ctx, function, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/orm.py", line 404, in call_within_transaction
    return func_outside_txn(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/orm.py", line 300, in retrier
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/django/db/transaction.py", line 339, in inner
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/__init__.py", line 229, in call_with_lock
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/start_up.py", line 189, in inner_start_up
    register_all_triggers()
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/orm.py", line 399, in call_within_transaction
    return func_within_txn(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/django/db/transaction.py", line 339, in inner
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/triggers.py", line 390, in register_all_triggers
    "nodegroupinterface_create_notify", "insert")
  File "/usr/lib/python2.7/dist-packages/maasserver/triggers.py", line 303, in register_trigger
    cursor.execute(trigger_sql)
  File "/usr/lib/python2.7/dist-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "/usr/lib/python2.7/dist-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/usr/lib/python2.7/dist-packages/django/db/backends/util.py", line 51, in execute
    return self.cursor.execute(sql)
OperationalError: deadlock detected
DETAIL: Process 25691 waits for AccessExclusiveLock on relation 16924 of database 16385; blocked by process 25706.
Process 25706 waits for AccessShareLock on relation 16759 of database 16385; blocked by process 25691.
HINT: See server log for query details.

2015-05-22 08:39:24 [maas.websocket.listener] Listening for notificaton from database.
2015-05-22 08:39:25 [maasserver.start_up] ERROR: Database error during start-up
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/maasserver/start_up.py", line 84, in start_up
    yield deferToThread(inner_start_up)
OperationalError: deadlock detected
LINE 1: ...maasserver_nodegroup"."default_disable_ipv4" FROM "maasserve...
                                                             ^
DETAIL: Process 25761 waits for AccessShareLock on relation 16759 of database 16385; blocked by process 25730.
Process 25730 waits for AccessExclusiveLock on relation 16924 of database 16385; blocked by process 25761.

This one is weird: seems there is a deadlock when accessing the DB… but this is happening within start_up and this is supposed to grab an exclusive lock before running…. this needs to be investigated.

Related branches

Raphaël Badin (rvb) wrote :

Raising this to critical since this is a crash and we shouldn't tolerate them even if they seem inconsequential.

tags: added: stacktrace start-up
Changed in maas:
importance: High → Critical
Ricardo Bánffy (rbanffy) wrote :

We managed to reproduce the issue. From the PostgreSQL log:

2015-05-27 16:37:25 EDT ERROR: deadlock detected [87/718]
2015-05-27 16:37:25 EDT DETAIL: Process 22120 waits for AccessExclusiveLock on relation 16759 of database 16385; blocked by process
 22204.
        Process 22204 waits for AccessShareLock on relation 16585 of database 16385; blocked by process 22120.
        Process 22120: DROP TRIGGER IF EXISTS maasserver_nodegroup_nodegroup_create_notify ON maasserver_nodegroup;
        CREATE TRIGGER maasserver_nodegroup_nodegroup_create_notify
        AFTER INSERT ON maasserver_nodegroup
        FOR EACH ROW

        EXECUTE PROCEDURE nodegroup_create_notify();

        Process 22204:
                    SELECT DISTINCT ON (node.hostname)
                        node.hostname, lease.ip
                    FROM maasserver_macaddress AS mac
                    JOIN maasserver_node AS node ON node.id = mac.node_id
                    JOIN maasserver_dhcplease AS lease ON lease.mac = mac.mac_address
                    WHERE lease.nodegroup_id = 1
                    AND (node.status = 9 OR node.status = 6)
                    ORDER BY node.hostname, mac.id

2015-05-27 16:37:25 EDT HINT: See server log for query details.

Changed in maas:
assignee: Mike Pontillo (mpontillo) → nobody
Changed in maas:
milestone: 1.8.0 → 1.8.1
Changed in maas:
milestone: 1.8.1 → 1.8.2
Gavin Panella (allenap) on 2015-07-30
Changed in maas:
assignee: nobody → Gavin Panella (allenap)
status: Triaged → In Progress
JuanJo Ciarlante (jjo) wrote :

FYI hit this issue with 1.8.0+bzr4001-0ubuntu2~trusty1, down to the point
not being able to bring maas services back up, until I did below

workaround: void running multiple regiond-workers ->

stop maas-regiond
sed -i 's/seq 4/seq 1/' /etc/init/maas-regiond.conf
start maas-regiond

tags: added: canonical-bootstack
Changed in maas:
milestone: 1.8.2 → 1.9.0
Gavin Panella (allenap) on 2015-08-25
Changed in maas:
status: In Progress → Fix Committed
tz (csherwood-n) wrote :

the workaround for 1.8.0 by JuanJo works if you have upstart, but if you're running systemd then MAAS installs (4) separate services for each worker, ie:

/lib/systemd/system/maas-regiond.service.wants/maas-regiond-worker@1.service
/lib/systemd/system/maas-regiond.service.wants/maas-regiond-worker@2.service

which reverts the workaround.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers