DDL stuck in "checking permissions" and cluster gets locked

Bug #1710585 reported by Grigorijs
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
New
Undecided
Unassigned

Bug Description

Got situation that DDL operation on one of the cluster's node (with TOI OSU) hang in "checking permissions" state. With that also all cluster was locked (due to TOI).

Problem seems to be occasional and random (as this operations normally execute successfully) - so it looks really as some bug (maybe some wrong mutex in checking permissions operations).

Kill operation on stuck thread gives nothing, the only way to get cluster back to operation is to hard reset problematic node.

Software version:
ii percona-xtradb-cluster-57 5.7.18-29.20-1.jessie amd64 Percona XtraDB Cluster with Galera

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Can you help us with some reproducible use-case.

Revision history for this message
Grigorijs (grigk) wrote : Re: [Bug 1710585] Re: DDL stuck in "checking permissions" and cluster gets locked

The bug seems not to be easy reproducible

There is "show processlist" output for stuck thread

| 15721463 | artis | 10.30.50.2:46853 | media_bank
| Query | 2018 | checking permissions | RENAME
TABLE media_bank.api_device_shop TO media_bank.api_device_shop_,
             media_bank.api | 0 | 11105 |

Full SQL for this request is:

RENAME TABLE media_bank.api_device_shop TO media_bank.api_device_shop_,
media_bank.api_device_shop_T TO media_bank.api_device_shop,
media_bank.api_device_shop_ TO media_bank.api_device_shop_T

But as I noticed, this code is run commonly without problems, but as can be
seen from above line it was stuck for >30 minutes in "checking permissions"
state.

On Mon, Aug 14, 2017 at 1:03 PM, Krunal Bauskar <email address hidden>
wrote:

> Can you help us with some reproducible use-case.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1710585
>
> Title:
> DDL stuck in "checking permissions" and cluster gets locked
>
> Status in Percona XtraDB Cluster:
> New
>
> Bug description:
> Got situation that DDL operation on one of the cluster's node (with
> TOI OSU) hang in "checking permissions" state. With that also all
> cluster was locked (due to TOI).
>
> Problem seems to be occasional and random (as this operations normally
> execute successfully) - so it looks really as some bug (maybe some
> wrong mutex in checking permissions operations).
>
> Kill operation on stuck thread gives nothing, the only way to get
> cluster back to operation is to hard reset problematic node.
>
>
> Software version:
> ii percona-xtradb-cluster-57 5.7.18-29.20-1.jessie amd64
> Percona XtraDB Cluster with Galera
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/percona-xtradb-cluster/+bug/
> 1710585/+subscriptions
>

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

When it stuck next time can you send us pt-pmp output. It will try to get a stack trace of what threads are stuck at. Any other way of generating stack trace is good enough.

Revision history for this message
Grigorijs (grigk) wrote :

We faced this problem again. Attaching pt-pmp trace from node with hang DDL command

Revision history for this message
Grigorijs (grigk) wrote :

pt-pmp output of other node when issue happened

Revision history for this message
Grigorijs (grigk) wrote :

pt-pmp output of other node

Revision history for this message
Grigorijs (grigk) wrote :

We again had DDL statment hang in "checking permission" state

Processlist output for it:
1163524 artis 10.30.50.2:44918 media_bank Query 2522 checking permissions RENAME TABLE media_bank.api_device_shop TO media_bank.api_device_shop_,^M\n media_bank.api_device_shop_T TO media_bank.api_device_shop,^M\n media_bank.api_device_shop_ TO media_bank.api_device_shop_T 0 10686

This is 3-node cluster and this is on node dbdmz5 (see pt-pmp.dbdmz4 debug)

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Can you suppress your wsrep_sync_wait setting and re-try.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1999

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.