FTWRL and wsrep_causal_reads with 'show status'

Bug #1271177 reported by Raghavendra D Prabhu
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Status tracked in 5.6
5.5
New
Undecided
Unassigned
5.6
Fix Released
Undecided
Yan Zhang
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Fix Released
High
Unassigned
5.6
Fix Released
High
Unassigned

Bug Description

When FTWRL is used on a node with wsrep_causal_reads=1 and there is DML on other nodes
then, subsequent selects/show status can hang/fail on this node.

Cloned from https://bugs.launchpad.net/codership-mysql/+bug/1126316

Example: https://bugs.launchpad.net/codership-mysql/+bug/1126316/comments/1

Tags: i36377

Related branches

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote : Re: FTWRL and wsrep_causal_reads

This is not repeatable for 5.6 for 'SHOW' commands but for select.

For 5.5, it is repeatable for 'SHOW' as well.

Regarding blocking on select and failing later with lock_wait_timeout, it needs to be seen if this is intended when the provider on that node has been paused. The behavior of apply and commit monitors with a paused provider needs to be checked.

Also note that this returns ER_LOCK_WAIT_TIMEOUT which can be confusing since changing innodb_lock_wait_timeout has no effect on this behavior, it is the causal_read_timeout (default: 30s) which needs to be changed for this.

summary: - FTWRTL and wsrep_causal_reads
+ FTWRL and wsrep_causal_reads
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

It may be desirable to disable causal_reads (ie. avoid wsrep_causal_wait) when provider has been paused.

description: updated
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Based on discussion we came to conclusion that having select hang
with causal_reads under FTWRL is according to behavior. Any
tuning required will be for causal_read_timeout (which is
dynamic but per-session).

So, marking 5.6 as fixed, but this still needs to be fixed in 5.5
since 'show status' hangs there as well.

summary: - FTWRL and wsrep_causal_reads
+ FTWRL and wsrep_causal_reads with 'show status'
Przemek (pmalkowski)
tags: added: i36377
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

The difference between 5.5 and 5.6 is in inconsistent application
of causal constraint to

  SQLCOM_SHOW_STATUS_FUNC
  SQLCOM_SHOW_STATUS
  SQLCOM_SHOW_STATUS_PROC

In 5.6, they are not applied but in 5.5 they are.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

a)
As for

"
This is important as it breaks backups made with Percona Xtrabackup, as it does some SHOW STATUS queries after it sets FTWRL.
The example error from innobackupex script looks like this:
"DBD::mysql::db selectall_hashref failed: Lock wait timeout exceeded; try restarting transaction at /usr/bin/innobackupex line 1652.
innobackupex: Error:
Error executing 'SHOW STATUS': DBD::mysql::db selectall_hashref failed: Lock wait timeout exceeded; try restarting transaction at /usr/bin/innobackupex line 1652."
"

It may be more suitable here if innobackupex did 'set session wsrep_causal_reads=0' before show status, will check with xtrabackup developers on this.

b)

It may be necessary to block even for show status since status
variables like wsrep_last_committed, for instance, depends on
application of the writeset.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Changed bug statuses to reflect:

a) 5.6 has the bug, in that SHOW must block there, currently it
doesn't.

b) Any client willing to get non-causal output when a global
wsrep-causal-reads is set (a global value of wsrep-causal-reads may not
be advisable but that is a different topic) should use session value of
wsrep-causal-reads=0.

c) b applies to backup tools too.

d) There is one another 'bug' here in that starting a new mysql client also
seemed to block. Need to investigate that.

e) If b doesn't work out, then use causal_read_timeout galera
variable in wsrep-provider-options

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Reported https://bugs.launchpad.net/percona-xtrabackup/+bug/1320441 for backup
tool as in last comment.

Revision history for this message
Yan Zhang (yan.zhang) wrote :
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-989

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.