FTWRL and wsrep_causal_reads with 'show status'

Bug #1271177 reported by Raghavendra D Prabhu on 2014-01-21
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Status tracked in 5.6
5.5
Undecided
Unassigned
5.6
Undecided
Yan Zhang
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Fix Released
High
Unassigned
5.6
Fix Released
High
Unassigned

Bug Description

When FTWRL is used on a node with wsrep_causal_reads=1 and there is DML on other nodes
then, subsequent selects/show status can hang/fail on this node.

Cloned from https://bugs.launchpad.net/codership-mysql/+bug/1126316

Example: https://bugs.launchpad.net/codership-mysql/+bug/1126316/comments/1

Related branches

This is not repeatable for 5.6 for 'SHOW' commands but for select.

For 5.5, it is repeatable for 'SHOW' as well.

Regarding blocking on select and failing later with lock_wait_timeout, it needs to be seen if this is intended when the provider on that node has been paused. The behavior of apply and commit monitors with a paused provider needs to be checked.

Also note that this returns ER_LOCK_WAIT_TIMEOUT which can be confusing since changing innodb_lock_wait_timeout has no effect on this behavior, it is the causal_read_timeout (default: 30s) which needs to be changed for this.

summary: - FTWRTL and wsrep_causal_reads
+ FTWRL and wsrep_causal_reads

It may be desirable to disable causal_reads (ie. avoid wsrep_causal_wait) when provider has been paused.

description: updated

Based on discussion we came to conclusion that having select hang
with causal_reads under FTWRL is according to behavior. Any
tuning required will be for causal_read_timeout (which is
dynamic but per-session).

So, marking 5.6 as fixed, but this still needs to be fixed in 5.5
since 'show status' hangs there as well.

summary: - FTWRL and wsrep_causal_reads
+ FTWRL and wsrep_causal_reads with 'show status'
Przemek (pmalkowski) on 2014-04-30
tags: added: i36377

The difference between 5.5 and 5.6 is in inconsistent application
of causal constraint to

  SQLCOM_SHOW_STATUS_FUNC
  SQLCOM_SHOW_STATUS
  SQLCOM_SHOW_STATUS_PROC

In 5.6, they are not applied but in 5.5 they are.

a)
As for

"
This is important as it breaks backups made with Percona Xtrabackup, as it does some SHOW STATUS queries after it sets FTWRL.
The example error from innobackupex script looks like this:
"DBD::mysql::db selectall_hashref failed: Lock wait timeout exceeded; try restarting transaction at /usr/bin/innobackupex line 1652.
innobackupex: Error:
Error executing 'SHOW STATUS': DBD::mysql::db selectall_hashref failed: Lock wait timeout exceeded; try restarting transaction at /usr/bin/innobackupex line 1652."
"

It may be more suitable here if innobackupex did 'set session wsrep_causal_reads=0' before show status, will check with xtrabackup developers on this.

b)

It may be necessary to block even for show status since status
variables like wsrep_last_committed, for instance, depends on
application of the writeset.

Changed bug statuses to reflect:

a) 5.6 has the bug, in that SHOW must block there, currently it
doesn't.

b) Any client willing to get non-causal output when a global
wsrep-causal-reads is set (a global value of wsrep-causal-reads may not
be advisable but that is a different topic) should use session value of
wsrep-causal-reads=0.

c) b applies to backup tools too.

d) There is one another 'bug' here in that starting a new mysql client also
seemed to block. Need to investigate that.

e) If b doesn't work out, then use causal_read_timeout galera
variable in wsrep-provider-options

Reported https://bugs.launchpad.net/percona-xtrabackup/+bug/1320441 for backup
tool as in last comment.

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-989

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers