Percona Server crash-safe replication conflicts with WSREP crash recovery

Bug #1182441 reported by Raghavendra D Prabhu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Released
High
Raghavendra D Prabhu

Bug Description

This should only affect a PXC node which is a slave to an async master.

This is because:

#ifdef WITH_WSREP
/* We hijack TRX_SYS_MYSQL_MASTER_LOG_INFO, it seems to be completely unused
   otherwise (see comments for MySQL bug #34058). */
/** */
#define TRX_SYS_WSREP_XID_INFO TRX_SYS_MYSQL_MASTER_LOG_INFO

in trx0sys.h

TRX_SYS_MYSQL_MASTER_LOG_INFO is also used in slave recovery in
PS code.

Related branches

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

The quickest solution to this is to add a '} else {' so that they
are mutually exclusive with priority given to wsrep recovery.

Changed in percona-xtradb-cluster:
milestone: none → 5.5.31-24.8
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@codership, Note that this affects MariaDB Galera cluster too since they use xtradb as well.

#define TRX_SYS_MYSQL_MASTER_LOG_INFO (UNIV_PAGE_SIZE - 2000)
#define TRX_SYS_MYSQL_RELAY_LOG_INFO (UNIV_PAGE_SIZE - 1500)
#define TRX_SYS_COMMIT_MASTER_LOG_INFO (UNIV_PAGE_SIZE - 3000)
#define TRX_SYS_COMMIT_RELAY_LOG_INFO (UNIV_PAGE_SIZE - 2500)
#define TRX_SYS_MYSQL_LOG_INFO (UNIV_PAGE_SIZE - 1000)
#define TRX_SYS_WSREP_XID_INFO TRX_SYS_MYSQL_MASTER_LOG_INFO
#define TRX_SYS_WSREP_XID_LEN (4 + 4 + 4 + XIDDATASIZE)
#define TRX_SYS_DOUBLEWRITE (UNIV_PAGE_SIZE - 200)
#define TRX_SYS_FILE_FORMAT_TAG (UNIV_PAGE_SIZE - 16)

Is it possible to not use any other offset at all? Looks like there is a big gap between UNIV_PAGE_SIZE - 200 and UNIV_PAGE_SIZE 1000 for XID of length 140.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

As discussed in the meeting, I will check on how innodb_overwrite_relay_log_info affects this.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

OK, I checked but the conflict happens irrespective of the value
of innodb_overwrite_relay_log_info.

For interested, look at trx0trx.c for details.
(innodb_overwrite_relay_log_info works in ha_innodb.cc).

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

So additional work is done in XtraDB irrespective of innodb_overwrite_relay_log_info being enabled or not ?

Changed in percona-xtradb-cluster:
importance: Undecided → High
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Ovais, yes, you can look at trx_write_serialisation_history for details.

Changed in percona-xtradb-cluster:
assignee: nobody → Raghavendra D Prabhu (raghavendra-prabhu)
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

From trx_sys.rb:

# The basic structure of a TRX_SYS page is: FIL header, TRX_SYS header,
# empty space, master binary log information, empty space, local binary
# log information, empty space, doublewrite information (repeated twice),
# empty space, and FIL trailer.

So, there is plenty of empty space between TRX_SYS header, as suggested earlier UNIV_PAGE_SIZE - 3500 should be good enough. (for total XID size of about 150 bytes)

Changed in percona-xtradb-cluster:
status: New → Fix Committed
Changed in percona-xtradb-cluster:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-965

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.