Possible bug in the in MySQL + Galera and PXC when running PT-table-checksum on a slave.

Bug #1298689 reported by Marco Tusa on 2014-03-27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Undecided
Unassigned
Percona Toolkit
Undecided
Daniel Nichter

Bug Description

Current nodes setup

 node1 <-> node2 <-> node3
  |
  \/
  Slave

  The issue is that when running pt-table-checksum on node1 all the statement that pt-table-checksum execute to test the data on the nodes, are replicated with ROW replication.
  To note that I am testing on percona toolkit 2.2.7 and the version of pt-table-checksum is implementing the /*!99997*/ trick as describe at https://groups.google.com/forum/#!msg/codership-team/kKqigq31dVQ/TV-DTUI0YYYJ.

  From MySQL point of view I have tested with PXC 5.5.34 and Codership 5.5.43 (wsrep 25.9)

  When pt-table-checksum is executed both the servers print the DEBUG information,
   140327 15:38:48 [Note] WSREP: consistency check: REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc)
   SELECT /*!99997*/ 'sakila', 'staff', '1', NULL, NULL, NULL, COUNT(*) AS cnt,
   COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `staff_id`, `first_name`, `last_name`, `address_id`, `picture`, `email`,
   `store_id`, `active`, `username`, `password`, `last_update` + 0, CONCAT(ISNULL(`picture`), ISNULL(`email`), ISNULL(`password`))))
   AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`staff` /*checksum table*/

   But then the information is passed as ROW format as UPDATE:
  BINLOG '
  SH40UxMjAAAATAAAAA2SAAAAACQAAAAAAAEAB3BlcmNvbmEACWNoZWNrc3VtcwAM/v4DBA/8/P4D
  /gMHDf7A/sAEWAICAv54/nh4Bg==
  SH40UxgjAAAAeQAAAIaSAAAAACQAAAAAAAEADP////949gZzYWtpbGEFc3RhZmYBAAAACDIzMzY2
  OGFlAgAAAEh+NFNw8AZzYWtpbGEFc3RhZmYBAAAANzMqPggyMzM2NjhhZQIAAAAIMjMzNjY4YWUC
  AAAASH40Uw==
  '/*!*/;
  ### UPDATE percona.checksums
  ### WHERE
  ### @1='sakila' /* STRING(192) meta=65216 nullable=0 is_null=0 */
  ### @2='staff' /* STRING(192) meta=65216 nullable=0 is_null=0 */
  ### @3=1 /* INT meta=0 nullable=0 is_null=0 */
  ### @4=NULL /* INT meta=4 nullable=1 is_null=1 */
  ### @5=NULL /* INT meta=600 nullable=1 is_null=1 */
  ### @6=NULL /* INT meta=2 nullable=1 is_null=1 */
  ### @7=NULL /* INT meta=2 nullable=1 is_null=1 */
  ### @8='233668ae' /* STRING(120) meta=65144 nullable=0 is_null=0 */
  ### @9=2 /* INT meta=0 nullable=0 is_null=0 */
  ### @10=NULL /* INT meta=65144 nullable=1 is_null=1 */
  ### @11=NULL /* INT meta=0 nullable=1 is_null=1 */
  ### @12=1395949128 /* TIMESTAMP meta=0 nullable=0 is_null=0 */
  ### SET
  ### @1='sakila' /* STRING(192) meta=65216 nullable=0 is_null=0 */
  ### @2='staff' /* STRING(192) meta=65216 nullable=0 is_null=0 */
  ### @3=1 /* INT meta=0 nullable=0 is_null=0 */
  ### @4=0.166211 /* FLOAT meta=4 nullable=1 is_null=0 */
  ### @5=NULL /* FLOAT meta=600 nullable=1 is_null=1 */
  ### @6=NULL /* FLOAT meta=2 nullable=1 is_null=1 */
  ### @7=NULL /* FLOAT meta=2 nullable=1 is_null=1 */
  ### @8='233668ae' /* STRING(120) meta=65144 nullable=0 is_null=0 */
  ### @9=2 /* INT meta=0 nullable=0 is_null=0 */
  ### @10='233668ae' /* STRING(120) meta=65144 nullable=1 is_null=0 */
  ### @11=2 /* INT meta=0 nullable=1 is_null=0 */
  ### @12=1395949128 /* TIMESTAMP meta=0 nullable=0 is_null=0 */
  # at 37510

 Given that there is no checksum execution on the SLAVE.

  pt-table-checksum code related to the patch
  # ########################################################################
   # Checksum args and the DMS part of the checksum query for each table.
   # ########################################################################
   my %crc_args = $rc->get_crc_args(dbh => $master_dbh);
   my $checksum_dml = "REPLACE INTO $repl_table "
                    . "(db, tbl, chunk, chunk_index,"
                    . " lower_boundary, upper_boundary, this_cnt, this_crc) "
                    . "SELECT"
                    . ($cluster->is_cluster_node($master_cxn) ? ' /*!99997*/' : '')
                    . " ?, ?, ?, ?, ?, ?,";
   my $past_cols = " COUNT(*), '0'";

   HOW to replicate:
   ================================
   1) install a cluster using either PXC or MySQL/Galera both(5.5)
   2) connect another MySQL (simple) as slave to node1
   3) load a test schema
   4) remove some records on the SLAVE
   5) run pt-table-checksum as /pt-table-checksum -u<user> -p<pw> -h 192.168.0.35 -P3306 --databases sakila,test --recursion-method dsn=h=192.168.0.35,P=3306,u=<user>,p=<pw>,D=percona,t=dsns --no-check-binlog-format
      dsns table:
   (root@localhost:pm) [percona]>select * from dsns;
  +----+-----------+---------------------------------------+
  | id | parent_id | dsn |
  +----+-----------+---------------------------------------+
  | 6 | NULL | h=192.168.0.35,P=5510,u=stress,p=tool |
  +----+-----------+---------------------------------------+
  1 row in set (0.00 sec)
 6) review data from the pt-table-checksum and from the query on master/slave
  SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks
  FROM percona.checksums
  WHERE (
   master_cnt <> this_cnt
   OR master_crc <> this_crc
   OR ISNULL(master_crc) <> ISNULL(this_crc))
  GROUP BY db, tbl;

Marco Tusa (marcotusa) wrote :

Forgot to mention that changing:
(root@localhost:pm) [percona]>set global wsrep_forced_binlog_format='STATEMENT';
Query OK, 0 rows affected (0.00 sec)

It obviously work, but this mean changing a global value and all the operations are then write in statement format

> The issue is that when running pt-table-checksum on node1 all the statement that pt-table-checksum execute to test the data
 > on the nodes, are replicated with ROW replication.

This is because STATEMENT binlog format is not supported in wsrep/galera. pt-checksum possibly needs fixes for PXC in this regard.

Download full text (5.3 KiB)

Ragu,
I know very well that STATEMENT is not the official replication way for
galera.

But this is exactly what I am trying to underline, there is an
inconsistency from my point of view.

Let me explain the whole in more details.

If you check the link in the bug report you will see that there was a
"trick" design for the pt-table-checksum, trick that still there given
it report the info in the log when use.
This trick DOES not work when wsrep_forced_binlog_format=ROW.

Given it force the log to stay as ROW whatever instruction.

If we change wsrep_forced_binlog_format=STATEMENT, obviously it works,
but given the incompatibility with the GALERA replication you have 99%
that the node will crash with:
140328 22:43:41 [Warning] WSREP: BF applier failed to
open_and_lock_tables: 1666, fatal: 0 wsrep = (exec_mode: 1
conflict_state: 0 seqno: 291772803)
140328 22:43:41 [ERROR] Slave SQL: Error executing row event: 'Cannot
execute statement: impossible to write to binary log since statement is
in row format and BINLOG_FORMAT = STATEMENT.', Error_code: 1666
140328 22:43:41 [Warning] WSREP: RBR event 2 Update_rows apply warning:
1666, 291772803
140328 22:43:41 [ERROR] WSREP: Failed to apply trx: source:
1f284b88-7401-11e3-afd2-76df52ec0065 version: 2 local: 0 state: APPLYING
flags: 1 conn_id: 37912472 trx_id: 828722030 seqnos (l: 92316516, g:
291772803, s: 291772802, d: 291772802, ts: 1396064621012184336)
140328 22:43:41 [ERROR] WSREP: Failed to apply trx 291772803 10 times
140328 22:43:41 [ERROR] WSREP: Node consistency compromized, aborting...
140328 22:43:41 [Note] WSREP: Closing send monitor...

Testing wsrep_forced_binlog_format=NONE ...
pt-table-checksum is workign fine, the binary log show that the
statement are logged in STATEMENT mode:
#140329 15:03:52 server id 35 end_log_pos 1798 Query
thread_id=44936 exec_time=0 error_code=0
use `sakila`/*!*/;
SET TIMESTAMP=1396119832/*!*/;
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index,
lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!99997*/
'sakila', 'actor', '1', NULL, NULL, NULL, COUNT(*) AS cnt,
COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `actor_id`,
`first_name`, `last_name`, `last_update` + 0)) AS UNSIGNED)), 10, 16)),
0) AS crc FROM `sakila`.`actor` /*checksum table*/
/*!*/;
# at 1798
#140329 15:03:52 server id 35 end_log_pos 1825 Xid = 306
COMMIT/*!*/;

But this has NOTHING to do with that trick, but with the fact that
galera is not enforcing the ROW format at session level.

As such the pt-table-checksum is able to change it (Line 9097)
          if ( $original_binlog_format !~ /STATEMENT/i ) {
             $sql = q{/*!50108 SET @@binlog_format := 'STATEMENT'*/};
             eval {
                PTDEBUG && _d($dbh, $sql);
                $dbh->do($sql);
             };

At the same time if I have other activity running on the other node ...
or on this Master node, this activity is reported in ROW format:

root@localhost:pm) [sakila]>insert into payment3 select * from payment2;
#140329 15:18:19 server id 3 end_log_pos 4991358 Write_rows: table id
57 flags: STMT_END_F
### INSERT INTO sakila.payment3
### SET
### @...

Read more...

Changed in percona-toolkit:
assignee: nobody → Daniel Nichter (daniel-nichter)
tags: added: pxc rbr
removed: replication row
Download full text (9.6 KiB)

On 30 Mar 2014 01:15, "Marco Tusa" <email address hidden> wrote:
>
> Ragu,
> I know very well that STATEMENT is not the official replication way for
>

>
>
> But this has NOTHING to do with that trick, but with the fact that
> galera is not enforcing the ROW format at session level.

This has been fixed in 5.6 and a warning is emitted in 5.5 (no error here
for backward compatibility).

>
> As such the pt-table-checksum is able to change it (Line 9097)
> if ( $original_binlog_format !~ /STATEMENT/i ) {
> $sql = q{/*!50108 SET @@binlog_format := 'STATEMENT'*/};
> eval {
> PTDEBUG && _d($dbh, $sql);
> $dbh->do($sql);
> };
>
>
> At the same time if I have other activity running on the other node ...
> or on this Master node, this activity is reported in ROW format:
>
> root@localhost:pm) [sakila]>insert into payment3 select * from payment2;
> #140329 15:18:19 server id 3 end_log_pos 4991358 Write_rows: table id
> 57 flags: STMT_END_F
> ### INSERT INTO sakila.payment3
> ### SET
> ### @1=1 /* SHORTINT meta=0 nullable=1 is_null=0 */
> ### @2=1 /* SHORTINT meta=0 nullable=0 is_null=0 */
> ### @3=1 /* TINYINT meta=0 nullable=0 is_null=0 */
> ### @4=76 /* INT meta=0 nullable=1 is_null=0 */
> ### @5=000000002.990000000 /* DECIMAL(5,2) meta=1282 nullable=0
> is_null=0 */
> ### @6=2005-05-25 11:30:37 /* DATETIME meta=0 nullable=0 is_null=0 */
> ### @7=1140059550 /* TIMESTAMP meta=0 nullable=0 is_null=0 */
> ### INSERT INTO sakila.payment3
>
>
> Given all the above, it seems that WE CAN use pt-table-checksum, but I
> am unsure about the GALERA stability, and WHY then we have that trick.
>
> The parameter wsrep_forced_binlog_format was introduced in Galera on the
> 2011-11-08 (Seppo comment
> https://bugs.launchpad.net/codership-mysql/+bug/887564).
> The trick seems from 4/24/12 Alex comment
> (
https://groups.google.com/forum/#!msg/codership-team/kKqigq31dVQ/TV-DTUI0YYYJ
)
>
> SO it comes AFTER, the wsrep_forced_binlog_format, and as such I am tent
> to assume it was introduced because there was a problem on the use of
> wsrep_forced_binlog_format...
>
> So after all this I think we can summarize my question as:
>
> "If we have to use the pt-table-checksum, will be the usage of
> wsrep_forced_binlog_format=NONE to be consider SAFE and CONSISTENT for
> galera replication, OR we have to assume that this parameter should
> NEVER be set/modify?
>
> In the case it cannot be used, what is the way then to use
> pt-table-checksum.
> In any case what is the reason of having the /*!99997*/ trick?
> "
> Should be consider all this as a documentation BUG or an dangerous
> exposed feature or a real bug?
>
>
> ------------------------------------------------
> Marco Tusa,
> Mobile: +1-613-282-7337
> do something for your planet!
> http://www.tusacentral.net/
> ------------------------------------------------
> Save a tree... please don't print this e-mail unless you really need to
>
> On 29/03/2014 03:27, Raghavendra D Prabhu wrote:
> >> The issue is that when running pt-table-checksum on node1 all the
statement that pt-table-checksum execute to test the data
> > >...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers