Comment 3 for bug 1298689

Revision history for this message
Marco Tusa (marcotusa) wrote : Re: [Bug 1298689] Re: Possible bug in the in MySQL + Galera and PXC when running PT-table-checksum on a slave.

Ragu,
I know very well that STATEMENT is not the official replication way for
galera.

But this is exactly what I am trying to underline, there is an
inconsistency from my point of view.

Let me explain the whole in more details.

If you check the link in the bug report you will see that there was a
"trick" design for the pt-table-checksum, trick that still there given
it report the info in the log when use.
This trick DOES not work when wsrep_forced_binlog_format=ROW.

Given it force the log to stay as ROW whatever instruction.

If we change wsrep_forced_binlog_format=STATEMENT, obviously it works,
but given the incompatibility with the GALERA replication you have 99%
that the node will crash with:
140328 22:43:41 [Warning] WSREP: BF applier failed to
open_and_lock_tables: 1666, fatal: 0 wsrep = (exec_mode: 1
conflict_state: 0 seqno: 291772803)
140328 22:43:41 [ERROR] Slave SQL: Error executing row event: 'Cannot
execute statement: impossible to write to binary log since statement is
in row format and BINLOG_FORMAT = STATEMENT.', Error_code: 1666
140328 22:43:41 [Warning] WSREP: RBR event 2 Update_rows apply warning:
1666, 291772803
140328 22:43:41 [ERROR] WSREP: Failed to apply trx: source:
1f284b88-7401-11e3-afd2-76df52ec0065 version: 2 local: 0 state: APPLYING
flags: 1 conn_id: 37912472 trx_id: 828722030 seqnos (l: 92316516, g:
291772803, s: 291772802, d: 291772802, ts: 1396064621012184336)
140328 22:43:41 [ERROR] WSREP: Failed to apply trx 291772803 10 times
140328 22:43:41 [ERROR] WSREP: Node consistency compromized, aborting...
140328 22:43:41 [Note] WSREP: Closing send monitor...

Testing wsrep_forced_binlog_format=NONE ...
pt-table-checksum is workign fine, the binary log show that the
statement are logged in STATEMENT mode:
#140329 15:03:52 server id 35 end_log_pos 1798 Query
thread_id=44936 exec_time=0 error_code=0
use `sakila`/*!*/;
SET TIMESTAMP=1396119832/*!*/;
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index,
lower_boundary, upper_boundary, this_cnt, this_crc) SELECT /*!99997*/
'sakila', 'actor', '1', NULL, NULL, NULL, COUNT(*) AS cnt,
COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `actor_id`,
`first_name`, `last_name`, `last_update` + 0)) AS UNSIGNED)), 10, 16)),
0) AS crc FROM `sakila`.`actor` /*checksum table*/
/*!*/;
# at 1798
#140329 15:03:52 server id 35 end_log_pos 1825 Xid = 306
COMMIT/*!*/;

But this has NOTHING to do with that trick, but with the fact that
galera is not enforcing the ROW format at session level.

As such the pt-table-checksum is able to change it (Line 9097)
          if ( $original_binlog_format !~ /STATEMENT/i ) {
             $sql = q{/*!50108 SET @@binlog_format := 'STATEMENT'*/};
             eval {
                PTDEBUG && _d($dbh, $sql);
                $dbh->do($sql);
             };

At the same time if I have other activity running on the other node ...
or on this Master node, this activity is reported in ROW format:

root@localhost:pm) [sakila]>insert into payment3 select * from payment2;
#140329 15:18:19 server id 3 end_log_pos 4991358 Write_rows: table id
57 flags: STMT_END_F
### INSERT INTO sakila.payment3
### SET
### @1=1 /* SHORTINT meta=0 nullable=1 is_null=0 */
### @2=1 /* SHORTINT meta=0 nullable=0 is_null=0 */
### @3=1 /* TINYINT meta=0 nullable=0 is_null=0 */
### @4=76 /* INT meta=0 nullable=1 is_null=0 */
### @5=000000002.990000000 /* DECIMAL(5,2) meta=1282 nullable=0
is_null=0 */
### @6=2005-05-25 11:30:37 /* DATETIME meta=0 nullable=0 is_null=0 */
### @7=1140059550 /* TIMESTAMP meta=0 nullable=0 is_null=0 */
### INSERT INTO sakila.payment3

Given all the above, it seems that WE CAN use pt-table-checksum, but I
am unsure about the GALERA stability, and WHY then we have that trick.

The parameter wsrep_forced_binlog_format was introduced in Galera on the
2011-11-08 (Seppo comment
https://bugs.launchpad.net/codership-mysql/+bug/887564).
The trick seems from 4/24/12 Alex comment
(https://groups.google.com/forum/#!msg/codership-team/kKqigq31dVQ/TV-DTUI0YYYJ)

SO it comes AFTER, the wsrep_forced_binlog_format, and as such I am tent
to assume it was introduced because there was a problem on the use of
wsrep_forced_binlog_format...

So after all this I think we can summarize my question as:

"If we have to use the pt-table-checksum, will be the usage of
wsrep_forced_binlog_format=NONE to be consider SAFE and CONSISTENT for
galera replication, OR we have to assume that this parameter should
NEVER be set/modify?

In the case it cannot be used, what is the way then to use
pt-table-checksum.
In any case what is the reason of having the /*!99997*/ trick?
"
Should be consider all this as a documentation BUG or an dangerous
exposed feature or a real bug?

------------------------------------------------
Marco Tusa,
Mobile: +1-613-282-7337
do something for your planet!
http://www.tusacentral.net/
------------------------------------------------
Save a tree... please don't print this e-mail unless you really need to

On 29/03/2014 03:27, Raghavendra D Prabhu wrote:
>> The issue is that when running pt-table-checksum on node1 all the statement that pt-table-checksum execute to test the data
> > on the nodes, are replicated with ROW replication.
>
> This is because STATEMENT binlog format is not supported in
> wsrep/galera. pt-checksum possibly needs fixes for PXC in this regard.
>