pt-table-sync false-positive error "Cannot nibble table because MySQL chose no index instead of the PRIMARY index"

Reported by Miguel Angel Nieto on 2012-01-18
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Percona Toolkit
High
Daniel Nichter

Bug Description

If I try to use pt-table-sync on the employees database (https://launchpad.net/test-db/) to correct some mistakes I receive this error message:

# pt-table-sync --print --replicate percona.checksums u=root,p=msandbox,h=127.0.0.1,P=26165
Cannot nibble table `employees`.`dept_emp` because MySQL chose no index instead of the `PRIMARY` index at /usr/bin/pt-table-sync line 4748. while doing employees.dept_emp on SBslave1
Cannot nibble table `employees`.`salaries` because MySQL chose no index instead of the `PRIMARY` index at /usr/bin/pt-table-sync line 4748. while doing employees.salaries on SBslave1
Cannot nibble table `employees`.`dept_emp` because MySQL chose no index instead of the `PRIMARY` index at /usr/bin/pt-table-sync line 4748. while doing employees.dept_emp on SBslave2
Cannot nibble table `employees`.`salaries` because MySQL chose no index instead of the `PRIMARY` index at /usr/bin/pt-table-sync line 4748. while doing employees.salaries on SBslave2

slave1 [localhost] {msandbox} (employees) > show create table dept_emp\G
*************************** 1. row ***************************
       Table: dept_emp
Create Table: CREATE TABLE `dept_emp` (
  `emp_no` int(11) NOT NULL,
  `dept_no` char(4) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`dept_no`),
  KEY `emp_no` (`emp_no`),
  KEY `dept_no` (`dept_no`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

slave1 [localhost] {msandbox} (employees) > show create table salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `emp_no` (`emp_no`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

tags: added: chunking pt-table-sync
Changed in percona-toolkit:
status: New → Confirmed
Dave Juntgen (djuntgen) wrote :

Daniel - can you give a reason to why and maybe a quick work around?

Baron Schwartz (baron-xaprb) wrote :

I think we're going to need to make the code handle this by testing whether EXPLAIN shows not only no index chosen, but also a large number of rows (larger than expected). If for some reason it's ignoring the index but there aren't many rows, that should be OK.

Baron Schwartz (baron-xaprb) wrote :

Dave, a workaround is probably to comment out that test in the code if you're sure it's OK (as per my previous comment on this bug)

Daniel Nichter (daniel-nichter) wrote :

Dave or Miguel: can you reproduce this problem? Re Baron's comment, if the problem can be reproduced, I'd like to see the full EXPLAIN output, which PTDEBUG should show.

Changed in percona-toolkit:
importance: Undecided → Medium
milestone: none → 2.1.4
Changed in percona-toolkit:
assignee: nobody → Brian Fraser (fraserbn)
Brian Fraser (fraserbn) on 2012-08-16
Changed in percona-toolkit:
status: Confirmed → Incomplete
Brian Fraser (fraserbn) wrote :

It appears that this bug only affected earlier releases of the toolkit, so I'm going to mark it as Invalid. If someone manages to reproduce it with newer versions of pt-table-sync, please either reply here or file a new bug.

Changed in percona-toolkit:
status: Incomplete → Invalid
Changed in percona-toolkit:
milestone: 2.1.4 → none
kedar (k-vaijanapurkar) wrote :

Using percona toolkit 2.1.7 I'm still getting the error:

perl percona-toolkit-2.1.7/bin/pt-table-sync --print --verbose --set-vars wait_timeout=30000 -t xxxxxxx --ignore-columns xxxxxxx --no-foreign-key-checks --sync-to-master --chunk-size=200 --chunk-column ID h=xxxxxxx,u=xxxxxxx,p=xxxxxxx >> 12282012_4.log
Cannot nibble table `xxxxxxx`.`xxxxxxx` because MySQL chose no index instead of the `PRIMARY` index at percona-toolkit-2.1.7/bin/pt-table-sync line 5407. while doing DB.TABLE on HOST.

Do we have any workaround?

Daniel Nichter (daniel-nichter) wrote :

There's no workaround because, afaik, this is the first reported case of this happening. It's generally not safe to workaround unless you understand why MySQL has chosen no index. If the table is small, maybe MySQL thinks it's faster to scan the whole table--and it could be correct.

I'm not sure yet if this a bug or the tool doing what it's supposed to. If you can easily reproduce this and provide PTDEBUG output, we could see if it's a bug. Else, the only workaround of sorts is to comment out the if block that starts like:

      if ( $self->{nibble} == 0 && !$self->{small_table} ) {
         my $explain_index = $self->__get_explain_index($sql);
         if ( lc($explain_index || '') ne lc($s->{index}) ) {
            die 'Cannot nibble table '.$q->quote($args{database}, $args{table})

I.e. prefix each line in that whole if { } block with #. At the moment, there's no easy way to implement an official workaround.

summary: - "Cannot nibble table" when running pt-table-sync
+ pt-table-sync can't nibble table because MySQL chose no index
Changed in percona-toolkit:
status: Invalid → Triaged
assignee: Brian Fraser (fraserbn) → nobody
importance: Medium → Undecided

I have fully reproducible case here.
To repeat the problem, load the history_master.sql table to master node, then history_slave.sql to slave node. Slave will have slightly more rows then master.
Then on master:
pt-table-checksum --databases=drupal
pt-table-sync --execute --print --replicate percona.checksums localhost

Also included full query log from pt-table-sync session as well as the tool's debug log.

Daniel Nichter (daniel-nichter) wrote :

Thanks Przemek, your test case works:

Cannot nibble table `test`.`history` because MySQL chose no index instead of the `PRIMARY` index at bin/pt-table-sync line 5405. while doing test.history on 127.0.0.1

I'll look into and fix this.

Changed in percona-toolkit:
status: Triaged → In Progress
assignee: nobody → Daniel Nichter (daniel-nichter)

I found the problem. First, the differing chunk is what we can an "oob" (out-of-bounds) chunk:

master:
*************************** 3. row ***************************
            db: test
           tbl: history
         chunk: 3
    chunk_time: 0.000658
   chunk_index: PRIMARY
lower_boundary: NULL
upper_boundary: 21,21,1045
      this_crc: 0
      this_cnt: 0
    master_crc: 0
    master_cnt: 0
            ts: 2013-01-23 08:00:17

slave:
*************************** 3. row ***************************
            db: test
           tbl: history
         chunk: 3
    chunk_time: 0.000658
   chunk_index: PRIMARY
lower_boundary: NULL
upper_boundary: 21,21,1045
      this_crc: 0
      this_cnt: 49
    master_crc: 0
    master_cnt: 0
            ts: 2013-01-23 08:00:17

this_cnt differe, and notice that lower_boundary=NULL: this (or upper_boundary=NULL) signal an oob chunk. When lower_boundary=NULL it's pt-table-checksum's chunk WHERE index_cols < first_row. The reason for that is precisely this test case:

master has row 5,6,7,8
slave has rows 1,2,3,4,5,6,7,8

So doing WHERE rows < 5 on master will detect the out-of-sync rows 1,2,3,4 on the slave.

Problem is, pt-table-checksum and pt-table-sync use different chunking methods. We make them compatible by converting ptc boundaries to pts boundaries in sub diff_where(), but that sub is not currently handling oob chunks correctly, resulting in a chunk like:

SELECT /*nibble boundary 0*/ `uid`,`nid`,`timestamp` FROM `test`.`history` FORCE INDEX (`PRIMARY`) WHERE (((`uid` > NULL) OR (`uid` = NULL AND `nid` >= NULL)) AND ((`uid` < '21') OR (`uid` = '21' AND `nid` <= '1045'))) ORDER BY `uid`,`nid` LIMIT 999, 1

which is an impossible WHERE clause, hence MySQL chooses no index.

So the fix is making diff_where() handle oob chunks correctly.

summary: - pt-table-sync can't nibble table because MySQL chose no index
+ pt-table-sync false-positive error "Cannot nibble table because MySQL
+ chose no index instead of the PRIMARY index"
tags: added: false-positive-error
Changed in percona-toolkit:
importance: Undecided → High
milestone: none → 2.1.9

This has been tested and fixed, will be released in2.1.9.

Changed in percona-toolkit:
status: In Progress → Fix Committed
Changed in percona-toolkit:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers