pt-online-schema-change eating portion of a table

Bug #1709650 reported by Rick Pizzi
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Won't Fix
Undecided
Unassigned

Bug Description

We have a situation where pt-online-schema-change is misbehaving, and the altered table is missing about 1/5 of the rows in the original table.

Please consider the following table:

CREATE TABLE `RICK_HOP` (
  `ID` bigint(20) unsigned NOT NULL,
  `ID_BOOKING` bigint(20) unsigned NOT NULL DEFAULT '0' ,
  `ID_FLIGHT` int(1) unsigned NOT NULL DEFAULT '1',
  `ID_LEG` int(11) unsigned NOT NULL DEFAULT '0' ,
  `ID_HOP` int(11) unsigned NOT NULL DEFAULT '0' ,
  `DEPARTURE_AIRPORT` char(3) DEFAULT NULL ,
  `DEPARTURE_DATE` datetime DEFAULT NULL ,
  `DEPARTURE_TERMINAL` varchar(20) DEFAULT NULL ,
  `ARRIVAL_AIRPORT` char(3) DEFAULT NULL ,
  `ARRIVAL_DATE` datetime DEFAULT NULL ,
  `ARRIVAL_TERMINAL` varchar(20) DEFAULT NULL ,
  `ID_AIRLINE` char(3) DEFAULT NULL ,
  `ID_AIRLINE_OPERATOR` char(3) DEFAULT '' ,
  `PNR` varchar(20) DEFAULT '' ,
  `CLASS_AVAILABILITY` char(1) DEFAULT NULL ,
  `CLASS_TO_BOOK` char(1) DEFAULT '' ,
  `FARE_BASIS` varchar(20) DEFAULT NULL ,
  `CLASS_TO_BOOK_C` char(1) DEFAULT '' ,
  `FARE_BASIS_C` varchar(20) DEFAULT NULL ,
  `CLASS_TO_BOOK_I` char(1) DEFAULT '' ,
  `FARE_BASIS_I` varchar(20) DEFAULT NULL ,
  `SEATS` int(2) NOT NULL DEFAULT '0' ,
  `DAY_CHANGE` tinyint(1) DEFAULT '0' ,
  `FLIGHT_CONNECTION` tinyint(1) DEFAULT '0' ,
  `STOPS` int(11) DEFAULT '0' ,
  `AIRPORT_CHANGE` tinyint(1) DEFAULT '0' ,
  `ELAPSED_FLIGHT_TIME` varchar(20) DEFAULT '' ,
  `JOURNAY_DURATION` varchar(20) DEFAULT '' ,
  `ELECTRONIC_TICKETING` tinyint(1) DEFAULT '0' ,
  `STATUS` varchar(20) NOT NULL DEFAULT '',
  `FLIGHT_NUMBER` varchar(15) DEFAULT NULL ,
  `AIRCRAFT` varchar(50) DEFAULT NULL ,
  `VALUE1` varchar(255) CHARACTER SET utf8 DEFAULT NULL ,
  `VALUE2` varchar(255) CHARACTER SET utf8 DEFAULT NULL ,
  `ADT_BAGGAGE_ALLOWANCE` varchar(5) DEFAULT NULL ,
  `CHD_BAGGAGE_ALLOWANCE` varchar(5) DEFAULT NULL ,
  `INF_BAGGAGE_ALLOWANCE` varchar(5) DEFAULT NULL ,
  `LAST_UPDATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  UNIQUE KEY `unique_idx_booking_hop` (`ID_BOOKING`,`ID_FLIGHT`,`ID_LEG`,`ID_HOP`),
  KEY `DEPARTURE_DATE` (`DEPARTURE_DATE`),
  KEY `DEPAIR_DEPDATE` (`DEPARTURE_AIRPORT`,`DEPARTURE_DATE`),
  KEY `LAST_UPDATE` (`LAST_UPDATE`),
  KEY `ID` (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Table contains 150,000 rows.

The first column 'ID' is candidate to become an auto_increment primary key for the table.
First 120,000 rows in the table have an increasing value for ID from 1 to 120,000 and we are using the following command line to change the ID column into an auto_increment primary key:

pt-online-schema-change --execute --user rpizzi --ask-pass --no-check-plan --max-lag 300 --no-check-replication-filter --no-check-plan --alter "MODIFY COLUMN ID BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, DROP INDEX ID, AUTO_INCREMENT=500000000" ...

After the above completes successfully, the resulting table has 29,999 less rows than then original one.

This also happens with latest release (pt-online-schema-change 3.0.4).

If we run the above ALTER command directly it works as intended (the 30,000 rows with ID=0 are correctly populated with auto incremental values).

We have noted that running without the "--no-check-plan" flag causes pt-online-schema-change to complain about MySQL only using a portion of the unique index (12 bytes out of 20) but this should not be a problem on this small table so we disabled the check.

We are attaching a dump for the table that you can use to run the above command line and verify the bug.

Revision history for this message
Rick Pizzi (pizzi) wrote :
affects: percona-server → percona-toolkit
Revision history for this message
Rick Pizzi (pizzi) wrote :

If we run the same with ALL rows having ID=0, the tool will only copy ONE row!!

Revision history for this message
Rick Pizzi (pizzi) wrote :

Some investigation brought me to the reason for the tool not copying the rows.
It only copies rows with ID > 0, then it copies the first row which has ID = 0
but for some reason this row doesn't get an AUTO INCREMENT value assigned but is rather
stored with ID=0 in the new table!
As a consequence, all future inserts by the tool are ignored (due to the IGNORE clause in
the INSERT statement that pt-online-schema-change uses).

If we comment out the IGNORE clause from the tool binary and run it again, we get:

017-08-09T17:40:20 Error copying rows from `rick`.`RICK_HOP` to `rick`.`_RICK_HOP_new`: 2017-08-09T17:40:20 DBD::mysql::st execute failed: Duplicate entry '0' for key 'PRIMARY' [for Statement "INSERT LOW_PRIORITY INTO `rick`.`_RICK_HOP_new` (`id`, `id_booking`, `id_flight`, `id_leg`, `id_hop`, `departure_airport`, `departure_date`, `departure_terminal`....

In fact, looks like the first row which gets copied, is copied with ID=0.
Since it is perfectly legit to use zero in an auto_increment column to get a new value, this is
quite surprising.... but it explains why the tool eats all subsequent rows. And why it only copies one row if all rows in source table have ID=0.

Revision history for this message
Rick Pizzi (pizzi) wrote :

Found out that pt-online-schema-change sets the NO_AUTO_VALUE_ON_ZERO SQL_MODE option!!
This explain.

Can you please tell me the reason for such choice?

Revision history for this message
Carlos Salguero (carlos-salguero) wrote :

Thanks for the bug report and for investigating it.

Even when it is not a good practice to have a primary key value=0, pt-online-schema-change needs copy all rows from the source table, even the row having PK=0.
What I can do is:
1) Document this behavior.
2) Maybe I can add an option to suppress setting NO_AUTO_VALUE_ON_ZERO.

Would that work for you?

Revision history for this message
Rick Pizzi (pizzi) wrote :

I would be interested in knowing why the tool explicitly sets NO_AUTO_VALUE_ON_ZERO.
An option to disable this behaviour would be nice. It would be also worth mentioning in the manual that the NO_AUTO_VALUE_ON_ZERO is the default, because this is not something expected IMHO.

Thanks!
Rick

Changed in percona-toolkit:
status: New → Won't Fix
Revision history for this message
Carlos Salguero (carlos-salguero) wrote :
Revision history for this message
Carlos Salguero (carlos-salguero) wrote :

Disregard the last comment please.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-1439

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.