pt-archiver dies with "MySQL server has gone away" when innodb_kill_idle_transaction set to low value and check-slave-lag used

Bug #1449226 reported by Sveta Smirnova
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Medium
Frank Cizmich

Bug Description

How to repeat.

1. Install Percona server.
2. cd mysql-test
3. ./mtr --start --suite=rpl rpl_alter --mysqld=--innodb_kill_idle_transaction=60 &
4. Connect to slave, setup it:

CHANGE MASTER TO master_host='127.0.0.1', master_port=13000, master_user='root', master_password='';
START SLAVE SQL_THREAD;

Do not start IO thread yet!

5. Create test data on master:

mysql> CREATE TABLE test (
    -> `id` int(11) NOT NULL AUTO_INCREMENT,
    -> `nonce` varchar(255) DEFAULT NULL,
    -> `timestamp` int(11) DEFAULT NULL,
    -> `created_at` datetime DEFAULT NULL,
    -> `updated_at` datetime DEFAULT NULL,
    -> `site_id` int(11) unsigned DEFAULT NULL,
    -> PRIMARY KEY (`id`),
    -> UNIQUE KEY `index_oauth_nonces_on_nonce_and_timestamp` (`nonce`,`timestamp`),
    -> KEY `oauth_nonces_ix_site_id` (`site_id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into test(nonce,`timestamp`,created_at,updated_at,site_id) values(md5(rand()),now(),now(),now(),1);
insert into test(nonce,`timestamp`,created_at,updated_at,site_id) select nonce,`timestamp`,created_at,updated_at,site_id from test;

Repeat last statements until you have 2,000,000 rows.

6. Run pt-archiver with following options:

pt-archiver --purge --quick --no-check-charset --statistics --buffer --bulk-delete --primary-key-only --max-lag 1 --check-interval 1 --check-slave-lag h=127.0.0.1,P=13002,u=root --source h=127.0.0.1,P=13001,u=root,D=test,t=test --where="id<=1300000" --limit 1000 --progress 1000 --commit-each

7. You will see something like:

TIME ELAPSED COUNT
2015-04-27T22:10:30 0 0

8. Wait 61 or more seconds, then start slave IO thread.

9. After slave catches up with master you will see something like:

2015-04-27T22:12:38 128 1000
DBD::mysql::st execute failed: MySQL server has gone away [for Statement "DELETE QUICK FROM `test`.`test` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (id<=1300000) LIMIT 1000" with ParamValues: 0='969118', 1='970117'] at /home/sveta/Downloads/percona-toolkit-2.2.13/bin/pt-archiver line 6253.

tags: added: i52898
tags: added: pt-archiver
Changed in percona-toolkit:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Frank Cizmich (frank-cizmich)
milestone: none → 2.3.1
Revision history for this message
Michael Wang (xw73) wrote :

--set-vars=innodb_kill_idle_transaction=10000 does *not* work because innodb_kill_idle_transaction it is not a session variable. Sending a dummy SQL command such as "select null from dual" in the code during the sleep *does* work.

Revision history for this message
Michael Wang (xw73) wrote :

Adding these two lines as shown below fixes the bug:

          while ( !defined $lag || $lag > $o->get('max-lag') ) {
             PTDEBUG && _d('Sleeping: slave lag is', $lag);
             sleep($o->get('check-interval'));
             $lag = $ms->get_slave_lag($lag_dbh);
+ $src->{dbh}->do("SELECT null FROM dual") if $src;
+ $dst->{dbh}->do("SELECT null FROM dual") if $dst;
          }

Revision history for this message
Frank Cizmich (frank-cizmich) wrote :

fixed as suggested

Changed in percona-toolkit:
status: Confirmed → Fix Committed
Changed in percona-toolkit:
status: Fix Committed → Fix Released
tags: added: i59430
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-680

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.