pt-archiver doesn't reconnect and retry its SELECT queries
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Triaged
|
Undecided
|
Unassigned |
Bug Description
pt-archiver queries are likely to be killed by automatic query killers. (Ask me how I know.) The tool then exits with a message such as the following:
DBD::mysql::st execute failed: Lost connection to MySQL server during query [for Statement "SELECT /*!40001 SQL_NO_CACHE ...
Where the code in question is the $get_sth execute here:
PTDEBUG && _d('Fetching rows in next chunk');
my $select_start = time;
PTDEBUG && _d('Fetched', $get_sth->rows, 'rows');
});
This makes the tool not very resilient. I think it should catch errors there and reconnect, then try again --retries times.
Changed in percona-toolkit: | |
status: | New → Triaged |
tags: | added: error-recovery |
It also needs to retry the initial SELECT:
4335 $get_sth = $get_first; # Later it may be assigned $get_next
4336 trace('select', sub {
4337 my $select_start = time;
4338 $get_sth->execute;
4339 $last_select_time = time - $select_start;
4340 $statistics{SELECT} += $get_sth->rows;
4341 });
That can get killed easily, too, if it runs for a long time (which it does sometimes).