pt-archiver doesn't reconnect and retry its SELECT queries

Bug #1046483 reported by Baron Schwartz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Triaged
Undecided
Unassigned

Bug Description

pt-archiver queries are likely to be killed by automatic query killers. (Ask me how I know.) The tool then exits with a message such as the following:

DBD::mysql::st execute failed: Lost connection to MySQL server during query [for Statement "SELECT /*!40001 SQL_NO_CACHE ...

Where the code in question is the $get_sth execute here:

         PTDEBUG && _d('Fetching rows in next chunk');
         trace('select', sub {
            my $select_start = time;
            $get_sth->execute(@{$lastrow}[@asc_slice]);
            $last_select_time = time - $select_start;
            PTDEBUG && _d('Fetched', $get_sth->rows, 'rows');
            $statistics{SELECT} += $get_sth->rows;
         });

This makes the tool not very resilient. I think it should catch errors there and reconnect, then try again --retries times.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

It also needs to retry the initial SELECT:

4335 $get_sth = $get_first; # Later it may be assigned $get_next
4336 trace('select', sub {
4337 my $select_start = time;
4338 $get_sth->execute;
4339 $last_select_time = time - $select_start;
4340 $statistics{SELECT} += $get_sth->rows;
4341 });

That can get killed easily, too, if it runs for a long time (which it does sometimes).

Changed in percona-toolkit:
status: New → Triaged
tags: added: error-recovery
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-1019

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.