pt-archiver doesn't reconnect and retry its SELECT queries

Reported by Baron Schwartz on 2012-09-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit
Undecided
Unassigned

Bug Description

pt-archiver queries are likely to be killed by automatic query killers. (Ask me how I know.) The tool then exits with a message such as the following:

DBD::mysql::st execute failed: Lost connection to MySQL server during query [for Statement "SELECT /*!40001 SQL_NO_CACHE ...

Where the code in question is the $get_sth execute here:

         PTDEBUG && _d('Fetching rows in next chunk');
         trace('select', sub {
            my $select_start = time;
            $get_sth->execute(@{$lastrow}[@asc_slice]);
            $last_select_time = time - $select_start;
            PTDEBUG && _d('Fetched', $get_sth->rows, 'rows');
            $statistics{SELECT} += $get_sth->rows;
         });

This makes the tool not very resilient. I think it should catch errors there and reconnect, then try again --retries times.

Baron Schwartz (baron-xaprb) wrote :

It also needs to retry the initial SELECT:

4335 $get_sth = $get_first; # Later it may be assigned $get_next
4336 trace('select', sub {
4337 my $select_start = time;
4338 $get_sth->execute;
4339 $last_select_time = time - $select_start;
4340 $statistics{SELECT} += $get_sth->rows;
4341 });

That can get killed easily, too, if it runs for a long time (which it does sometimes).

Changed in percona-toolkit:
status: New → Triaged
tags: added: error-recovery
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers