pt-online-schema-change is stuck when the table that is being altered is filtered out in the slave

Bug #1730168 reported by Jaime Sicam
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Confirmed
Undecided
Unassigned

Bug Description

pt-online-schema-change --execute --alter="engine=innodb" --no-check-replication-filters h=127.0.0.1,P=20594,u=root,p=msandbox,D=employees,t=titles --skip-check-slave-lag=h=127.0.0.1,P=20595,u=root,p=msandbox
Found 2 slaves:
kits-desktop -> SBslave1:20595
kits-desktop -> SBslave2:20596
Will check slave lag on:
kits-desktop -> SBslave1:20595
kits-desktop -> SBslave2:20596
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `employees`.`titles`...
Creating new table...
Created new table employees._titles_new OK.
Waiting forever for new table `employees`.`_titles_new` to replicate to kits-desktop...
Waiting for kits-desktop: 0% 00:00 remain
Waiting for kits-desktop: 0% 00:00 remain
Waiting for kits-desktop: 0% 00:00 remain
Waiting for kits-desktop: 0% 00:00 remain
Waiting for kits-desktop: 0% 00:00 remain
Waiting for kits-desktop: 0% 00:00 remain
Waiting for kits-desktop: 0% 00:00 remain
Waiting for kits-desktop: 0% 00:00 remain
Waiting for kits-desktop: 0% 00:00 remain

To reproduce, add a filter on the database of the slave's my.cnf config and restart:
replicate-wild-ignore-table=employees.%

Then try running pt-online-schema-change on a table of the filtered database on the master:
pt-online-schema-change --execute --alter="engine=innodb" --no-check-replication-filters h=127.0.0.1,P=20594,u=root,p=msandbox,D=employees,t=titles

Tags: i211081
Jaime Sicam (jssicam)
Changed in percona-toolkit:
status: New → Confirmed
Revision history for this message
Carlos Salguero (carlos-salguero) wrote :

What's the expected behavior?
The documentation says:

If the replicas are configured with any filtering options, you should be careful not to modify any databases or tables that exist on the master and not the replicas, because it could cause replication to fail. For more information on replication rules, see http://dev.mysql.com/doc/en/replication-rules.html.

Revision history for this message
Jaime Sicam (jssicam) wrote :

Carlos,

Is it feasible to ignore if the table exists on the slave and still check for slave lag? The workaround is recursion-method=none but slave will lag since it's no longer checked.

Based on the code, it looks like if the tool detects if slave(s) exist, it will check if the created table is on the slave:

   if ( $slaves && scalar @$slaves ) {
      foreach my $slave (@$slaves) {
         my ($pr, $pr_first_report);
         if ( $o->get('progress') ) {
            $pr = new Progress(
               jobsize => scalar @$slaves,
               spec => $o->get('progress'),
               name => "Waiting for " . $slave->name(),
            );
            $pr_first_report = sub {
               print "Waiting forever for new table $new_tbl->{name} to replicate "
                  . "to " . $slave->name() . "...\n";
            };
         }
         $pr->start() if $pr;
         my $has_table = 0;
         while ( !$has_table ) {
            $has_table = $tp->check_table(
               dbh => $slave->dbh(),
               db => $new_tbl->{db},
               tbl => $new_tbl->{tbl}
            );
            last if $has_table;
            $pr->update(
               sub { return 0; },
               first_report => $pr_first_report,
            ) if $pr;
            sleep 1;
         }
      }
   }

But when checking for replication lag it only checks "Seconds behind master".

sub get_slave_lag {
   my ( $self, $dbh ) = @_;
   my $stat = $self->get_slave_status($dbh);
   return unless $stat; # server is not a slave
   return $stat->{seconds_behind_master};
}

So, is it possible to just ignore if the table exists on the slave or is this difficult to fix?

Revision history for this message
Carlos Salguero (carlos-salguero) wrote :

Hi,

Please give me a couple of days to analyze this before giving you an answer.

Regards

tags: added: i211081
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-1455

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.