pt-upgrade test breaks replication

Bug #940972 reported by Baron Schwartz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Undecided
Daniel Nichter

Bug Description

The pt-upgrade test suite breaks replication on the test environment, causing subsequent tests to fail. This affects at least MySQL 5.5.20. After running the tests, we have:

12345
12346
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
12347
             Slave_IO_Running: Yes
            Slave_SQL_Running: No

The errors in the server's log:

120225 10:11:04 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statement accesses nontransactional table as well as transactional or temporary table, and writes to any of them. Statement: INSERT INTO film VALUES (1,'ACADEMY DINOSAUR','A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies',2006,1,NULL,6,'0.99',86,'20.99','PG','Deleted Scenes,Behind the Scenes','2006-02-15 05:03:42'),(2,'ACE GOLDFINGER','A Astounding Epistle of a Database Administrator And a Explorer who must Find a Car in Ancient China',2006,1,NULL,3,'4.99',48,'12.99','G','Trailers,Deleted Scenes','2006-02-15 05:03:42'),(3,'ADAPTATION HOLES','A Astounding Reflection of a Lumberjack And a Car who must Sink a Lumberjack in A Baloon Factory',2006,1,NULL,7,'2.99',50,'18.99','NC-17','Trailers,Deleted Scenes','2006-02-15 05:03:42'),(4,'AFFAIR PREJUDICE','A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank',2006,1,NULL,5,'2.99',117
120225 10:15:00 [ERROR] Slave SQL: Error 'Table 't' already exists' on query. Default database: 'test'. Query: 'CREATE TABLE `t` (
  `id` int(10) NOT NULL,
  `name` varchar(255) default NULL,
  `last_login` datetime default NULL,
  PRIMARY KEY (`id`)
)', Error_code: 1050

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Interestingly, if I "prove" t/pt-upgrade, all tests pass, and the replication breaks.

If I "perl" each .t file one at a time, basics.t has 3 failures, beginning with "Can't create database 'test'; database exists"

All other tests pass OK when run with "perl".

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

OK, now it's clear what is happening. The test assumes that the server running on port 12347 is a "second master" but it's not, it's replicating from the first "master".

If no server is running on port 12347, this works fine -- the test starts one as a standalone server. But if it's already running, it breaks because the test starts making changes to the 12347 instance that conflict with replication. We need another sandbox instance to be a real "master" instead of hijacking a slave and treating it like one.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

This patch fixes the breaking replication, but now the tests fail, because the tests appear to halfway rely on the second server being a replica of the first, and halfway not.

Changed in percona-toolkit:
assignee: nobody → Daniel Nichter (daniel-nichter)
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

It's not; it still breaks for me when testing against MySQL 5.0.

I think we need our tests to have another utility function, something like "verify no writes on replica" by doing FLUSH MASTER LOGS before running the test, then afterwards, using mysqlbinlog to look for writes in the binlog that originated on the replica.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

As I'm running through the test suite there are a lot of things that are stopping on failed replication. I've had to restart it several times in the last run against the branch you mentioned. I think it really is mandatory to have a check that the test didn't break replication; and tests for tools that do anything with replication, such as pt-table-checksum and pt-heartbeat, probably need to have a "replication_is_ok()" test first thing, which either passes or the whole test bails out. Otherwise we will never be able to run these reliably in an automated fashion; test runs will just hang and not complete.

tags: added: breaks-replication
tags: added: mysql-5.5
tags: added: test-failure
Changed in percona-toolkit:
status: Confirmed → In Progress
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

I've tested this on 5.0, 5.1, and 5.5 (using 2.0 branch) and it no longer breaks replication. I'm pretty certain it was fixed in lp:~percona-toolkit-dev/percona-toolkit/stabilize-test-suite which was merged into 2.0. I'll untarget this bug from 2.0.4 for now and keep it open in case you can reproduce the problem still.

Changed in percona-toolkit:
milestone: 2.0.4 → none
Changed in percona-toolkit:
importance: High → Undecided
Brian Fraser (fraserbn)
Changed in percona-toolkit:
milestone: none → 2.1.2
status: In Progress → Fix Committed
milestone: 2.1.2 → none
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-942

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.