main.percona_bug1008609 breaks any replication test further in the same run

Bug #1515602 reported by Laurynas Biveinis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.6
New
High
Unassigned
5.7
Fix Released
High
Laurynas Biveinis

Bug Description

When main.percona_bug1008609 completes, any replication test on any MTR worker starts failing with slaves being unable to connect to master. It shows up as, for example

main.auth_rpl [ fail ]
        Test ended at 2015-11-12 14:06:00

CURRENT_TEST: main.auth_rpl
mysqltest: In included file ./include/wait_for_slave_param.inc at line 156:
included from ./include/wait_for_slave_io_to_start.inc at line 44:
included from ./include/wait_for_slave_to_start.inc at line 30:
included from ./include/start_slave.inc at line 46:
included from ./include/rpl_for_each_connection.inc at line 63:
included from ./include/rpl_start_slaves.inc at line 30:
included from ./include/rpl_init.inc at line 463:
included from ./include/master-slave.inc at line 51:
included from /Users/laurynas/percona/mysql-server/mysql-test/t/auth_rpl.test at line 4:
At line 156: Timeout in include/wait_for_slave_param.inc

The result from queries just before the failure was:
< snip >
relaylog_name = 'No such row'
SHOW RELAYLOG EVENTS IN 'No such row';
Log_name Pos Event_type Server_id End_log_pos Info

**** slave_relay_info on server_1 ****
SELECT * FROM mysql.slave_relay_log_info;
Number_of_lines Relay_log_name Relay_log_pos Master_log_name Master_log_pos Sql_delay Number_of_workers Id Channel_name

**** slave_master_info on server_1 ****
SELECT * FROM mysql.slave_master_info;
Number_of_lines Master_log_name Master_log_pos Host User_name User_password Port Connect_retry Enabled_ssl Ssl_ca Ssl_capath Ssl_cert Ssl_cipher Ssl_key Ssl_verify_server_cert Heartbeat Bind Ignored_server_ids Uuid Retry_count Ssl_crl Ssl_crlpath Enabled_auto_position Channel_name

**** mysql.gtid_executed on server_1 ****
SELECT * FROM mysql.gtid_executed;
source_uuid interval_start interval_end
rpl_topology= 1->2
rand_seed: '' _rand_state: ''
extra debug info if any: ''
rpl_topology=1->2
connection server_2;
safe_process[91283]: Child process: 91284, exit: 1

Checking the error log of the slave shows failure to connect to master because server uuid is identical to that master.

This is caused by main.percona_bug1008609 running mysqld --bootstrap with --datadir pointing to the master MTR datadir, which is later copied to the worker master and any slave working directories. These datadir contains an auto.cnf, containing a server UUID, created during bootstrap. MTR deletes that file after the initial --bootstrap run so that later any servers started generate their own UUIDs. MTR also re-copies the master datadir for servers to use as needed.

And percona_bug1008609 re-creates auto.cnf with an UUID, which is then copied for the servers to use, resulting in identical UUIDs for masters and slaves.

Tags: ci
tags: added: ci
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-942

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.