Query "stop slave" hangs

Bug #906323 reported by Dreas van Donselaar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Confirmed
Undecided
Kristian Nielsen

Bug Description

On various of our slave servers, we had problems with replication. It appears this started with hardware issues on our central MariaDB master (power failures). Trying to stop replication with "mysql -e 'stop slave'" resulted in the query simply getting stuck. MySQL wouldn't respond anymore to any queries relating to replication (e.g. show slave status). The only way to fix this, appeared to be to remove the /var/lib/mysql/master.info file and manually setting the correct replication values again after startup. I've made a full copy of a database in such a problematic state (both /var/log/mysql and /var/lib/mysql), uploaded that to FTP. Running MariaDB 5.2.9

Revision history for this message
Elena Stepanova (elenst) wrote :

Based on the data uploaded to FTP separately, I think it's a manifestation of the bug http://bugs.mysql.com/bug.php?id=45940 (part 4) and its duplicate http://bugs.mysql.com/bug.php?id=53985 which describe exactly the same problem, hanging STOP SLAVE.

The analysis in the latter bug says that it happens when SQL thread is being stopped in a middle of transaction, while IO thread has already exited, and relates to the situation when a mix of transactional and non-transactional engines is involved (so rolling back the started group is not safe).

In our case, we have all the same elements, just due to different reasons.

According to the slave error log, on the server start the IO thread exited immediately due to ER_MASTER_FATAL_ERROR_READING_BINLOG. The previous HW problem on the master can account for that.

SQL thread started, but its position ponted at the beginning of a non-finished transaction (group). So, as the bugs above describe, it finished executing what it had and started waiting for the rest, which the IO thread of course could not provide. The error log does not even show any signs of SQL thread attempting to exit when it presumably should have received the STOP command.

What for the mix of transactional and non-transactional engines, instead of it we have different table engines on master and slave. The transaction itself apparently consisted of two DML statements only (the first was written in the binlog, the second and COMMIT weren't), so there was no mix. But the slave table is Aria, while the master table is most likely InnoDB (judging by the look of the binary log). So, since the binary log is transactional, the SQL thread treats it as such, but it also raises the flag 'modified_non_transactional_table'.

I'm assigning it to Kristofer so he could confirm (or deny), and importantly decide if there is anything to be done about it in 5.2/5.3. The original bug was fixed in 5.5, according to the bug comments.

Changed in maria:
assignee: nobody → Kristian Nielsen (knielsen)
Revision history for this message
Elena Stepanova (elenst) wrote :

Sorry typo, to Kristian of course.

Changed in maria:
status: New → Confirmed
Revision history for this message
Dreas van Donselaar (dreas-9) wrote :

The master tables are indeed (still) InnoDB.

Elena Stepanova (elenst)
tags: added: replication upstream
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.