MariaDB 5.1.55 -> 5.3.5 replication fail

Reported by Alexey Zilber on 2012-03-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Maria
Incomplete
Undecided
Unassigned

Bug Description

Have a master+slave MariaDB 5.1.55 replicating using statement binlog format, all fine.

Added MariaDB 5.3.5 slave, get this error:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master'

This is unrelated to max packet. max_allowed_packet is 16M on mariadb 5.1.55. It was the same on 5.3.55. Attempted to increase it to 32M to no avail.

From the error log:

120314 8:51:18 [Warning] Slave I/O: Notifying master by SET @master_binlog_checksum= @@global.binlog_checksum failed with error: Unknown system variable 'binlog_checksum', Error_code: 1193
120314 8:51:18 [ERROR] Error reading packet from server: log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master ( server_errno=1236)
120314 8:51:18 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master', Error_code: 1236

attempting to turn off checksums using info from here: http://kb.askmonty.org/en/binlog-event-checksums
Has no effect.

Elena Stepanova (elenst) wrote :

Hi Alexey,

The warning about binlog_checksum seems reasonable, it just means that master does not know about binlog_checksum functionality, which is true since your master server is 5.1.55.

In regard to the real issue (log event entry exceeded max_allowed_packet), could you please provide your master binlog on which the replication breaks, full contents of slave error log file, and cnf files from master and slave? You can FTP them to ftp://ftp.askmonty.org.

Thank you.

Hi Elena,

  Based on the logs, it would seem that when the master receives a
binlog_checksum query (even though I turned those off), it throws an error
and replication breaks. According to the docs, replication is compatible
from earlier versions to newer versions of mysql/Mariadb. This makes me
think it's not so. I'm unable to provide the log file at this time. I'm
reverting back to MariaDB 5.2.10 for the meantime, as I need to migrate the
db.

-Alex

On Wed, Mar 14, 2012 at 6:05 PM, Elena Stepanova
<email address hidden>wrote:

> Hi Alexey,
>
> The warning about binlog_checksum seems reasonable, it just means that
> master does not know about binlog_checksum functionality, which is true
> since your master server is 5.1.55.
>
> In regard to the real issue (log event entry exceeded
> max_allowed_packet), could you please provide your master binlog on
> which the replication breaks, full contents of slave error log file, and
> cnf files from master and slave? You can FTP them to
> ftp://ftp.askmonty.org.
>
> Thank you.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/955057
>
> Title:
> MariaDB 5.1.55 -> 5.3.5 replication fail
>
> Status in Maria:
> New
>
> Bug description:
> Have a master+slave MariaDB 5.1.55 replicating using statement binlog
> format, all fine.
>
> Added MariaDB 5.3.5 slave, get this error:
>
>
> Last_IO_Error: Got fatal error 1236 from master when reading data from
> binary log: 'log event entry exceeded max_allowed_packet; Increase
> max_allowed_packet on master'
>
>
> This is unrelated to max packet. max_allowed_packet is 16M on mariadb
> 5.1.55. It was the same on 5.3.55. Attempted to increase it to 32M to no
> avail.
>
> From the error log:
>
>
> 120314 8:51:18 [Warning] Slave I/O: Notifying master by SET
> @master_binlog_checksum= @@global.binlog_checksum failed with error:
> Unknown system variable 'binlog_checksum', Error_code: 1193
> 120314 8:51:18 [ERROR] Error reading packet from server: log event entry
> exceeded max_allowed_packet; Increase max_allowed_packet on master (
> server_errno=1236)
> 120314 8:51:18 [ERROR] Slave I/O: Got fatal error 1236 from master when
> reading data from binary log: 'log event entry exceeded max_allowed_packet;
> Increase max_allowed_packet on master', Error_code: 1236
>
> attempting to turn off checksums using info from here:
> http://kb.askmonty.org/en/binlog-event-checksums
> Has no effect.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maria/+bug/955057/+subscriptions
>

Elena Stepanova (elenst) wrote :

Hi Alex,

The warning always appears in the slave error log when you have a master that does not recognize the variable; it does not normally break replication. I have 5.1.55 -> 5.3.5 replication setup, it does indeed produce the warning on startup, but continues working as expected.

The query that you are talking about
SET @master_binlog_checksum= @@global.binlog_checksum
is generic, it is executed regardless of the variable value on slave. This way slave finds out whether master is capable of writing binlog checksums, and it helps the dump thread initialize properly.

The fact that it happened the same second as the error that broke replication is irrelevant because, as I understand, it was the moment when your slave threads were initializing. I suppose right before the warning you had the record similar to
[Note] Slave I/O thread: connected to master .... ,replication started in log

That is, the IO thread connected (wrote the note), went through the handshake (wrote the warning), started reading binlog events and found a bad one, which caused the error.

So, while without looking at the logs we cannot completely rule out any relation between the warning and the packet error, it seems rather unlikely. At the same time, regardless binlog checksums, the problem 'log event entry exceeded max_allowed_packet' has a long history in MySQL:
http://bugs.mysql.com/bug.php?id=8215 -- bug of 2005
http://www.linuxquestions.org/questions/linux-newbie-8/got-fatal-error-1236-log-event-entry-exceeded-max_allowed_packet-mysql-replication-839336/ -- message of 2011
.. and plenty in between.
Due to the lack of details and evidence, more often than not real causes of the failure remained suspected but not proved (disk failures, errors in binlog writing, external libraries, to name a few). So, we really need the logs to be able to give investigation a try.

Thank you.

Alexey Zilber (alexeyzilber) wrote :
Download full text (4.0 KiB)

Hi Elena,

  Really appreciate the analysis. The initial binlog offset that I started
on was correct (verified by mysqlbinlog), but the offset where it stopped
and had the error was incorrect. I don't have the original offset anymore
as I've wiped everything on the slave (to revert to 5.2), but I will
attempt again in a few days.

Thanks,
Alex

On Wed, Mar 14, 2012 at 7:02 PM, Elena Stepanova
<email address hidden>wrote:

> Hi Alex,
>
> The warning always appears in the slave error log when you have a master
> that does not recognize the variable; it does not normally break
> replication. I have 5.1.55 -> 5.3.5 replication setup, it does indeed
> produce the warning on startup, but continues working as expected.
>
> The query that you are talking about
> SET @master_binlog_checksum= @@global.binlog_checksum
> is generic, it is executed regardless of the variable value on slave. This
> way slave finds out whether master is capable of writing binlog checksums,
> and it helps the dump thread initialize properly.
>
> The fact that it happened the same second as the error that broke
> replication is irrelevant because, as I understand, it was the moment when
> your slave threads were initializing. I suppose right before the warning
> you had the record similar to
> [Note] Slave I/O thread: connected to master .... ,replication started in
> log
>
> That is, the IO thread connected (wrote the note), went through the
> handshake (wrote the warning), started reading binlog events and found a
> bad one, which caused the error.
>
> So, while without looking at the logs we cannot completely rule out any
> relation between the warning and the packet error, it seems rather
> unlikely. At the same time, regardless binlog checksums, the problem 'log
> event entry exceeded max_allowed_packet' has a long history in MySQL:
> http://bugs.mysql.com/bug.php?id=8215 -- bug of 2005
>
> http://www.linuxquestions.org/questions/linux-newbie-8/got-fatal-error-1236-log-event-entry-exceeded-max_allowed_packet-mysql-replication-839336/-- message of 2011
> .. and plenty in between.
> Due to the lack of details and evidence, more often than not real causes
> of the failure remained suspected but not proved (disk failures, errors in
> binlog writing, external libraries, to name a few). So, we really need the
> logs to be able to give investigation a try.
>
> Thank you.
>
> ** Bug watch added: MySQL Bug System #8215
> http://bugs.mysql.com/bug.php?id=8215
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/955057
>
> Title:
> MariaDB 5.1.55 -> 5.3.5 replication fail
>
> Status in Maria:
> New
>
> Bug description:
> Have a master+slave MariaDB 5.1.55 replicating using statement binlog
> format, all fine.
>
> Added MariaDB 5.3.5 slave, get this error:
>
>
> Last_IO_Error: Got fatal error 1236 from master when reading data from
> binary log: 'log event entry exceeded max_allowed_packet; Increase
> max_allowed_packet on master'
>
>
> This is unrelated to max packet. max_allowed_packet is 16M on mariadb
> 5.1.55. It was the same on 5.3.55. Attempted to increase it to 32M to no
> avail.
>
>...

Read more...

Elena Stepanova (elenst) on 2012-03-16
Changed in maria:
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.