pt-heartbeat --update --replace causes duplicate key error

Bug #1004567 reported by Mark W
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Medium
Daniel Nichter
1.0
Won't Fix
Undecided
Unassigned
2.0
Fix Committed
Medium
Daniel Nichter
2.1
Fix Released
Medium
Daniel Nichter

Bug Description

I am using pt-heartbeat with MySQL that is configured in a HA setup.
 LoadBalancer(VIP)
              |
M1(RW)<--->M2(RW)
|
          +----+----+
S1(R) S2(R)
I am using pt-heartbeat version 1.0.1

I send out a heartbeat every two minutes to the active writer via cron
to the load balancer Virtual IP Address.

*/2 * * * * /usr/bin/pt-heartbeat -D test --update --replace --create-
table -h LBVIP -P LBPORT --run-time 1

When I do a failover I get duplicate key errors on the secondary
master M2. On failback i get duplicate key error on Master M1.

Last_Error: Error 'Duplicate entry '8806' for key 'PRIMARY'' on query.
Default database: 'test'. Query: 'INSERT INTO `test`.`heartbeat`
(server_id, ts) VALUES ('8806', NOW())'

I am using the --update --replace options but i believe based on my
replication errors that it is doing an insert not a replace.

Am i hitting a bug with pt-heartbeat or do i have to change my method
of sending heartbeat?

Related branches

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

I can't reproduce this with the latest version of pt-heartbeat, 2.1.1. The binary log event is as follows:

REPLACE INTO `test`.`heartbeat` (ts, server_id, file, position, relay_master_log_file, exec_master_log_pos) VALUES ('2012-05-25T13:16:38.001640', '12345', 'mysql-bin.000001', '3254215', NULL, NULL)

I just noticed that you're not using the latest version in the 1.0 series. Can you either upgrade to the latest release (and there will be a new one in a few days) or upgrade to the latest 1.0 release?

Revision history for this message
Mark W (deviant-dolphin) wrote :

Last night i had another HA Failover and Fail Back.
Upon Fail Back i got the same type of error.

Default database: 'test'. Query: 'INSERT INTO `test`.`heartbeat`
(server_id, ts) VALUES ('8006', NOW())'

I reviewed my binary log and indeed most of the time pt-heartbeat is sending a replace
but it occasionally does do an insert.

In the past I ran into this very same problem and i created a secondary cron job that deletes a row from the table that matches the current HA primary nodes server-id. In retrospect this would not really help because once replication brakes its to late.

So my heartbeat goes out every 2 minutes

*/2 * * * * nagios /usr/bin/pt-heartbeat --defaults-file /home/nagios/.my.cnf -D test --update --replace --create-table -h LBVIP -P LBPORT --run-time 1

Then i had a cron job deleting the row it inserted every 10 minutes. ( as of today i am no longer running this job)
*/10 * * * * nagios mysql -h LBVIP -P LBPORT test -e "DELETE FROM test.heartbeat WHERE server_id <> LBSERVERID "

I think the problem is that when table becomes empty the --create-table option fires off an insert , which does not respect the --update --replace parameters. I am not sure why this breaks replication because the slaves should also have an empty heartbeat table at this point.

I have attached the relevant portions my binlog from last night.

--

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Have you tried with the newest version of Percona Toolkit, or are you still looking at the old version?

Revision history for this message
Mark W (deviant-dolphin) wrote :

I updated pt-heartbeat to 2.1.1 but i have not failed over to test.
I also stop the cron job that runs every 10 minutes that cleans up the table.

I have no planned maintenance at this time but something always comes up.
I will report back in on my next failover which probably happen sometime in next 30 days.

tags: added: crash duplicate-key pt-heartbeat
Changed in percona-toolkit:
status: New → Triaged
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

I believe the problem may be caused by this bit of code:

3586 my $row = $dbh->selectall_arrayref($sql);
3587 if ( scalar @$row == 0 ) {
3588 PTDEBUG && _d('No heartbeat row in table');
3589 if ( $o->get('insert-heartbeat-row') ) {
3590 my $sql = "INSERT INTO $db_tbl ($pk_col, ts) "
3591 . "VALUES ('$pk_val', NOW())";
3592 PTDEBUG && _d($sql);
3593 $dbh->do($sql);
3594 }

This doesn't respect the REPLACE options.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

See also bug 885938.

Changed in percona-toolkit:
milestone: none → 2.1.3
Changed in percona-toolkit:
importance: Undecided → High
importance: High → Medium
Changed in percona-toolkit:
status: Triaged → In Progress
assignee: nobody → Daniel Nichter (daniel-nichter)
summary: - pt-heartbeat --update --replace is causing duplicate key errors
+ pt-heartbeat --update --replace causes duplicate key error
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

WontFix 1.0, just the fix but not the test in 2.0 (because the 2.1 test suite is different than 2.0), fix and test in 2.1.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-536

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.