Duplicate entry error for primary key following cluster size change

Bug #1366997 reported by Doug Barth on 2014-09-08
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Status tracked in 5.6
5.5
Undecided
Unassigned
5.6
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Fix Released
Undecided
Alexey Kopytov
5.6
Fix Released
Undecided
Unassigned
percona-cluster (Juju Charms Collection)
Undecided
Mario Splivalo
percona-xtradb-cluster-5.5 (Ubuntu)
Undecided
Mario Splivalo
Trusty
Medium
Mario Splivalo

Bug Description

[Impact]

 * Duplicate entry errors on the primary key for tables occurs, data
   can't be added to DB correctly and in time.

[Test Case]

 * Prepare a .sql file full of INSERT queries:

   INSERT INTO test.t VALUES(NULL,"miguel");

   that line million of times.

 * Start a 3 node clusters.

 * Load the data on node 1 (if you do several times in parallel you will get the error sooner): cat data.sql | mysql

 * Restart node2 and wait until MySQL is running.

 * Restart node3 and wait until MySQL is running.

 * Repeat step 4 and 5 several times (sometimes lot of restarts are needed, sometimes just a few) and the node1 load will fail with duplicate key entry.

[Regression Potential]

 * The patch to fix this issue is backported from a nearest version 5.6.

 * There is failed build for powerpc architecture, but this was not introduced by this change - one can check that this has been like this in previous releases too:

https://launchpad.net/ubuntu/+source/percona-xtradb-cluster-5.5

If you expand previous versions (14.04.1 and 14.04.2) you can see that the build was failing for powerpc.

Buildlog:
https://launchpadlibrarian.net/324600505/buildlog_ubuntu-trusty-powerpc.percona-xtradb-cluster-5.5_5.5.37-25.10+dfsg-0ubuntu0.14.04.4_BUILDING.txt.gz

[Other Info]

* For Ubuntu SRU verification team :
https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1366997/comments/21

We (PagerDuty) have experienced on multiple occasions duplicate entry errors on the primary key for some tables after our cluster's size changes. In the cases we have experienced so far, this is when gracefully adding or removing a node to vertically scale the nodes in the cluster. We don't experience a total failure of all transactions. Instead, a small subset error out. In some cases, the problem corrects itself, but several times now we have had the errors persist for several hours. When the problem persists, we have found that restarting one of the cluster members will sometimes fix the issue.

Restarting all transactions by rebooting the application does not fix the issue. Only a restart of a cluster member can eventually solve the problem.

Here are the packages we are running

ii percona-toolkit 2.2.7 Advanced MySQL and system command-line tools
ii percona-xtrabackup 2.1.9-744-1.lucid Open source backup tool for InnoDB and XtraDB
ii percona-xtradb-cluster-client-5.5 5.5.37-25.10-756.lucid Percona XtraDB Cluster database client binaries
ii percona-xtradb-cluster-common-5.5 5.5.37-25.10-756.lucid Percona XtraDB Cluster database common files (e.g. /e
ii percona-xtradb-cluster-galera-2.x 188.lucid Galera components of Percona XtraDB Cluster
ii percona-xtradb-cluster-server-5.5 5.5.37-25.10-756.lucid Percona XtraDB Cluster database server binaries

Here is an example error that we see.

Duplicate entry '623287' for key 'PRIMARY'

That error will be seen for multiple tables, and sometimes again for the same table.

Please let me know what additional information can help. The MySQL error log doesn't appear to have any interesting details in it (just node join and leave events) but I would be glad to pass it along if you think it would be helpful.

Related branches

Jay Janssen (jay-janssen) wrote :

Hey Doug,
  Can you confirm the PRIMARY KEY in this case is an auto-increment? If so, PXC modifies the auto_increment_increment and auto_increment_offset settings depending on the cluster size to avoid such collisions as these (assuming you have wsrep_auto_increment_control = ON, which is the default: http://www.percona.com/doc/percona-xtradb-cluster/5.5/wsrep-system-index.html#wsrep_auto_increment_control). I'm thinking this is a bug related to that.

Doug Barth (dougbarth) wrote :

Hi Jay,

Yep, all our tables have the following definition for primary keys.

  `id` int(11) NOT NULL AUTO_INCREMENT

wsrep_auto_increment_control is left at the default (ON)

mysql> show global variables like 'wsrep_auto_increment_control';
+------------------------------+-------+
| Variable_name | Value |
+------------------------------+-------+
| wsrep_auto_increment_control | ON |
+------------------------------+-------+

I neglected to mention that we currently send all transactions to a single node in the cluster, so we can rule out writeset conflicts.

I posted about this issue the first time it happened on the Codership mailing list: https://groups.google.com/forum/#!topic/codership-team/lDRB9Z6swV8

The suggestion from then was that if we're sending transactions to a single server, we could disable auto increment control, but we would prefer to keep it on for a few reasons. First, we sometimes go multi-master to slowly move application traffic between cluster nodes. Additionally, we like the operational experience of having the cluster manage the increment and offset values when we add/remove nodes.

One other possible interesting setup for our cluster, we have wsrep_causal_reads enabled globally in our my.cnf file. I don't see how that would effect this case, but I know it's something makes us unique from other clusters that might exist.

Can you please share the my.cnf and table definition? I'm going to try to reproduce the problem and get a test case.

Doug Barth (dougbarth) wrote :

Hi Miguel,

I have attached a slightly redacted version of our my.cnf file. Also, the primary key issue affects a large portion of our tables when it happens, so I have attached the structure of one of the affected tables, but it isn't the only one.

CREATE TABLE `incidents_acknowledgements` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `incident_id` int(11) NOT NULL,
  `user_id` int(11) DEFAULT NULL,
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `index_incidents_acknowledgements_on_user_id` (`user_id`),
  KEY `index_incidents_acknowledgements_on_incident_id_and_created_at` (`incident_id`,`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC

Hi.

Thanks for all the info you provided, I've been able to reproduce it with the following test case:

1- Prepare a .sql file full of INSERT queries:

INSERT INTO test.t VALUES(NULL,"miguel");

that line million of times.

2- Start a 3 node clusters.

3- Load the data on node 1 (if you do several times in parallel you will get the error sooner):

cat data.sql | mysql

4- Restart node2 and wait until MySQL is running.

5- Restart node3 and wait until MySQL is running.

Repeat step 4 and 5 several times (sometimes lot of restarts are needed, sometimes just a few) and the node1 load will fail with duplicate key entry.

I used your my.cnf.

Also confirmed with 5.6.

Seppo Jaakola (seppo-jaakola) wrote :

This bug is being worked on in upstream bug tracker: https://github.com/codership/mysql-wsrep/issues/18

https://github.com/codership/mysql-wsrep/issues/18#issuecomment-63890539 has the workaround for this issue in terms of wsrep_retry_autocommit and wsrep_drupal_282555_workaround

Doug Barth (dougbarth) wrote :

Hey all,

I'm glad to see that this bug was addressed in the latest 5.6 release. Do you know if the fix will be backported to 5.5? I see this bug is filed under the future-5.5 milestone, but I don't know if that means it will necessarily be addressed.

FWIW, we would very much appreciate it being backported as I don't foresee us upgrading to 5.6 in the near future, and we're hitting this bug again when removing nodes from a recently scaled cluster.

Yaguang Tang (heut2008) on 2014-12-02
Changed in percona-server-5.5 (Ubuntu):
assignee: nobody → Yaguang Tang (heut2008)
no longer affects: percona-server-5.5 (Ubuntu)
tags: added: cts
Yaguang Tang (heut2008) on 2014-12-05
description: updated
Yaguang Tang (heut2008) wrote :
Changed in percona-cluster (Juju Charms Collection):
assignee: nobody → Edward Hope-Morley (hopem)
status: New → In Progress
Mario Splivalo (mariosplivalo) wrote :

I was able to reproduce this bug quite easily. One has to deploy percona-xtradb-cluster with 3 nodes (one can use juju with local provider to speed up deployment - once juju with local provider (https://juju.ubuntu.com/docs/config-LXC.html) has been set up just do: juju deploy percona-cluster -n 3). Then set up this database scheme:

mysql> CREATE DATABASE pktest;
mysql> CREATE TABLE t1 (id INTEGER PRIMARY KEY AUTO_INCREMENT, datafield VARCHAR(20));

Adjust grants for pktest database to reflect your environment and then connect to one of the nodes, firing up 20 or more simultaneous load of INSERT statements:

$> for i in {1..20}; do (yes "INSERT INTO t1 (datafield) VALUES ('mario');" | mysql -h 10.0.0.10 pktest) & done

As soon as you stop any of the nodes, primary key violation will occur.

I confirmed this to fail on percona-xtradb-5.5, and I have confirmed it is fixed in percona-xtradb-5.6.

I also confirmed that the workaround mentioned earlier (https://github.com/codership/mysql-wsrep/issues/18#issuecomment-63890539) fixes the issue in percona-xtradb-5.5.

Doug Barth (dougbarth) wrote :

Hi Mario,

I'm fairly certain that issuing FLUSH TABLES on all nodes in the cluster is also a workaround to fix these errors after they have started. It won't prevent the problem from occurring, but it might be useful if you need to keep wsrep_auto_increment_control ON and you don't use autocommit.

My theory (somewhat validated by the 5.6 patch) is that the problem corrects itself with the next statement using a cached table. That could take a while if you have a large table cache setting. FLUSH TABLES will therefore clear that cache and speed the process along.

Mario Splivalo (mariosplivalo) wrote :

Unfortunately we're not always in control of the client apps connecting to the percona cluster so we can't implement workaround in the client code.

The wsrep_drupal_282555_workaround + wsrep_retry_autocommit seem to be working for time being, so until fixed 5.5 is released we can use the workaround.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

affects: percona-cluster (Ubuntu) → percona-xtradb-cluster-5.5 (Ubuntu)
Changed in percona-xtradb-cluster-5.5 (Ubuntu):
status: New → Confirmed
Changed in percona-xtradb-cluster-5.5 (Ubuntu):
status: New → Confirmed
Mario Splivalo (mariosplivalo) wrote :

I have created a debdiff for this issue.
I've also uploaded the patched debian packages to ppa:mariosplivalo/pxc, and verified that the issue is fixed:

1. create percona-cluster using juju (juju deploy percona-cluster -n3)
2. run the test against the installed cluster - the test will fail
3. add ppa:mariosplivalo/pxc to all three units, upgrade mysql (one by one)
4. run the test again, it will pass.

Changed in percona-cluster (Juju Charms Collection):
assignee: Edward Hope-Morley (hopem) → Mario Splivalo (mariosplivalo)
milestone: none → 15.04
status: In Progress → Fix Released
Changed in percona-xtradb-cluster-5.5 (Ubuntu):
assignee: nobody → Mario Splivalo (mariosplivalo)
Changed in percona-xtradb-cluster-5.5 (Ubuntu Trusty):
assignee: nobody → Mario Splivalo (mariosplivalo)
status: New → Confirmed
Changed in percona-xtradb-cluster-5.5 (Ubuntu):
status: Confirmed → In Progress
Changed in percona-xtradb-cluster-5.5 (Ubuntu Trusty):
status: Confirmed → In Progress

Hi Mario,
this seems to be punted for way too long.
But just recently you updated the status which is a lifesign.

What about the Trusty portion of that - will you rebase and attach an new one here?

tags: added: sts sts-sru-needed
Mario Splivalo (mariosplivalo) wrote :

Hi, Cristian!

Yes, I will re-test my patch and re-submit it shortly.

Eric Desrochers (slashd) on 2017-06-13
Changed in percona-xtradb-cluster-5.5 (Ubuntu Trusty):
importance: Undecided → Medium
Mario Splivalo (mariosplivalo) wrote :

This fixes the duplicate primary key bug, as described.

Eric Desrochers (slashd) wrote :

[Note to SRU verification team]

For this Trusty SRU, we need to jump from "5.5.37-25.10+dfsg-0ubuntu0.14.04.2" to "5.5.37-25.10+dfsg-0ubuntu0.14.04.4".

"5.5.37-25.10+dfsg-0ubuntu0.14.04.3" has been burned by a patch that has been dropped in another bug :
https://bugs.launchpad.net/ubuntu/+source/percona-xtradb-cluster-5.5/+bug/1657256/comments/31

Eric Desrochers (slashd) wrote :

Uploaded to Trusty upload queue.

 - Eric

description: updated
tags: added: sts-sponsor-done

Hello Doug, or anyone else affected,

Accepted percona-xtradb-cluster-5.5 into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/percona-xtradb-cluster-5.5/5.5.37-25.10+dfsg-0ubuntu0.14.04.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in percona-xtradb-cluster-5.5 (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Eric Desrochers (slashd) on 2017-06-19
description: updated
Mario Splivalo (mariosplivalo) wrote :

Hello, Łukasz!

I have verified that package which is in trusty-proposed fixes the issue.

I merely repeated the test case explained here: https://bugs.launchpad.net/ubuntu/+source/percona-xtradb-cluster-5.5/+bug/1366997/comments/12, but instead of using my personal PPA i used trusty-proposed pocket:

1. I used juju to install a 3 node PXC (Percona Xtradb Cluster): juju deploy cs:trusty/percona-cluster -n3

2. I created the test database and the test table in it:
mysql> CREATE DATABASE pktest;
mysql> CREATE TABLE pktest.t1 (id INTEGER PRIMARY KEY AUTO_INCREMENT, datafield VARCHAR(20));

3. I created a 'ubuntu' user which is able to connect to all the databases from any of the hosts:
mysql> GRANT ALL PRIVILEGES ON *.* TO ubuntu@'%';

4. From the separate node I run this:
$> for i in {1..20}; do (yes "INSERT INTO t1 (datafield) VALUES ('mario');" | mysql -u ubuntu -h 10.0.0.10 pktest) & done

While the INSERTs where running I restarted one of the nodes - the primary key violation occurred instantly.

5. I upgraded my PXC cluster, running this on every node:
# echo 'deb http://archive.ubuntu.com/ubuntu/ trusty-proposed restricted main multiverse universe' >> /etc/apt/sources.list
# apt-get update
# apt-get dist-upgrade

6. After the cluster was completely upgraded and I confirmed it is running fine I repeated step (4):
$> for i in {1..20}; do (yes "INSERT INTO t1 (datafield) VALUES ('mario');" | mysql -u ubuntu -h 10.0.0.10 pktest) & done

While INSERTs were running I restarted one of the nodes. This time no error occurred and INSERTs kept on going.

From that I concluded that the package in trusty-proposed fixes the issue.

tags: added: verification-done
removed: verification-needed
tags: added: verification-done-trusty
removed: verification-done
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package percona-xtradb-cluster-5.5 - 5.5.37-25.10+dfsg-0ubuntu0.14.04.4

---------------
percona-xtradb-cluster-5.5 (5.5.37-25.10+dfsg-0ubuntu0.14.04.4) trusty; urgency=medium

  * d/p/fix_duplicate_primary_key.patch: Fix 'duplicate entry for primary key
    on cluster change' (LP: #1366997)

 -- Mario Splivalo <email address hidden> Mon, 19 Jun 2017 12:03:27 +0200

Changed in percona-xtradb-cluster-5.5 (Ubuntu Trusty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for percona-xtradb-cluster-5.5 has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

The attachment "fix-duplicate-primary-key.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Eric Desrochers (slashd) on 2017-06-29
tags: added: sts-sru-done
removed: sts-sru-needed
Changed in percona-xtradb-cluster-5.5 (Ubuntu):
status: In Progress → Fix Released

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1731

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments