Duplicate entry error for primary key following cluster size change
| Affects | Status | Importance | Assigned to | Milestone | ||
|---|---|---|---|---|---|---|
| MySQL patches by Codership | Status tracked in 5.6 | |||||
| | 5.5 |
Undecided
|
Unassigned | |||
| | 5.6 |
Undecided
|
Unassigned | |||
| Percona XtraDB Cluster | Status tracked in 5.6 | |||||
| | 5.5 |
Undecided
|
Alexey Kopytov | |||
| | 5.6 |
Undecided
|
Unassigned | |||
| | percona-cluster (Juju Charms Collection) |
Undecided
|
Mario Splivalo | |||
| | percona-xtradb-cluster-5.5 (Ubuntu) |
Undecided
|
Mario Splivalo | |||
| | Trusty |
Medium
|
Mario Splivalo | |||
Bug Description
[Impact]
* Duplicate entry errors on the primary key for tables occurs, data
can't be added to DB correctly and in time.
[Test Case]
* Prepare a .sql file full of INSERT queries:
INSERT INTO test.t VALUES(
that line million of times.
* Start a 3 node clusters.
* Load the data on node 1 (if you do several times in parallel you will get the error sooner): cat data.sql | mysql
* Restart node2 and wait until MySQL is running.
* Restart node3 and wait until MySQL is running.
* Repeat step 4 and 5 several times (sometimes lot of restarts are needed, sometimes just a few) and the node1 load will fail with duplicate key entry.
[Regression Potential]
* The patch to fix this issue is backported from a nearest version 5.6.
* There is failed build for powerpc architecture, but this was not introduced by this change - one can check that this has been like this in previous releases too:
https:/
If you expand previous versions (14.04.1 and 14.04.2) you can see that the build was failing for powerpc.
[Other Info]
* For Ubuntu SRU verification team :
https:/
We (PagerDuty) have experienced on multiple occasions duplicate entry errors on the primary key for some tables after our cluster's size changes. In the cases we have experienced so far, this is when gracefully adding or removing a node to vertically scale the nodes in the cluster. We don't experience a total failure of all transactions. Instead, a small subset error out. In some cases, the problem corrects itself, but several times now we have had the errors persist for several hours. When the problem persists, we have found that restarting one of the cluster members will sometimes fix the issue.
Restarting all transactions by rebooting the application does not fix the issue. Only a restart of a cluster member can eventually solve the problem.
Here are the packages we are running
ii percona-toolkit 2.2.7 Advanced MySQL and system command-line tools
ii percona-xtrabackup 2.1.9-744-1.lucid Open source backup tool for InnoDB and XtraDB
ii percona-
ii percona-
ii percona-
ii percona-
Here is an example error that we see.
Duplicate entry '623287' for key 'PRIMARY'
That error will be seen for multiple tables, and sometimes again for the same table.
Please let me know what additional information can help. The MySQL error log doesn't appear to have any interesting details in it (just node join and leave events) but I would be glad to pass it along if you think it would be helpful.
Related branches
- Raghavendra D Prabhu: Pending requested 2015-01-15
-
Diff: 65 lines (+24/-3)2 files modifiedsql/wsrep_hton.cc (+1/-1)
storage/innobase/handler/ha_innodb.cc (+23/-2)
- Billy Olsen: Approve on 2015-02-11
- Edward Hope-Morley: Pending requested 2015-02-11
- charmers: Pending requested 2015-02-11
-
Diff: 87 lines (+27/-14)3 files modifiedconfig.yaml (+12/-8)
hooks/percona_hooks.py (+4/-1)
templates/my.cnf (+11/-5)
- Edward Hope-Morley: Pending requested 2015-02-10
- charmers: Pending requested 2015-02-10
-
Diff: 87 lines (+27/-14)3 files modifiedconfig.yaml (+12/-8)
hooks/percona_hooks.py (+4/-1)
templates/my.cnf (+11/-5)
| Jay Janssen (jay-janssen) wrote : | #1 |
| Doug Barth (dougbarth) wrote : | #2 |
Hi Jay,
Yep, all our tables have the following definition for primary keys.
`id` int(11) NOT NULL AUTO_INCREMENT
wsrep_auto_
mysql> show global variables like 'wsrep_
+------
| Variable_name | Value |
+------
| wsrep_auto_
+------
I neglected to mention that we currently send all transactions to a single node in the cluster, so we can rule out writeset conflicts.
I posted about this issue the first time it happened on the Codership mailing list: https:/
The suggestion from then was that if we're sending transactions to a single server, we could disable auto increment control, but we would prefer to keep it on for a few reasons. First, we sometimes go multi-master to slowly move application traffic between cluster nodes. Additionally, we like the operational experience of having the cluster manage the increment and offset values when we add/remove nodes.
One other possible interesting setup for our cluster, we have wsrep_causal_reads enabled globally in our my.cnf file. I don't see how that would effect this case, but I know it's something makes us unique from other clusters that might exist.
Can you please share the my.cnf and table definition? I'm going to try to reproduce the problem and get a test case.
| Doug Barth (dougbarth) wrote : | #4 |
Hi Miguel,
I have attached a slightly redacted version of our my.cnf file. Also, the primary key issue affects a large portion of our tables when it happens, so I have attached the structure of one of the affected tables, but it isn't the only one.
CREATE TABLE `incidents_
`id` int(11) NOT NULL AUTO_INCREMENT,
`incident_id` int(11) NOT NULL,
`user_id` int(11) DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `index_
KEY `index_
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=
Hi.
Thanks for all the info you provided, I've been able to reproduce it with the following test case:
1- Prepare a .sql file full of INSERT queries:
INSERT INTO test.t VALUES(
that line million of times.
2- Start a 3 node clusters.
3- Load the data on node 1 (if you do several times in parallel you will get the error sooner):
cat data.sql | mysql
4- Restart node2 and wait until MySQL is running.
5- Restart node3 and wait until MySQL is running.
Repeat step 4 and 5 several times (sometimes lot of restarts are needed, sometimes just a few) and the node1 load will fail with duplicate key entry.
I used your my.cnf.
Also confirmed with 5.6.
| Seppo Jaakola (seppo-jaakola) wrote : | #7 |
This bug is being worked on in upstream bug tracker: https:/
https:/
| Doug Barth (dougbarth) wrote : | #9 |
Hey all,
I'm glad to see that this bug was addressed in the latest 5.6 release. Do you know if the fix will be backported to 5.5? I see this bug is filed under the future-5.5 milestone, but I don't know if that means it will necessarily be addressed.
FWIW, we would very much appreciate it being backported as I don't foresee us upgrading to 5.6 in the near future, and we're hitting this bug again when removing nodes from a recently scaled cluster.
| Changed in percona-server-5.5 (Ubuntu): | |
| assignee: | nobody → Yaguang Tang (heut2008) |
| no longer affects: | percona-server-5.5 (Ubuntu) |
| tags: | added: cts |
| description: | updated |
| Yaguang Tang (heut2008) wrote : | #10 |
the two commits in upstream fix this issue
https:/
https:/
| Yaguang Tang (heut2008) wrote : | #11 |
| Changed in percona-cluster (Juju Charms Collection): | |
| assignee: | nobody → Edward Hope-Morley (hopem) |
| status: | New → In Progress |
| Mario Splivalo (mariosplivalo) wrote : | #12 |
I was able to reproduce this bug quite easily. One has to deploy percona-
mysql> CREATE DATABASE pktest;
mysql> CREATE TABLE t1 (id INTEGER PRIMARY KEY AUTO_INCREMENT, datafield VARCHAR(20));
Adjust grants for pktest database to reflect your environment and then connect to one of the nodes, firing up 20 or more simultaneous load of INSERT statements:
$> for i in {1..20}; do (yes "INSERT INTO t1 (datafield) VALUES ('mario');" | mysql -h 10.0.0.10 pktest) & done
As soon as you stop any of the nodes, primary key violation will occur.
I confirmed this to fail on percona-xtradb-5.5, and I have confirmed it is fixed in percona-xtradb-5.6.
I also confirmed that the workaround mentioned earlier (https:/
| Doug Barth (dougbarth) wrote : | #13 |
Hi Mario,
I'm fairly certain that issuing FLUSH TABLES on all nodes in the cluster is also a workaround to fix these errors after they have started. It won't prevent the problem from occurring, but it might be useful if you need to keep wsrep_auto_
My theory (somewhat validated by the 5.6 patch) is that the problem corrects itself with the next statement using a cached table. That could take a while if you have a large table cache setting. FLUSH TABLES will therefore clear that cache and speed the process along.
| Mario Splivalo (mariosplivalo) wrote : | #14 |
Unfortunately we're not always in control of the client apps connecting to the percona cluster so we can't implement workaround in the client code.
The wsrep_drupal_
| Launchpad Janitor (janitor) wrote : | #15 |
Status changed to 'Confirmed' because the bug affects multiple users.
| Changed in percona-xtradb-cluster-5.5 (Ubuntu): | |
| status: | New → Confirmed |
| affects: | percona-cluster (Ubuntu) → percona-xtradb-cluster-5.5 (Ubuntu) |
| Changed in percona-xtradb-cluster-5.5 (Ubuntu): | |
| status: | New → Confirmed |
| Mario Splivalo (mariosplivalo) wrote : | #16 |
I have created a debdiff for this issue.
I've also uploaded the patched debian packages to ppa:mariospliva
1. create percona-cluster using juju (juju deploy percona-cluster -n3)
2. run the test against the installed cluster - the test will fail
3. add ppa:mariospliva
4. run the test again, it will pass.
| Changed in percona-cluster (Juju Charms Collection): | |
| assignee: | Edward Hope-Morley (hopem) → Mario Splivalo (mariosplivalo) |
| milestone: | none → 15.04 |
| status: | In Progress → Fix Released |
| Changed in percona-xtradb-cluster-5.5 (Ubuntu): | |
| assignee: | nobody → Mario Splivalo (mariosplivalo) |
| Changed in percona-xtradb-cluster-5.5 (Ubuntu Trusty): | |
| assignee: | nobody → Mario Splivalo (mariosplivalo) |
| status: | New → Confirmed |
| Changed in percona-xtradb-cluster-5.5 (Ubuntu): | |
| status: | Confirmed → In Progress |
| Changed in percona-xtradb-cluster-5.5 (Ubuntu Trusty): | |
| status: | Confirmed → In Progress |
| ChristianEhrhardt (paelzer) wrote : | #17 |
Hi Mario,
this seems to be punted for way too long.
But just recently you updated the status which is a lifesign.
What about the Trusty portion of that - will you rebase and attach an new one here?
| tags: | added: sts sts-sru-needed |
| Mario Splivalo (mariosplivalo) wrote : | #18 |
Hi, Cristian!
Yes, I will re-test my patch and re-submit it shortly.
| Changed in percona-xtradb-cluster-5.5 (Ubuntu Trusty): | |
| importance: | Undecided → Medium |
| Mario Splivalo (mariosplivalo) wrote : | #19 |
This fixes the duplicate primary key bug, as described.
| Mario Splivalo (mariosplivalo) wrote : | #20 |
| Eric Desrochers (slashd) wrote : | #21 |
[Note to SRU verification team]
For this Trusty SRU, we need to jump from "5.5.37-
"5.5.37-
https:/
| Eric Desrochers (slashd) wrote : | #22 |
Uploaded to Trusty upload queue.
- Eric
| description: | updated |
| tags: | added: sts-sponsor-done |
Hello Doug, or anyone else affected,
Accepted percona-
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
| Changed in percona-xtradb-cluster-5.5 (Ubuntu Trusty): | |
| status: | In Progress → Fix Committed |
| tags: | added: verification-needed |
| description: | updated |
| Mario Splivalo (mariosplivalo) wrote : | #24 |
Hello, Łukasz!
I have verified that package which is in trusty-proposed fixes the issue.
I merely repeated the test case explained here: https:/
1. I used juju to install a 3 node PXC (Percona Xtradb Cluster): juju deploy cs:trusty/
2. I created the test database and the test table in it:
mysql> CREATE DATABASE pktest;
mysql> CREATE TABLE pktest.t1 (id INTEGER PRIMARY KEY AUTO_INCREMENT, datafield VARCHAR(20));
3. I created a 'ubuntu' user which is able to connect to all the databases from any of the hosts:
mysql> GRANT ALL PRIVILEGES ON *.* TO ubuntu@'%';
4. From the separate node I run this:
$> for i in {1..20}; do (yes "INSERT INTO t1 (datafield) VALUES ('mario');" | mysql -u ubuntu -h 10.0.0.10 pktest) & done
While the INSERTs where running I restarted one of the nodes - the primary key violation occurred instantly.
5. I upgraded my PXC cluster, running this on every node:
# echo 'deb http://
# apt-get update
# apt-get dist-upgrade
6. After the cluster was completely upgraded and I confirmed it is running fine I repeated step (4):
$> for i in {1..20}; do (yes "INSERT INTO t1 (datafield) VALUES ('mario');" | mysql -u ubuntu -h 10.0.0.10 pktest) & done
While INSERTs were running I restarted one of the nodes. This time no error occurred and INSERTs kept on going.
From that I concluded that the package in trusty-proposed fixes the issue.
| tags: |
added: verification-done removed: verification-needed |
| tags: |
added: verification-done-trusty removed: verification-done |
| Launchpad Janitor (janitor) wrote : | #25 |
This bug was fixed in the package percona-
---------------
percona-
* d/p/fix_
on cluster change' (LP: #1366997)
-- Mario Splivalo <email address hidden> Mon, 19 Jun 2017 12:03:27 +0200
| Changed in percona-xtradb-cluster-5.5 (Ubuntu Trusty): | |
| status: | Fix Committed → Fix Released |
The verification of the Stable Release Update for percona-
The attachment "fix-duplicate-
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]
| tags: | added: patch |
| tags: |
added: sts-sru-done removed: sts-sru-needed |
| Changed in percona-xtradb-cluster-5.5 (Ubuntu): | |
| status: | In Progress → Fix Released |


Hey Doug, increment and auto_increment_ offset settings depending on the cluster size to avoid such collisions as these (assuming you have wsrep_auto_ increment_ control = ON, which is the default: http:// www.percona. com/doc/ percona- xtradb- cluster/ 5.5/wsrep- system- index.html# wsrep_auto_ increment_ control). I'm thinking this is a bug related to that.
Can you confirm the PRIMARY KEY in this case is an auto-increment? If so, PXC modifies the auto_increment_