possible sync issue using commit_id and backup to bring up a slave

Bug #766296 reported by Joe Daly on 2011-04-19
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

came from discussion with knielsen

This gets back into the prepare_commit_mutex that does not exist in the inno replication log. The basics of the bug are its possible based on where the commit_id is assigned that two unrelated transactions T1 and T2 write to disk in a different order (t2 then t1). If you were to take a backup at preciously the moment t2 wrote to disk but not t1 and prevision a slave with that backup the slave applier would assume the transaction to start at would be (t2 + 1) this would leave t1 un-applied on the slave.

Heres some IRC discussion in case my description is confusing

Apr 19 09:49:47 --> Ori (~Ori@75-149-135-141-Connecticut.hfc.comcastbusiness.net) has joined #drizzle
Apr 19 09:51:02 <knielsen> Shrews: yes, I noticed that in the code. What I didn't understand was if you ensure that this number is allocated in the same sequence as commits are written into the innodb transaction log
Apr 19 09:51:43 <knielsen> Shrews: from what you say, my guess is that you do not ensure this. Which can be the right solution depending on what you want, just trying to figure out how it works
Apr 19 09:52:01 --> HarrisonF (~hfisk@ has joined #drizzle
Apr 19 09:52:01 <-- HarrisonF has quit (Changing host)
Apr 19 09:52:01 --> HarrisonF (~hfisk@mysql/training/HarrisonF) has joined #drizzle
Apr 19 09:52:13 <knielsen> there are a number of advantages to relaxed ordering (between innodb transaction log and commit_id number)
Apr 19 09:52:16 <jdaly> knielsen: it would be possible that two unrelated commits could not be ordered, but related would be held up at a higher level before assignment of the commit_id
Apr 19 09:53:07 <knielsen> there are some disadvantates as well, of course :-) main ones I can think of is applications seeing state on slave that never existed on master, and xtrabackup taking a server snapshot that does not correspond to any commit_id number
Apr 19 09:53:23 <knielsen> jdaly: yes, I agree
Apr 19 09:53:58 <jdaly> why would xtrabackup take a backup that doesnt correspond to a commit_id number. Im naive in that area
Apr 19 09:54:00 <knielsen> jdaly: if two commits depend on one another, then the second cannot start commit until the first is done and releases row locks
Apr 19 09:54:19 <knielsen> jdaly: the issue is the following
Apr 19 09:55:07 <knielsen> jdaly: xtrabackup basically copies the innodb transaction log up to a certain point X, and the resulting backup gives a snapshot of the server at that point
Apr 19 09:55:41 <knielsen> jdaly: now suppose that we commit independent transactions T1 and T2. T1 is assigned commit_id 101 and T2 commit_id 102
Apr 19 09:56:03 <knielsen> jdaly: then thread scheduling just happens to work so that T2 is written into the innodb transaction log before T1.
Apr 19 09:56:26 <knielsen> jdaly: now if we take an xtrabackup at that exact moment, we may get a snapshot that has T2, but not T1
Apr 19 09:56:34 <knielsen> ... which is fine, as T1 and T2 are independent
Apr 19 09:56:51 <knielsen> jdaly: but now suppose we use this backup to provision a new slave
Apr 19 09:57:23 <jdaly> ok Im seeing what your talking about
Apr 19 09:57:35 <knielsen> jdaly: then we have the problem that we don't know which commit_id to start replication from? If we take 101, then we will duplicate T2. If we take 102 then we will be missing T1
Apr 19 09:58:01 <knielsen> jdaly: this problem is the sole reason (AFAIK) that innodb did the prepare_commit_mutex in MySQL, which killed group commit for >5 years :-(
Apr 19 09:59:44 <knielsen> jdaly: on the other hand, it is nice not to take an expensive lock and impose ordering for _every_ commit just for an issue that only occurs for one millisecond of the daily backup
Apr 19 10:00:01 <knielsen> (eg. one could maybe impose ordering only at that split second during backup)
Apr 19 10:02:34 <jdaly> knielsen: thanks for the details, Ill look at what would be involved in forcing order when doing a backup. The max commit_id is stored in the innospace as well, it may be possible to use that somewhere although off the top Im not sure how
Apr 19 10:02:35 <-- shinguz (~oli@67-28.3-85.cust.bluewin.ch) has left #drizzle
Apr 19 10:03:21 <LinuxJedi> surely if you are taking an xtrabackup of the slave then it would require a little logic but we could do a comparison of the replication log between the master and slave for missing transactions (or am I missing something?)
Apr 19 10:03:30 <LinuxJedi> since the replication log will be part of the backup
Apr 19 10:03:47 --> tjoneslo (~<email address hidden>) has joined #drizzle
Apr 19 10:05:47 <jdaly> LinuxJedi: you also could look for a missing commit_id in the replication log
Apr 19 10:05:57 <knielsen> yes, that would be one interesting way of doing it
Apr 19 10:06:06 <LinuxJedi> jdaly: yep :)
Apr 19 10:06:22 <knielsen> you just need some kind of checkpointing so you know how far back you have to look for missing commits
Apr 19 10:07:12 --- hartmut is now known as hartmut|jetlag
Apr 19 10:07:30 * LinuxJedi thinks a bug should probably be filed to document this
Apr 19 10:07:38 <knielsen> When I did the MariaDB group commit, I though a lot about how to do this issue. I ended up (in MariaDB) enforcing same commit order. But I just wondered what Drizzle was doing. I think it's also interesting to relax the order and then just deal with that on the slave or wherever
Apr 19 10:07:44 <LinuxJedi> (document this conversation so we know to fix it I mean)
Apr 19 10:07:59 <knielsen> so thanks for the info :-)
Apr 19 10:08:06 <jdaly> I can file a bug, it will be a couple hours
Apr 19 10:08:21 <LinuxJedi> knielsen: thanks for the feedback :)
Apr 19 10:08:35 * knielsen hopes Drizzle guys don't mind him asking the nasty questions that cause bugs to be filed, seems I've done that a couple of times now :)
Apr 19 10:08:48 <jdaly> knielsen: yes thanks much (again)

Joe Daly (skinny.moey) wrote :

left unassigned deliberately as this needs some bouncing around to come up with a workable solution, the amount of code to fix probably will end up being small

Changed in drizzle:
status: New → Confirmed
importance: Undecided → Medium
milestone: none → 2011-05-09
Changed in drizzle:
milestone: 2011-05-09 → 2011-05-23
Changed in drizzle:
milestone: 2011-05-23 → 2011-06-06
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers