Bug #1595394 “2.4 supports TokuDB backup” : Series 2.4 : Bugs : Percona XtraBackup moved to https://jira.percona.com/projects/PXB

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2016-06-23:

#1

Hi!

The patch seems to turn off checkpointing for the entire time of copying of tokudb data. It is not recommended (in manual) to turn off checkpoints for a long period of time. Also, does it prevent tokudb from writing to logs and data files?

We need George Lorch to take a look.

Revision history for this message

bohutang (overred-shuttler) wrote on 2016-06-24:

#2

Hello Sergei,

We must acquire the checkpoint lock for the whole time of copying tokudb files, the purpose is to get a snapshot(in last checkpoint state) for all files(logs and data files).
The effects(only) we take checkpoint lock is that more and more redo-logs until the copying end.
This method has been used in our production(mysql-5.6+xtrabackup 2.2) for a long time, it works very well.

Revision history for this message

bohutang (overred-shuttler) wrote on 2016-06-24:

#3

Hello,

I re-pull this patch to github, and updated this bug description.
https://github.com/percona/percona-xtrabackup/pull/218/commits/b43ba1b25deff2ba21eb9030b49413e1851f263b

description:

updated

Revision history for this message

George Ormond Lorch III (gl-az) wrote on 2016-06-27:

#4

Download full text (3.7 KiB)

The basic idea is not a bad one, but it has problems as discussed in another email chain. Here are some of my current thoughts on implementing this idea and will need buy in from EMT due to the amount of time it might take to fully implement this and make is production ready up to current XtraBackup standards.

Disabling checkpointing makes this a not-so-hot backup solution and has several potential drawbacks:
1) If backing up a very large data set (we have seen InnoDB backups that run for days) and checkpointing is disabled that entire time, the first checkpoint after re-enabling will be extremely painful and have a definite impact on client performance.
2) If backing up a very large data set, the recovery log can grow to extreme size, possibly causing disk pressure. Since this implementation copies the recovery log under FTWRL after the file copy, it can cause severe server stall if it is large.
3) If a crash in the server were to happen with checkpoint disabled and a large recovery log, recovery can take an extreme amount of time.
4) If a crash in XtraBackup were to happen after checkpoints are disabled, they will remain disabled until someone notices and re-enables them.
5) PerconaFT _will_ continue to write to data files with checkpointing disabled as cache pressure evicts dirty nodes (existing node copies will not be overwritten), what will not happen though is the block header map rotation. For an excessively long backup, this can cause the TokuDB data files to grow to over 2x the actual needed disk size due to having two copies of the nodes within the data files.
6) XtraBackup does page level validation (checksumming) during the backup, this patch does no such thing for TokuDB and possibly should.

Some of these can be made less painful:
1) Not much can be done to address this, checkpointing in general needs to be made fuzzy and not sharp.
2) Copy the recovery log in parallel with the backup similar to how InnoDB REDO log is copied and finalize the copy only under FTWRL.
3) Nothing can be done to address this.
4) Work on the server to re-enable checkpointing when a session disconnects if that session disabled checkpoints, perhaps use some bakup specific semantics for this instead of the current simple global option.
5) Nothing can be done to address this.
6) Can be implemented with some effort.

Things that also need consideration for a proper production ready implementation:
1) XtraBackup would need to go to the lengths to validate that TokuDB is installed and the plugin running before enabling the TokuDB portion of the backup functionality.
2) Incrementals have to be handled, either disallowing them if TokuDB is enabled and running and maybe adding a new option to bypass this (--incremental-innodb-only or something), or giving some kind of strong warning, or just skipping TokuDB. Whatever is chosen would need documented very well, and we know that users don't read documentation and might be surprised if they did a successful incremental backup of an InnoDB + TokuDB server but restored only to find that the TokuDB was not part of the incremental. This is a can of worms.
3) Compression and encryption, does it make sense for a user to...

The basic idea is not a bad one, but it has problems as discussed in another email chain. Here are some of my current thoughts on implementing this idea and will need buy in from EMT due to the amount of time it might take to fully implement this and make is production ready up to current XtraBackup standards.

Disabling checkpointing makes this a not-so-hot backup solution and has several potential drawbacks:
1) If backing up a very large data set (we have seen InnoDB backups that run for days) and checkpointing is disabled that entire time, the first checkpoint after re-enabling will be extremely painful and have a definite impact on client performance.
2) If backing up a very large data set, the recovery log can grow to extreme size, possibly causing disk pressure. Since this implementation copies the recovery log under FTWRL after the file copy, it can cause severe server stall if it is large.
3) If a crash in the server were to happen with checkpoint disabled and a large recovery log, recovery can take an extreme amount of time.
4) If a crash in XtraBackup were to happen after checkpoints are disabled, they will remain disabled until someone notices and re-enables them.
5) PerconaFT _will_ continue to write to data files with checkpointing disabled as cache pressure evicts dirty nodes (existing node copies will not be overwritten), what will not happen though is the block header map rotation. For an excessively long backup, this can cause the TokuDB data files to grow to over 2x the actual needed disk size due to having two copies of the nodes within the data files.
6) XtraBackup does page level validation (checksumming) during the backup, this patch does no such thing for TokuDB and possibly should.

Some of these can be made less painful:
1) Not much can be done to address this, checkpointing in general needs to be made fuzzy and not sharp.
2) Copy the recovery log in parallel with the backup similar to how InnoDB REDO log is copied and finalize the copy only under FTWRL.
3) Nothing can be done to address this.
4) Work on the server to re-enable checkpointing when a session disconnects if that session disabled checkpoints, perhaps use some bakup specific semantics for this instead of the current simple global option.
5) Nothing can be done to address this.
6) Can be implemented with some effort.

Things that also need consideration for a proper production ready implementation:
1) XtraBackup would need to go to the lengths to validate that TokuDB is installed and the plugin running before enabling the TokuDB portion of the backup functionality.
2) Incrementals have to be handled, either disallowing them if TokuDB is enabled and running and maybe adding a new option to bypass this (--incremental-innodb-only or something), or giving some kind of strong warning, or just skipping TokuDB. Whatever is chosen would need documented very well, and we know that users don't read documentation and might be surprised if they did a successful incremental backup of an InnoDB + TokuDB server but restored only to find that the TokuDB was not part of the incremental. This is a can of worms.
3) Compression and encryption, does it make sense for a user to compress TokuDB if TokuDB tables are compressed? This will just burn CPU and possibly result in larger backup than the original data.
4) Restoration, since TokuDB would not support incrementals, the --apply-logs-only would have no effect, but --apply-logs might still be useful to fully replay recovery logs in staging before dropping in to live server.
5) --copy-back functionality needs extending to TokuDB, but it might already 'just work'.

There will definitely be other things that come up where some XB specific InnoDB feature is incompatible with TokuDB that will need addressing.

Revision history for this message

bohutang (overred-shuttler) wrote on 2016-06-28:

#5

The less painful is very complex and this feature is what we need(but now, it will be a long time with a lot of work), this maybe xtrabackup issues, someday we want to support MyRocks in xtrabackup, how we do it?
To your drawbacks:
1) The checkpoint which after re-enabling is a normal checkpoint as we do periodically.The difference is it needs to remove more redo-logs, and the CP(1) is very fast.
2) under FTWRL(it's LOCK TABLES FOR BACKUP in PS5.7), there are just a few TokuDB redo-logs to copy(CP(1)), it would be better if we copying them in parallel.
3) Y
4) N, tokudb_checkpoint_lock set to OFF by mysqld if a session closed.
5) Y, but the data file will be truncated when recovery done and do a checkpoint.
6) I can add checksum based on my patch

BTW, based this patch, we have backed up many TokuDB instances whoes datasets are about 1.5TB(raw size is 6TB).
After all, this is a NOT BAD one.

Revision history for this message

George Ormond Lorch III (gl-az) wrote on 2016-06-28:

#6

I don't fully understand your response but the issues I rise are real and must be addressed before accepting this change into the product. We are willing to do some of the work on our side to make this production ready but we will not introduce a half-implemented feature set into the product.

On point 2, I think you missed my meaning. The recovery log copy should begin at the start of the backup, after the checkpoint lock has been placed and the file monitored for changes all throughout the backup until the FTWRL or LTFB. This is how the InnoDB backup works and is necessary for InnoDB due to the way the redo log is fixed sized rolling log. PerconaFT log is not a fixed sized rolling log, they will grow until it runs out of disk. By copying and monitoring in parallel, it will shorten the time needed during the FTWRL/LTFB that finalizes the last position/entry within the file.

On point 4, you are incorrect, when a session ends/terminates, the tokudb_checkpoint_lock will not be automatically released. It is a session variable that 'acts' like a global, meaning that one session can "set session tokudb_checkpoint_lock=true;" and it will globally disable checkpointing within PerconaFT, then another session can do a "set session tokudb_checkpoint_lock=false;" and globally re-enable checkpointing within PerconaFT. When a session closes, the session variables are simply thrown away, they are not 'unset'. So again, this is a bad way to handle disabling a critical function such as checkpointing and needs a better mechanism to ensure that in the event of an unexpected session termination, checkpointing gets re-enabled within the server.

On point 5, PerconaFT only truncates a PerconaFT data file on open, so unless the file is closed (the owing table happens to get evicted from the MySQL open table cache) or the server shutdown, this space is still allocated to the file during the backup and normal operation thereafter. This means that in theory, the worst possible case, you would need to have reserve file system capacity of 2x your data size plus whatever excess recovery log size. Due to the way the file map works, truncation may not even do anything if there is live data at the tail of the file (which is a distinct possibility because of the reason it was allocated in the first place, evicted dirty nodes not checkpointed), leaving all of the empty space within the file still allocated.

A backup operation is _not_ supposed to cause or alter server(mysqld) behavior or resource needs, which this approach does via file system space demands. Just because it works for you does not mean that it is an acceptable trade-off or solution for everyone and needs to be properly vetted before accepting the idea. Any potential problematic areas need to be identified and fully tested and documented so any user considering the product understands the tradeoffs.

As far as MyRocks, you are on your own there for the time being. It is not used in production yet by even the people that are developing it and still has many issues. It is not yet supported by Percona in any way.

I don't fully understand your response but the issues I rise are real and must be addressed before accepting this change into the product. We are willing to do some of the work on our side to make this production ready but we will not introduce a half-implemented feature set into the product.

On point 2, I think you missed my meaning. The recovery log copy should begin at the start of the backup, after the checkpoint lock has been placed and the file monitored for changes all throughout the backup until the FTWRL or LTFB. This is how the InnoDB backup works and is necessary for InnoDB due to the way the redo log is fixed sized rolling log. PerconaFT log is not a fixed sized rolling log, they will grow until it runs out of disk. By copying and monitoring in parallel, it will shorten the time needed during the FTWRL/LTFB that finalizes the last position/entry within the file.

On point 4, you are incorrect, when a session ends/terminates, the tokudb_checkpoint_lock will not be automatically released. It is a session variable that 'acts' like a global, meaning that one session can "set session tokudb_checkpoint_lock=true;" and it will globally disable checkpointing within PerconaFT, then another session can do a "set session tokudb_checkpoint_lock=false;" and globally re-enable checkpointing within PerconaFT. When a session closes, the session variables are simply thrown away, they are not 'unset'. So again, this is a bad way to handle disabling a critical function such as checkpointing and needs a better mechanism to ensure that in the event of an unexpected session termination, checkpointing gets re-enabled within the server.

On point 5, PerconaFT only truncates a PerconaFT data file on open, so unless the file is closed (the owing table happens to get evicted from the MySQL open table cache) or the server shutdown, this space is still allocated to the file during the backup and normal operation thereafter. This means that in theory, the worst possible case, you would need to have reserve file system capacity of 2x your data size plus whatever excess recovery log size. Due to the way the file map works, truncation may not even do anything if there is live data at the tail of the file (which is a distinct possibility because of the reason it was allocated in the first place, evicted dirty nodes not checkpointed), leaving all of the empty space within the file still allocated.

A backup operation is _not_ supposed to cause or alter server(mysqld) behavior or resource needs, which this approach does via file system space demands. Just because it works for you does not mean that it is an acceptable trade-off or solution for everyone and needs to be properly vetted before accepting the idea. Any potential problematic areas need to be identified and fully tested and documented so any user considering the product understands the tradeoffs.

As far as MyRocks, you are on your own there for the time being. It is not used in production yet by even the people that are developing it and still has many issues. It is not yet supported by Percona in any way.

Revision history for this message

bohutang (overred-shuttler) wrote on 2016-06-29:

#7

Hello George,

Thanks for your reply.

On point 2:
I agree with you

On point 4:
We should check the checkpoint_lock status　whether is ON before we set it OFF.
There may be some misunderstanding here.

On point 5:
No.
File maybe truncated on block_table::note_end_checkpoint, this will be called in the first checkpoint after a recovery.

I am looking forward to the final solution support in Xtrabackup not this trick one.

Revision history for this message

George Ormond Lorch III (gl-az) wrote on 2016-06-29:

#8

On point 4, we need to fix/implement a more robust mechanism to ensure that in the event of a backup/client termination, checkpointing gets re-enabled if the session has disabled it. It's not difficult, it just needs to be done. Backups fail/terminate/abort for many reasons and we need to ensure that we do not compromise the host system as a result of a failure.

On point 5, Ack, I missed this code path in my review, good catch. My point though is still valid. Truncation can only truncate back to the end of the last used block. Because of the way the current block allocator works, the chances of there being blocks allocated at the end of the file is very high. We already are aware that the PerconaFT files do not defragment and shrink themselves correctly in practice. This is of course a different issue but it is a reality that needs to be tested, documented and minimized as much as possible. We have people using TokuDB specifically because of the compression. It then becomes a very difficult argument to make to say they may need up to 2x space if they want to be able to use this solution to perform backups.

I am running some experiments on the side effects of keeping the checkpointer disabled or extended periods of time and will post back my results.

Sergei Glushchenko (sergei.glushchenko) on 2016-07-14

information type:

Private Security → Public

Revision history for this message

George Ormond Lorch III (gl-az) wrote on 2016-07-14:

#9

Sergei, I will be breaking this down into some specific blueprints since it has changes and needs spanning products.

Revision history for this message

Sveta Smirnova (svetasmirnova) wrote on 2017-08-29:

#10

Thank you for the reasonable feature request. I will mark it as "Confirmed".

tags:	added: i200485 i203146
Changed in percona-xtrabackup:
status:	New → Confirmed

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-20:

#11

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-1392

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraBackup moved to https://jira.percona.com/projects/PXB	Status tracked in 2.4
	2.4	Confirmed	Undecided	Unassigned

Percona XtraBackup moved to https://jira.percona.com/projects/PXB

2.4 supports TokuDB backup

Bug Description

Other bug subscribers

Remote bug watches