changed_page_tracking bitmap files are very sparse.

Bug #1269547 reported by Tom Manville
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.1
Won't Fix
Medium
Unassigned
5.5
Triaged
Medium
Unassigned
5.6
Triaged
Medium
Unassigned
5.7
Triaged
Medium
Unassigned

Bug Description

We've seen over 100X size reduction in the files by coalescing the entries in single files. This should probably be performed by Xtrabackup.

Revision history for this message
Andrew Gaul (gaul) wrote :

More specifically, we have seen a large number of bitmap files when the primary key of a database is random, e.g., a hash. Either coalescing these before or after writing to disk would avoid using a lot of space.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

I assume by coalescing entries you mean merging the bitmap files so that, on disk, instead of bitmap A between checkpoints 1 and 2 followed by bitmap B between 2 and 3 we have a joint bitmap A|B between checkpoints 1 and 3? Is this correct or do you have something else in mind?

If yes, then I am not sure how to best implement this. A server bitmap writer could add to and overwrite some last bitmap section instead of appending new pages to the end, but that would have bitmap data loss in case of crash concern. What we could do is to have another bitmap page format for sparse bitmaps that is not of a constant 4KB size. These pages would still be appended instead of coalescing and the writer would choose the format based on the estimated bitmap sparsity for that checkpoint.

Also, can you quantify "very sparse"? For example, what is the ratio between redo log writes and bitmap writes, i.e. LSN delta divided by bitmap file bytes for that LSN interval?

Revision history for this message
Alexey Kopytov (akopytov) wrote :

Should certainly be performed by the server rather than XtraBackup.

affects: percona-xtrabackup → percona-server
Revision history for this message
Tom Manville (tdmanville) wrote :

Laurynas yes, that is what I have in mind.

Even increasing the size of and making the bitmaps variable length would not help much for us. We perform small random writes to a very large table. At its worst, the ratio between LSN delta divided by bitmap size was around 5, but got worse as the db size increased. The end size of the db was around 300GB.

I'm attaching the output from "ls -ltr /var/lib/mysql | grep xdb" that should give the info you requested.

Revision history for this message
Tom Manville (tdmanville) wrote :

Attached is a plot of the info you requested.

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

It seems that this (LSN delta divided by bitmap size being small) is easy enough to reproduce. Start server with:

innodb_track_changed_pages=1
innodb_max_bitmap_file_size=32768

Then apply load like this:

create table tbig(id int primary key, c1 char(255)) engine=InnoDB;
insert into tbig values(1,'a');
insert into tbig select rand()*1000000000, 'a' from tbig;
...
insert into tbig select rand()*1000000000, 'a' from tbig;
flush changed_page_bitmaps;

and check *.xdb file sizes and LSN values:

openxs@ao756:~/dbs/p5.6$ ls -l data/*.xdb
-rw-rw---- 1 openxs openxs 147456 п╩я▌я┌ 3 19:28 data/ib_modified_log_1_0.xdb
-rw-rw---- 1 openxs openxs 32768 п╩я▌я┌ 3 19:32 data/ib_modified_log_2_12003768.xdb
-rw-rw---- 1 openxs openxs 40960 п╩я▌я┌ 3 19:33 data/ib_modified_log_3_12017366.xdb
-rw-rw---- 1 openxs openxs 45056 п╩я▌я┌ 3 19:35 data/ib_modified_log_4_13490670.xdb
...

and do calculations like this:

mysql> select (1349067-1201736)/40960;
+-------------------------+
| (1349067-1201736)/40960 |
+-------------------------+
| 3.5969 |
+-------------------------+
1 row in set (0,01 sec)

Note low values like the above.

Revision history for this message
Tom Manville (tdmanville) wrote :

Hi Valerii,
Thanks for looking into this.

I've run a similar experiment with the default settings and 100,000,000 rows.

I wanted to determine the benefit of coalescing entries in a single xdb file by comparing the number of unique (space, first_page) pairs to the total number of pairs. If each file had only unique entries we could save a significant amount of space.

I'm attaching graphs showing the possible savings for the synthetic experiment above and our application. In our system, this would change the size of the bitmaps from 400GB to ~4GB.

To realize these savings, we have the following proposal:

In our use case, the there are many entries for a single (space_id, page_offset) pair. By coalescing bitmap entries withing a single 100MB file, we see a gain of 100x. It therefore seems like coalescing entries within a single file will be sufficient.

XtraDB would track 2 parallel sets of bitmaps.
The first's behavior would remain the same as the current implementation. At each checkpoint, the bitmaps would be flushed to a file as the are now.
The second would cover a coarser granularity. Rather than flush at each checkpoint, the second set would exist in memory until a new xdb file is written. When this occurs the second bitmap set is flushed to temporary file and then (atomically) renamed to the just-written xdb file. This file would contain fewer entries and only one lsn range. Only one "last_block" flag would be set.

I would be happy to work on this implementation if you think the community would accept it.

Alternative approaches that may help this issue:
- Using variable length bitmaps. I can check the performance of this in our system if you think this is a viable option.
- Allowing tables to be excluded from backups.

Thanks,
Tom

Revision history for this message
Tom Manville (tdmanville) wrote :

Data from our system:

tags: added: bitmap xtradb
Revision history for this message
Tom Manville (tdmanville) wrote :

Based on a discussion with Layrynas at Percona Live 2014, a simple fix for us may be to be able to exclude certain tables from both the bitmaps and xtrabackup.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Having considered the table exclusion idea for a while, I believe it would work, but, without a bitmap file format change, it's IMHO too dangerous to be included in the trunk. The reason is that it would be impossible to tell from a bitmap file that it's missing some table change data. If some changes to the skipped tables do happen for any reason, then using such bitmap would result in a silently broken backup.

I am considering the above change with a bitmap file format change, and other fix options.

Should you want to implement the table skip option anyway for your in-house needs, it should be a simple implementation and we could review it for other issues even if we wouldn't merge it to trunk.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Tom and others affected by the bitmap file sparseness -

Here are the current options I see for addressing this, including your suggestions. What do you think?

    - Re. tracking two parallel sets of bitmaps, what would yo do with
      the first set - delete its files externally as they are rotated?
      This would also need a limit on the in-memory bitmap size with a
      forced rotation when it is reached for the second set. Also, how
      would the information of the two sets of files be merged
      if there is need to find the last tracked LSN or by XtraBackup
      if both are present?
    - The option of variable-length bitmap data pages seems to be
      easiest to implement and use (with some reservations re. crash
      recovery, but these should be possible work out), but the
      savings there will be less as only data inside a single
      checkpoint would be compressed whereas other approaches would
      compress arbitrarily many checkpoints. Would you be willing to
      test a prototype to measure space savings?
    - Re. the table skip option, the biggest concern is silently
      broken backups due to missing data as I wrote before. To make
      backups safe, the information of the skipped tables (and the LSN
      intervals for when the table skips were active) needs to be
      present in the bitmaps somehow. Then XtraBackup would only
      accept bitmaps if the skipped table information is consistent
      with XB --tables options. The skipped tables information could
      be provided by a special data page type in the bitmaps, but, if
      we need to change the format, might as well do the compact
      representation instead.
    - I also considered an option of providing an external utility to
      merge finished bitmap files. It would have to be a part of
      XtraBackup, which might not necessarily be acceptable because of
      other considerations. It would also have to work with
      arbitrarily large bitmap files.

Revision history for this message
Tom Manville (tdmanville) wrote :

1. A rename could overwrite the first set with the second. If both files exist, then this means there was a crash while writing the second set. In this case, we should use the first file. I understand this may be relatively complex and hard to implement a correct crash recovery program.
2. I'd be happy to test out any implementations for this.
3. Understood.
4. I have a script that does this in python. Let me know if I can help with the implementation for XtraDB.

Thanks Laurynas

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Tom and others -

I have pushed lp:~laurynas-biveinis/percona-server/sparse-bitmap-prototype for estimating space saving with a sparse bitmap representation.

The branch has one new option, --innodb-sparse-changed-page-bitmap=TRUE|FALSE, dynamic, disabled by default, that writes compact bitmap pages if considered profitable at each checkpoint. The resulting bitmap files are NOT usable, the branch itself is strictly for experimental purposes.

For both option values it will print a sparsity measure to stderr, and if the option is enabled, it will also print how many bytes were actually written and how many bytes were written if the default representation were used instead.

Tested very lightly. I have worked out what I believe is the correct cutoff value between representations, but feel free to tweak it in the code if you find that sparse bitmap takes more bytes than a dense one.

Thanks in advance for any space saving measurements and any other feedback.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

The branch is based on 5.6. Let me know if you prefer 5.5 instead.

Revision history for this message
Tom Manville (tdmanville) wrote :

5.6 is great. I'll test your patch this weak.

Thanks!
Tom

tags: added: i43464
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1471

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.