Comment 7 for bug 1269547

Revision history for this message
Tom Manville (tdmanville) wrote :

Hi Valerii,
Thanks for looking into this.

I've run a similar experiment with the default settings and 100,000,000 rows.

I wanted to determine the benefit of coalescing entries in a single xdb file by comparing the number of unique (space, first_page) pairs to the total number of pairs. If each file had only unique entries we could save a significant amount of space.

I'm attaching graphs showing the possible savings for the synthetic experiment above and our application. In our system, this would change the size of the bitmaps from 400GB to ~4GB.

To realize these savings, we have the following proposal:

In our use case, the there are many entries for a single (space_id, page_offset) pair. By coalescing bitmap entries withing a single 100MB file, we see a gain of 100x. It therefore seems like coalescing entries within a single file will be sufficient.

XtraDB would track 2 parallel sets of bitmaps.
The first's behavior would remain the same as the current implementation. At each checkpoint, the bitmaps would be flushed to a file as the are now.
The second would cover a coarser granularity. Rather than flush at each checkpoint, the second set would exist in memory until a new xdb file is written. When this occurs the second bitmap set is flushed to temporary file and then (atomically) renamed to the just-written xdb file. This file would contain fewer entries and only one lsn range. Only one "last_block" flag would be set.

I would be happy to work on this implementation if you think the community would accept it.

Alternative approaches that may help this issue:
- Using variable length bitmaps. I can check the performance of this in our system if you think this is a viable option.
- Allowing tables to be excluded from backups.

Thanks,
Tom