PBXT tables very large on disk

Bug #368693 reported by m00dawg on 2009-04-28
2
Affects Status Importance Assigned to Milestone
PBXT
Undecided
Unassigned

Bug Description

PBXT tables seem to be extremely large in comparison to other engines on disk. Even outputting to a .CSV file still produce a filesize that is less than half of PBXT:

mysql> SHOW CREATE TABLE Logger\G
*************************** 1. row ***************************
       Table: Logger
Create Table: CREATE TABLE `Logger` (
  `timestampOccurred` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `dateOccurred` date NOT NULL,
  `session` char(32) DEFAULT NULL,
  `host` varchar(255) DEFAULT NULL,
  `sslMode` enum('enabled','disabled') DEFAULT 'enabled',
  `requestURI` varchar(255) DEFAULT NULL,
  `referer` varchar(255) DEFAULT NULL,
  `userAgent` varchar(255) DEFAULT NULL,
  `remoteHost` int(10) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql> ALTER TABLE Logger ENGINE='PBXT';
Query OK, 945463 rows affected (1 min 21.74 sec)
Records: 945463 Duplicates: 0 Warnings: 0

mysql> SELECT DATA_LENGTH / 1024 / 1024 AS Data, INDEX_LENGTH / 1024 / 1024 AS Indexes FROM information_schema.TABLES WHERE TABLE_SCHEMA='test' AND TABLE_NAME='Logger';
+------------+------------+
| Data | Indexes |
+------------+------------+
| 0.90166473 | 0.00390625 |
+------------+------------+
1 row in set (0.00 sec)

office101-221:test root# ls -lh
total 1195560
-rw-rw---- 1 _mysql staff 3.8M Apr 28 11:13 Logger-1.xtr
-rw-rw---- 1 _mysql staff 8.7K Apr 28 11:12 Logger.frm
-rw-rw---- 1 _mysql staff 580M Apr 28 11:13 Logger.xtd
-rw-rw---- 1 _mysql staff 4.0K Apr 28 11:12 Logger.xti
-rw-rw---- 1 _mysql staff 65B Apr 28 11:01 db.opt

mysql> ALTER TABLE Logger ENGINE='Archive';
Query OK, 945463 rows affected (13.85 sec)
Records: 945463 Duplicates: 0 Warnings: 0

mysql> SELECT DATA_LENGTH / 1024 / 1024 AS Data, INDEX_LENGTH / 1024 / 1024 AS Indexes FROM information_schema.TABLES WHERE TABLE_SCHEMA='test' AND TABLE_NAME='Logger';
+-------------+------------+
| Data | Indexes |
+-------------+------------+
| 27.35079288 | 0.00000000 |
+-------------+------------+
1 row in set (0.00 sec)

office101-221:test root# ls -lh
total 56048
-rw-rw---- 1 _mysql staff 27M Apr 28 11:08 Logger.ARZ
-rw-rw---- 1 _mysql staff 8.7K Apr 28 11:07 Logger.frm
-rw-rw---- 1 _mysql staff 65B Apr 28 11:01 db.opt

mysql> ALTER TABLE Logger ENGINE='MyISAM';
Query OK, 945463 rows affected (8.29 sec)
Records: 945463 Duplicates: 0 Warnings: 0

mysql> SELECT DATA_LENGTH / 1024 / 1024 AS Data, INDEX_LENGTH / 1024 / 1024 AS Indexes FROM information_schema.TABLES WHERE TABLE_SCHEMA='test' AND TABLE_NAME='Logger';
+--------------+------------+
| Data | Indexes |
+--------------+------------+
| 186.35408020 | 0.00097656 |
+--------------+------------+
1 row in set (0.01 sec)

office101-221:test root# ls -lh
total 381696
-rw-rw---- 1 _mysql staff 186M Apr 28 11:14 Logger.MYD
-rw-rw---- 1 _mysql staff 1.0K Apr 28 11:14 Logger.MYI
-rw-rw---- 1 _mysql staff 8.7K Apr 28 11:14 Logger.frm
-rw-rw---- 1 _mysql staff 65B Apr 28 11:01 db.opt

mysql> ALTER TABLE Logger ENGINE='InnoDB';
Query OK, 945463 rows affected (48.27 sec)
Records: 945463 Duplicates: 0 Warnings: 0

mysql> SELECT DATA_LENGTH / 1024 / 1024 AS Data, INDEX_LENGTH / 1024 / 1024 AS Indexes FROM information_schema.TABLES WHERE TABLE_SCHEMA='test' AND TABLE_NAME='Logger';
+--------------+------------+
| Data | Indexes |
+--------------+------------+
| 221.75000000 | 0.00000000 |
+--------------+------------+
1 row in set (0.00 sec)

office101-221:test root# ls -lh
total 475168
-rw-rw---- 1 _mysql staff 8.7K Apr 28 11:16 Logger.frm
-rw-rw---- 1 _mysql staff 232M Apr 28 11:17 Logger.ibd
-rw-rw---- 1 _mysql staff 65B Apr 28 11:01 db.opt

mysql> SELECT * FROM Logger INTO OUTFILE '/tmp/logger.csv';
Query OK, 945463 rows affected (8.04 sec)

office101-221:tmp root# ls -lh logger.csv
-rw-rw-rw- 1 _mysql wheel 215M Apr 28 11:20 logger.csv
office101-221:tmp root#

I wasn't sure if this was by design, but it seems like it could cost a lot of disk I/O? I was talking to a colleague (and fellow PBXT enthusiast) and he seems to think it may be an issue with PBXT not packing keys?

Related branches

m00dawg (tim-moocowproductions) wrote :

My apologies! These tests were run on MySQL 5.1.34 with latest PBXT from Bazaar (revision 614). Both MySQL and PBXT were built from source on my Mac running OS X 10.5.6.

Also, though not directly related to each other perhaps, I ran the same tests for bug 368693 as well as 368692.

Paul McCullagh (paul-mccullagh) wrote :

Hi Tim!

The problem may be the default estimate of the average row size used by PBXT. It may be bad in this case.

I plan to improve this using OPTIMIZE TABLE, but that is future stuff still.

In the meantime you should try the following:

Run CHECK TABLE, and then have a look at the .err log.

PBXT dumps statistics about the table. One of them is the average row size.

For example, you will see something like this:

CHECK TABLE Logger;

In the .err log:
CHECK TABLE: ./test/Logger
Record buffer size = 1072
Fixed length rec. len. = 1069
Handle data record size = 447
Min/max header size = 14/26
Min/avg/max record size = 53/433/1081
Avg row len set for tab = not specified
Rows fixed length = NO
Maximum fixed size = 16384
Minimum variable size = 320
Minimum auto-increment = 0
Number of columns = 9
Number of fixed columns = 5
Columns req. for index = 0
Columns req. for blobs = 0
Number of blob columns = 0
Number of indices = 0
Minumum comp. rec. len. = 194
Average comp. rec. len. = 194
Maximum comp. rec. len. = 194
Free record count = 0
Deleted record count = 0
Allocated record count = 1

Take the "Average comp. rec. len" and do the following (example):

ALTER TABLE Logger AVG_ROW_LENGTH=194;

You can add a bit to the value here, depending on the value of "Maximum comp. rec. len".

What will happen is that all data that exceeds this average row length, overflows into the data log files. Where it is packed together very efficiently.

The data logs file are in pbxt/data, called dlog-*.xt

So you need to add the space taken by these files to the size of the table, after the ALTER TABLE.

Further optimization-:

* Move the fixed length fields to the start of the table
* Move fields more often accessed to the start of the table.

This may save PBXT from reading the extended part of the record in the data log file.

For example:

CREATE TABLE `Logger` (
 `timestampOccurred` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `dateOccurred` date NOT NULL,
 `session` char(32) DEFAULT NULL,
 `remoteHost` int(10) unsigned NOT NULL
 `sslMode` enum('enabled','disabled') DEFAULT 'enabled',
 `host` varchar(255) DEFAULT NULL,
 `requestURI` varchar(255) DEFAULT NULL,
 `referer` varchar(255) DEFAULT NULL,
 `userAgent` varchar(255) DEFAULT NULL,
) ENGINE=PBXT;

Changed in pbxt:
status: New → Incomplete
Paul McCullagh (paul-mccullagh) wrote :

The index storage has been made more compact in this version, provided that rows are inserted in index order.

Otherwise, size of the database can be tuned by setting AVG_ROW_LENGTH, after using CHECK TABLE to determine the distribution of row sizes.

Changed in pbxt:
status: Incomplete → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers