Reducing the time spent with a global read lock

Bug #993714 reported by astrostl on 2012-05-03
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MySQL Data Dumper
Wishlist
Max Bubenick

Bug Description

I have some DBs with tens of thousands of tables, perhaps 5% of which are MyISAM. When I start Mydumper, it can take 30+ minutes before it actually begins doing backups. Between verbose output, the MySQL processlist, and an strace, it looks like the following:

- Mydumper requests a global FLUSH TABLES WITH READ LOCK
- Once granted, it performs a 'SHOW TABLE STATUS' on each table
- Once the TABLE STATUS loops finish, it backs up non-InnoDB tables
- Once non-InnoDB tables are backed up, it releases the READ LOCK
- Once the READ LOCK is released, it backs up InnoDB tables

The thing is, with tens of thousands of tables - at least on my abused servers - it can take more than 30 minutes simply to finish the SHOW TABLE STATUS stage. Is it necessary for data integrity to have a READ LOCK as Mydumper is collecting this information? If not, I'd *love* to see the READ LOCK acquired afterward (status -> lock -> non-InnoDB -> release rather than lock -> status -> non-InnoDB -> release).

Whether or not it's deemed necessary to lock first, would it perhaps be faster to query for table data a la 'mysql -Be 'select TABLE_SCHEMA, TABLE_NAME, ENGINE from information_schema.tables where ENGINE not like "InnoDB"'', IF (??) information_schema can provide all the data you need?

astrostl (astrostl) on 2012-05-03
description: updated
description: updated
description: updated
astrostl (astrostl) on 2012-05-03
description: updated
description: updated
description: updated
astrostl (astrostl) on 2012-05-03
description: updated
description: updated
astrostl (astrostl) wrote :

Ideal optimization, presuming a READ LOCK isn't necessary for InnoDB? :

- FLUSH TABLES WITH READ LOCK
- query information_schema for non-InnoDB tables
- (SHOW TABLE STATUS on non-InnoDB tables, if still necessary)
- back up non-InnoDB tables
- release READ LOCK

- query information_schema for InnoDB tables
- (SHOW TABLE STATUS on InnoDB tables, if still necessary)
- back up InnoDB tables

I think that alone, perhaps especially w/o the SHOW TABLE STATUS, would be a big boon for my environment.

Max Bubenick (max-bubenick) wrote :

new option added --less-locking

Changed in mydumper:
milestone: none → 0.6.0
assignee: nobody → Max Bubenick (max-bubenick)
importance: Undecided → Wishlist
status: New → Fix Committed
astrostl (astrostl) wrote :

Whoa, awesome! Which method did you end up using for 'less-locking' ?

Max Bubenick (max-bubenick) wrote :

We reduced FTWRL time by releasing it before dumping non-innodb tables. We do that with read locks only on those tables.
SHOW TABLES must be inside the global read lock for the backup to be consistent, if not we can loose tables created between.

astrostl (astrostl) wrote :

Cool, thanks! Also, what about

    'select TABLE_SCHEMA, TABLE_NAME, ENGINE from information_schema.tables where ENGINE not like "InnoDB"'

as an alternative to a SHOW TABLE STATUS loop? Is it polling for more than the engine type, or something that couldn't/shouldn't be divined from the information_schema table? I find those lookups to be the biggest source of slowdowns, when dealing with massive amounts of schemas/tables.

Max Bubenick (max-bubenick) wrote :

That is in progress, also we are analyzing if we can run only one SELECT FROM I_S.tables.

astrostl (astrostl) wrote :

Awesome. Thanks so much, very glad to see it all in active dev!

Changed in mydumper:
status: Fix Committed → Fix Released
astrostl (astrostl) wrote :

"That is in progress, also we are analyzing if we can run only one SELECT FROM I_S.tables."

Has this analysis concluded, or is it being tracked in another ticket?

Again, thanks so much :)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers