TokuDB does not allow to re-create table after one of partitions lost

Bug #1657908 reported by Sveta Smirnova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Won't Fix
Undecided
Unassigned
5.5
Invalid
Undecided
Unassigned
5.6
Won't Fix
Undecided
Unassigned
5.7
Won't Fix
Undecided
Unassigned

Bug Description

After some disaster one of partitions in a table was lost. Other tables were not affected. Restoring full database from backup is not an option due to size. Server crashed during start, so we deleted partition. After that server successfully stopped, but table could not be dropped. Then we dropped database which contained the table, re-created it. After that all tries to re-create the table failed with error 17.

How to repeat:

1. cd into mysql-test directory

2. LD_PRELOAD=$HOME/mysql_packages/Percona-Server-5.7.11-4-Linux.x86_64.ssl100/lib/mysql/libjemalloc.so ./mtr --start innodb &

3. sudo ../bin/ps_tokudb_admin -h 127.0.0.1 -P13001 -uroot -e

4. cat > small_tab.sql
CREATE TABLE `t1` (
  `uid` bigint(20) unsigned NOT NULL,
  `tstamp` int(10) unsigned NOT NULL,
  `value` decimal(15,2) NOT NULL,
  PRIMARY KEY (`uid`,`tstamp`),
  KEY `tstamp` (`tstamp`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1
PARTITION BY RANGE (tstamp)
(PARTITION p1421712000 VALUES LESS THAN (1421712000) ENGINE = TokuDB,
 PARTITION p1421798400 VALUES LESS THAN (1421798400) ENGINE = TokuDB,
 PARTITION p1421884800 VALUES LESS THAN (1421884800) ENGINE = TokuDB,
 PARTITION p1487030400 VALUES LESS THAN (1487030400) ENGINE = TokuDB,
 PARTITION p1487116800 VALUES LESS THAN (1487116800) ENGINE = TokuDB,
 PARTITION p1487203200 VALUES LESS THAN (1487203200) ENGINE = TokuDB,
 PARTITION p1487289600 VALUES LESS THAN (1487289600) ENGINE = TokuDB,
 PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = TokuDB);

^C

5. alias mysqlmtr='../bin/mysql -h127.0.0.1 -P13000 -uroot'

6. mysqlmtr -P13001 test < /home/sveta/issues/160114/bug/small_tab.sql

7. ../bin/mysqladmin -h127.0.0.1 -P13001 -uroot shutdown

8. rm var/mysqld.1/data/_test_t1_P_p1421798400_main_4_2_1d.tokudb

9. LD_PRELOAD=$HOME/mysql_packages/Percona-Server-5.7.11-4-Linux.x86_64.ssl100/lib/mysql/libjemalloc.so ./mtr --start-dirty innodb &

10. mysqlmtr -P13001 test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 5.6.34-79.1-debug-log Source distribution

Copyright (c) 2009-2016 Percona LLC and/or its affiliates
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| t1 |
+----------------+
1 row in set (0.00 sec)

mysql> drop table t1;
ERROR 1051 (42S02): Unknown table 'test.t1'
mysql> show tables;
Empty set (0.00 sec)

mysql> drop database test;
Query OK, 0 rows affected (0.01 sec)

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> \q
Bye

11. mysqlmtr -P13001 test < /home/sveta/issues/160114/bug/small_tab.sql
ERROR 1030 (HY000) at line 1: Got error 17 from storage engine

12. ls var/mysqld.1/data/
auto.cnf log000000000001.tokulog29 tc.log tokudb.directory __tokudb_lock_dont_delete_me_logs
ibdata1 mtr test tokudb.environment __tokudb_lock_dont_delete_me_recovery
ib_logfile0 mysql _test_t1_P_p1421798400_key_tstamp_4_3_1d.tokudb __tokudb_lock_dont_delete_me_data __tokudb_lock_dont_delete_me_temp
ib_logfile1 performance_schema _test_t1_P_p1421798400_status_4_1_1d.tokudb __tokudb_lock_dont_delete_me_environment tokudb.rollback

13. rm var/mysqld.1/data/_test_t1_P_p1421798400_*

14. ls var/mysqld.1/data/
auto.cnf ib_logfile1 mysql test __tokudb_lock_dont_delete_me_data __tokudb_lock_dont_delete_me_recovery
ibdata1 log000000000001.tokulog29 performance_schema tokudb.directory __tokudb_lock_dont_delete_me_environment __tokudb_lock_dont_delete_me_temp
ib_logfile0 mtr tc.log tokudb.environment __tokudb_lock_dont_delete_me_logs tokudb.rollback

15. mysqlmtr -P13001 test < /home/sveta/issues/160114/bug/small_tab.sql
ERROR 1030 (HY000) at line 1: Got error 17 from storage engine

16. tail var/log/mysqld.1.err
2017-01-20 01:29:22 14072 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.34-79.1 started; log sequence number 1632006
2017-01-20 01:29:22 14072 [Note] Server hostname (bind-address): '*'; port: 13001
2017-01-20 01:29:22 14072 [Note] IPv6 is available.
2017-01-20 01:29:22 14072 [Note] - '::' resolves to '::';
2017-01-20 01:29:22 14072 [Note] Server socket created on IP: '::'.
2017-01-20 01:29:22 14072 [Note] Event Scheduler: Loaded 0 events
2017-01-20 01:29:22 14072 [Note] /home/sveta/build/ps-5.6/bin/mysqld: ready for connections.
Version: '5.6.34-79.1-debug-log' socket: '/home/sveta/build/ps-5.6/mysql-test/var/tmp/mysqld.1.sock' port: 13001 Source distribution
2017-01-20 01:29:41 14072 [ERROR] TokuDB: toku dbremove failed

Option tokudb_strip_frm_data does not help.

Tags: tokudb i160114
Revision history for this message
George Ormond Lorch III (gl-az) wrote :

tokudb_strip_frm_data is not for this purpose and should never be used in any attempt of data recovery as it can actually cause further data loss.

This sounds like some TokuDB/PerconaFT data files were deleted out from under the storage engine, causing the PerconaFT directory file mapping to become inconsistent (and possibly inconsistent with recovery log entries that also might refer to that now missing data file). This is not a supported operation and will result in precisely the situation that is described. Restore from a backup or logical dump and reload is the only recovery mechanism available and supported at this time.

tags: added: tokudb
Revision history for this message
George Ormond Lorch III (gl-az) wrote :

Tere are several recovery tasks already in the queue:
https://tokutek.atlassian.net/browse/FT-733 : Create utility to allow editing of ydb directory file
https://tokutek.atlassian.net/browse/DB-820 : Force tokudb recovery to skip certain tables or transactions
https://tokutek.atlassian.net/browse/DB-841 : Need faster rollback time when database crashes during large transaction.
https://tokutek.atlassian.net/browse/DB-918 : Transportable Single-Table Tablespaces Supports
https://tokutek.atlassian.net/browse/DB-947 : DB row corruption recovery

Changed in percona-server:
status: Confirmed → Won't Fix
Revision history for this message
George Ormond Lorch III (gl-az) wrote :

This issue specifically needs the FT-733, the PerconaFT directory map editor.

You would use the editor to remove the bad/dead directory mapping. This 'directory' is the tokudb.directory file mentioned here https://www.percona.com/blog/2016/09/27/tokudb-and-perconaft-database-file-management-part-1-of-2/ and maps the database.table.index names to actual file names.

The issue here is that there have what seems to be an 'orphaned' mapping from a database.table.index that MySQL side thinks doesn't exists but PerconaFT thinks still exists, so when an attempt is made to create a new one, PerconaFT returns an error saying it already exists.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

Tried to repeat the issue with the following mtr test:

================
source include/have_tokudb.inc;

SET DEFAULT_STORAGE_ENGINE='tokudb';
SET GLOBAL tokudb_dir_per_db=ON;

--let $DB= test
--let $DATADIR= `SELECT @@datadir`

CREATE TABLE t1 (a INT PRIMARY KEY, b INT);
--source dir_per_db_show_table_files.inc

--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--shutdown_server
--source include/wait_until_disconnected.inc

--remove_files_wildcard $DATADIR/$DB t1_main_*.tokudb

--enable_reconnect
--exec echo "restart:" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--source include/wait_until_connected_again.inc

--source dir_per_db_show_table_files.inc

SHOW TABLES;

--error 1051
DROP TABLE t1;

SHOW TABLES;

--source dir_per_db_show_table_files.inc

CREATE TABLE t1 (a INT PRIMARY KEY, b INT);
DROP TABLE t1;

SET GLOBAL tokudb_dir_per_db=default;
===========

And I had the following error:

mysqltest: At line 33: query 'CREATE TABLE t1 (a INT PRIMARY KEY, b INT)' failed: 1017: Can't find file: 't1' (errno: 2 - No such file or directory)

and the following record in error-log file:
2017-02-21 13:14:10 3890 [ERROR] TokuDB: toku dbremove failed

But, when I deleted status file too, i.e. instead of
--remove_files_wildcard $DATADIR/$DB t1_main_*.tokudb
I used
--remove_files_wildcard $DATADIR/$DB t1_*.tokudb

I could create table after drop without any problems.

So as workaround I would suggest just to remove '_status.tokudb' file and all files of this table before DROP TABLE, and after DROP TABLE it's possible to create the table once more.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

The above workaround does not work for the case of partition table.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

The simplified version of the mtr test to repeat "error 17":

=============
source include/have_tokudb.inc;

SET GLOBAL tokudb_dir_per_db=ON;

--let $DB= test
--let $DATADIR= `SELECT @@datadir`

--delimiter |
CREATE PROCEDURE create_table()
BEGIN
CREATE TABLE test.t1 (
  a INT
) ENGINE = TokuDB
PARTITION BY RANGE (a)
(PARTITION p100 VALUES LESS THAN (100) ENGINE = TokuDB,
 PARTITION p_to_del VALUES LESS THAN (200) ENGINE = TokuDB);
END|
--delimiter ;

--echo ### Create partitioned table
CALL create_table();
--source dir_per_db_show_table_files.inc

--echo ### Stop server
--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--shutdown_server
--source include/wait_until_disconnected.inc

--echo ### Remove 'main' file of one of the partitions
--remove_files_wildcard $DATADIR/$DB t1_P_p_to_del_main_*.tokudb

--echo ### Start server
--enable_reconnect
--exec echo "restart:" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--source include/wait_until_connected_again.inc

--echo ### Make sure 'main' partition file is deleted
--source dir_per_db_show_table_files.inc

--echo ### Make sure the table still exists
SHOW TABLES;

--echo ### Drop table, expect error
--error 1051
DROP TABLE t1;

--echo ### Make sure the table is dropped despite error
SHOW TABLES;

--echo ### Check what files still exist after DROP TABLE
--source dir_per_db_show_table_files.inc

--echo ### Remove the rest of the files
--remove_files_wildcard $DATADIR/$DB *.tokudb

--echo ### Make sure there are no tokudb files
--source dir_per_db_show_table_files.inc

--echo ### Create the same table once more
CALL create_table();
--source dir_per_db_show_table_files.inc

--echo ### Restore state
DROP TABLE t1;
DROP PROCEDURE create_table;
SET GLOBAL tokudb_dir_per_db=default;
================

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

The scenario of this bug is the following:

1) DROP TABLE t1;
a) drop p100;
b) drop p_to_del:
b.1) try to drop 'main' file, as this file does not exist, the following code returns error:

static int
env_dbremove(DB_ENV * env, DB_TXN *txn, const char *fname, const char *dbname, uint32_t flags) {
...
    r = toku_db_open_iname(db, txn, iname, 0, 0);
    if (txn && r) {
        if (r == EMFILE || r == ENFILE)
            r = toku_ydb_do_error(env, r, "toku dbremove failed because open file limit reached\n");
        else
            r = toku_ydb_do_error(env, r, "toku dbremove failed\n");
        goto exit;
    }
...
}

on the higher level this error is processed as:

int ha_tokudb::delete_or_rename_table (const char* from_name, const char* to_name, bool is_delete) {
...
    error = delete_or_rename_dictionary(from_name, to_name, "main", false, txn, is_delete);
    if (error) { goto cleanup; }
...
}

So if removing "main" file end up with error the "status" file stays untouched and moreover tokudb dir table still contains record for "status" file.

3) CREATE TABLE t1
a) create p100
b) try to create p_to_del:
b.1) create "main" file
b.2) when "status" file is created the following code returns error:

static int
toku_db_open(DB * db, DB_TXN * txn, const char *fname, const char *dbname, DBTYPE dbtype, uint32_t flags, int mode) {
...
    r = toku_db_get(db->dbenv->i->directory, txn, &dname_dbt, &iname_dbt, DB_SERIALIZABLE); // allocates memory for iname
    std::unique_ptr<char[], decltype(&toku_free)> iname(
        static_cast<char *>(iname_dbt.data), &toku_free);
    if (r == DB_NOTFOUND && !is_db_create) {
        r = ENOENT;
    } else if (r==0 && is_db_excl) {
        r = EEXIST; // <------------------------ HERE!
    } else if (r == DB_NOTFOUND) {
...
}

i.e. as tokudb dir table still contains record for the "status" table, this record is used and file name for the "status" file is received from this table. As the "status" file was not deleted the error happens during attempt to create it, as a result - we get error 17 (EEXIST) from engine.

How to fix. I would propose to try to remove all table files ("main", "status") before returning error from ha_tokudb::delete_or_rename_table(). The "recovery" case must be considered separately.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

During work on this bug I found another minor bug.

Each tokudb table consist of the minimum two files - "main" and "status". Each file name contains postfix, this postfix is transaction id of transaction in which the table is created.

If some new table is created and tokudb directory table contains record for one of it's files ("main" or "status") then the file name is obtained from this record and if such file does not exist postfixes for "main" and "status" will be different, what is not expected.

See the following code:
static int
toku_db_open(DB * db, DB_TXN * txn, const char *fname, const char *dbname, DBTYPE dbtype, uint32_t flags, int mode) {
...
    r = toku_db_get(db->dbenv->i->directory, txn, &dname_dbt, &iname_dbt, DB_SERIALIZABLE); // allocates memory for iname
    std::unique_ptr<char[], decltype(&toku_free)> iname(
        static_cast<char *>(iname_dbt.data), &toku_free);
    if (r == DB_NOTFOUND && !is_db_create) {
        r = ENOENT;
    } else if (r==0 && is_db_excl) {
        r = EEXIST;
    } else if (r == DB_NOTFOUND) {
        iname = generate_iname_for_rename_or_open(db->dbenv, txn, dname, true);
        toku_fill_dbt(&iname_dbt, iname.get(), strlen(iname.get()) + 1);
        //
        // put_flags will be 0 for performance only, avoid unnecessary query
        // if we are creating a hot index, per #3166, we do not want the write lock in directory grabbed.
        // directory read lock is grabbed in toku_db_get above
        //
        uint32_t put_flags = 0 | ((is_db_hot_index) ? DB_PRELOCKED_WRITE : 0);
        r = toku_db_put(db->dbenv->i->directory, txn, &dname_dbt, &iname_dbt, put_flags, true);
    }

    // we now have an iname
    if (r == 0) {
        r = toku_db_open_iname(db, txn, iname.get(), flags, mode);
        if (r == 0) {
            db->i->dname = toku_xstrdup(dname);
            env_note_db_opened(db->dbenv, db); // tell env that a new db handle is open (using dname)
        }
    }
...
}

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3627

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.