Corrupt ARCHIVE tables failing to repair

Bug #881383 reported by Dreas van Donselaar on 2011-10-25
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MariaDB
New
Medium
Michael Widenius

Bug Description

We regularly have corrupt ARCHIVE engine tables that fail to repair:
================================
MariaDB [(none)]> database;
Database changed
MariaDB [database]> repair table table;
+-----------------------------+--------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-----------------------------+--------+----------+----------+
| database.table | repair | error | Corrupt |
+-----------------------------+--------+----------+----------+
1 row in set (3.92 sec)

MariaDB [database]>
================================

I'm unsure what causes the corruption, but a repair shouldn't be failing I presume?

I'll upload a copy of an example table to FTP.

Sergei Golubchik (sergii) wrote :

on irc:
<dreas> Basically if you kill MariaDB whilst it's writing to an archive table, you can throw it away afterwards

Changed in maria:
milestone: none → 5.2
importance: Undecided → Medium
Elena Stepanova (elenst) wrote :

Hi Dreas,

There seems to be a workaround which allows to fix the corrupted tables: run REPAIR TABLE <tablename> EXTENDED twice on the table. See the output below (`t` it's pretty much a copy of one of your tables).

Both 'EXTENDED' and 'twice' is important. The second attempt of REPAIR .. EXTENDED takes much longer, and after that the table starts being usable again. I've seen the same effect on other tables you uploaded. Please try it out.

MariaDB [test]> check table t;
+--------+-------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------+-------+----------+----------+
| test.t | check | error | Corrupt |
+--------+-------+----------+----------+
1 row in set (11.52 sec)

MariaDB [test]> repair table t;
+--------+--------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------+--------+----------+----------+
| test.t | repair | error | Corrupt |
+--------+--------+----------+----------+
1 row in set (10.51 sec)

MariaDB [test]> check table t;
+--------+-------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------+-------+----------+----------+
| test.t | check | error | Corrupt |
+--------+-------+----------+----------+
1 row in set (1.65 sec)

MariaDB [test]> insert into t select * from corrupt_archive_tables.t1;
ERROR 1034 (HY000): Incorrect key file for table 't'; try to repair it

MariaDB [test]> repair table t extended;
+--------+--------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------+--------+----------+----------+
| test.t | repair | status | OK |
+--------+--------+----------+----------+
1 row in set (9.88 sec)

MariaDB [test]> check table t;
+--------+-------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------+-------+----------+----------+
| test.t | check | error | Corrupt |
+--------+-------+----------+----------+
1 row in set (6.59 sec)

MariaDB [test]> repair table t extended;
+--------+--------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------+--------+----------+----------+
| test.t | repair | status | OK |
+--------+--------+----------+----------+
1 row in set (40.35 sec)

MariaDB [test]> check table t;
+--------+-------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------+-------+----------+----------+
| test.t | check | status | OK |
+--------+-------+----------+----------+
1 row in set (6.60 sec)

MariaDB [test]> insert into t select * from corrupt_archive_tables.t1;
Query OK, 700000 rows affected (23.28 sec)
Records: 700000 Duplicates: 0 Warnings: 0

Elena Stepanova (elenst) wrote :

Hi Monty,

from IRC:
montywi: if you could just do a quick test of trying to repair an archive table that is 'too short' and check what happens...
montywi: ie, does it abort repair or does it get back most of the rows

One observation that seems to be an error rather than a limitation (I already mentioned it in the previous comment, will put in more details now, and with test SQL to reproduce instead of tables uploaded to our FTP).

After an archive table gets corrupted due to server being killed while writing into the table, REPAIR TABLE does not help, it leaves the table in the same corrupted state, no matter how many times we try. However, REPAIR TABLE .. EXTENDED does help, but only if it's executed twice.
I don't know why REPAIR does not work but EXTENDED does, much less why it requires two iterations, but it doesn't look right.
Noticeably, the second REPAIR..EXTENDED iteration takes much longer, so something definitely goes on there.

In the manual test below steps are described as comments, and actual SQL as commands, so you can copy-paste it into your client in two chunks (before and after server restart).

I tried MariaDB 5.2.10 and MariaDB 5.5.21, got the same results on both.
On MySQL 5.5.21, however, the first REPAIR (normal, not EXTENDED) returns OK, and all consequent commands too.

# Scenario to reproduce:

# start server, no specific parameters needed;
# create and populate a MyISAM table which we'll be reading from, as below:

# BEFORE server restart

DROP TABLE IF EXISTS t1, t2;
CREATE TABLE t1 ( f1 CHAR(16) ) ENGINE=MyISAM;
INSERT INTO t1 VALUES ('abc'),('def'),('ghi'),('jkl');

# create and start populating an archive table, as below:
CREATE TABLE t2 ( f1 CHAR(16) ) ENGINE=ARCHIVE;
INSERT INTO t2 SELECT a.* FROM t1 a, t1 b, t1 c, t1 d, t1 e, t1 f, t1 g, t1 h, t1 i, t1 j, t1 k, t1 l, t1 m;

# End of "BEFORE server restart"

##########################
# while the insert is running, kill the server with kill -9
# (we insert ~65M rows, so there will be time)
# start the server again (no complaints in error log about archive tables)
##########################

# AFTER server restart:

CHECK TABLE t2;
# It says the table is corrupted.
# You can run it several times if you wish, it's still corrupted.

REPAIR TABLE t2;
# Again, it says the table is corrupted,
# and you can run it several times, it's still corrupted.

OPTIMIZE TABLE t2;
# It says "Unknown - internal error 145 during operation"

# 145 is "table marked as crashed", so it's not wrong.
# Running it several times doesn't help, either.

REPAIR TABLE t2 EXTENDED;
# It says OK, but if you run CHECK TABLE now, it's still corrupted:

CHECK TABLE t2;
# (says "corrupt")

# Run repair .. .extended again (takes much longer):

REPAIR TABLE t2 EXTENDED;
# (says "OK")

# And now CHECK TABLE suddenly says it's OK, too:

CHECK TABLE t2;
# (says OK)

# And it's usable again:

SELECT COUNT(*) FROM t2;
# Returns some count>0;

SELECT * FROM t2 LIMIT 1;
# Returns a row

Changed in maria:
assignee: nobody → Michael Widenius (monty)
tags: added: archive corrupt
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers