Percona Server with XtraDB

Percona Server crashes with segmentation fault when disk quota is reached

Reported by Ovais Tariq on 2012-11-16
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server
Low
George Ormond Lorch III
5.1
Undecided
Unassigned
5.5
Low
George Ormond Lorch III

Bug Description

Percona Server crashes with segmentation fault when mysqld tries to write data to a table that would mean extending the tablespace file, and disk quota is reached outside the database directory, but on the same volume.

=== OS:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.4 LTS"

2.6.32-341-ec2 #42-Ubuntu SMP Tue Dec 6 14:56:13 UTC 2011 x86_64 GNU/Linux

=== Percona Server version
Server version: 5.5.24-55-log Percona Server (GPL), Release 26.0

-- Filesystem and mount options
/dev/sdm /vol/ebs1 xfs rw,nodiratime,relatime,nouuid,nobarrier,allocsize=2048k,logbufs=8,logbsize=32k,grpquota 0 0

=== Test case
-- create a new group and set quota of 10M for testing purposes
groupadd free2
xfs_quota -x -c 'limit -g bsoft=10m bhard=10m free2' /vol/ebs1

-- create database and change the group to free2 and setgid for the database directory so that all files created in the directory are owned by the user free2
mysql [(none)]> create database free2test;

chgrp -R free2 /vol/ebs1/mysql/free2test
chmod +s /vol/ebs1/mysql/free2test

-- create the table as follows in the database free2test
mysql [(none)]> use free2test;
Database changed
mysql [free2test]> CREATE TABLE `t1` ( `i` int(11) not null auto_increment primary key, c char(255) default 'dummy_text' ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.03 sec)

-- now create a 9M file outside of the database directory but in /vol/ebs1 where the quota is applied and change its group to free2 (the one on which quota is set)
dd if=/dev/zero of=/vol/ebs1/testfile_9M bs=9000000 count=1
chgrp free2 /vol/ebs1/testfile_9M

-- now insert values into the table till MySQL crashes
insert into t1(i) values (null);
insert into t1 select null, c from t1; <-- 11th time this is executed and MySQL crashes

The gdb backtrace of the crash is attached. The section to look at is "Thread 1 (Thread 10457)", the segmentation fault is caused inside the function btr_page_split_and_insert.

Ovais Tariq (ovais-tariq) wrote :

Is this reproducible with -
- Current 5.5 release?
- Current 5.1 release?
- Is this bug Percona Server-specific or is it reproducible with MySQL too?

Also would be useful to try reproducing this using a debug build.

Alexey Kopytov (akopytov) wrote :

I believe it applies to upstream as well. In which case, it should also have an upstream bug report linked.

Ovais Tariq (ovais-tariq) wrote :

I am able to repeat the crash with the test case provided above on both Percona Server 5.5.24 and 5.5.28-29.1 on CentOS 5.8

Ovais Tariq (ovais-tariq) wrote :

The crash is also repeatable on upstream 5.5.28 and the stack trace is exactly the same. Attached you will find the complete backtrace generated from the core file.

Based on the trace, the call stack is probably like: (in reverse order)

fsp_try_extend_data_file
fsp_reserve_free_extents
fseg_alloc_free_page_general
btr_page_alloc_low
btr_page_alloc
btr_page_split_and_insert

(with no NULL checking in btr_page_alloc and btr_page_split_and_insert -- atleast not for the return case, it is checked as

 if (new_block) {
  buf_block_dbg_add_level(new_block, SYNC_TREE_NODE_NEW);
 }

)

fsp_try_extend_data_file will fail if there is no space or quota limits (ENOSPC or EDQUOT respectively).

Non-innodb code -- my_write in my_write.c handles this with a loop till space is available (with special handling for EDQUOT) whereas os_file_write_func doesn't do that and fails if ENOSPC/EDQUOT.

Ovais Tariq (ovais-tariq) wrote :

I have reported the bug upstream: http://bugs.mysql.com/bug.php?id=67606

Ovais -

Thanks. Could you also check 5.1 (since we will start fixing it there), and if debug builds crash differently?

Ovais Tariq (ovais-tariq) wrote :

Error log of MySQl 5.5.28 crash

Ovais Tariq (ovais-tariq) wrote :

Laurynas,

I am not able to repeat the crash on MySQL 5.1.66. The exact steps when done on 5.1.66 are handled gracefully and when there is no space available the following reported:

mysql> insert into t1 select null, c from t1;
ERROR 1114 (HY000): The table 't1' is full
mysql> insert into t1 select null, c from t1;
ERROR 1114 (HY000): The table 't1' is full

The error log entries are also different:
Version: '5.1.66-community' socket: '/mnt/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL)
121117 3:06:58 InnoDB: Error: Write to file ./free2test/t1.ibd failed at offset 0 131072.
InnoDB: 16384 bytes should have been written, only -1 were written.
InnoDB: Operating system error number 122.
InnoDB: Check that your OS and file system support files of this size.
InnoDB: Check also that the disk is not full or a disk quota exceeded.
InnoDB: Error number 122 means 'Disk quota exceeded'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/operating-system-error-codes.html
121117 3:06:58 [ERROR] /usr/sbin/mysqld: The table 't1' is full
121117 3:07:13 [ERROR] /usr/sbin/mysqld: The table 't1' is full

Note that MySQL 5.5.28 reports "Operating system error number 0" while 5.1.66 reports "Operating system error number 122" which is more correct.

Ovais -
Thanks. Can you test the 5.5 debug build too?

Ovais Tariq (ovais-tariq) wrote :

Laurynas,

I am able to reproduce the crash on upstream 5.5.28 debug-build. The debug-build crashes with an assertion failure instead of segmentation fault. Attached is the gdb backtrace of the crashed 5.5.28 debug-build

Ovais Tariq (ovais-tariq) wrote :

Attached is the error log of the crashed upstream 5.5.28 debug-build

I tried this against 5.5.-28.29.1 on CentOS 6, using xfs and the sequence described above but with no crash. I properly got on the 15th insert:

mysql> insert into t1 select null, c from t1;
ERROR 1114 (HY000): The table 't1' is full

Perhaps there is some minor difference in xfs between Ubuntu, CentOS 5 and Centos 6?

I will try with upstream 5.5.28...

OK, must be some difference in xfs, MySQL 5.5.28 behaves correctly as well (unless I am missing something in the reproduction).

I will try to create a VM with Ubuntu and see if I can reproduce there...

1)

The assertion failure in #12 and #13 is the effect and not the cause of the crash.

btr0btr.c:

 new_block = btr_page_alloc(cursor->index, hint_page_no, direction,
       btr_page_get_level(page, mtr), mtr, mtr);
 new_page = buf_block_get_frame(new_block);

btr_page_alloc returns NULL and buf_block_get_frame fails on that assertion.

It falls for this assertion

 ut_ad(block);

Interestingly, PS has non-debug assertions in place there:

 ut_a(srv_pass_corrupt_table || block);

 if (srv_pass_corrupt_table && !block) {
  return(0);
 }

 ut_ad(block);

Now, the reason why that assertion is not getting triggered in MySQL/PS release builds (and instead crashing with signal 11) is because:

=====================================================

#ifdef UNIV_DEBUG
/*********************************************************************//**
Gets a pointer to the memory frame of a block.
@return pointer to the frame */
UNIV_INLINE
buf_frame_t*
buf_block_get_frame(
/*================*/
 const buf_block_t* block) /*!< in: pointer to the control block */
 __attribute__((pure));
#else /* UNIV_DEBUG */
# define buf_block_get_frame(block) (block ? (block)->frame : 0)
#endif /* UNIV_DEBUG */

==============================================================

ie. in non UNIV_DEBUG builds, it is a macro, hence the backtrace just shows only upto btr_page_split_and_insert

2) Regarding 'The table ... is full', fsp_reserve_free_extents used by btr_cur_pessimistic_insert returns that (and not proceed to btr_page_split_and_insert later). fsp_reserve_free_extents is also used by btr_page_alloc. However, fsp_reserve_free_extents doesn't always try to extend the file, it tries to reuse the free extents in the tablespace (and/or reuse free pages for small single tablespaces).
    @Ovais, can you provide the cnf file you used for the MySQL/PS instances mentioned above? (also was it a shared tablespace (spaceid=0) or single tablespace)

Ovais Tariq (ovais-tariq) wrote :

Here is the my.cnf file I used in the tests:

[client]
socket = /tmp/mysql.sock

[mysqld]
datadir = /mnt/mysql-55
user = mysql
socket = /tmp/mysql.sock

innodb_file_per_table = 1

core-file

[mysqld_safe]
core-file-size = unlimited

Ovais Tariq (ovais-tariq) wrote :

George,

Did you test with innodb_file_per_table enabled?

Ovais, oddly enough, no I did not use file_per_table which is quite odd as I almost always use it. Anyway, I retested on CentOS 6 with PS 5.5.28-29.1 debug builds (both with and without WITH_DEBUG=ON) and file_per table, also tried a few different things on what files are contained in the free2 group and get either:
"ERROR 1114 (HY000): The table 't1' is full"
from mysql if I "chgrp -R free2 /5gb/mysql/free2test; chmod +s /5gb/mysql/free2test" _after_ I create the table or I get:
121120 9:26:02 [ERROR] /usr/local/dev/ps-5.5-27444-debug/bin/mysqld: Disk is full writing './mysql-bin.000003' (Errcode: 122). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
121120 9:26:02 [ERROR] /usr/local/dev/ps-5.5-27444-debug/bin/mysqld: Retry in 60 secs. Message reprinted in 600 secs
from mysqld if I follow the test procedure exactly.

I failed miserably at getting CentOS 5 VM running and building yesterday, wow, Stewart was correct, it really does, uhh, stink...so I will try Ubuntu 10.4 lucid to try and save some time rather than fighting anymore with CentOS 5 and see if I can reproduce.

My specifics:

mount:
/dev/sdb on /5gb type xfs (rw,nodiratime,relatime,nouuid,nobarrier,grpquota)

my.cnf:
[client]
port = 3306
socket = /tmp/mysql.sock

[mysqld]
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 256M
max_allowed_packet = 1M
table_open_cache = 256
sort_buffer_size = 1M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size= 16M
thread_concurrency = 8
log-bin=mysql-bin
binlog_format=mixed
server-id = 1
innodb_data_home_dir = /5gb/mysql/free2test
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_group_home_dir = /5gb/mysql/free2test
innodb_buffer_pool_size = 256M
innodb_additional_mem_pool_size = 20M
innodb_log_file_size = 64M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_file_per_table=1
core-file

[root@localhost glorch]# ls -l /5gb/mysql/free2test/
total 149616
drwx------. 2 mysql free2 45 Nov 20 09:25 free2test
-rw-rw----. 1 mysql free2 18874368 Nov 20 09:35 ibdata1
-rw-rw----. 1 mysql free2 67108864 Nov 20 09:35 ib_logfile0
-rw-rw----. 1 mysql free2 67108864 Nov 20 09:23 ib_logfile1
-rw-rw----. 1 mysql free2 5 Nov 20 09:23 localhost.pid
drwx------. 2 mysql free2 4096 Nov 20 09:23 mysql
-rw-rw----. 1 mysql free2 27338 Nov 20 09:23 mysql-bin.000001
-rw-rw----. 1 mysql free2 126 Nov 20 09:23 mysql-bin.000002
-rw-rw----. 1 mysql free2 65536 Nov 20 09:35 mysql-bin.000003
-rw-rw----. 1 mysql free2 57 Nov 20 09:23 mysql-bin.index
drwx------. 2 mysql free2 4096 Nov 20 09:23 performance_schema
drwx------. 2 mysql free2 6 Nov 20 09:23 test

[root@localhost glorch]# ls -l /5gb/mysql/free2test/free2test/
total 9232
-rw-rw----. 1 mysql free2 65 Nov 20 09:24 db.opt
-rw-rw----. 1 mysql mysql 8578 Nov 20 09:25 t1.frm
-rw-rw----. 1 mysql mysql 9437184 Nov 20 09:35 t1.ibd

Ovais Tariq (ovais-tariq) wrote :

George,

I only get the crash when I create a file out of MySQL but within the same group on which the quota is applied. Perhaps you didn't actually run the steps in the sequence that I mentioned? And I see that you also have binlogging enabled, so one of the messages that you see "Waiting for someone to free space" is binlogging related.

I will try to run my test case on a CentOS 6 VM and let you know.

OK, I got it Ovais, was just replying when I got your response here. My directory structure was slightly different than yours. I got it under Ubuntu 10.04.4 lucid using your my.cnf and a closer approximation of your directory structure. On initial inspection, Raghavendras analysis of the issue above is spot on. I will switch back to CentOS 6 (my preferred environment) and retry there, then continue on with the fix. Thanks for all the info.

Annnd, I got it in CentOS 6 now too. Thanks a bunch guys...possible fix forthcoming.

Now, that we have verified it, we need to see what should the behaviour be in this case -- whether to return DB_OUT_OF_FILE_SPACE or to spin and wait for disk space to be freed / quota to be increased (how mysql/myisam do). In former case (DB_OUT_OF_FILE_SPACE), there is still a high probability of it crashing again with an assertion failure (ex: btr_insert_on_non_leaf_level_func -- has a ut_a(err == DB_SUCCESS) and btr_cur_pessimistic_update has a ut_a(err == DB_SUCCESS) ).

Also, error handling is required in btr_cur_pessimistic_insert since btr_cur_pessimistic_insert doesn't seem to handle it well either

......
 } else {
  *rec = btr_page_split_and_insert(cursor, entry, n_ext, mtr);
 }
......
(big_rec needs to be set NULL as well as DB_OUT_OF_FILE_SPACE needs to be returned).

Ovais Tariq (ovais-tariq) wrote :

Raghu,

Related to what you are saying that InnoDB should spin and wait for space to be available, there is something similar in upstream 5.7.0 where MySQL retries the operation several times before giving up, see this bug report: http://bugs.mysql.com/bug.php?id=54430

Download full text (11.1 KiB)

OK, so this is rather nasty, in PS 5.5.28:

btr0cur.c:btr_pessimistic_insert(1578) calls fsp_reserve_free_extents to reserve enough real database pages, expanding the space if necessary, to perform the insert.

fsp0fsp.c:fsp_reserve_free_extents(3037) has an unsigned math error when FSP_SIZE is less than FSP_FREE_LIMIT which causes remainder of function to believe that there are an extremely large number of remaining extents, so does not attempt to extend the data file: n_free_up = (size - free_limit) / FSP_EXTENT_SIZE; It returns success without actually extending data file and never detects the quota issue.

So then btr0cur.c:btr_pessimistic_insert(1627) goes on about its businesss, assuming that enough pages have been reserved to perform the insert. The index insertion mechanism _demands_ that pages have already been reserved and has no rational error handling for no space or other page/filesystem errors.

It then calls:
btr0btr.c:btr_page_split_and_insert(2577) which tries to allocate a page via btr_page_alloc (at line 2666), which fails and returns a NULL (which is never tested for). Then it calls buf_block_get_page_zip with returned NULL, and asserts.

OK, so, lets fix the math error in fsp_reserve_free_extents by changing line 3037 from:
    n_free_up = (size - free_limit) / FSP_EXTENT_SIZE;
to:
    if (size <= free_limit) {
        n_free_up = 0;
    } else if (alloc_type == FSP_UNDO) {
        n_free_up = (size - free_limit) / FSP_EXTENT_SIZE;
    }

OK, so this solves the issue for _small_ inserts, like if the test above just did something like:
    for i in `seq 2000`; do mysql -e 'INSERT INTO free2test.t1(i) VALUES (null)'; done;
But, if we follow the sequence prescribed above by doing:
    mysql -e 'INSERT INTO free2test.t1 SELECT null FROM free2test.t1'

Then we have a big problem. There is no space left to perform the rollback. If we set a simple break point at the end of fsp_reserve_free_extents on "if (success && n_pages_added > 0)" :
    b fsp0fsp.c:3088 if space > 0 && (!success || !n_pages_added)
This will stop us every time we 'detect' the error...

Here is the call stack where we first detect the space issue:
#0 fsp_reserve_free_extents (n_reserved=0x7fffdf707940, space=10, n_ext=3, alloc_type=1000000, mtr=0x7fffdf7079e0) at /home/glorch/dev/ps-5.5-27444-debug/Percona-Server/storage/innobase/fsp/fsp0fsp.c:3105
#1 0x0000000000879ecc in btr_cur_pessimistic_insert (flags=0, cursor=0x7fffdf707ed0, entry=0x7fffd000d668, rec=0x7fffdf707ec8, big_rec=0x7fffdf707ec0, n_ext=0, thr=0x7fffd000cca8, mtr=0x7fffdf7079e0)
    at /home/glorch/dev/ps-5.5-27444-debug/Percona-Server/storage/innobase/btr/btr0cur.c:1578
#2 0x0000000000960e12 in row_ins_index_entry_low (mode=<value optimized out>, index=0x7fffd0012968, entry=0x7fffd000d668, n_ext=0, thr=0x7fffd000cca8) at /home/glorch/dev/ps-5.5-27444-debug/Percona-Server/storage/innobase/row/row0ins.c:2190
#3 0x0000000000962681 in row_ins_index_entry (index=0x7fffd0012968, entry=0x7fffd000d668, n_ext=0, foreign=<value optimized out>, thr=0x7fffd000cca8) at /home/glorch/dev/ps-5.5-27444-debug/Percona-Server/storage/innobase/row/row0ins.c:2294
#4 0x00000000009631c5 in row_ins_index_e...

Ovais Tariq (ovais-tariq) wrote :

George,

I repeated the test case with innodb_file_per_table=0 but I could not get InnoDB to crash. The error log entry is same as what is written when InnoDB crashes:

121127 5:38:57 InnoDB: Error: Write to file ./ibdata1 failed at offset 0 20971520.
InnoDB: 1048576 bytes should have been written, only 196608 were written.
InnoDB: Operating system error number 0.
InnoDB: Check that your OS and file system support files of this size.
InnoDB: Check also that the disk is not full or a disk quota exceeded.
InnoDB: Error number 0 means 'Success'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/operating-system-error-codes.html

You can see that the error log entry makes reference to partial write and that the error number was 0, which is exactly what is written to the error log when InnoDB crashed. However, repeatedly executing the insert ... select only results in the following message:

121127 5:38:57 [ERROR] /usr/sbin/mysqld: The table 't1' is full
121127 5:39:04 [ERROR] /usr/sbin/mysqld: The table 't1' is full
121127 5:39:06 [ERROR] /usr/sbin/mysqld: The table 't1' is full

Could this be taken as a workaround "using single shared tablespace with innodb_file_per_table=0 does not produce a crash". May be you could test this out with your own test case that produces the crash.

Ovais, innodb_file_per_table will not change the core issue and that is that innodb does not properly reserve enough space for both the transaction modification _and_ potential rollback space at the same time, nor does it 'hold back' any space to be used exclusively for rollback.

I can make the server crash or fail gracefully by simply varying the number of rows inserted at one time which varies the both the required space to perform the inserts _and_ the space required to perform a rollback. By inserting a single row at a time, there is almost always going to be enough space left to perform a rollback when enough extents can't be reserved for the insert. By increasing the number of rows inserted at once until the balance of space reaches a point where the row size requirement is just less than the available free space is where the rollback crash will occur since then the attempt to reserve any extents to perform the rollback will fail, sending the server into a massive anxiety attack and shutting down.

So changing where the space is allocated is really just pushing peas around the plate.

See http://bugs.mysql.com/bug.php?id=43665

The money quote:
"[17 Mar 2009 9:39] Marko Mäkelä

Fixing this bug would be a major undertaking, and it might affect the performance of InnoDB as well. InnoDB assumes in many places that space is available for the undo log records. We could add a check (and a rollback) to each place where something is written to the undo log, but that would likely move the problem elsewhere, since B-tree delete operations sometimes require extra space. We could also shut down InnoDB in the event of such a failure, but that would require extra checking (possibly even locking) in the MySQL-to-InnoDB interface.

The bottom line is that the root cause (the InnoDB system tablespace being too small) cannot be fixed without a server restart. Currently, InnoDB can't be shut down and restarted independently of MySQL. The fix is too risky to be implemented in the stable releases of MySQL."

Ovais Tariq (ovais-tariq) wrote :

George,

Based on Marko's comments in this bug report http://bugs.mysql.com/bug.php?id=43665, checking for required space wherever needed in different parts of the code would lead to performance regression.
If there really would be any performance regression, I think many users would be more concerned by that rather than the fixes in tablespace management :)

I don't think "Fix Released" is a correct status for this bug. The server still crashes on full disk errors. What has been fixed is bug 1083700 (bug in reserved extent number calculation), and this one should be "Won't Fix" instead.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.