Percona Server with XtraDB

Percona Server Assertion Failure in ha_innodb.cc

Reported by Blake Harps on 2012-12-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server
Undecided
Unassigned
5.5
Undecided
Unassigned

Bug Description

Posted this on the Percona forum and was told to post this over here. Quick background:

Running Percona Server 5.5.22 with over 2500 databases (we're a SaaS provider) with ~100 tables each. I'm continuing to see the assertion failures pop up. 5 days will go by without issue, taking both reads and writes, but then it'll crash. Then after recovery it will immediately crash 2 days later. Rinse/repeat. I've restored from backups (ec2-consistent-snapshot on AWS) and caught up from the other master via replication, but the issue did not go away.

Error log:

121202 1:38:54 InnoDB: Assertion failure in thread 140707766650624 in file ha_innodb.cc line 4220
InnoDB: Failing assertion: share->idx_trans_tbl.index_count == mysql_num_index
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to x
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB:
InnoDB: about forcing recovery.
01:38:54 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=136
max_threads=500
thread_count=16
connection_count=16
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1126889 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x88dd16c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff91472be58 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7a3ec5]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x67f894]
/lib64/libpthread.so.0(+0xf4a0)[0x7ffbc66644a0]
/lib64/libc.so.6(gsignal+0x35)[0x7ffbc581e885]
/lib64/libc.so.6(abort+0x175)[0x7ffbc5820065]
/usr/sbin/mysqld[0x7fbe18]
/usr/sbin/mysqld(_ZN7handler7ha_openEP5TABLEPKcii+0x3e)[0x6818ae]
/usr/sbin/mysqld(_ZN12ha_partition4openEPKcij+0x36d)[0x95553d]
/usr/sbin/mysqld(_ZN7handler7ha_openEP5TABLEPKcii+0x3e)[0x6818ae]
/usr/sbin/mysqld(_Z21open_table_from_shareP3THDP11TABLE_SHAREPKcjjjP5TABLEb+0x58c)[0x6001dc]
/usr/sbin/mysqld(_Z10open_tableP3THDP10TABLE_LISTP11st_mem_rootP18Open_table_context+0xc53)[0x5535c3]
/usr/sbin/mysqld(_Z11open_tablesP3THDPP10TABLE_LISTPjjP19Prelocking_strategy+0x486)[0x5542d6]
/usr/sbin/mysqld(_Z20open_and_lock_tablesP3THDP10TABLE_LISTbjP19Prelocking_strategy+0x44)[0x554d24]
/usr/sbin/mysqld[0x5830a4]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1216)[0x586a26]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x333)[0x58a323]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x15b2)[0x58b982]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0xd7)[0x623a97]
/usr/sbin/mysqld(handle_one_connection+0x51)[0x623bd1]
/lib64/libpthread.so.0(+0x77f1)[0x7ffbc665c7f1]
/lib64/libc.so.6(clone+0x6d)[0x7ffbc58d170d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7ff73ac24930): is an invalid pointer
Connection ID (thread ID): 5798290
Status: NOT_KILLED

The manual page at x contains
information that should help you find out what is causing the crash.
121202 01:38:57 mysqld_safe Number of processes running now: 0

I've tried everything to reproduce the issue, but even with the general log on, I can't find the exact statement that crashes the server. Next step would be to dump network traffic and then analyze once it crashes.

Blake Harps (e-b0ake-d) wrote :

Upgraded to 5.5.28 to check whether the issue still occurs. I will update this ticket when either the server crashes again or it survives a full week.

If it continues to crash, I would be interested in investing in a support contract to get this looked at.

Blake Harps (e-b0ake-d) wrote :

Upgrade to 5.5.28 did not fix the issue, as server continues to crash every few days under load.

121223 8:30:11 InnoDB: Assertion failure in thread 140082639791872 in file ha_
innodb.cc line 4360
InnoDB: Failing assertion: share->idx_trans_tbl.index_count == mysql_num_index
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
08:30:11 UTC - mysqld got signal 6 ;

I've tried emailing in for a price quote on getting this issue looked at, but haven't received a response back.

Blake Harps <email address hidden> writes:
> I've tried emailing in for a price quote on getting this issue looked
> at, but haven't received a response back.

FYI I've pinged people internally on this.

--
Stewart Smith

Ovais Tariq (ovais-tariq) wrote :

Hi Blake,

I have looked at the entries from the errorlog you have posted, and the relevant parts of code. InnoDB is crashing inside the call to the function "innobase_build_index_translation". In simple terms, this function builds a translation table that maps indexes defined on a table at the MySQL server level to the indexes defined at the InnoDB level. The function raises an assertion when the number of indexes defined at the MySQL server level does not match the actual number of indexes in the InnoDB table. The above function that builds the translation table is called from inside the function ha_innobase::open which is executed when a an InnoDB table is accessed by a user query or an administrative query. Note that when indexes are defined both MySQL server and the InnoDB storage engine track the indexes separately.

At the moment, I think the reason why there could be an inconsistency between the indexes reported by InnoDB and the indexes MySQL server thinks should be present, could be because of data corruption. I suggest running CHECK TABLE to check if there are any corrupt tables. Although I notice that you have a lot of tables so this could be quite cumbersome. You mentioned that you had general log turned on when the crash happened. The general log logs date and time with each query, so we can do is filter out the queries that were executed at the time of the crash (as reported in the errorlog). Then run CHECK TABLE on the tables those queries access.

Blake Harps (e-b0ake-d) wrote :

Hey Ovais,

I've attempted to find a good way to audit indexes that exist in MySQL vs those reported by InnoDB. It's definitely a large undertaking due to the number of tables we have. Querying the information_schema INNODB tables for index information proved extremely resource intensive and ended up locking up the replication thread and eventually the entire server instance. I think I could pull a lot of that data out onto a different test box and analyze it though.

Any other ideas for comparing indexes?

Ovais Tariq (ovais-tariq) wrote :

Hi Blake,

You do not need to audit indexes that exist in MySQL vs those reported by InnoDB manually. This is done automatically by InnoDB whenever any table is opened. As you can see in the errorlog entry, InnoDB automatically crashed itself because of an assertion related to indexes audit failed. This is not a situation that should happen and the only reason this could happen is because of data corruption in a particular table. So again you do not need to audit the indexes manually, what you would need to do is to find the table that is corrupted. Since you already have a general log from during the time of one of the crashes, that could be used to find the table in question. You would need to check the queries in the general-log that were executed at the time the crash is reported in the errorlog. And then see what tables were involved in those queries and run CHECK TABLE on those tables.

Blake Harps (e-b0ake-d) wrote :
Download full text (4.0 KiB)

We survived for almost 9 days without a crash, as I was running through and checking tables on a separate slave to try and isolate the bad table(s), but the server ended up dying yesterday morning with a different assertion error that mirrors what you anticipated:

130106 6:25:34 [ERROR] Index PRIMARY of brandX_affiliate/stats_daily#P#p90 has 4 columns unique inside InnoDB, but MySQL is asking statistics for 5 columns. Have you mixed up .frm files from different installations? See http://dev.mysql.
com/doc/refman/5.5/en/innodb-troubleshooting.html

130106 6:25:34 InnoDB: Assertion failure in thread 139967370213120 in file ha_innodb.cc line 6845
InnoDB: Failing assertion: ut_strcmp(index->name, key->name) == 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
06:25:34 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=144
max_threads=500
thread_count=13
connection_count=13
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1126896 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x88d0c5e0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f4cb161ae58 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7ab8b5]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x6874b4]
/lib64/libpthread.so.0(+0xf4a0)[0x7f4f6b7884a0]
/lib64/libc.so.6(gsignal+0x35)[0x7f4f6a942885]
/lib64/libc.so.6(abort+0x175)[0x7f4f6a944065]
/usr/sbin/mysqld[0x81c984]
/usr/sbin/mysqld[0x82496b]
/usr/sbin/mysqld[0x82570d]
/usr/sbin/mysqld(_ZN7handler7ha_openEP5TABLEPKcii+0x3e)[0x6898ce]
/usr/sbin/mysqld(_ZN12ha_partition4openEPKcij+0x565)[0x9662f5]
/usr/sbin/mysqld(_ZN7handler7ha_openEP5TABLEPKcii+0x3e)[0x6898ce]
/usr/sbin/mysqld(_Z21open_table_from_shareP3THDP11TABLE_SHAREPKcjjjP5TABLEb+0x58c)[0x6049bc]
/usr/sbin/mysqld(_Z10open_tableP3THDP10TABLE_LISTP11st_mem_rootP18Open_table_context+0xc53)[0x556713]
/usr/sbin/mysqld(_Z11open_tablesP3THDPP10TABLE_LISTPjjP19Prelocking_strategy+0x4
8d)[0x55742d]
/usr/sbin/mysqld(_Z20open_and_lock_tablesP3THDP10TABLE_LISTbjP19Prelocking_strategy+0x44)[0x557e74]
/usr/sbin/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4...

Read more...

Assuming that you've restored tables/data eventually, do you still see errors like this:

130106 6:25:34 [ERROR] Index PRIMARY of brandX_affiliate/stats_daily#P#p90 has 4 columns unique inside InnoDB, but MySQL is asking statistics for 5 columns.

and assertions mentioned in this report with recent Percona Server 5.5.x versions?

Changed in percona-server:
status: New → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for Percona Server because there has been no activity for 60 days.]

Changed in percona-server:
status: Incomplete → Expired
Launchpad Janitor (janitor) wrote :

[Expired for Percona Server 5.5 because there has been no activity for 60 days.]

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers