MariaDB Crash executing user-defined variables on binlog

Bug #1037711 reported by DavidDucos
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
New
Undecided
Unassigned

Bug Description

We upgrade from 5.2.5-MariaDB-mariadb99-log to 5.5.25-MariaDB-log on a Slave. The master is 5.2.5-MariaDB-mariadb99-log.
After execute "start slave;", the slave crash.
The binlog execute until reaches a set of a user-defined variable:

# at 10826
#120816 11:02:20 server id 2101 end_log_pos 845941273 Query thread_id=468834444 exec_time=0 error_code=0
SET TIMESTAMP=1345129340/*!*/;
BEGIN
/*!*/;
# at 10905
#120816 11:02:20 server id 2101 end_log_pos 845941320 User_var
SET @`lastId`:=7521389/*!*/;

After recover, it continuous crash when must execute a set of a user-defined variable:

120816 12:56:39 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 5.5.25-MariaDB-log
key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=1
max_threads=1002
thread_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4249215 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x2aaafc016470
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x41cba0b8 thread_stack 0x48000
??:0(my_print_stacktrace)[0xa8783e]
??:0(handle_fatal_signal)[0x6d048c]
??:0(??)[0x3e8c40eb70]
??:0(THD::cleanup_after_query())[0x54a441]
??:0(Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned int))[0x7a7bb9]
??:0(apply_event_and_update_pos(Log_event*, THD*, Relay_log_info*))[0x50ca52]
??:0(exec_relay_log_event(THD*, Relay_log_info*))[0x514710]
??:0(handle_slave_sql)[0x515a24]
??:0(??)[0x3e8c40673d]
??:0(??)[0x3e8bcd44bd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x2aaafc01c2f3): is an invalid pointer
Connection ID (thread ID): 4
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Revision history for this message
DavidDucos (david-ducos) wrote :

I confirm this on versions:
5.2.5-MariaDB-mariadb99-log
5.3.2-MariaDB-beta-mariadb102-log

Steps to reproduce:
1- Install on DB-MASTER 5.2.5-MariaDB-mariadb99-log or 5.3.2-MariaDB-beta-mariadb102-log
2- Install on DB-SLAVE 5.5.25-MariaDB-log
3- Replicate DB-SLAVE from DB-MASTER
4- create a table
5- Insert data like this:
insert into DB.a values (1);
insert into DB.a values (2);
insert into DB.a values (3);
set @ant=4;
insert into DB.a values (@ant);
6- Replication fails and mysqld crashes

Revision history for this message
Elena Stepanova (elenst) wrote :

Hi,

Have there been any versions that you tried and did NOT see the problem?
This information could help us to analyze the issue faster.

Thank you.

Revision history for this message
DavidDucos (david-ducos) wrote :

Hi, I test only 5.2.5-MariaDB-mariadb99-log and 5.3.2-MariaDB-beta-mariadb102-log.

Revision history for this message
Elena Stepanova (elenst) wrote :

Hi,

I haven't been able to reproduce the failure so far (your set of commands works fine for me on a 5.3.2 => 5.5.25 setup).
Could you please attach your binary log file from the short test that you ran? It should be tiny, since you only have to run a few commands.

Thank you.

Revision history for this message
DavidDucos (david-ducos) wrote :
Download full text (4.1 KiB)

This is the mysqlbinlog of the relay-log :
# mysqlbinlog *bin.0* --base64-output=DECODE-ROWS -vv
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#120816 15:30:33 server id 202 end_log_pos 245 Start: binlog v 4, server v 5.5.25-MariaDB-log created 120816 15:30:33
# at 245
#120816 15:30:47 server id 202 end_log_pos 288 Rotate to slave-bin.000003 pos: 4
DELIMITER ;
DELIMITER /*!*/;
# at 4
#120816 15:30:47 server id 202 end_log_pos 245 Start: binlog v 4, server v 5.5.25-MariaDB-log created 120816 15:30:47
# at 245
#691231 19:00:00 server id 201 end_log_pos 0 Rotate to slave-bin.000001 pos: 481
# at 288
#120816 15:12:12 server id 201 end_log_pos 0 Start: binlog v 4, server v 5.3.2-MariaDB-beta-mariadb102-log created 120816 15:12:12
# at 529
#120816 15:31:34 server id 201 end_log_pos 545 Query thread_id=7 exec_time=0 error_code=0
SET TIMESTAMP=1345145494/*!*/;
SET @@session.pseudo_thread_id=7/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=192,@@session.collation_connection=192,@@session.collation_server=192/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 593
#120816 15:31:34 server id 201 end_log_pos 657 Query thread_id=7 exec_time=0 error_code=0
SET TIMESTAMP=1345145494/*!*/;
insert into DBA.olx_items_to_cleanup values (1,2,1,1)
/*!*/;
# at 705
#120816 15:31:34 server id 201 end_log_pos 684 Xid = 24
COMMIT/*!*/;
# at 732
#120816 15:32:08 server id 201 end_log_pos 748 Query thread_id=7 exec_time=0 error_code=0
SET TIMESTAMP=1345145528/*!*/;
BEGIN
/*!*/;
# at 796
#120816 15:32:08 server id 201 end_log_pos 792 User_var
SET @`ant`:=4/*!*/;
# at 840
#120816 15:32:08 server id 201 end_log_pos 907 Query thread_id=7 exec_time=0 error_code=0
SET TIMESTAMP=1345145528/*!*/;
insert into DBA.olx_items_to_cleanup values (2,2,1,@ant)
/*!*/;
# at 955
#120816 15:32:08 server id 201 end_log_pos 934 Xid = 28
COMMIT/*!*/;
DELIMITER ;
DELIMITER /*!*/;
# at 4
#120816 15:32:11 server id 202 end_log_pos 245 Start: binlog v 4, server v 5.5.25-MariaDB-log created 120816 15:32:11
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

This is the show slave status:

mysql [olx-staging-dc][db1-tester] ((none)) > show slave status \G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 192.168.100.41
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: slave-bin.000001
          Read_Master_Log_Pos: 934
               Relay_Log_File: slave-bin.000003
                Relay_Log_Pos: 732
        Relay_Master_Log_File: slave-bin.000001
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replic...

Read more...

Revision history for this message
DavidDucos (david-ducos) wrote :

Binlog and Relaylogs have the same name. They are divided by the folder.

Revision history for this message
DavidDucos (david-ducos) wrote :

On test 5.3 -> 5.5:
Master:
Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 5.3 (Final)
Slave:
Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 5.6 (Final)
On test 5.2 -> 5.5:
Master
Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 5.3 (Final)
Slave:
Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 5.3 (Final)

Revision history for this message
DavidDucos (david-ducos) wrote :

I think that this issue only affects at 64bit platforms
On my desktop machine (Linux 3.2.0-29-generic-pae #46-Ubuntu SMP Fri Jul 27 17:25:43 UTC 2012 i686 i686 i386 GNU/Linux) works fine between 5.2.5 -> 5.5.25
Have you tested the x86_64 packages?

Revision history for this message
DavidDucos (david-ducos) wrote :

I made another test. I install MariaDB 5.5.25 from yum repository and now it seems to work. So, this issue affects only to the rpm version.

Revision history for this message
Elena Stepanova (elenst) wrote :

Hi,

Yes, I tried the steps you suggested on 5.3.2 => 5.5.25 Linux binary tarballs, both x86_64 and i686, with no luck.

In the archive you attached, MASTER binary logs are empty.
At the same time, SHOW SLAVE STATUS output you quoted points at a master binary log 000001 pos 934, but there is no such position in MASTER/slave-bin.000001.
So, where do the changes to the slave come from?

Also, lets reconcilie our algorithms to reproduce, just to be sure.

Do I understand correctly that the following set of commands executed on MASTER (5.2.5 or 5.3.2) makes your SLAVE (5.5.25) crash?
(It is the same set that you previously suggested, I only added the schema and table creation).

create database if not exists DB;
drop table if exists DB.a;
create table DB.a (i int) engine=MyISAM;
insert into DB.a values (1);
insert into DB.a values (2);
insert into DB.a values (3);
set @ant=4;
insert into DB.a values (@ant);

If it does crash your slave, please send your my.cnf (or whatever configuration files and/or command-line options you use) for both master and slave.

If it does not crash your slave, could you please point out at a difference between these steps and the way you reproduce the issue?

Thank you.

Revision history for this message
DavidDucos (david-ducos) wrote :

Master binlog

Revision history for this message
DavidDucos (david-ducos) wrote :

I attached the master binlog. Before, I uploaded a old file (Sorry!).

The sequence of commands is correct and it crash my slave. As I tell you before, I install via yum and now it is working. I tried to reproduce once again the problem, erasing MariaDB and installing via .rpm files but now the slave is not crashing.

So, we are going to upgrade another server to MariaDB 5.5.25 via yum and see if this issue appears.

I will keep you informed.

Revision history for this message
Elena Stepanova (elenst) wrote :

Did you happen to backup your cnf files before the consequent re-installations? My best guess so far is that there was some peculiar configuration which got overridden after you re-installed... Not that a peculiar configuration is an excuse for a crash, but if it was the reason, it might be quite difficult to guess what caused the problem without seeing the config files.

Revision history for this message
DavidDucos (david-ducos) wrote :

The first time, after install, we recover the config files that were overridden.

We re kick the db with 5.2.5 and decide to upgrade again to 5.5.25 and fails again setting a user variables.

We change the config files to the minimal configuration and it didn't work. It keeps crashing.

Revision history for this message
DavidDucos (david-ducos) wrote :
Revision history for this message
DavidDucos (david-ducos) wrote :

Can you test the procedure again with binlog_format = 'ROW'; ?
ie:
SET SESSION binlog_format = 'ROW';
create database if not exists DB;
drop table if exists DB.a;
create table DB.a (i int) engine=MyISAM;
insert into DB.a values (1);
insert into DB.a values (2);
insert into DB.a values (3);
set @ant=4;
insert into DB.a values (@ant);

Revision history for this message
DavidDucos (david-ducos) wrote :
Revision history for this message
DavidDucos (david-ducos) wrote :

I think that is a bug, I have reproduced in QA and Live environments. I will upload the list of package installed, so you can tell me which ones differ from your test and my test.

Revision history for this message
DavidDucos (david-ducos) wrote :

We execute gdb to debug mariaDB. This is what we found:

(gdb) continue
Continuing.
[New Thread 0x41923940 (LWP 12579)]
[New Thread 0x4196b940 (LWP 12580)]
[New Thread 0x419b3940 (LWP 12581)]
[New Thread 0x4b07e940 (LWP 12582)]
[New Thread 0x4b0c6940 (LWP 12583)]
[Thread 0x41923940 (LWP 12579) exited]
[Thread 0x419b3940 (LWP 12581) exited]
[New Thread 0x419b3940 (LWP 12584)]
[New Thread 0x41923940 (LWP 12585)]
[Thread 0x4b07e940 (LWP 12582) exited]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x41923940 (LWP 12585)]
0x000000000054a441 in THD::cleanup_after_query() ()
(gdb)
Continuing.
Detaching after fork from child process 12840.
[Thread 0x4b0c6940 (LWP 12583) exited]

Program received signal SIGSEGV, Segmentation fault.
0x00000031b800bd72 in pthread_kill () from /lib64/libpthread.so.0
(gdb)
Continuing.
[Thread 0x419b3940 (LWP 12584) exited]
[Thread 0x41923940 (LWP 12585) exited]
[Thread 0x42428940 (LWP 12516) exited]
[Thread 0x40bac940 (LWP 12517) exited]
[Thread 0x42e29940 (LWP 12518) exited]
[Thread 0x4382a940 (LWP 12519) exited]
[Thread 0x4422b940 (LWP 12520) exited]
[Thread 0x44c2c940 (LWP 12521) exited]
[Thread 0x4562d940 (LWP 12522) exited]
[Thread 0x4602e940 (LWP 12523) exited]
[Thread 0x46a2f940 (LWP 12524) exited]
[Thread 0x47430940 (LWP 12525) exited]
[Thread 0x47e31940 (LWP 12526) exited]
[Thread 0x48832940 (LWP 12528) exited]
[Thread 0x49233940 (LWP 12529) exited]
[Thread 0x49c34940 (LWP 12530) exited]
[Thread 0x4a635940 (LWP 12531) exited]
[Thread 0x418db940 (LWP 12532) exited]
[Thread 0x4b036940 (LWP 12533) exited]
[Thread 0x40bf4940 (LWP 12536) exited]
[Thread 0x4196b940 (LWP 12580) exited]

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

# rpm -qf /lib64/libpthread.so.0
glibc-2.5-81.el5_8.4

Revision history for this message
DavidDucos (david-ducos) wrote :

In another execution of gdb, I see:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x4b902940 (LWP 16432)]
0x00000031b7878082 in strcmp () from /lib64/libc.so.6
(gdb) Quit
(gdb)
Continuing.
[Thread 0x416c1940 (LWP 16437) exited]
[Thread 0x40a5c940 (LWP 16440) exited]
[New Thread 0x41799940 (LWP 16448)]
[Thread 0x40900940 (LWP 16441) exited]
[Thread 0x41751940 (LWP 16427) exited]
Detaching after fork from child process 16454.
[Thread 0x417e1940 (LWP 16435) exited]
[Thread 0x40334940 (LWP 15236) exited]
[New Thread 0x417e1940 (LWP 16455)]
[New Thread 0x40334940 (LWP 16456)]
[New Thread 0x41751940 (LWP 16457)]
[Thread 0x41799940 (LWP 16448) exited]

Program received signal SIGSEGV, Segmentation fault.
0x00000031b800bd72 in pthread_kill () from /lib64/libpthread.so.0

Revision history for this message
DavidDucos (david-ducos) wrote :

I'm continuing the debugging:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x40820940 (LWP 17747)]
0x0000000000ab6074 in strnmov ()
(gdb) bt
#0 0x0000000000ab6074 in strnmov ()
#1 0x0000000000532035 in create_table_def_key(THD*, char*, TABLE_LIST const*, bool) ()
#2 0x000000000053ae88 in open_table(THD*, TABLE_LIST*, st_mem_root*, Open_table_context*) ()
#3 0x000000000053d89e in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) ()
#4 0x000000000053e5ce in open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) ()
#5 0x000000000056a19a in mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool) ()
#6 0x00000000005827dd in mysql_execute_command(THD*) ()
#7 0x0000000000583a41 in mysql_parse(THD*, char*, unsigned int, Parser_state*) ()
#8 0x00000000007a7bb9 in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned int) ()
#9 0x000000000050ca52 in apply_event_and_update_pos(Log_event*, THD*, Relay_log_info*) ()
#10 0x0000000000514710 in exec_relay_log_event(THD*, Relay_log_info*) ()
#11 0x0000000000515a24 in handle_slave_sql ()
#12 0x00000031b800677d in start_thread () from /lib64/libpthread.so.0
#13 0x00000031b78d33ed in clone () from /lib64/libc.so.6

Revision history for this message
Elena Stepanova (elenst) wrote :

Hi,

Sorry, I don't quite understand what you meant by

> We re kick the db with 5.2.5 and decide to upgrade again to 5.5.25 and fails again setting a user variables.

> We change the config files to the minimal configuration and it didn't work. It keeps crashing.

But you already had 5.5.25 on the slave, which was crashing. So what changed? You upgraded the master and it started crashing?
The variables you attached -- are they from the master or from the slave?
Can you attach another set (if these were from the master, then attach from the slave, and vice versa)?

You are saying that you can easily reproduce the crash using the test you provided, but the error log seems to be from your non-simpified production scenario. Could you please:

- start fresh servers on new clean datadirs;
- enable general log and error log;
- reproduce the problem;
- pack the datadirs and the logs and upload them along with cnf files and server startup command lines to ftp://ftp.askmonty.org/ (the datadirs should contain binlogs and the data itself; since you're not using InnoDB tables in the scenario, you can drop InnoDB files so the directories should be tiny).

Thank you.

Revision history for this message
DavidDucos (david-ducos) wrote :

Answers:
1- But you already had 5.5.25 on the slave, which was crashing. So what changed? You upgraded the master and it started crashing?
We delete the slave with 5.5.25. We started again and it didn't work.
2- The variables you attached -- are they from the master or from the slave?
They are from the slave.
3- Can you attach another set (if these were from the master, then attach from the slave, and vice versa)?
No, it is not necessary because we suspend the project.

We think and we are pretty sure that MariaDB 5.5.25 and our version of CentOS 5 can not work together. We are planning to upgrade our OS. Do you need the execution of "rpm -qa" to keep analyzing?

Revision history for this message
Elena Stepanova (elenst) wrote :

Hi David,

No, thanks, rpm -qa won't be necessary at this point.
Please let us know whether the upgrade made the problem disappear.

Meanwhile, we will watch similar issues, if we get some more cases, we might be able to chase down the root of the problem. I also believe there was a somewhat similar bug report in the upstream, if it is so indeed, we might be able to see it in the bug system after the fix is released.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.