core dump on gcomm://nonexistanthost

Bug #820348 reported by Henrik Ingo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
Fix Released
Medium
Teemu Ollakka
0.8
Fix Released
Medium
Teemu Ollakka
1.x
Fix Released
Medium
Teemu Ollakka

Bug Description

Hi. I got a core dump when starting a server and trying to connect to a nonexisting host (typo, of course) in the gcomm:// connection string.

# ./mysql-galera -g gcomm://nonexistinghost start
Starting mysqld instance with data dir /data/b/mysql-5.1.53-galera-0.8.0b-x86_64/mysql/var and listening at port 3306 and socket /data/b/mysql-5.1.53-galera-0.8.0b-x86_64/mysql/var/mysqld.sock...... Done (PID:26928)
Waiting for wsrep_ready.ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104
./mysql-galera: line 129: 26928 Segmentation fault (core dumped) nohup $VALGRIND $MYSQLD $DEFAULTS_OPTION --user="$MYSQLD_USER" --basedir="$MYSQL_BASE_DIR" --datadir="$MYSQL_DATA_DIR" --pid-file="$MYSQL_PID" --port=$MYSQL_PORT --socket=$MYSQL_SOCKET --skip-external-locking --log_error=$err_log $MYSQLD_OPTS $INNODB_OPTS $WSREP_OPTS $DEBUG_OPTS $LOGGING_OPTS $RBR_OPTS $PLUGIN_OPTS > /dev/null 2>> $err_log
 Done

Let me know if you can reproduce this just like that, or if I should dig into the core dump and log files.

Btw, with all of your different launchpad projects, it's a bit confusing to guess where to report this bug.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Henrik,

You seem to be using 0.8.0 release still. Please upgrade to 0.8.1. With it I have the following in the error log:

110803 17:17:36 [ERROR] WSREP: failed to open gcomm backend connection: 2: getaddrinfo failed with error code -2 for tcp://nonexisting:4567: 2 (No such file or directory)
  at galerautils/src/gu_resolver.cpp:resolve():475
110803 17:17:36 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():180: Failed to open backend connection: -2 (No such file or directory)
110803 17:17:36 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1215: Failed to open channel 'my_test_cluster' at 'gcomm://nonexisting': -2 (No such file or directory)
110803 17:17:36 [ERROR] WSREP: gcs connect failed: No such file or directory
110803 17:17:36 [ERROR] WSREP: wsrep::connect() failed: 6
110803 17:17:36 [ERROR] Aborting

110803 17:17:36 [Note] WSREP: Service disconnected.
110803 17:17:37 [Note] WSREP: Some threads may fail to exit.
110803 17:17:37 InnoDB: Starting shutdown...
110803 17:17:42 InnoDB: Shutdown completed; log sequence number 10680766899
110803 17:17:42 [Note] /tmp/galera1/mysql/libexec/mysqld: Shutdown complete

If you think that this is a correct behaviour, I guess we can close the bug.

As for the bug reporting, I thought the mailing list might be a good point to report bugs which you're unsure about where they belong. But in this case you guessed right, it would have been Galera bug.

Revision history for this message
Henrik Ingo (hingo) wrote :

I tried with 0.8.1 demo package and it didn't happen again. You may close this bug then.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

fixed in release 0.8.1

Changed in galera:
status: New → Fix Released
Revision history for this message
Henrik Ingo (hingo) wrote :

I got this again in 0.8.2

110905 12:25:19 [Note] WSREP: Start replication
110905 12:25:19 [Warning] WSREP: state file not found: grastate.dat
110905 12:25:19 [Note] WSREP: Assign initial position for certification: -1, protocol version: 1
110905 12:25:19 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
110905 12:25:19 [Note] WSREP: protonet asio version 0
110905 12:25:19 [Note] WSREP: backend: asio
110905 12:25:19 [Note] WSREP: GMCast version 0
terminate called after throwing an instance of 'gu::Exception'
  what(): getaddrinfo failed with error code -2 for tcp://cluster127:4567: 0 (Success)
  at galerautils/src/gu_resolver.cpp:resolve():477
110905 12:25:19 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
...

Hostname "cluster127" was not defined on this host.

I have the corefile saved if you want it.

Changed in galera:
status: Fix Released → New
Revision history for this message
Teemu Ollakka (teemu-ollakka) wrote :

This is probably just a case of uncaught exception. Henrik, could you paste stack trace from that core?

Revision history for this message
Henrik Ingo (hingo) wrote :

You mean like this.

stack_bottom = (nil) thread_stack 0x40000
/data/mysql-5.1.57-galera-0.8.2-x86_64/mysql/libexec/mysqld(my_print_stacktrace+0x2e) [0x8aaede]
/data/mysql-5.1.57-galera-0.8.2-x86_64/mysql/libexec/mysqld(handle_segfault+0x41c) [0x5e225c]
/lib64/libpthread.so.0 [0x3d4ea0e930]
/data/mysql-5.1.57-galera-0.8.2-x86_64/galera/lib/libgalera_smm.so [0x2aaac7769883]
/data/mysql-5.1.57-galera-0.8.2-x86_64/galera/lib/libgalera_smm.so(gcs_core_set_pkt_size+0x52) [0x2aaac7761482]
/data/mysql-5.1.57-galera-0.8.2-x86_64/galera/lib/libgalera_smm.so [0x2aaac77647a4]
/data/mysql-5.1.57-galera-0.8.2-x86_64/galera/lib/libgalera_smm.so(gcs_open+0x27a) [0x2aaac7766a5a]
/data/mysql-5.1.57-galera-0.8.2-x86_64/galera/lib/libgalera_smm.so(galera::ReplicatorSMM::connect(std::string const&, std::string const&, std::string const&)+0x232) [0x2aaac778$
/data/mysql-5.1.57-galera-0.8.2-x86_64/galera/lib/libgalera_smm.so(galera_connect+0x99) [0x2aaac779ec39]
/data/mysql-5.1.57-galera-0.8.2-x86_64/mysql/libexec/mysqld(wsrep_start_replication()+0xcf) [0x75a67f]
/data/mysql-5.1.57-galera-0.8.2-x86_64/mysql/libexec/mysqld(wsrep_init_startup(bool)+0x3a) [0x75acba]
/data/mysql-5.1.57-galera-0.8.2-x86_64/mysql/libexec/mysqld(main+0xb98) [0x5e84c8]
/lib64/libc.so.6(__libc_start_main+0xf4) [0x3d4de1d994]
/data/mysql-5.1.57-galera-0.8.2-x86_64/mysql/libexec/mysqld [0x520149]

I did just notice that even with the hostname working, I still get another uncaught exception on this host. I think it's the same I experienced in June where I have to yum groupinstall "Development Tools" to just run Galera. I was about to file a separate bug on that, but maybe this is also of symptom of that problem. The exceptions are different though.

Revision history for this message
Teemu Ollakka (teemu-ollakka) wrote :

I meant like:
    shell# gdb mysqld <corefile>
    (gdb) bt
and paste the output of that command.

Seems that the exception message shows at least one problem already:
    what(): getaddrinfo failed with error code -2 for tcp://cluster127:4567: 0 (Success)
Errno associated with the exception is zero, which can cause error situation become unnoticed later on. Seems that error reporting of gu::net::resolve() is not adequate.

 However, this exception message does not make sense with the stack trace above, are they really from the same log file?

Revision history for this message
Henrik Ingo (hingo) wrote :
Download full text (7.1 KiB)

Aha. Got a little distracted by the fact there is another mysqld in path, but this is what you mean:

[root@esitbi128lab mysql-galera]# gdb mysql/libexec/mysqld ../core.2272
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /data/mysql-5.1.57-galera-0.8.2-x86_64/mysql/libexec/mysqld...done.
[New Thread 2297]
[New Thread 2290]
[New Thread 2289]
[New Thread 2288]
[New Thread 2287]
[New Thread 2286]
[New Thread 2284]
[New Thread 2283]
[New Thread 2282]
[New Thread 2281]
[New Thread 2280]
[New Thread 2279]
[New Thread 2278]
[New Thread 2277]
[New Thread 2276]
[New Thread 2275]

warning: .dynamic section for "/usr/lib64/libstdc++.so.6" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libssl.so.6" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/usr/lib64/libgssapi_krb5.so.2" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/usr/lib64/libkrb5.so.3" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/usr/lib64/libk5crypto.so.3" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/lib64/libcrypto.so.6" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/usr/lib64/libkrb5support.so.0" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib...

Read more...

Changed in galera:
assignee: nobody → Teemu Ollakka (teemu-ollakka)
importance: Undecided → Medium
status: New → Confirmed
Changed in galera:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.