corosync segfaults on startup joining another node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
corosync (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: corosync
Architecture: amd64
Date: Thu Apr 21 12:00:35 2011
Dependencies:
adduser 3.112ubuntu1
base-files 5.0.0ubuntu20.
base-passwd 3.5.22
coreutils 7.4-2ubuntu3
debconf 1.5.28ubuntu4
debconf-i18n 1.5.28ubuntu4
debianutils 3.2.2
dpkg 1.15.5.6ubuntu4.5
findutils 4.4.2-1ubuntu1
gcc-4.4-base 4.4.3-4ubuntu5
libacl1 2.2.49-2
libattr1 1:2.4.44-1
libc-bin 2.11.1-0ubuntu7.8
libc6 2.11.1-0ubuntu7.8
libcorosync4 1.2.0-0ubuntu1
libdb4.8 4.8.24-1ubuntu1
libgcc1 1:4.4.3-4ubuntu5
liblocale-
libncurses5 5.7+20090803-
libnspr4-0d 4.8.6-0ubuntu0.
libnss3-1d 3.12.9+
libpam-modules 1.1.1-2ubuntu5
libpam0g 1.1.1-2ubuntu5
libselinux1 2.0.89-4
libsqlite3-0 3.6.22-1
libstdc++6 4.4.3-4ubuntu5
libtext-
libtext-
libtext-
lsb-base 4.0-0ubuntu8
lzma 4.43-14ubuntu2
ncurses-bin 5.7+20090803-
passwd 1:4.1.4.
perl-base 5.10.1-8ubuntu2
sed 4.2.1-6
sensible-utils 0.0.1ubuntu3
tzdata 2011e-0ubuntu0.
zlib1g 1:1.2.3.
DistroRelease: Ubuntu 10.04
InstallationMedia: Ubuntu-Server 10.04 LTS "Lucid Lynx" - Release amd64 (20100427)
Package: corosync 1.2.0-0ubuntu1
PackageArchitec
ProblemType: Bug
ProcEnviron:
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcVersionSign
SourcePackage: corosync
Tags: lucid
Uname: Linux 2.6.32-30-server x86_64
It will segfault with or without the other server online. Oddly, these are built on similar machines with essentially the exact same config throughout. The only difference is that I originally forgot to set the mtu to 9000 on this particular machine and therefore corosync failed to communicate initially. I brought it back up with the correct eth MTU and it started with segfaulting. I've tried cleaning up the machine and starting fresh with just the config, but that doesn't work either. I suspect this is cured in a newer package of corosync.
corosync.conf:
# Please read the corosync.conf.5 manual page
totem {
version: 2
secauth: off
threads: 0
netmtu: 9000
token: 3000
token_
join: 60
consensus: 5000
vsftype: none
max_messages: 20
clear_
interface {
ringnumber: 0
bindnetaddr: 10.24.98.0
mcastaddr: 239.18.110.1
mcastport: 4172
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: on
timestamp: on
logger_subsys {
subsys: AMF
debug: on
}
}
amf {
mode: disabled
}
aisexec {
user: root
group: root
}
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 0
}
daemon.log: attached
I have the strace too, if it would help.
Changed in corosync (Ubuntu): | |
status: | Confirmed → Incomplete |
status: | Incomplete → Invalid |
Changed in corosync (Ubuntu): | |
status: | Confirmed → Incomplete |
The work around to this is to export the updated CIB, modify it for the host and then: d/corosync start && cibadmin --replace --xml-file /path/to/ modified. xml
/etc/init.