corosync segfaults on startup joining another node

Bug #768471 reported by Wes Janzen
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
corosync (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Binary package hint: corosync

Architecture: amd64
Date: Thu Apr 21 12:00:35 2011
Dependencies:
  adduser 3.112ubuntu1
  base-files 5.0.0ubuntu20.10.04.3
  base-passwd 3.5.22
  coreutils 7.4-2ubuntu3
  debconf 1.5.28ubuntu4
  debconf-i18n 1.5.28ubuntu4
  debianutils 3.2.2
  dpkg 1.15.5.6ubuntu4.5
  findutils 4.4.2-1ubuntu1
  gcc-4.4-base 4.4.3-4ubuntu5
  libacl1 2.2.49-2
  libattr1 1:2.4.44-1
  libc-bin 2.11.1-0ubuntu7.8
  libc6 2.11.1-0ubuntu7.8
  libcorosync4 1.2.0-0ubuntu1
  libdb4.8 4.8.24-1ubuntu1
  libgcc1 1:4.4.3-4ubuntu5
  liblocale-gettext-perl 1.05-6
  libncurses5 5.7+20090803-2ubuntu3
  libnspr4-0d 4.8.6-0ubuntu0.10.04.2
  libnss3-1d 3.12.9+ckbi-1.82-0ubuntu0.10.04.1
  libpam-modules 1.1.1-2ubuntu5
  libpam0g 1.1.1-2ubuntu5
  libselinux1 2.0.89-4
  libsqlite3-0 3.6.22-1
  libstdc++6 4.4.3-4ubuntu5
  libtext-charwidth-perl 0.04-6
  libtext-iconv-perl 1.7-2
  libtext-wrapi18n-perl 0.06-7
  lsb-base 4.0-0ubuntu8
  lzma 4.43-14ubuntu2
  ncurses-bin 5.7+20090803-2ubuntu3
  passwd 1:4.1.4.2-1ubuntu2.2
  perl-base 5.10.1-8ubuntu2
  sed 4.2.1-6
  sensible-utils 0.0.1ubuntu3
  tzdata 2011e-0ubuntu0.10.04
  zlib1g 1:1.2.3.3.dfsg-15ubuntu1
DistroRelease: Ubuntu 10.04
InstallationMedia: Ubuntu-Server 10.04 LTS "Lucid Lynx" - Release amd64 (20100427)
Package: corosync 1.2.0-0ubuntu1
PackageArchitecture: amd64
ProblemType: Bug
ProcEnviron:
  PATH=(custom, no user)
  LANG=en_US.UTF-8
  SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-30.59-server 2.6.32.29+drm33.13
SourcePackage: corosync
Tags: lucid
Uname: Linux 2.6.32-30-server x86_64

It will segfault with or without the other server online. Oddly, these are built on similar machines with essentially the exact same config throughout. The only difference is that I originally forgot to set the mtu to 9000 on this particular machine and therefore corosync failed to communicate initially. I brought it back up with the correct eth MTU and it started with segfaulting. I've tried cleaning up the machine and starting fresh with just the config, but that doesn't work either. I suspect this is cured in a newer package of corosync.

corosync.conf:

# Please read the corosync.conf.5 manual page

totem {
 version: 2
 secauth: off
 threads: 0
 netmtu: 9000
 token: 3000
 token_retransmits_before_loss_const: 10
 join: 60
 consensus: 5000
 vsftype: none
 max_messages: 20
 clear_node_high_bit: yes
 interface {
  ringnumber: 0
  bindnetaddr: 10.24.98.0
  mcastaddr: 239.18.110.1
  mcastport: 4172
 }
}

logging {
 fileline: off
 to_stderr: yes
 to_logfile: no
 to_syslog: yes
 syslog_facility: daemon
 debug: on
 timestamp: on
 logger_subsys {
  subsys: AMF
  debug: on
 }
}

amf {
 mode: disabled
}

aisexec {
 user: root
 group: root
}

service {
 # Load the Pacemaker Cluster Resource Manager
 name: pacemaker
 ver: 0
}

daemon.log: attached

I have the strace too, if it would help.

Revision history for this message
Wes Janzen (wes-janzen) wrote :
Revision history for this message
Wes Janzen (wes-janzen) wrote :

The work around to this is to export the updated CIB, modify it for the host and then:
/etc/init.d/corosync start && cibadmin --replace --xml-file /path/to/modified.xml

Revision history for this message
Bart Van Assche (bart-vanassche) wrote :

I'm also seeing frequent corosync segfaults when another node joins, but on Ubuntu 12.04.

Changed in corosync (Ubuntu):
status: New → Confirmed
Changed in corosync (Ubuntu):
status: Confirmed → Incomplete
status: Incomplete → Invalid
Revision history for this message
Bart Van Assche (bart-vanassche) wrote :

Rafael, you shouldn't have changed the status of this bug into Incomplete / Invalid without having explained why you did so.

Changed in corosync (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Bart, and anyone affected by this, this is an old bug and it is referring a crash happening in a quite common place (node joining) in a pretty old version. That is why I've marked this as incomplete last time (in 2016), since it had no crash file for us to analyse the root cause.

With that in mind, and considering after you marked as confirmed again, there was nothing added in 3 years, I'm marking this as incomplete again and I'll be more than happy to help in any crash happening during corosync initialization. Please feel free to re-open this case (or open a new one) for a this (or a similar) issue.

Thank you very much

Rafael

Changed in corosync (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for corosync (Ubuntu) because there has been no activity for 60 days.]

Changed in corosync (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.