[Dapper] linux-image-server breaks heartbeat/heartbeat-2

Bug #140633 reported by Michael Rivera
8
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Unassigned

Bug Description

Using Ubuntu/dapper 6.06 LTS with all package updates.

2 Nodes with heartbeat active/passive running.
After I reboot one machine, heartbeat is started by init again.
Fine so far.

Many seconds later, heartbeat on the rebooted machine doesn't recognize itself
and restarts again... and of course is acquire resources, even the other node ows
those.
(heartbeat[4098]: ERROR: No local heartbeat. Forcing restart.)

This is a problem with the linux-image-server. If I switch over to linux-image-686
this problem is gone.

Look here if you need debug output from heartbeat.
http://lists.linux-ha.org/pipermail/linux-ha/2007-September/027528.html
or http://thread.gmane.org/gmane.linux.highavailability.user/19692

Revision history for this message
Michael Carpenter (mcarpent) wrote :

I can confirm this bug.

It was causing random lockups on my passive heartbeat server using linux-image-server (2.6.15-51). When server would restart it was unable to detect the master heartbeat server (running linux-image-686, same version), and would grab the resources. After a few minutes (it seems entirely random, anywhere from 2-20 minutes), the heartbeat server would lose a connection to itself and the master would take over again. My servers have been doing this dance all weekend long!

Neither of the boxes do anything but handle heartbeat/ldirectord and the load is not high enough for the systems to be declaring each other dead.

After seeing that the two servers were running different kernels, I changed the offending server to use the 686 version, since it was stable, and the issue has now disappeared.

Log files can be provided if necessary.

Revision history for this message
Michael Carpenter (mcarpent) wrote :

Quick follow up... still having the same issue with the new kernel (2.6.15-51-686). Not certain if it's a hardware issue or the ip_vs_backup/master processes. I've noticed the date command is reporting erroneous times (even moving backwards!), even though the hardware clock continues to run fine. Load is also constantly above 2 even though there's nothing being processed...

I've read that the ip_vs_backup/master processes have caused issues for others in the past due to ssleep() calls, and I have verified that this kernel does still use this function. I am going to attempt moving to a newer version which changes to msleep_interruptable() instead to see if anything changes.

Revision history for this message
Andreas Moog (ampelbein) wrote : Old standing report

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thank you for taking the time to report this bug and helping to make
Ubuntu better. You reported this bug a while ago and there hasn't been
any activity in it recently. We were wondering is this still an issue
for you? Can you try with latest Ubuntu release? Thanks in advance.

 status incomplete
 assignee andreas-moog
 subscribe

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAki1cTAACgkQ06FgkPZwicRANACglvPFojQb7P0P9Uyiy2ml5jlT
Zp8An3nWhjAwteb/mby2tL1+ndkCLue5
=M+WB
-----END PGP SIGNATURE-----

Changed in linux-meta:
assignee: nobody → andreas-moog
status: New → Incomplete
Revision history for this message
Michael Rivera (rivera) wrote :

This is still an issue, at least for dapper.

Incomplete?

Revision history for this message
Andreas Moog (ampelbein) wrote :

Thanks for the update.

Changed in linux-meta:
assignee: andreas-moog → nobody
importance: Undecided → High
status: Incomplete → Confirmed
Revision history for this message
Andy Whitcroft (apw) wrote :

This is not a bug in the linux-meta package, moving to the linux package.

affects: linux-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
Rolf Leggewie (r0lf) wrote :

I assume these are production servers that can't be easily fiddled with? Would it be possible to see if hardy or a later release are affected at all?

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

invalidating due to age. If this is an issue in a later version or the development branch, please file us a new bug with 'ubuntu-bug linux' so that we can get up to date environment information.

~Jfo

Changed in linux (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.