[SRU] drbd fence-peer breaks when using kernel 2.6.32-41

Bug #1000355 reported by Jacob Smith on 2012-05-16
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
drbd8 (Ubuntu)
Medium
Ante Karamatić
Lucid
Undecided
Ante Karamatić

Bug Description

SRU Justification

Upstream commit:

e6cbc43 - http://git.drbd.org/gitweb.cgi?p=drbd-8.3.git;a=commitdiff;h=e6cbc43

Description:

Latest 10.04 kernel (2.6.32-41) fixed an issue described in bug 963685. Cause of this change, drbd module, built with dkms, regressed and can not be used as intended.

Notes (original report):

Ubuntu 10.04 Lucid with 2.6.32-41 kernel and drbd8

Kernel 2.6.32-41 fixed a consistency issue around UMH_WAIT_PROC in this bug:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/963685

This causes the drbd fencing script's exit codes to be incorrectly interpreted which then breaks the drbd fencing:

**** This also affects linux source in all distributions after Lucid with the applicable kernel versions patched in bug 963685 above since the drbd kernel module is mainlined in those more recent kernel versions ****

To replicate:

Have fencing enabled in drbd config:
In handlers section: fence-peer "/usr/lib/drbd/crm-fence-peer.sh"
In the disk section: fencing resource-only;

Have both drbd nodes uptodate with one primary one secondary
Make the fence-peer get executed. I did this by:
Having drbd under pacemaker control. Both pacemaker nodes were online and in-sync. Drbd in primary on node 1. Put node 1 in standby. Fence-peer will get executed.
Fence handler will report fence-peer exited with 0 (broken) - such as this:

May 15 09:45:17 kernel: [56645.420714] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0
May 15 09:45:17 kernel: [56645.420920] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 0 (0x0)
May 15 09:45:17 kernel: [56645.420925] block drbd0: fence-peer helper broken, returned 0

If you log debug output of fence-peer script (crm-fence-peer.sh) when executed it exits 4 not the kernel reported 0.

This commit in drbd git should fix this behavior:

http://git.drbd.org/gitweb.cgi?p=drbd-8.3.git;a=commitdiff;h=e6cbc43

This will cause complete failure of a drbd setup using fencing to auto-recover or continue without manual intervention and repair.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1000355

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

Unfortunately unable to collect such log files due to the nature of the bug and changes since to bring systems back to working order.

I might have further info I could give - if there are specific questions I will try to answer.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: lucid regression-update
description: updated
Changed in drbd8 (Ubuntu):
status: New → Confirmed
Ante Karamatić (ivoks) wrote :

Per comment https://bugs.launchpad.net/ubuntu/+source/linux/+bug/963685/comments/13 this bug exists only in 10.04, since drbd module is shipped as a separate package.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Ante Karamatić (ivoks) wrote :
Changed in drbd8 (Ubuntu):
importance: Undecided → Medium
summary: - drbd fence-peer breaks when using kernel 2.6.32-41
+ [SRU] drbd fence-peer breaks when using kernel 2.6.32-41
Ante Karamatić (ivoks) on 2012-05-17
description: updated
Changed in drbd8 (Ubuntu):
assignee: nobody → Ante Karamatić (ivoks)
Ante Karamatić (ivoks) on 2012-05-17
description: updated
no longer affects: linux (Ubuntu)
Tim Gardner (timg-tpi) wrote :

Ante - you forgot to add usermodehelper-consistently.dpatch to debian/patches/00list. Attached is the patch that I uploaded.

Changed in drbd8 (Ubuntu Lucid):
status: New → Fix Committed
assignee: nobody → Ante Karamatić (ivoks)
Changed in drbd8 (Ubuntu):
status: Confirmed → Fix Released

Hello Jacob, or anyone else affected,

Accepted drbd8 into lucid-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Hans (listacc) wrote :

Is it conceivable when the final patch will be released?

tmortensen (tmortensen) wrote :

I have tested the supplied patch and it does resolve the issues with the crm fencing.

Jacob Smith (jsmith-argotecinc) wrote :

Hans - The patch is awaiting verification testing. I was unable to test the proposed fix since I had mitigated the bug already on my servers by the time it was available.

Looks like tmortensen has tested it so I would expect you to see the patch available soon!

Ante Karamatić (ivoks) on 2012-06-06
tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package drbd8 - 2:8.3.7-1ubuntu2.3

---------------
drbd8 (2:8.3.7-1ubuntu2.3) lucid-proposed; urgency=low

  * debian/patches/usermodehelper-consistently.dpatch:
    - upstream commit e6cbc43
    - usermodehelper: use UMH_WAIT_PROC consistently
    - (LP: #1000355)
 -- Ante Karamatic <email address hidden> Thu, 17 May 2012 09:29:33 +0200

Changed in drbd8 (Ubuntu Lucid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers