This causes the drbd fencing script's exit codes to be incorrectly interpreted which then breaks the drbd fencing:
To replicate:
Have fencing enabled in drbd config:
In handlers section: fence-peer "/usr/lib/drbd/crm-fence-peer.sh"
In the disk section: fencing resource-only;
Have both drbd nodes uptodate with one primary one secondary
Make the fence-peer get executed. I did this by:
Having drbd under pacemaker control. Both pacemaker nodes were online and in-sync. Drbd in primary on node 1. Put node 1 in standby. Fence-peer will get executed.
Fence handler will report fence-peer exited with 0 (broken) - such as this:
May 15 09:45:17 kernel: [56645.420714] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0
May 15 09:45:17 kernel: [56645.420920] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 0 (0x0)
May 15 09:45:17 kernel: [56645.420925] block drbd0: fence-peer helper broken, returned 0
If you log debug output of fence-peer script (crm-fence-peer.sh) when executed it exits 4 not the kernel reported 0.
Ubuntu 10.04 Lucid with 2.6.32-41 kernel and drbd8
Kernel 2.6.32-41 fixed a consistency issue around UMH_WAIT_PROC in this bug:
https:/ /bugs.launchpad .net/ubuntu/ +source/ linux/+ bug/963685
This causes the drbd fencing script's exit codes to be incorrectly interpreted which then breaks the drbd fencing:
To replicate:
Have fencing enabled in drbd config: drbd/crm- fence-peer. sh"
In handlers section: fence-peer "/usr/lib/
In the disk section: fencing resource-only;
Have both drbd nodes uptodate with one primary one secondary
Make the fence-peer get executed. I did this by:
Having drbd under pacemaker control. Both pacemaker nodes were online and in-sync. Drbd in primary on node 1. Put node 1 in standby. Fence-peer will get executed.
Fence handler will report fence-peer exited with 0 (broken) - such as this:
May 15 09:45:17 kernel: [56645.420714] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0
May 15 09:45:17 kernel: [56645.420920] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 0 (0x0)
May 15 09:45:17 kernel: [56645.420925] block drbd0: fence-peer helper broken, returned 0
If you log debug output of fence-peer script (crm-fence-peer.sh) when executed it exits 4 not the kernel reported 0.
This commit in drbd git should fix this behavior:
http:// git.drbd. org/gitweb. cgi?p=drbd- 8.3.git; a=commitdiff; h=e6cbc43
This will cause complete failure of a drbd setup using fencing to auto-recover or continue without manual intervention and repair.