Infinite loop in helper LVM script for DRBD 8 in Lucid
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
drbd8 (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Lucid |
Fix Released
|
High
|
Unassigned |
Bug Description
The script /usr/lib/
This bug is not in present in natty, but I humbly request an SRU to Lucid.
This script can be called by DRBD to create an LVM snapshot of a resource before it starts resyncing (thus becoming inconsistent). This script is present (though commented out) in the default drbd configuration file /etc/drbd.
This bug can result in drbd silently failing to resync an outdated resource. Thus, any newer data will be lost if the cluster fails over to the outdated node.
This is a known problem upstream and it was fixed in later versions of DRBD. The following patch was committed upstream to address this bug : http://
Steps to reproduce:
On a cluster composed of node A and node B:
- on both nodes, uncomment the line "before-
- on both nodes, configure a drbd resource called "test"
- make the initial synchronization
- on B, do "drbdadm disconnect test"
- nn A, make "test" primary and write to it
- on B, do "drbdadm connect test"
=> the script will kick in and fall in an endless loop - top will show it using 100% CPU.
lsb_release -rd
Description: Ubuntu 10.04.2 LTS
Release: 10.04
apt-cache policy drbd8-utils
drbd8-utils:
Installé : 2:8.3.7-1ubuntu2.1
Candidat : 2:8.3.7-1ubuntu2.1
Table de version :
*** 2:8.3.7-1ubuntu2.1 0
500 http://
100 /var/lib/
2:
500 http://
Lionel Sausin.
=======
SRU Justification
IMPACT:
Using the --percent option, the script entered in an endless loop. This can result on having DRBD failing to resync an outdated resource when using LVM.
REPRODUCE (as specified above):
On a cluster composed of node A and node B:
- on both nodes, uncomment the line "before-
- on both nodes, configure a drbd resource called "test"
- make the initial synchronization
- on B, do "drbdadm disconnect test"
- nn A, make "test" primary and write to it
- on B, do "drbdadm connect test"
=> the script will kick in and fall in an endless loop - top will show it using 100% CPU.
HOW FIXED:
The fix was taken from upstream. It basically consists on sourcing the default file first, then shifting correctly the option, and finally and use drbdadm sh-minor to obtain the minor version instead of guessing it, which all together can cause the script to enter in infinite loop and fail to resync as specified above.
PATCH:
Attached. Uploaded to lucid-proposed for review.
REGRESSION POTENTIAL:
Minimal. This has been tested thoroughly.
=======
Related branches
Changed in drbd8 (Ubuntu): | |
assignee: | nobody → Andres Rodriguez (andreserl) |
status: | Triaged → In Progress |
importance: | Wishlist → High |
description: | updated |
description: | updated |
Changed in drbd8 (Ubuntu): | |
assignee: | Andres Rodriguez (andreserl) → nobody |
status: | In Progress → New |
status: | New → Fix Released |
Changed in drbd8 (Ubuntu Lucid): | |
importance: | Undecided → High |
Hi there,
thank you for reporting bugs and trying to make Ubuntu better. I'll look into this bug. Thank you!