metadata operations hang after node outage
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lvm2 (Ubuntu) |
New
|
Undecided
|
Unassigned | ||
Bug Description
Binary package hint: clvm
When I shut down a node, and then boot it back up, LVM metadata operations hang after it has rejoined the cluster, unless I restart clvmd (by sending it a SIGKILL first).
Currently, when I'm booting up all 3 nodes in my cluster fresh, LVM works fine.
When I shut down one of the nodes cleanly (so that the cluster remains quorate), LVM continues to work fine until the downed node is brought back up.
The LVM operations appear to be hanging on a connect() call - I've attached an strace. There's nothing interesting in dmesg or the syslog.
cman_tool services reports that all 3 nodes are running a clvmd (and ps on each node confirms)
I can restore functionality by doing `pkill -9 clvmd` and then running clvmd by hand on the node that was just restarted.
This appears to be a deadlock of some sort or another in clvmd - it's hanging on a futex call.