Comment 4 for bug 1123192

Revision history for this message
Alex Bligh (ubuntu-alex-org) wrote :

Just to document an IRC conversation, another more obvious instance of the problem is:

a) use OCFS2 with a shared heartbeat service on an iSCSI connected SAN - assume this is not in the configuration file for iSCSI (not that it makes much difference)
b) OCFS2 will access the raw block device with O_DIRECT. If writes fail for a time, it will fence the machine (meaning hard reboot it)
c) Now try upgrading openiscsi. stoptargets() in the init script logs out of all targets, causing OCFS2 heartbeat writes to fail

I'm not sure how to fix this. The initiator FD appears to be owned by iscsid. If we don't do stoptargets and the module removes in the init script, what seems to happen is the block device (/dev/sdb) because stale after iscsid quits and OCFS2 doesn't want to write to it anyway. Restarting the OCFS2 cluster should fix this, but we have know way of knowing when to do that, and we have a shortish time window or the machine hard reboots due to kernel fencing.

What I think should happen ideally is more like the nbd client way of doing things (and I'm just making assumptions about how iscsid works here). That is in essence that the daemon should double fork() for each initiated session, and that fork()'d session should ONLY exit when the kernel is actually done with the session. Upgrading iscsid etc should itself not kill existing sessions at all. If I understand right (possibly not) iscsid isn't actually doing anything post negotiation, apart from causing an issue if it dies.