open-iscsi removes modules on stop but should not

Bug #1123192 reported by Alex Bligh on 2013-02-12
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
open-iscsi (Ubuntu)
High
Unassigned

Bug Description

open-iscsi's stop script has within it the following lines:

stop() {
        stoptargets
        log_daemon_msg "Stopping iSCSI initiator service"
        start-stop-daemon --stop --quiet --signal KILL --exec $DAEMON
        rm -f $PIDFILE /lib/init/rw/sendsigs.omit.d/`basename $PIDFILE`
        modprobe -r ib_iser 2>/dev/null
        modprobe -r iscsi_tcp 2>/dev/null
        log_end_msg 0
}

The modprobe -r lines attempt to remove the relevant iscsi modules. I believe this to be a bug, because there may be other users of those modules besides open-iscsi. On an upgrade (for instance) and attempt is made to remove those modules (albeit ignoring errors), and then modprobe them back in again. However, those modules are not distributed in the open-iscsi package - they are kernel modules - and open-iscsi has no business removing them as there may be other users using them in the mean time.

I can see no reason why they are being removed at all.

A simple fix is to remove the two 'modprobe -r' lines from debian/open-iscsi.init.

What other users are you talking about?
On Feb 12, 2013 9:30 PM, "Alex Bligh" <email address hidden> wrote:

> Public bug reported:
>
> open-iscsi's stop script has within it the following lines:
>
> stop() {
> stoptargets
> log_daemon_msg "Stopping iSCSI initiator service"
> start-stop-daemon --stop --quiet --signal KILL --exec $DAEMON
> rm -f $PIDFILE /lib/init/rw/sendsigs.omit.d/`basename $PIDFILE`
> modprobe -r ib_iser 2>/dev/null
> modprobe -r iscsi_tcp 2>/dev/null
> log_end_msg 0
> }
>
>
> The modprobe -r lines attempt to remove the relevant iscsi modules. I
> believe this to be a bug, because there may be other users of those modules
> besides open-iscsi. On an upgrade (for instance) and attempt is made to
> remove those modules (albeit ignoring errors), and then modprobe them back
> in again. However, those modules are not distributed in the open-iscsi
> package - they are kernel modules - and open-iscsi has no business removing
> them as there may be other users using them in the mean time.
>
> I can see no reason why they are being removed at all.
>
> A simple fix is to remove the two 'modprobe -r' lines from debian/open-
> iscsi.init.
>
> ** Affects: open-iscsi (Ubuntu)
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are subscribed to open-
> iscsi in Ubuntu.
> https://bugs.launchpad.net/bugs/1123192
>
> Title:
> open-iscsi removes modules on stop but should not
>
> Status in “open-iscsi” package in Ubuntu:
> New
>
> Bug description:
> open-iscsi's stop script has within it the following lines:
>
> stop() {
> stoptargets
> log_daemon_msg "Stopping iSCSI initiator service"
> start-stop-daemon --stop --quiet --signal KILL --exec $DAEMON
> rm -f $PIDFILE /lib/init/rw/sendsigs.omit.d/`basename $PIDFILE`
> modprobe -r ib_iser 2>/dev/null
> modprobe -r iscsi_tcp 2>/dev/null
> log_end_msg 0
> }
>
>
> The modprobe -r lines attempt to remove the relevant iscsi modules. I
> believe this to be a bug, because there may be other users of those modules
> besides open-iscsi. On an upgrade (for instance) and attempt is made to
> remove those modules (albeit ignoring errors), and then modprobe them back
> in again. However, those modules are not distributed in the open-iscsi
> package - they are kernel modules - and open-iscsi has no business removing
> them as there may be other users using them in the mean time.
>
> I can see no reason why they are being removed at all.
>
> A simple fix is to remove the two 'modprobe -r' lines from debian
> /open-iscsi.init.
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1123192/+subscriptions
>

Alex Bligh (ubuntu-alex-org) wrote :

Ritesh,

We're actually seeing 2 problems:

1. on an upgrade to the open-iscsi module, between the stop and the subsequent start, a daemon is calling iscsiadm and that is failing as the module isn't loaded. Arguably this is our problem.

2. on a separate system, we have something which does something similar to iscsiadm by manipulating /proc/scsi/scsi etc. directly. This is, I believe, permissible. I don't think this is our problem.

I'm curious to know what the reason is for the removal in the first place.

Alex

Alex Bligh (ubuntu-alex-org) wrote :

Actually one problem we are seeing should be pretty universal.

When using OCFS2, it will open the shared iSCSI device with O_DIRECT, and write a heartbeat there. If this errors, then the node self fences (i.e. reboots).

If the open-iscsi package is upgraded, it calls the init with a stop method, then the start method. There may be a gap between these two if lots of packages are being upgraded.

It's actually not just the modprobes that are the problem here, it's the fact that it logs out of all iscsi targets WHETHER OR NOT THEY ARE IN THE CONFIG FILE which is the issue. In one scenario here we aren't using the config file at all - we're calling iscsiadm or equivalent to mount the LUNs directly.

What should be happening here I think is it shutting down the daemon and arguably stopping the targets it started on boot. I don't see why it's interfering with other targets it had nothing to do with starting. Or at least there should be some /etc/default type setting that says 'please don't interfere with running targets'.

What happens if your root directory is on iscsi? (serious question - I'm interested in how it's meant to cope)

Changed in open-iscsi (Ubuntu):
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Dmitrijs Ledkovs (xnox)
Alex Bligh (ubuntu-alex-org) wrote :

Just to document an IRC conversation, another more obvious instance of the problem is:

a) use OCFS2 with a shared heartbeat service on an iSCSI connected SAN - assume this is not in the configuration file for iSCSI (not that it makes much difference)
b) OCFS2 will access the raw block device with O_DIRECT. If writes fail for a time, it will fence the machine (meaning hard reboot it)
c) Now try upgrading openiscsi. stoptargets() in the init script logs out of all targets, causing OCFS2 heartbeat writes to fail

I'm not sure how to fix this. The initiator FD appears to be owned by iscsid. If we don't do stoptargets and the module removes in the init script, what seems to happen is the block device (/dev/sdb) because stale after iscsid quits and OCFS2 doesn't want to write to it anyway. Restarting the OCFS2 cluster should fix this, but we have know way of knowing when to do that, and we have a shortish time window or the machine hard reboots due to kernel fencing.

What I think should happen ideally is more like the nbd client way of doing things (and I'm just making assumptions about how iscsid works here). That is in essence that the daemon should double fork() for each initiated session, and that fork()'d session should ONLY exit when the kernel is actually done with the session. Upgrading iscsid etc should itself not kill existing sessions at all. If I understand right (possibly not) iscsid isn't actually doing anything post negotiation, apart from causing an issue if it dies.

Changed in open-iscsi (Ubuntu):
assignee: Dimitri John Ledkov (xnox) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers