authenticated and unauthenicated iscsi clients fails to complete boot

Bug #838809 reported by James Page on 2011-09-01
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
open-iscsi (Ubuntu)
High
Stéphane Graber
Oneiric
High
Stéphane Graber

Bug Description

Discovered during beta-1 testing of oneiric-server-amd64 images.

Installation of both authenticated and unauthenticated iscsi clients works just fine; they network boot and appear to be OK until the iscsid starts up - at which point boot is really slow with lots of recoverable errors.

I can't logon to the system as it won't complete boot but I have attached two screenshots.

I'm using a lucid iscsi target which has been good for testing so-far this release.

James Page (james-page) wrote :
James Page (james-page) wrote :
James Page (james-page) wrote :

I can of course mount the target disks on the iscsi target host itself if anything on the target filesystem might be of interest.

tags: added: iso-testing
Colin Watson (cjwatson) wrote :

I wonder if this is another case of networking being ripped out from under the iSCSI stack during boot? That's been a problem in the past.

The best way to debug this is usually to boot with init=/bin/sh and then manually bring up the system, running steps from Upstart jobs by hand. It's tedious but it's a lot easier than trying to play guesswork from a full parallel boot ...

Dave Walker (davewalker) on 2011-09-01
tags: added: server-o-rs
Robbie Williamson (robbiew) wrote :

Assigning to canonical-foundations, if that's wrong punt it back to the ubuntu-server team. -Robbie

Stéphane Graber (stgraber) wrote :

Hello!

So I'm expecting to have some time to look at that bug next week. I just had a quick look at how to get a setup in place to reproduce the bug and found the instructions for the ISO testing at: http://testcases.qa.ubuntu.com/Install/ServeriSCSIRoot

Which describe who to get the client VM installed but I couldn't easily find instructions on setting up the server side with I'm assuming dhcp, tftp and the iscsi target. Do we have documentation for that somewhere?
Sadly I'm not done configuring my SAN to act as an iSCSI target yet so I won't be able to use it for that (FC only at the moment) so I'll need to have that running in a VM too.

Dave Walker (davewalker) on 2011-09-03
Changed in ubuntu:
milestone: none → ubuntu-11.10-beta-2
tags: added: server-o-ro
removed: server-o-rs
Tobin Davis (gruemaster) wrote :

This appears to be reproducible on arm platforms as well, using both AMD64 and armel target hosts and armel initiator.

As to documentation, we need better documentation for raw system setup. Not everyone will be running this in a vm (actually, quite few I would imagine).

tags: added: armel
Stéphane Graber (stgraber) wrote :

Successfully reproduced here on kvm.

Initial testing shows that the open-iscsi init script seems to be the cause of the disconnect and input/output errors.
Removing it from the boot sequence lets me boot and get a shell without storage related errors.

I'm not too familiar with iscsi so I'll now have to dig a bit into open-iscsi and see exactly what it's trying to do at boot time.
As iscsi is already in use for the root device, my hope would be that it would keep it as it's an only initialize any other lun you may need.

Would be great if you could confirm that disabled open-iscsi from your target systems doesn't give you any error and that the system behaves as it should (it seems to be taking quite a bit of time to boot here, no idea if that's normal for iscsi boot).

Stéphane Graber (stgraber) wrote :

Starting iscsid manually (which seems to be the part of open-iscsi that breaks the system) gives me:
iscsid: Login failed to authenticate with target iqn.2010-01.com.example:storage.lun1
iscsid: session 1 login rejected: Initiator failed authentication with target

Then everything freezes and I get input/output errors.

So it seems like even though iscsid ran properly from initramfs, the same done on the booted system fails, thereby breaking the already established connection and crashing the system.

Stéphane Graber (stgraber) wrote :

I guess I'll need to talk to someone with a bit more iscsi experience though my feeling so far is that iscsistart works properly from the initramfs, mounts the lun and then mounts the root device.

When iscsid then starts, it tries to disconnect the existing device and re-establish the connection and fails to do so for some reason.
I reproduced the same behaviour on another oneiric system booting from an harddrive and having an iscsi lun mounted by the initramfs, that way I can debug it without having my root filesystem disconnected.

Stéphane Graber (stgraber) wrote :
Download full text (4.3 KiB)

Digging a little deeper, here's the dump of a connection on the server when done from initramfs:
Sep 9 16:33:47 ubuntu ietd: 49 6e 69 74 | 69 61 74 6f | 72 4e 61 6d | 65 3d 69 71 | InitiatorName=iq |
Sep 9 16:33:47 ubuntu ietd: 6e 2e 31 39 | 39 33 2d 30 | 38 2e 6f 72 | 67 2e 64 65 | n.1993-08.org.de |
Sep 9 16:33:47 ubuntu ietd: 62 69 61 6e | 3a 30 31 3a | 33 34 62 31 | 30 35 39 61 | bian:01:34b1059a |
Sep 9 16:33:47 ubuntu ietd: 39 32 61 00 | 49 6e 69 74 | 69 61 74 6f | 72 41 6c 69 | 92a InitiatorAli |
Sep 9 16:33:47 ubuntu ietd: 61 73 3d 28 | 6e 6f 6e 65 | 29 00 54 61 | 72 67 65 74 | as=(none) Target |
Sep 9 16:33:47 ubuntu ietd: 4e 61 6d 65 | 3d 69 71 6e | 2e 32 30 31 | 30 2d 30 31 | Name=iqn.2010-01 |
Sep 9 16:33:47 ubuntu ietd: 2e 63 6f 6d | 2e 65 78 61 | 6d 70 6c 65 | 3a 73 74 6f | .com.example:sto |
Sep 9 16:33:47 ubuntu ietd: 72 61 67 65 | 2e 6c 75 6e | 31 00 53 65 | 73 73 69 6f | rage.lun1 Sessio |
Sep 9 16:33:47 ubuntu ietd: 6e 54 79 70 | 65 3d 4e 6f | 72 6d 61 6c | 00 48 65 61 | nType=Normal Hea |
Sep 9 16:33:47 ubuntu ietd: 64 65 72 44 | 69 67 65 73 | 74 3d 4e 6f | 6e 65 00 44 | derDigest=None D |
Sep 9 16:33:47 ubuntu ietd: 61 74 61 44 | 69 67 65 73 | 74 3d 4e 6f | 6e 65 00 44 | ataDigest=None D |
Sep 9 16:33:47 ubuntu ietd: 65 66 61 75 | 6c 74 54 69 | 6d 65 32 57 | 61 69 74 3d | efaultTime2Wait= |
Sep 9 16:33:47 ubuntu ietd: 32 00 44 65 | 66 61 75 6c | 74 54 69 6d | 65 32 52 65 | 2 DefaultTime2Re |
Sep 9 16:33:47 ubuntu ietd: 74 61 69 6e | 3d 30 00 49 | 46 4d 61 72 | 6b 65 72 3d | tain=0 IFMarker= |
Sep 9 16:33:47 ubuntu ietd: 4e 6f 00 4f | 46 4d 61 72 | 6b 65 72 3d | 4e 6f 00 45 | No OFMarker=No E |
Sep 9 16:33:47 ubuntu ietd: 72 72 6f 72 | 52 65 63 6f | 76 65 72 79 | 4c 65 76 65 | rrorRecoveryLeve |
Sep 9 16:33:47 ubuntu ietd: 6c 3d 30 00 | 49 6e 69 74 | 69 61 6c 52 | 32 54 3d 4e | l=0 InitialR2T=N |
Sep 9 16:33:47 ubuntu ietd: 6f 00 49 6d | 6d 65 64 69 | 61 74 65 44 | 61 74 61 3d | o ImmediateData= |
Sep 9 16:33:47 ubuntu ietd: 59 65 73 00 | 4d 61 78 42 | 75 72 73 74 | 4c 65 6e 67 | Yes MaxBurstLeng |
Sep 9 16:33:47 ubuntu ietd: 74 68 3d 31 | 36 37 37 36 | 31 39 32 00 | 46 69 72 73 | th=16776192 Firs |
Sep 9 16:33:47 ubuntu ietd: 74 42 75 72 | 73 74 4c 65 | 6e 67 74 68 | 3d 32 36 32 | tBurstLength=262 |
Sep 9 16:33:47 ubuntu ietd: 31 34 34 00 | 4d 61 78 4f | 75 74 73 74 | 61 6e 64 69 | 144 MaxOutstandi |
Sep 9 16:33:47 ubuntu ietd: 6e 67 52 32 | 54 3d 31 00 | 4d 61 78 43 | 6f 6e 6e 65 | ngR2T=1 MaxConne |
Sep 9 16:33:47 ubuntu ietd: 63 74 69 6f | 6e 73 3d 31 | 00 44 61 74 | 61 50 44 55 | ctions=1 DataPDU |
Sep 9 16:33:47 ubuntu ietd: 49 6e 4f 72 | 64 65 72 3d | 59 65 73 00 | 44 61 74 61 | InOrder=Yes Data |
Sep 9 16:33:47 ubuntu ietd: 53 65 71 75 | 65 6e 63 65 | 49 6e 4f 72 | 64 65 72 3d | SequenceInOrder= |
Sep 9 16:33:47 ubuntu ietd: 59 65 73 00 | 4d 61 78 52 | 65 63 76 44 | 61 74 61 53 | Yes MaxRecvDataS |
Sep 9 16:33:47 ubuntu ietd: 65 67 6d 65 | 6e 74 4c 65 | 6e 67 74 68 | 3d 32 36 32 | egmentLength=262 |
Sep 9 16:33:47 ubuntu ietd: 31 34 34 00 | | | | 144 |

There everything works fine, now compare to what we get ...

Read more...

Stéphane Graber (stgraber) wrote :

Here is the result of a "grep -i auth" on the level 9 debug output of iscsid on the client.

iscsid: updated 'node.session.auth.authmethod', 'None' => 'None'
iscsid: updated 'node.session.auth.authmethod', 'None' => 'None'
iscsid: updated 'node.conn[0].timeo.auth_timeout', '45' => '45'
iscsid: updated 'node.session.auth.authmethod', 'None' => 'None'
iscsid: authentication setup complete...
iscsid: > AuthMethod=CHAP
iscsid: Login failed to authenticate with target iqn.2010-01.com.example:storage.lun1
iscsid: session 1 login rejected: Initiator failed authentication with target

This confirms that the client indeed tries to authenticate using CHAP instead of None and doesn't respect its own configuration.

Stéphane Graber (stgraber) wrote :

When was the last time that test was successful?

I had a quick look through the package history and can't find anything in open-iscsi that'd explain this bug (package almost didn't change since natty) and my VM now runs Natty's kernel and still reproduces the bug.

James Page (james-page) wrote :

Stephane

These tests passed during alpha3 testing.

James Page (james-page) wrote :

The change that is causing this issue to occur is in upstart:

http://launchpadlibrarian.net/76962956/upstart_1.3-0ubuntu5_1.3-0ubuntu6.diff.gz

If I revert the change to rc-sysinit.conf the system boots fine - open-iscsi is only run on IFACE lo which does an exit 0 so no iscsid is tarted.

With this change in place iscsid runs for lo and eth0 which then results in a large number of IO errors and general system mis-behaviour.

Stéphane Graber (stgraber) wrote :

so the problem is that the init script now does what it's supposed to do.

I did a quick test on Natty and can also confirm that back then iscsid was just exitting and never tried to setup iscsi at boot time.

Stéphane Graber (stgraber) wrote :

As a temporary workaround for beta2, I'd recommend we update the open-iscsi script to:
 - Check if /etc/iscsi/iscsi.initramfs exists
 - If it does, then check if "iscsiadm -m session" reports any established session
 - If it does, then "exit 0"

This will restore the old behaviour we had before the upstart fix, this still won't take care of the use case where your root is on an iscsi LUN and you have other LUNs that need to be mounted by open-iscsi, but my understanding is that it never worked.

affects: Ubuntu Oneiric → open-iscsi (Ubuntu Oneiric)
Changed in open-iscsi (Ubuntu Oneiric):
milestone: ubuntu-11.10-beta-2 → none
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package open-iscsi - 2.0.871-0ubuntu8

---------------
open-iscsi (2.0.871-0ubuntu8) oneiric; urgency=low

  * Disable open-iscsi init script when root is on iscsi (LP: #838809)
 -- Stephane Graber <email address hidden> Thu, 15 Sep 2011 10:01:12 -0400

Changed in open-iscsi (Ubuntu Oneiric):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers