Cannot use open-iscsi inside LXC container

Bug #1226855 reported by Elizabeth K. Joseph on 2013-09-17
This bug affects 12 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
lxc (Ubuntu)

Bug Description

Trying to use open-iscsi from within an LXC container, but the iscsi netlink socket does not support multiple namespaces, causing: "iscsid: sendmsg: bug? ctrl_fd 6" error and failure.

Command attempted: iscsiadm -m node -p $ip:$port -T $target --login

Results in:

Exit code: 18
Stdout: 'Logging in to [iface: default, target: $target, portal: $ip,$port] (multiple)'
Stderr: 'iscsiadm: got read error (0/0), daemon died?
iscsiadm: Could not login to [iface: default, target: $target, portal: $ip,$port].
iscsiadm: initiator reported error (18 - could not communicate to iscsid)
iscsiadm: Could not log into all portals'

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: lxc 0.9.0-0ubuntu3.4
ProcVersionSignature: Ubuntu 3.8.0-30.44-generic
Uname: Linux 3.8.0-30-generic x86_64
ApportVersion: 2.9.2-0ubuntu8.3
Architecture: amd64
Date: Tue Sep 17 14:38:08 2013
InstallationDate: Installed on 2013-01-15 (245 days ago)
InstallationMedia: Xubuntu 12.10 "Quantal Quetzal" - Release amd64 (20121017.1)
MarkForUpload: True
SourcePackage: lxc
UpgradeStatus: Upgraded to raring on 2013-05-16 (124 days ago)

Elizabeth K. Joseph (lyz) wrote :
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug.

Your example command says '$ip:$port'. Is the iscsid running on the host
or in the container? Is $ip the ip of the host?

If $ip is the host ip and you just want iscsiadm in the guest to talk to iscsid on the
host, that should work.

There are several ways depending on your configuration where netlink sockets
might be being attempted. Could you show strace -f output to show exactly
which fails? (iscsiadm itself should only fail if you're trying offload, which it
doesn't look like you are)

Netlink sockets are per-netns, so if you want to be able to connect to a
netlink socket from another netns, then something will need to open a
socket from the target netns and pass that into the other ns. (This
could be arranged with setns, but only from the host).

Changed in lxc (Ubuntu):
status: New → Incomplete
Clint Byrum (clint-fewbar) wrote :

I can't answer all of the questions, but the basic idea is that an LXC container could mount an iscsi target from inside the container with very little if any cooperation from the host's user space.

I believe other similar systems like nbd use ioctl's to configure such devices, but iscsi uses netlink which I believe is the krux of the problem.

Changed in lxc (Ubuntu):
status: Incomplete → New
Serge Hallyn (serge-hallyn) wrote :


Thanks. Then I see three possible workarounds:

1. The simplest way would be to have iscsid running on the host, and connect to it over tcp from the container.

2. You could also have a container without its own network namespace, and have iscsid running there.

3. You could open the netlink socket from the host network namespace, and pass that into the container.

If none of these suffices, then I'll mark this as affecting the kernel, and it'll take a new kernel feature to make this work. However controlling host devices from a container is in general deemed suboptimal (see user namespaces which may not access many devices at all). To solve the netlink part of the issue we would have to come up with a way to choose which containers may access the netlink socket.

It would still be useful for future consideration of this bug if you could attach an strace of the netlink failure to this bug.

Changed in lxc (Ubuntu):
status: New → Incomplete
Elizabeth K. Joseph (lyz) wrote :

Thanks for your reply, I'll chat with Robert and Clint to see if any of these solutions is reasonable for us.

As a reference point, here's the setup we're using:

Host has 2 VMs: An LXC and an qemu VM

The host has the iscsi_tcp module loaded, which then can be seen and used for the iscsi daemon within the container.

Now, what we're attempting to do is provision the qemu VM via the LXC container using OpenStack's baremetal provisioning tools in a virtualized environment (no nested KVM!), so loosely the procedure is: the LXC container boots the qemu image (we have a nifty power driver) and gives it an address via dnsmasq-dhcp, loads up some things via dnsmasq-tftp (this all works) and then we use iscsi to copy data to the qemu VM. Robert or Clint can chime in with more details (or to clarify/correct!).

Today I ran through the test again and connected to the iscsid daemon inside the container for your strace, attached is the output from: strace -p 1488 -o iscsid_strace_baremetal.txt

Serge Hallyn (serge-hallyn) wrote :

Thanks, that is an interesting strace. What is - is that the host? Could you also start the daemon in the container by hand under strace for a few seconds so we can see exactly how fd 6 is created? (Presumably it is a connection to iscsi_nl_sock, but I'm confused since (a) it managed to get connected and only was refused on send, and (b) if the daemon is talking over tcp then why is it doing netlink at all).

Elizabeth K. Joseph (lyz) wrote : is the IP of the qemu VM that the LXC container is attempting to provision.

I'll load up my test instance soon to get that additional strace.

Elizabeth K. Joseph (lyz) wrote :

Assuming I should use the init script for this, attached output from running the following in the container:

strace -o iscsi-start.txt service open-iscsi start

Serge Hallyn (serge-hallyn) wrote :

Thanks - unfortunately we need the -f flag added to strace to follow forks.

Elizabeth K. Joseph (lyz) wrote :

Aha! Attached: strace -f -o iscsi-start_f.txt service open-iscsi start

Changed in lxc (Ubuntu):
status: Incomplete → Confirmed
Changed in lxc (Ubuntu):
importance: Undecided → Wishlist
Changed in linux (Ubuntu):
importance: Undecided → Wishlist

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1226855

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

 total 0
 crw-rw---- 1 root audio 116, 1 Jul 31 21:36 seq
 crw-rw---- 1 root audio 116, 33 Jul 31 21:36 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: [Errno 2] No such file or directory
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
IwConfig: Error: [Errno 2] No such file or directory
Lspci: Error: [Errno 2] No such file or directory
Lsusb: Error: [Errno 2] No such file or directory
MachineType: Xen HVM domU
Package: lxc

 PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-32-generic root=/dev/xvda1 ro splash quiet vt.handoff=7
ProcVersionSignature: Ubuntu 3.13.0-32.57-generic
 linux-restricted-modules-3.13.0-32-generic N/A
 linux-backports-modules-3.13.0-32-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
Uname: Linux 3.13.0-32-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)

_MarkForUpload: True 11/28/2013
dmi.bios.vendor: Xen
dmi.bios.version: 4.1.5
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:bvr4.1.5:bd11/28/2013:svnXen:pnHVMdomU:pvr4.1.5:cvnXen:ct1:cvr: HVM domU
dmi.product.version: 4.1.5
dmi.sys.vendor: Xen

tags: added: apport-collected trusty

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Kevin Carter (kevin-carter) wrote :

strace -f -o open-iscsi-start-lxc.txt service open-iscsi start

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Kevin Carter (kevin-carter) wrote :

This is still an impacting issue, Curious if there has been any progress on it on any front?
Being that apport information was requested I've provided it from my running systems.
The apport information attached to this issue is from within the LXC container.

The host system is:
Ubuntu 14.04.01
Kernel: "3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux"
LXC Version: 1.0.4-0ubuntu0.1.

Jason Harley (redmind) wrote :

I'm also curious about an update on this issue. I'm running 14.04.1, Kernel 3.13.0-39-generic and LXC 1.0.6-0ubuntu0.1. 'strace' output of open-scsi looks basically the same as above.

As I understand it, this is related to iSCSI's Netlink implementation not being namespace aware? I would love to find a solution for this and am happy to work with any developer to help make it happen.

Quoting Jason Harley (<email address hidden>):
> I'm also curious about an update on this issue. I'm running 14.04.1,
> Kernel 3.13.0-39-generic and LXC 1.0.6-0ubuntu0.1. 'strace' output of
> open-scsi looks basically the same as above.
> As I understand it, this is related to iSCSI's Netlink implementation
> not being namespace aware? I would love to find a solution for this and
> am happy to work with any developer to help make it happen.

I personally won't have time to work on this this year. I'd recommend
simply sitting down and looking through the kernel code, and getting
your bearings through the netlink code for starters. I'm definitely
interested in this and hope to join in early next year.

M.Morana (mahmoh) wrote :

@Serge, Hi, had a customer ping me about this bug, any updates? Here's his explanation:

You may recall our conversation around iSCSI connectivity for binding block volume with host in OpenStack. The host is Linux container and block volume is on EMC VMAX. The I/O path is over iSCSI. The environment is Ubuntu 14.04 LTS x64. Our observation and KB found from internet is also given.

- Regular attach volume works fine (iSCSI login occurs between compute node and array target). In this context, Compute node is a physical host
- When creating bootable volume, controller node needs to perform iSCSI login for instance to copy. We are seeing issues with this. In this case, Compute node is an LXC container.
- Same isciadm commands that fail within the container, run fine when running outside the container (on physical controller host)

Looks like,

    There’s a iSCSI kernel bug with mounting a target within a container
    The issue is with multiple namespaces it appears.

KBs found:!topic/open-iscsi/GhKfxIix4ds

Do we know whether this is fixed?

Jason Harley (redmind) wrote :

@m.morana - to my knowledge modern kernel's still don't have a namespace aware ISCSI netlink implementation. In an OpenStack context, I recall seeing something about changing nova's volume attach code to use qemu's native iSCSI support which may be a workaround for iscsiadm and native block devices, but I haven't had a chance to look into it myself.

Mark Brown (mstevenbrown) wrote :

This is a blocking issue for users of iscsi-based storage HW on Openstack; is there any way of re-prioritizing this issue?

Kevin Carter (kevin-carter) wrote :

Bump if anyone has time to work on this it would be a huge benefit to the OpenStack community.

This is a blocking issue for us too as we're not able to fully use LXC containers in our OpenStack deployments. Specifically we can not run nova-compute in an lxc container due to issues with (RW) AF_NETLINK . In the os-ansible-deployment system we've gone back to installing nova-compute on the host machine as the default process which ensured we have functional deployments but it would be great to be able to change the default such that the deployment had everything containerized.

As mentioned, I have gear that I can dedicate to testing things out but I don't have time to work through the problems at present.

Serge Hallyn (serge-hallyn) wrote :

Quoting M.Morana (<email address hidden>):
> @Serge, Hi, had a customer ping me about this bug, any updates? Here's
> his explanation:

Sorry, no, I have not spent any time on this. As far as I know neither
has anyone on the kernel team, and I haven't seen it discussed on any
mailing lists.

From what I've seen, I'm asked about this once or twice per year, and
it's always deemed low priority. If it's now deemed high priority, then
we will simply need to find a person and time to do it. (I don't know
enough about iscsi to even guess as to the time to do it)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers