Cannot enlist/commission machines in MAAS 2.1 with usb network adapter

Bug #1639202 reported by Richard Lovell
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
maas-images
Invalid
Undecided
Unassigned
initramfs-tools (Ubuntu)
Invalid
Undecided
Dave Chiluk
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

After upgrade from MAAS 2.0 to MAAS 2.1:

Cannot enlist/commission client machines via network boot (or pre-staging machine) when using usb-c network adapter D59GG (e.g. Precision 5510). This was working ok with MAAS 2.0.

Using the Xenial 16.04 base image for enlist/commission (no minimum kernel set).

Enlist/Commission/Deploy works fine with other laptop and desktop models which have a built in NIC.

From an already deployed Precision 5510 system (16.04.1) I can see the following module is loaded when the usb-c adapter is connected (and working):

$ lsmod
usbnet 45056 1 cdc_ether

Errors received on client during enlisting with MAAS:

no /run/net-bootif.conf
lvmetad is not activated
...invalid path for logical volume.
gave up waiting for root device
common problems: boot args (cat /proc/cmdline)
check rootdelay = (did system wait long enough)
check root = (did the system wait for the right device?)
missing modules (cat /proc/modules)
ALERT! /dev/disk/by-path/ip-<ipaddress>:3260-iscsi-iqn,2004.05.com.ubuntu:maas:ephemeral-ubuntu-amd64-generic-xenial-daily-lun-1 does not exist. Dropping to shell!
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver

- cat /proc/cmdline shows mac address that of usb-c network adapter.
- cat /proc/modules includes “usbnet 45056 cdc_ether, Live 0xffffffffc009c000”
- cat /proc/modules | grep usb … includes “ usbnet 45056 and usbhid 49152”

I've tried enlisting with older boot image and different kernel versions (14.04 and 16.04 with ga-16.04, hwe-16.04 set) but get the same problem.

It seems like the usb-c network adapter isn't loading properly or maybe just not quickly enough?

Please let me know if you require any more info. I can provide info from /var/log/maas/* and dpkg -l '*maas*'|cat if need be.

Tags: sts
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Richard,

This seems like an issue with images, or the kernel rather than MAAS. We'll need to do some more investigation here.

Changed in maas:
status: New → Incomplete
Revision history for this message
Richard Lovell (ralovell) wrote :

Hi Andres,
Thanks for picking this up.
Let me know if you require more info from me
thanks

Dave Chiluk (chiluk)
tags: added: sts
Revision history for this message
Dave Chiluk (chiluk) wrote :

I'm adding the kernel to this, as I'm fairly certain the kernel is responsible for what modules are built into the default initramfs'.

In this case it looks like cdc_ether and usbnet need to be added into the initramfs in order to solve a case like this.

I'm not sure if this should be done in the kernel for all installations, or if it should be done via only maas-images

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1639202

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Richard Lovell (ralovell) wrote :

I'm unable to run apport-collect 1639202 from the Precision 5510 after the enlist stage fails.

Dave Chiluk (chiluk)
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Dave Chiluk (chiluk) wrote :

Please ignore the comment from brad-figg as the above comment originated from a bot.

That being said, I've had a bit of an opportunity to research this. Initramfs-tools is responsible for updating the initramfs and including modules in the initramfs. Unfortunately I don't think that adding usbnet and cdc_ether to the default initramfs produced by the distro is the correct course of action on this case.

That being said, it may be possible to create the maas images with all network modules in it as that would make more sense, in my opinion. I guess this is why Andre originally opened this against maas-images.

Revision history for this message
Dave Chiluk (chiluk) wrote :

I just checked the maas daily images, and both the 4.4.0-45 xenial and 4.8.0-27 yakkety initrd contain the cdc_ether kernel drivers. I also checked the 20160310 maas image that contains the 4.4.0-11-generic kernel, and it contains the drivers as well.

So I now suspect that the initramfs is not correctly configuring the network adapter.

Revision history for this message
Dave Chiluk (chiluk) wrote :

@Richard Lovell
Can you please provide me the console log messages that exist above the "no /run/net-bootif.conf"?

Additionally can I get the output of following commands from the laptop after the message "Dropping to shell!"
- ip a # I'm not sure if this available, but ifconfig or similar would be helpful. Basically I need the name of the nic as discovered. I have a feeling that the initramfs is not correctly checking for this name when it attempts to bring up the interfaces.
- lsmod
- cat /proc/cmdline

More commands may be necessary after I have this output and am able to investigate the initramfs.

Thanks

Changed in maas:
status: Incomplete → Invalid
Changed in initramfs-tools (Ubuntu):
assignee: nobody → Dave Chiluk (chiluk)
Revision history for this message
Richard Lovell (ralovell) wrote :

Hi Dave,

The following output appears. I don't see an error relating to "no net-BOOTIF.conf" now...just what's below...

ipconfig: BOOTIF: SIOCGIFINDEX: no such device
ipconfig: no devices to configure
/scripts/local-top/scsi: line 515: can't open '/run/net-BOOTIF.conf'
lvmetad is not active yet, using direct activation during sysinit

ip a # gives:
1: lo <LOOPBACK> mtu 65536 qdisc noop qlen 1
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enx847beb55c195: <BROADCAST, MULTICAST> mtu 1500 qdisc noop qlen 1000
 link/ether 84:7b:eb:55:c1:95 brd ff:ff:ff:ff:ff:ff

cat /proc/cmdline gives the mac of the usb-c network adapter (different mac to the previous command - above):

BOOT_IMAGE=ubuntu/amd64/generic/xenial/daily/boot-kernel nomodeset isci_target_name=iqn.2004-05.com.ubuntu:maas:ephermeral-ubuntu-amd64-generic-xenial-daily iscsi_target_ip=<ip of maas server> iscsi_target_port=3260 iscsi_initiator=maas-enlist ip=::::maas-enlist:BOOTIF ro root=/dev/disk/by-path/ip-<ip of maaas server>:3260-iscsi-iqn.2004-05.com.ubuntu:maas:ephemeral-ubuntu-amd64-generic-xenial-daily-lun-1 overlayroot=tmpfs cloud-config-url=http://<ip of maas serrver>:5420/MAAS/metadata/latest/enlist-preseed/?op=get_enlist_preseed log_host=<ip of maas server> log_port=514 initrd=ubuntu/amd64/generic/xenial/daily/boot-initrd BOOTIF=01-9c-eb-e8-3c-52-cc

lsmod gives:

/bin/sh: lsmod: not found

ipconfig:

ipconfig: no devices to configure

Other info:
Gave up waiting for root device. Common problems:
- Boot args (cat /proc/cmdline)
  - Check rootdelay = ( did the system wait long enough)
  - Check root= (did the system wait for the right device?)
- Missing modules (cat /proc/modules; ls /dev)
ALERT ..dropping to shell.

regards
Richard

Revision history for this message
Dave Chiluk (chiluk) wrote :

So from the best I can determine it looks like configure_networking out of scripts/functions of the initramfs is failing to properly bring up the network device. I'm going through the process now to figure out what logic might actually be stopping it from functioning. I'll let you know more when I have something more concrete.

Revision history for this message
Dave Chiluk (chiluk) wrote :

From the initramfs shell can you provide me the output for

ls -la /sys/class/net/*
and
cat /sys/class/net/*/address

I have a feeling you'll only see lo and enx847beb55c195. Specifially I'm looking for the device name that matches your BOOTIF mac address. However if it's showing up please attempt to run
$ ipconfig -t 100 <name of device>

Then see if the nic comes up from there with ip a.

Revision history for this message
Dave Chiluk (chiluk) wrote :

If the device is not being detected I have a feeling there may be something going wrong with udev.

Revision history for this message
Richard Lovell (ralovell) wrote :

more info, as requested:

ls -la /sys/class/net/*

lrwxrwxrwx 1 0 /sys/class/net/lo -> ../../devices/virtual/net/lo

lrwxrwxrwx 1 0 /sys/class/net/enx847beb55c195 -> ../../devices/pci0000:00/0000:00:1d.6/0000:06:00.0/0000:07:02.0/0000:0a:00.0/usb3/3-1/3-1:1.0/net/enx847beb55c195

cat /sys/class/net/*/address

84:7b:eb:55:c1:95
00:00:00:00:00:00

so there's no reference to 9c-eb-e8-3c-52-cc, which is the mac displayed via BOOTIF =

Revision history for this message
Eric Desrochers (slashd) wrote :

Hi Richard,

Thanks for updating the bug with Dave's latest suggestion in order to isolate this particular situation.

We will come back to you soon with the next steps.

Regards,
Eric

Revision history for this message
Dave Chiluk (chiluk) wrote :

1. So this indicates to me that the kernel is unable to control the usb device. When you mentioned earlier that you were able to investigate the usb-c device with a 16.04.1 machine, what exact kernel version was that machine running. uname -a output should be sufficient.

2. Can you similarly provide the exact kernel version that you are using during commissioning, when dropped to shell?

3. Were you able to utilize the device under this kernel or did it simply load the kernel driver there as well? I.e. do you see a device for the usb device in /sys/class/net/?

4. Also can you please provide the lsusb -vv output from that machine as well. I'd like to look up the device id's against the cdc_ether driver to see if the device id was added to one of the later 4.4 kernels.

5. Also are you using stable images or daily images. If you haven't done so already can you enable the daily images, and check using those?
Instructions are available here
http://maas.io/docs/en/installconfig-images
The images I'm referring to are available here
https://images.maas.io/ephemeral-v2/daily/

Thank you,

Revision history for this message
Dave Chiluk (chiluk) wrote :

One of my collegues informed me that maas 2.1 is meant to use
https://images.maas.io/ephemeral-v3/daily/
Instead of the ephemeral-v2 images.

Revision history for this message
Richard Lovell (ralovell) wrote :

1. 16.04 kernel version = 4.4.0-42-generic is the result of uname -r from the 5510 that already had 16.04 installed.

2. 4.4.0-47-generic is the kernel version used during commissioning (when it drops to shell, this is what uname -r provides)

3. In /sys/class/net I see: enx847beb55c195 -> ../../devices/pci0000:00/0000:00:1d.6/0000:06:00.0/0000:07:02.0/0000:0a:00.0/usb4/4-1/4-1:1.0/net/enx847b4b55c195 ( so the usb is present here)

4. lsbusb -vv output is attached.

5. We're using daily images... v3 is being used to download from https://images.maas.io/ephemeral-v3/daily. Then it seems to use v1 to import into the local rack controller (see below from maas.log).

Nov 30 07:07:32 servername maas.import-images: [INFO] Downloading image descriptions from http://images.maas.io/ephemeral-v3/daily/
Nov 30 07:07:33 servername maas.bootsources: [INFO] Updated boot sources cache.
Nov 30 07:07:33 servername maas.bootresources: [INFO] Started importing of boot images from 1 source(s).
Nov 30 07:07:33 servername maas.import-images: [INFO] Downloading image descriptions from http://images.maas.io/ephemeral-v3/daily/
Nov 30 07:07:34 servername maas.bootresources: [INFO] Importing images from source: http://images.maas.io/ephemeral-v3/daily/
Nov 30 07:07:36 servername maas.bootresources: [INFO] Finished importing of boot images from 1 source(s).
Nov 30 07:07:36 servername maas.import-images: [INFO] Started importing boot images.
Nov 30 07:07:36 servername maas.import-images: [INFO] Downloading image descriptions from http://localhost:5240/MAAS/images-stream/streams/v1/index.json
Nov 30 07:07:36 servername maas.import-images: [INFO] Finished importing boot images, the region does not have any new images.

Revision history for this message
Dave Chiluk (chiluk) wrote :

Alright so the problem at present appears to be that the machine is pxe booting off of a nic with a mac address that is not showing up after the kernel boots.

The way the boot works is the bios/efi launches a pxe network stack. This typically makes a dhcp request. The DHCP server responds with an IP address, and the address of the PXE/TFTP server *(in this case the maas server). The network stack firmware on the client then requests the kernel, initramfs and kernel arguments from the PXE server. The bios/efi pxe network stack then downloads this, and executes the kernel.

One of the arguments maas is responding with BOOTIF=01-9c-eb-e8-3c-52-cc. This means the original pxe request originates from this mac address. When the initramfs starts it runs a script function called configure_networking that attempts to set up the BOOTIF=01-9c-eb-e8-3c-52-cc NIC, but it doesn't appear to exist to the OS.

This could mean a few things.
- The NIC doing the initial pxe request is different than the usb-c one. Is there a chance that there's a wireless nic that has a pxe stack that you've configured? I know some newer machines are able to pxe boot off of their network cards so this would be useful to check.
- The mac address is changing between the pxe request and the OS boot.
- IPv6 is in the mix. Are you attempting to boot via ipv6?
- The PXE server is responding with the incorrect mac address in BOOTIF.
The last two can be checked by looking at /var/log/rackd.log on your maas server. You should be able to grep for 01-9c-eb-e8-3c-52-cc or 01-84-7b-eb-55-c1-95 in the rackd.log to see which nic is making the pxe request. If 01-9c-eb-e8-3c-52-cc shows up in the rackd.log then it's pretty definitive that the issue is booting using a nic with that mac somehow.

Please check the above and let me know what you discover.

Thanks,
Dave.

Revision history for this message
Richard Lovell (ralovell) wrote :

- disabled wireless in BIOS. booted from USB NIC. I get the same problem.
- It's booting via ip4. I've attached the log info from rackd.log showing the entries during the dhcp process (ipv4.txt).
- Also, another txt file added showing the grep results from the /var/log/maas/rackd.log for 01-9c-eb-e8-3c-52-cc. It's present, but 01-84-7b-eb-55-c1-95 isn't.

Revision history for this message
Richard Lovell (ralovell) wrote :

additional info (as requested).

Revision history for this message
Dave Chiluk (chiluk) wrote :

Do you know where the 9c-eb-e8-3c-52-cc originates from? There is nothing we can do so long as the request to maas originates from the 9c-eb-e8-3c-52-cc mac address. If the firmware is somehow masking the actual mac address with the above then that needs to be fixed in the firmware, or via a firmware setting.

Please attempt to figure out where the 9c-eb-e8-3c-52-cc is.
If it helps a mac search on 9c-eb-e8 yeilds: BizLink (Kunshan) Co. Ltd.
A mac search of 84-7b-eb yeilds Dell Inc.

I suspect they are really the same device. I have a feeling the usb-c adapter or docking station you are using is really just Bizlink device that Dell has rebranded.

This really looks like a firmware bug, where the firmware uses the non-rebranded mac of the device for pxe while in efi, and the rebranded mac address in the OS after boot.

Have you tried updating the firmware on your machine, and possibly also the firmware of the usb-c device?

Thanks,
Dave Chiluk

Revision history for this message
Dave Chiluk (chiluk) wrote :

Another thought is that the pxe firmware of the usb-c device was missed in the rebranding process by dell. This is actually quite likely in my opinion.

Either way we need to engage Dell in order to remedy this.

Revision history for this message
Dave Chiluk (chiluk) wrote :

So apparently this is a feature of the dell-branded usb-c devices. Please see the knowledge base.

http://www.dell.com/support/article/us/en/04/SLN301147

I've heard back from our contacts at dell, and the issue you are seeing is apparently resolved via a firmware update.

Revision history for this message
Richard Lovell (ralovell) wrote :

9c-eb-e8-3c-52-cc is the mac address of the USB C Network adapter. Apologies that I hadn't already made this clear on this thread.

I'd already looked into the mac address pass-through issue before opening this case. At the time, upgrading the BIOS from 1.2.10 to 1.2.14 didn't work. While running MAAS 2.0, BIOS version 1.2.10 worked fine.

There was a BIOS update (1.2.16) released by Dell on the 1st Dec. This seems to have fixed it! To be sure I tested with BIOS version 1.2.10 and 1.2.14 again, but got the same problem. So it looks like the fix is included with 1.2.16.

Now, when I boot up, I immediately see the mac address of the on-board device (not the usb-c)...which allows me to install Ubuntu 16.04.1 from MAAS 2.1 to the Dell Precision 5510.

Dave Chiluk (chiluk)
Changed in initramfs-tools (Ubuntu):
status: New → Invalid
Changed in maas-images:
status: New → Invalid
Revision history for this message
Spyderdyne (spyderdyne) wrote :
Download full text (8.7 KiB)

This is happening to me on an Intel NUC 5i5MYHE blade on PXE. The error is an iSCSI destination error apparently. Same scenario...

Maas version: MAAS Version 2.1.3+bzr5573-0ubuntu1 (16.04.1)

On node PXE boot console output (sorry, but some of this is cut off on this screen, typing what I can see. will switch to another monitor and reproduce if this is not enough of the error):

lvmetad is not active yet, using sysinit

disk/by-path/ip-192.168.199.2:3260-iscsi-iqn.2004-05.com.ubuntu:maas:ephemeral...
...ubuntu-amd64-generic-xenial-daily-lun-1"

tgtd on the MaaS node seems to be running fine but netstat output does not show the initiator contacting it when I search for the IP that MaaS assigned on PXE init.

tgtd MaaS node is Raspberry Pi 3B, so not a Dell issue specifically:

root@juju-rack2:~# cat /etc/issue
Ubuntu 16.04.1 LTS \n \l

root@juju-rack2:~# uname -r
4.4.43-v7+

MAAS Version 2.1.3+bzr5573-0ubuntu1 (16.04.1)

Intel blade is also running PCIe solid state drive, not a USB device.

Fails about every 60 seconds according to maas.log:

"Feb 8 21:47:17 juju-rack2 maas.service_monitor: [error] While monitoring service 'tgt' an error was encountered: Unable to parse the active state from systemd for service 'tgt', active state reported as 'deactivating'.
Feb 8 21:48:18 juju-rack2 maas.service_monitor: [error] While monitoring service 'tgt' an error was encountered: Unable to parse the active state from systemd for service 'tgt', active state reported as 'deactivating'.
Feb 8 21:49:17 juju-rack2 maas.service_monitor: [error] While monitoring service 'tgt' an error was encountered: Unable to parse the active state from systemd for service 'tgt', active state reported as 'deactivating'."

/var/log/rackd/log output:

2017-02-08 21:32:23 sstreams: [info] maas:v2:download/maas:boot:ubuntu:amd64:hwe-16.04:xenial: to_add=['20170207'] to_remove=[]
2017-02-08 21:32:23 sstreams: [info] maas:v2:download/maas:boot:ubuntu:armhf:hwe-16.04:xenial: to_add=['20170207'] to_remove=[]
2017-02-08 21:32:23 sstreams: [info] maas:v2:download/maas:boot:ubuntu:amd64:ga-16.04-lowlatency:xenial: to_add=['20170207'] to_remove=[]
2017-02-08 21:32:23 sstreams: [info] maas:v2:download/maas:boot:grub-efi:arm64:generic:uefi: to_add=['20170125.0'] to_remove=[]
2017-02-08 21:32:23 sstreams: [info] maas:v2:download/maas:boot:grub-efi-signed:amd64:generic:uefi: to_add=['20170125.0'] to_remove=[]
2017-02-08 21:32:23 sstreams: [info] maas:v2:download/maas:boot:ubuntu:armhf:ga-16.04:xenial: to_add=['20170207'] to_remove=[]
2017-02-08 21:32:23 sstreams: [info] maas:v2:download/maas:boot:ubuntu:armhf:generic-lpae:xenial: to_add=['20170207'] to_remove=[]
2017-02-08 21:32:23 sstreams: [info] maas:v2:download/maas:boot:ubuntu:armhf:hwe-16.04-edge:xenial: to_add=['20170207'] to_remove=[]
2017-02-08 21:32:23 sstreams: [info] maas:v2:download/maas:boot:ubuntu:amd64:ga-16.04:xenial: to_add=['20170207'] to_remove=[]
2017-02-08 21:32:23 sstreams: [info] maas:v2:download/maas:boot:centos:amd64:generic:centos70: to_add=['20161201_01'] to_remove=[]
2017-02-08 21:32:23 sstreams: [info] maas:v2:download/maas:boot:grub-ieee1275:ppc64el:generic:open-firmware: to_add=['20170125.0'] to_remove...

Read more...

Revision history for this message
Spyderdyne (spyderdyne) wrote :

Image source: maas.io
Sync: completed

Revision history for this message
Spyderdyne (spyderdyne) wrote :

root@juju-rack2:/var/log/maas# maas refresh
root@juju-rack2:/var/log/maas# netstat -untap | grep 3260
tcp 0 0 0.0.0.0:3260 0.0.0.0:* LISTEN 26516/tgtd
tcp6 0 0 :::3260 :::* LISTEN 26516/tgtd

spyderdyne@juju-rack2:~$ service tgt status
‚óè tgt.service - (i)SCSI target daemon
   Loaded: loaded (/lib/systemd/system/tgt.service; enabled; vendor preset: enabled)
   Active: deactivating (stop-sigterm) (Result: exit-code)
     Docs: man:tgtd(8)
  Process: 5557 ExecStartPost=/usr/sbin/tgt-admin -e -c /etc/tgt/targets.conf (code=exited, status=22)
  Process: 5554 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)
 Main PID: 5552 (tgtd)
   Status: "Starting event loop..."
   CGroup: /system.slice/tgt.service
           ‚îî‚îÄ5552 /usr/sbin/tgtd -f

Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgtd[5552]: tgtd: bs_thread_open(437) stopped the worker thread 4
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgtd[5552]: tgtd: bs_thread_open(437) stopped the worker thread 3
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgtd[5552]: tgtd: bs_thread_open(437) stopped the worker thread 2
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgtd[5552]: tgtd: bs_thread_open(437) stopped the worker thread 1
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgtd[5552]: tgtd: bs_thread_open(437) stopped the worker thread 0
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgt-admin[5557]: tgtadm: out of memory
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgt-admin[5557]: Command:
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgt-admin[5557]: tgtadm -C 0 --lld iscsi --op new --mode logicalunit --ti
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgt-admin[5557]: exited with code: 22.
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net systemd[1]: tgt.service: Control process exited, code=exited status=22

Out of memory. Mystery solved. ;)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.