unable to deploy 20.04(focal) w/ default kernel for sut after images update on Oct 5

Bug #1898808 reported by Billy Jan
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Note: Per comment #29, this was working up until Oct 5. It looks to me like the last update to the focal boot images are also dated Oct 5 coinciding with the appearance of this issue.

1. maas:2.4.2 (7034-g2f5deb8b8-0ubuntu1)
2. 18.04(bionic) is good to deploy on the same sut/env.
3. deployment failed on SUT pxe boot stage w/ error below,
"Booting under MAAS direction...
error: timeout reading 'ubuntu/amd64/ga-20.04/focal/daily/boot-initrd'.

press any key to continue..."

4.cross check for 'boot-initrd' file had been found on MAAS sever(ref attachment)
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.17
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
InstallationDate: Installed on 2020-10-06 (0 days ago)
InstallationMedia: Ubuntu-Server 18.04.3 LTS "Bionic Beaver" - Release amd64 (20190805)
IwConfig:
 lo no wireless extensions.

 eno1np0 no wireless extensions.

 eno2np1 no wireless extensions.
MachineType: Supermicro Super Server
Package: linux (not installed)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=3a412367-9f1c-419b-8d53-39c529aefb9f ro
ProcVersionSignature: Ubuntu 4.15.0-118.119-generic 4.15.18
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-118-generic N/A
 linux-backports-modules-4.15.0-118-generic N/A
 linux-firmware 1.173.19
RfKill:

Tags: bionic uec-images
Uname: Linux 4.15.0-118-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 11/15/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1.0b
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: H12SSW-NT/iN
dmi.board.vendor: Supermicro
dmi.board.version: 0123456789
dmi.chassis.asset.tag: To be filled by O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1.0b:bd11/15/2019:svnSupermicro:pnSuperServer:pvr0123456789:rvnSupermicro:rnH12SSW-NT/iN:rvr0123456789:cvnSupermicro:ct17:cvr0123456789:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: Super Server
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.17
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
InstallationDate: Installed on 2020-10-06 (0 days ago)
InstallationMedia: Ubuntu-Server 18.04.3 LTS "Bionic Beaver" - Release amd64 (20190805)
IwConfig:
 lo no wireless extensions.

 eno1np0 no wireless extensions.

 eno2np1 no wireless extensions.
MachineType: Supermicro Super Server
Package: linux (not installed)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=3a412367-9f1c-419b-8d53-39c529aefb9f ro
ProcVersionSignature: Ubuntu 4.15.0-118.119-generic 4.15.18
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-118-generic N/A
 linux-backports-modules-4.15.0-118-generic N/A
 linux-firmware 1.173.19
RfKill:

Tags: bionic uec-images
Uname: Linux 4.15.0-118-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 11/15/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1.0b
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: H12SSW-NT/iN
dmi.board.vendor: Supermicro
dmi.board.version: 0123456789
dmi.chassis.asset.tag: To be filled by O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1.0b:bd11/15/2019:svnSupermicro:pnSuperServer:pvr0123456789:rvnSupermicro:rnH12SSW-NT/iN:rvr0123456789:cvnSupermicro:ct17:cvr0123456789:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: Super Server
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro

Revision history for this message
Billy Jan (billyjan-smc) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1898808

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: focal
Revision history for this message
Billy Jan (billyjan-smc) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected bionic uec-images
description: updated
Revision history for this message
Billy Jan (billyjan-smc) wrote : CRDA.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : Lspci.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : Lsusb.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : ProcModules.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : UdevDb.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : WifiSyslog.txt

apport information

description: updated
Revision history for this message
Billy Jan (billyjan-smc) wrote : AlsaInfo.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : CRDA.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : Lspci.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : Lsusb.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : ProcModules.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : UdevDb.txt

apport information

Revision history for this message
Billy Jan (billyjan-smc) wrote : WifiSyslog.txt

apport information

Jonas (jpf-3)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Jeff Lane  (bladernr)
affects: linux (Ubuntu) → maas
affects: maas → linux (Ubuntu)
Jeff Lane  (bladernr)
tags: added: blocks-hwcert-server
Revision history for this message
acd (alecd-smc) wrote : Re: [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut

This is to confirm the issue as being reported here. I have the same issue as with other two systems that I’m trying to certify with 20.04LTS. Both systems could not be deployed. Below is the snapshot.

Thanks
Alec

Jonas (jpf-3)
Changed in maas-images:
status: New → Confirmed
Revision history for this message
Jeff Lane  (bladernr) wrote :

Alec, Billy: are these the same MAAS environments you've previously used to deploy 20.04 for other work, or have you rebuilt the MAAS environments recently?

Revision history for this message
Jonas (jpf-3) wrote :

Same MAAS environment for me, it seems to be related to the new focal image that was added on the 5th of October.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Note for investigation: Is it possible that the kenrel and initrd boot media are not updated and thus you're seeing the Carlsville NIC problem with old initrd? This could mean we need to rebuild initrd and push that to the stream?

Revision history for this message
Jonas (jpf-3) wrote :

Just to clarify I was able to deploy 20.04 on a machine with the same hardware configuration in the week before the 5th without problem. I did not update MAAS or anything else in the meantime, so my assumption is that MAAS must have synced the new image and the new image is not working.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Thanks, yes, MAAS will sync using a nightly cron job, I just wasn't quite sure how often that happens so I had to ask.

summary: [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut
+ after images update on Oct 5
description: updated
Revision history for this message
Jeff Lane  (bladernr) wrote : Re: [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut after images update on Oct 5

Jonas, Billy, Alec: Can you provide me the output of the following:

$ ls -l /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily
$ sha256sum /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/*

I suspect I know what the answers will be, but I want to verify that.

Revision history for this message
Billy Jan (billyjan-smc) wrote :

Hi Jeff,
here is outputs from my maas server,

root@lab-virtual-machine:~#
root@lab-virtual-machine:~# ls -l /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/
total 460488
-rw-r--r-- 3 maas maas 86260608 十 6 11:23 boot-initrd
-rw-r--r-- 3 maas maas 11678464 十 6 11:23 boot-kernel
-rw-r--r-- 4 maas maas 373587968 十 6 11:22 squashfs
root@lab-virtual-machine:~#
root@lab-virtual-machine:~# sha256sum /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/*
273212b1858bc4441b392900123f1433733023986d542e8d2caf458fbb48edb2 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/boot-initrd
1a8aec22331f411cdbc61c367b7397cf0d6cda7a85afc94c7ebb64ec478c32b8 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/boot-kernel
0caa3059361ab22a75f9797834ff7bcce372621919122a7d8289b66a1a9c8084 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/squashfs
root@lab-virtual-machine:~#

Revision history for this message
Jonas (jpf-3) wrote :

Hi Jeff,

I don't have daily but only stable however the checksum seems to match with Billy's:

~$ ls -l /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/stable/
total 460484
-rw-r--r-- 3 maas maas 86260608 Oct 15 19:58 boot-initrd
-rw-r--r-- 3 maas maas 11678464 Oct 15 19:58 boot-kernel
-rw-r--r-- 4 maas maas 373587968 Oct 15 19:58 squashfs

~$ sha256sum /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/stable/*
273212b1858bc4441b392900123f1433733023986d542e8d2caf458fbb48edb2 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/stable/boot-initrd
1a8aec22331f411cdbc61c367b7397cf0d6cda7a85afc94c7ebb64ec478c32b8 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/stable/boot-kernel
0caa3059361ab22a75f9797834ff7bcce372621919122a7d8289b66a1a9c8084 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/stable/squashfs

Changed in maas-images:
assignee: nobody → Lee Trager (ltrager)
importance: Undecided → Critical
Revision history for this message
acd (alecd-smc) wrote :

Hi Jeffrey,

Here's the output of the commands you requested:

1. certuser@maas216-cert:~$ ls -l /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily
********************************
total 460480
-rw-r--r-- 3 maas maas 86260608 Oct 6 00:25 boot-initrd
-rw-r--r-- 3 maas maas 11678464 Sep 29 01:57 boot-kernel
-rw-r--r-- 4 maas maas 373587968 Oct 6 00:25 squashfs
********************************

2. certuser@maas216-cert:~$ sha256sum /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/*
********************************
273212b1858bc4441b392900123f1433733023986d542e8d2caf458fbb48edb2 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/boot-initrd
1a8aec22331f411cdbc61c367b7397cf0d6cda7a85afc94c7ebb64ec478c32b8 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/boot-kernel
0caa3059361ab22a75f9797834ff7bcce372621919122a7d8289b66a1a9c8084 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/squashfs
********************************

Thanks
Alec

Revision history for this message
Lee Trager (ltrager) wrote :

Last week we split the image stream hosted at images.maas.io into candidate and stable. daily now redirects to stable. There have been no changes to the images, only the label has been changed. lp:maas-images only downloads the SquashFS from http://cloud-images.ubuntu.com/ and pulls the kernel out of the Debian package. It has no control over the contents of the image or kernel.

no longer affects: maas-images
Revision history for this message
Jeff Lane  (bladernr) wrote :

Thanks Lee... I think there are two different issues that need to be resolved.

First, is that we recently introduced patches to the 5.4 kernel for Intel x710 NICs because 5.4 only saw some of the NIC ports. So to solve that problem, we would need new maas images spun that include the latest 5.4 kernel SRU for the boot-kernel and boot-initrd, so that MAAS deployments will succeed on systems that use the Intel X710 LOM.

Second, is this issue. The two look similar at first, BUT, as Alec reported, after Oct 5, focal deployments that DID work are now not working. And looking at the images on maas.io, the most recent focal ones were from Oct 5, which would hint to me that something broke in that image update.

Lee, I wonder¸ is there a way for them to download the older images from here:
 http://images.maas.io/ephemeral-v3/stable/focal/amd64/20200930/ga-20.04/generic/

to see if restoring the older maas images and boot media resolves the issue?

I looked on one of my MAAS servers and it has only the Oct 5 update, and on another MAAS server, it only has the Sept 30 update. So am I correct in thinking that downloading that media to a new subdirectory in /var/lib/maas/boot-resources and then pointing the "current" symlink to that older media would allow them to boot using the older Sept 30 kernel and initrd?

Just curious if that would work. If for no other reason than to verify that indeed the breakage is in the Oct 5 images as a sort of bisect.

Revision history for this message
Jonas (jpf-3) wrote :

Jeff, I synced the newest image from the 20th but this image still has the same problem. BTW our machines do not have Intel X710 NICs but some old Intel I350.

Revision history for this message
Jonas (jpf-3) wrote :

The problem persists with the newest image from the 27th...

Alberto Donato (ack)
summary: - [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut
- after images update on Oct 5
+ unable to deploy 20.04(focal) w/ default kernel for sut after images
+ update on Oct 5
Revision history for this message
Björn Tillenius (bjornt) wrote :

Is this still an issue?

Changed in maas:
status: New → Incomplete
Revision history for this message
Jonas (jpf-3) wrote :

I upgraded MAAS to 2.7 and since then haven't had a problem. But the problem might still persist on 2.4.2 which I believe is default on 18.04.

Revision history for this message
Björn Tillenius (bjornt) wrote :

Ok, that's good to hear. I'm going to close this issue, since you're not seeing it again. But please feel free to reopen it if you see it again, so that we can do more debugging.

Changed in maas:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.