Unable to boot 4.15 / 4.18 /5.0 kernel on an Intel Denlow SDP

Bug #1821573 reported by Po-Hsu Lin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Invalid
Undecided
Unassigned
linux (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned
Cosmic
Invalid
Undecided
Unassigned
Disco
Invalid
Undecided
Unassigned

Bug Description

Node "amaura" cannot be deployed with Bionic / Cosmic / Disco image.

I tried to connect to the IPMI console, there is no output after the prompt:
    Press <F2> to enter setup, <F6> Boot Menu, <F12> Network Boot

The system can be deployed with T/X without any issue, I tried to install the 4.15 HWE kernel on it, but it's not working as well.

BTW, I can see some RAM related messages probably indicates a hardware issue, not sure if it has something to do with that.
[ 0.000000] *BAD*gran_size: 4M chunk_size: 128M num_reg: 10 lose cover RAM: -112M
[ 0.000000] *BAD*gran_size: 4M chunk_size: 256M num_reg: 10 lose cover RAM: -128M
[ 0.000000] *BAD*gran_size: 4M chunk_size: 512M num_reg: 10 lose cover RAM: -192M
[ 0.000000] *BAD*gran_size: 4M chunk_size: 1G num_reg: 10 lose cover RAM: -192M
[ 0.000000] *BAD*gran_size: 4M chunk_size: 2G num_reg: 10 lose cover RAM: -192M
[ 0.000000] gran_size: 8M chunk_size: 8M num_reg: 10 lose cover RAM: 0G
[ 0.000000] *BAD*gran_size: 8M chunk_size: 16M num_reg: 10 lose cover RAM: -8M
....
[ 0.000000] gran_size: 256M chunk_size: 1G num_reg: 5 lose cover RAM: 232M
[ 0.000000] gran_size: 256M chunk_size: 2G num_reg: 5 lose cover RAM: 232M
[ 0.000000] gran_size: 512M chunk_size: 512M num_reg: 5 lose cover RAM: 488M
[ 0.000000] gran_size: 512M chunk_size: 1G num_reg: 5 lose cover RAM: 488M
[ 0.000000] gran_size: 512M chunk_size: 2G num_reg: 5 lose cover RAM: 488M
[ 0.000000] gran_size: 1G chunk_size: 1G num_reg: 4 lose cover RAM: 1000M
[ 0.000000] gran_size: 1G chunk_size: 2G num_reg: 5 lose cover RAM: 1000M
[ 0.000000] gran_size: 2G chunk_size: 2G num_reg: 2 lose cover RAM: 3048M
[ 0.000000] mtrr_cleanup: can not find optimal value
[ 0.000000] please specify mtrr_gran_size/mtrr_chunk_size
[ 0.000000] e820: last_pfn = 0x4e000 max_arch_pfn = 0x400000000
[ 0.000000] found SMP MP-table at [mem 0x000fc980-0x000fc98f] mapped at [ffff8800000fc980]
[ 0.000000] Scanning 1 areas for low memory corruption
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] RAMDISK: [mem 0x33466000-0x35a2afff]

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-generic-hwe-16.04 4.15.0.47.68
ProcVersionSignature: User Name 4.4.0-143.169-generic 4.4.170
Uname: Linux 4.4.0-143-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.18
Architecture: amd64
Date: Mon Mar 25 09:51:36 2019
SourcePackage: linux-meta-hwe
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
summary: - Unable to boot 4.15 / 4.18 /5.0 kernel on Intel Denlow SDP
+ Unable to boot 4.15 / 4.18 /5.0 kernel on an Intel Denlow SDP
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1821573

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: New → Incomplete
Changed in linux (Ubuntu Cosmic):
status: New → Incomplete
Po-Hsu Lin (cypressyew)
tags: added: regression-release
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Tried old 4.15 kernels on Xenial.

4.15.0-43
4.15.0-45
4.15.0-46 (updates)
4.15.0-47 (proposed)

Non of them are working, however we got test reports from this node for 4.15.0-45 and 4.15.0-43.

So it looks like a (strange) HW issue to me.

tags: removed: regression-release
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

For node amaura, Sean has the memory replaced. But I can still see this BAD gran_size stuff in dmesg: https://pastebin.ubuntu.com/p/vH6bP9xkHd/

Also, I found another Dell PowerEdge R320 node "fozzie" is behaving like this as well. Works with T/X but not B/C/D. And not even with X + old 4.15 kernel.
The IPMI console is not helping too...

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Found another node "naumann", it's only working with X, not B/C/D

On node amaura, with earlyprintk added to the grub cmdline.
I can see the output with 4.15.0-47 stopped at:
    [ 0.000000] Console: colour VGA+ 80x25

https://pastebin.ubuntu.com/p/tph3dtWzSr/

Tried to boot with 4.15.0-30, it got the same behaviour.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue does not exist anymore. I can deploy these affected node now.

no longer affects: linux-meta-hwe (Ubuntu)
Changed in linux (Ubuntu Cosmic):
status: Incomplete → Invalid
Changed in linux (Ubuntu Disco):
status: Incomplete → Invalid
Changed in ubuntu-kernel-tests:
status: New → Invalid
Changed in linux (Ubuntu Bionic):
status: Incomplete → Invalid
Po-Hsu Lin (cypressyew)
no longer affects: linux-meta-hwe (Ubuntu Bionic)
no longer affects: linux-meta-hwe (Ubuntu Cosmic)
no longer affects: linux-meta-hwe (Ubuntu Disco)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.