Thinkpad T430u won't boot without noapic workaround

Bug #1808418 reported by bmaupin
24
This bug affects 13 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
Bionic
New
Undecided
Unassigned

Bug Description

(For anyone coming to this bug for the first time, the current workaround is to add this to the kernel boot parameters: intremap=off)

I've used Ubuntu on this specific machine for years without problems, but starting with the 4.13 kernel that came with the 17.10 HWE and continuing into 18.04 kernels my computer would no longer boot, hanging at various screens:

- A blank screen with a flashing cursor
- A screen with this message:
Loading Linux 4.13.0.36-generic ...
Loading initial ramdisk ...
- A screen with kernel messages, ending in:
APCI: EC: interrupt blocked

I've encountered the bug in a number of kernels, including:
- 4.13.0-32
- 4.13.0-36
- 4.13.0-37
- 4.15.0-24
- 4.15.0-34
- 4.15.0-36
- 4.15.0-38
- 4.15.0-42
- 4.20.0-rc7

I was able to work around the issue by adding noapic to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default grub, e.g.:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash noapic"

I'm not entirely sure of the consequences of this workaround, but one thing I've noticed is significantly reduced battery life.

This bug seems very similar to what's described here for the Thinkpad E485/E585: https://evilazrael.de/node/401

I'll attach screenshots of various times I've encountered this bug over the last year.

$ lsb_release -rd
Description: Ubuntu 18.04.1 LTS
Release: 18.04

$ apt-cache policy linux-image-4.15.0-42-generic
linux-image-4.15.0-42-generic:
  Installed: 4.15.0-42.45
  Candidate: 4.15.0-42.45
  Version table:
 *** 4.15.0-42.45 500
        500 http://us.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
        100 /var/lib/dpkg/status

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-42-generic 4.15.0-42.45
ProcVersionSignature: Ubuntu 4.15.0-42.45-generic 4.15.18
Uname: Linux 4.15.0-42-generic x86_64
NonfreeKernelModules: wl
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: bryan 2349 F.... pulseaudio
CurrentDesktop: XFCE
Date: Thu Dec 13 15:31:15 2018
HibernationDevice: RESUME=UUID=4afa8032-cfe5-45f4-a626-738ab33904ac
InstallationDate: Installed on 2014-05-08 (1680 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417)
MachineType: LENOVO 3351CTO
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-42-generic root=UUID=cffbc87d-956b-4986-94df-b3b64ae5237f ro quiet splash noapic vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-42-generic N/A
 linux-backports-modules-4.15.0-42-generic N/A
 linux-firmware 1.173.2
SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2018-07-12 (154 days ago)
dmi.bios.date: 06/01/2018
dmi.bios.vendor: LENOVO
dmi.bios.version: H6ETA0WW (2.18 )
dmi.board.asset.tag: Not Available
dmi.board.name: 3351CTO
dmi.board.vendor: LENOVO
dmi.board.version: Win8 STD DPK TPG
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvrH6ETA0WW(2.18):bd06/01/2018:svnLENOVO:pn3351CTO:pvrThinkPadT430u:rvnLENOVO:rn3351CTO:rvrWin8STDDPKTPG:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.family: ThinkPad T430u
dmi.product.name: 3351CTO
dmi.product.version: ThinkPad T430u
dmi.sys.vendor: LENOVO

Revision history for this message
bmaupin (bmaupin) wrote :
Revision history for this message
bmaupin (bmaupin) wrote :
Revision history for this message
bmaupin (bmaupin) wrote :
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
bmaupin (bmaupin)
description: updated
bmaupin (bmaupin)
description: updated
bmaupin (bmaupin)
description: updated
description: updated
description: updated
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please try latest mainline kernel [1] without "noapic" parameter.

[1] https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20-rc7/

Revision history for this message
bmaupin (bmaupin) wrote :

I just tried kernel v4.20-rc7 and without noapic my computer hangs at a blank screen. With noapic it boots fine.

Thanks!

description: updated
Revision history for this message
bmaupin (bmaupin) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

The next step is to find the last working -rc kernel and first non-working -rc kernel.

From your description, both of them should be <= v4.13.

The kernels can be found in [1].

[1] https://kernel.ubuntu.com/~kernel-ppa/mainline/

Revision history for this message
bmaupin (bmaupin) wrote :

I'm still using v4.20-rc7, and I wanted to try some more kernel parameters to narrow down the problem. Since the last kernel messages I got using earlyprintk (https://gist.githubusercontent.com/bmaupin/b743bb3325e100341c62ee62e713d8a4/raw/7372dddb5704d3f15ff429807355d5d3686165d2/boot7.png) were related to the Spectre mitigations, I removed noapic and replaced it with spec_store_bypass_disable=off and it booted. Oddly enough then I removed spec_store_bypass_disable=off altogether and it booted without any special parameters.

So I don't fully understand what happened, but it seems like my system boots fine now without any special parameters. I'll remove v4.20-rc next and see if it still boots with the kernels from Bionic.

Revision history for this message
bmaupin (bmaupin) wrote :

I tried to boot with an older kernel (4.15.0-43), and it wouldn't boot without noapic. But v4.20-rc7 seems to boot fine. I guess it must've been a fluke that it didn't boot for me the first time.

Should I still try to figure out which RC kernel the problem started?

Revision history for this message
Anthony Wong (anthonywong) wrote :

Maybe you can try adding 'apic=debug' and remove 'quiet splash' parameters when you boot and see if kernel shows more messages that can point to the problem.

Revision history for this message
bmaupin (bmaupin) wrote :

I'm assuming I should add apic=debug to a kernel that has the bug (and not one that boots fine), correct?

Should I used apic=debug with or without noapic?

Revision history for this message
bmaupin (bmaupin) wrote :

I haven't reformatted my machine in years and was running into some other issues, so I just did a reinstall of 18.04. With both the install media and the latest 18.04 kernel (4.15.something), I needed noapic.

So I installed kernel v4.20.4-042004, and it would not boot without noapic either. However, after I boot to v4.20 with noapic one time, I can remove it and it seems to boot fine. So the issue persists with the latest kernels albeit slightly differently.

When I get a chance I'll try to figure out which kernel the problem started since apparently the problem hasn't yet been resolved.

Thanks!

Revision history for this message
bmaupin (bmaupin) wrote :

I tested a few kernels, and v4.12.14 is the last kernel that works without noapic. Starting with v4.13-rc1 my computer will hang on boot unless I use noapic.

Revision history for this message
bmaupin (bmaupin) wrote :

Here's the kernel log from v4.12.14

$ uname -a
Linux bryan-ThinkPad-T430u 4.12.14-041214-generic #201709200843 SMP Wed Sep 20 12:46:23 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
bmaupin (bmaupin) wrote :

Here's the kernel log from v4.12.14 (with apic=debug)

$ uname -a
Linux bryan-ThinkPad-T430u 4.12.14-041214-generic #201709200843 SMP Wed Sep 20 12:46:23 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

So I guess v4.12 also works? Let's do a kernel bisection:

$ sudo apt build-dep linux
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ git bisect start
$ git bisect good v4.12
$ git bisect bad v4.13-rc1
$ make localmodconfig
$ make -j`nproc` deb-pkg
Install the newly built kernel, then reboot with it.
If the issue still happens,
$ git bisect bad
Otherwise,
$ git bisect good
Repeat to "make -j`nproc` deb-pkg" until you find the commit that causes the regression.

Revision history for this message
bmaupin (bmaupin) wrote :

I finally got around to doing the bisect, and here's the result:

7bf1e44f865523aa16e0eb340a82d643da9215b5 is the first bad commit

Thanks!

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

The commit is not correct. When doing a bisection, please skip stable release, (e.g. 4.12.1), and use mainline release (e.g. 4.12).

Revision history for this message
bmaupin (bmaupin) wrote :

I followed your instructions exactly as written. How can I use the mainline release?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

This is really weird. I don't think you system has "ccree" device which only presents on ARM devices.

Revision history for this message
bmaupin (bmaupin) wrote :

This is the correct commit, right?:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7bf1e44f865523aa16e0eb340a82d643da9215b5

It looks to me like there are a lot more than just ARM changes there, but I don't know anything about kernel development so I might not be . I'm surprised to see so many changes committed all at once.

Is there any way to narrow down which change is causing the regression?

From skimming the changes, one thing that stands out to me was a change to the intel microcode, only because when my computer started having problems initially I had ran into one or more other bugs related to the microcode and I had to upgrade my BIOS to get the latest microcode working:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751584
https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1759920

... but I'm honestly not sure if any of that was related or just a timing coincidence.

Thanks!

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

You are right.

Apparently something's wrong with my git,

$ git show 7bf1e44f865523aa16e0eb340a82d643da9215b5
commit 7bf1e44f865523aa16e0eb340a82d643da9215b5
Merge: e5770b7bdbfe 32c1431eea48
Author: Greg Kroah-Hartman <email address hidden>
Date: Mon Jun 12 08:20:47 2017 +0200

    Merge 4.12-rc5 into staging-next

    We want the IIO fixes and other staging driver fixes in here as well.

    Signed-off-by: Greg Kroah-Hartman <email address hidden>

diff --cc drivers/staging/ccree/ssi_buffer_mgr.c
index 1ff603f8f8f5,6471d3d2d375..88ebda854377
--- a/drivers/staging/ccree/ssi_buffer_mgr.c
+++ b/drivers/staging/ccree/ssi_buffer_mgr.c
@@@ -136,13 -210,14 +136,14 @@@ void ssi_buffer_mgr_zero_sgl(struct sca
   */
  void ssi_buffer_mgr_copy_scatterlist_portion(
        u8 *dest, struct scatterlist *sg,
 - uint32_t to_skip, uint32_t end,
 + u32 to_skip, u32 end,
        enum ssi_sg_cpy_direct direct)
  {
 - uint32_t nents, lbytes;
 + u32 nents, lbytes;

        nents = ssi_buffer_mgr_get_sgl_nents(sg, end, &lbytes, NULL);
- sg_copy_buffer(sg, nents, (void *)dest, (end - to_skip), 0, (direct == SSI_SG_TO_BUF));
+ sg_copy_buffer(sg, nents, (void *)dest, (end - to_skip + 1), to_skip,
+ (direct == SSI_SG_TO_BUF));
  }

  static inline int ssi_buffer_mgr_render_buff_to_mlli(

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Oh it's a merge point so there are still some commits:
$ git diff e5770b7bdbfe..32c1431eea48

So probably a good idea to bisect between e5770b7bdbfe and 32c1431eea48.

Revision history for this message
bmaupin (bmaupin) wrote :

This is what I'm getting:

63db7c815bc0997c29e484d2409684fdd9fcd93b is the first bad commit

But that feels wrong; it looks like it's a fix for XFS?:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=63db7c815bc0997c29e484d2409684fdd9fcd93b

... and I'm not using XFS:

$ cat /etc/fstab | grep -i xfs
$ mount | grep -i xfs

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

- First, try latest mainline kernel (v5.0) again, hopefully there's a fix already.
- Does the issue happen 100% of the time? Otherwise the bisection can be inconclusive.
- Or, take a look kernel config on v4.12 and v4.13-rc1 - maybe some new option breaks your system.

Revision history for this message
bmaupin (bmaupin) wrote :

> - First, try latest mainline kernel (v5.0) again, hopefully there's a fix already.
No, it looks like it still won't boot (without noapic).

> - Does the issue happen 100% of the time? Otherwise the bisection can be inconclusive.
Yes, it happens 100% of the time.

> - Or, take a look kernel config on v4.12 and v4.13-rc1 - maybe some new option breaks your system.
Sorry to belabour the point, but how can I do this? I have a .config generated from the bisects but aside from deleting it and doing all the bisects over again I don't know how to see the changes. Surely there's an easier way?

Thanks!

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Try changing IRQ configs (CONFIG_GENERIC_IRQ_MIGRATION=y to =n) and i2c-designware's (CONFIG_I2C_DESIGNWARE_CORE=y to =m).

Revision history for this message
Jelle de Jong (jelledejong) wrote :
Download full text (3.8 KiB)

I can confirm the same behaviour:

$ sudo dmidecode -t bios
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.

Handle 0x000E, DMI type 0, 24 bytes
BIOS Information
 Vendor: LENOVO
 Version: H6ETA0WW (2.18 )
 Release Date: 06/01/2018
 Address: 0xE0000
 Runtime Size: 128 kB
 ROM Size: 12288 kB
 Characteristics:
  PCI is supported
  BIOS is upgradeable
  BIOS shadowing is allowed
  Boot from CD is supported
  Selectable boot is supported
  EDD is supported
  Print screen service is supported (int 5h)
  8042 keyboard services are supported (int 9h)
  Serial services are supported (int 14h)
  Printer services are supported (int 17h)
  CGA/mono video services are supported (int 10h)
  NEC PC-98
  ACPI is supported
  USB legacy is supported
  BIOS boot specification is supported
  Function key-initiated network boot is supported
  Targeted content distribution is supported
  UEFI is supported
 BIOS Revision: 2.18
 Firmware Revision: 2.1

Handle 0x0020, DMI type 13, 22 bytes
BIOS Language Information
 Language Description Format: Abbreviated
 Installable Languages: 7
  en-US
  fr-FR
  ja-JP
  ko-KR
  zh-CHT
  zh-CHS
  ru-RU
 Currently Installed Language: en-US

$ sudo dmidecode -t system
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.

Handle 0x000F, DMI type 1, 27 bytes
System Information
 Manufacturer: LENOVO
 Product Name: 33533MG
 Version: ThinkPad T430u
 Serial Number: PB88L27
 UUID: 507a2801-5144-11cb-8738-9d5ba3e2b1ee
 Wake-up Type: Power Switch
 SKU Number: LENOVO_MT_3353
 Family: ThinkPad T430u

Handle 0x001F, DMI type 12, 5 bytes
System Configuration Options

Handle 0x002D, DMI type 15, 81 bytes
System Event Log
 Area Length: 98 bytes
 Header Start Offset: 0x0000
 Header Length: 16 bytes
 Data Start Offset: 0x0010
 Access Method: General-purpose non-volatile data functions
 Access Address: 0x00F0
 Status: Valid, Not Full
 Change Token: 0x00000005
 Header Format: Type 1
 Supported Log Type Descriptors: 29
 Descriptor 1: Single-bit ECC memory error
 Data Format 1: Multiple-event handle
 Descriptor 2: Multi-bit ECC memory error
 Data Format 2: Multiple-event handle
 Descriptor 3: Parity memory error
 Data Format 3: None
 Descriptor 4: Bus timeout
 Data Format 4: None
 Descriptor 5: I/O channel block
 Data Format 5: None
 Descriptor 6: Software NMI
 Data Format 6: None
 Descriptor 7: POST memory resize
 Data Format 7: None
 Descriptor 8: POST error
 Data Format 8: POST results bitmap
 Descriptor 9: PCI parity error
 Data Format 9: None
 Descriptor 10: PCI system error
 Data Format 10: None
 Descriptor 11: CPU failure
 Data Format 11: None
 Descriptor 12: EISA failsafe timer timeout
 Data Format 12: None
 Descriptor 13: Correctable memory log disabled
 Data Format 13: None
 Descriptor 14: Logging disabled
 Data Format 14: None
 Descriptor 15: System limit exceeded
 Data Format 15: None
 Descriptor 16: Asynchronous hardware timer expired
 Data Format 16: None
 Descriptor 17: System configuration information
 Data Format 17: None
 Descriptor 18: Hard disk information
 Data Format 18: None
 Descriptor 19: System reconfigured
 Data Format 19: None
 Descriptor 20: Uncorrectable CPU-complex error
 Data Format 20:...

Read more...

Revision history for this message
bmaupin (bmaupin) wrote :

I'm still seeing this same behaviour for the 5.3 kernel.

I'm just now getting back to this, and I think I'm going to need some hand holding...

> Try changing IRQ configs (CONFIG_GENERIC_IRQ_MIGRATION=y to =n) and i2c-designware's (CONFIG_I2C_DESIGNWARE_CORE=y to =m).

I copied the 4.13.0 configuration from here to .config and modified those two options:

curl https://launchpadlibrarian.net/414616578/config-4.13.0-041300rc1-generic > .config
cp .config .config.bak
sed -i 's/CONFIG_GENERIC_IRQ_MIGRATION=.*/CONFIG_GENERIC_IRQ_MIGRATION=n/' .config
sed -i 's/CONFIG_I2C_DESIGNWARE_CORE=.*/CONFIG_I2C_DESIGNWARE_CORE=m/' .config

Now I can confirm the changes:

$ diff .config.bak .config
92c92
< CONFIG_GENERIC_IRQ_MIGRATION=y
---
> CONFIG_GENERIC_IRQ_MIGRATION=n
4073c4073
< CONFIG_I2C_DESIGNWARE_CORE=y
---
> CONFIG_I2C_DESIGNWARE_CORE=m

However, when I build the kernel and look at the .config file, both of them seem to be reverted.

The same thing happens if I run make oldconfig; the changes aren't preserved.

I can't find them either using make menuconfig. I just see this:

│ Symbol: GENERIC_IRQ_MIGRATION [=y] │
  │ Type : boolean │
  │ Defined at kernel/irq/Kconfig:38 │
  │ Selected by: X86 [=y] && SMP [=y]

  │ Symbol: I2C_DESIGNWARE_CORE [=y] │
  │ Type : tristate │
  │ Defined at drivers/i2c/busses/Kconfig:480 │
  │ Depends on: I2C [=y] && HAS_IOMEM [=y] │
  │ Selected by: I2C_DESIGNWARE_PLATFORM [=y] && I2C [=y] && HAS_IOMEM [=y] && (ACPI [=y] && COMMON_CLK [=y] || !ACPI [=y]) || I2C_DESIGNWARE

How can I override these two values?

Thanks!

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Hmm, please try kernel parameter "dis_ucode_ldr" to disable microcode loading.

Revision history for this message
bmaupin (bmaupin) wrote :

I booted 4.13.0-rc1, removing noapic and adding dis_ucode_ldr. Unfortunately it still doesn't boot.

Revision history for this message
bmaupin (bmaupin) wrote :

After that I tried booting 5.3.0-7642-generic, removing noapic and adding dis_ucode_ldr. And it booted just fine.

Revision history for this message
bmaupin (bmaupin) wrote :

Oops, please disregard that last message. I'm multitasking :P

I booted 5.3.0-7642-generic, removing noapic and adding dis_ucode_ldr, and it wouldn't boot (it froze at ACPI: EC: interrupt blocked)

Revision history for this message
bmaupin (bmaupin) wrote :

I'm still tinkering...

I removed noapic and I tried instead with this, and it booted!:

intremap=off

Is this related by any chance to the link I posted in the original bug report?: https://evilazrael.de/node/401

It's a different model ThinkPad but the symptoms are very similar.

Revision history for this message
bmaupin (bmaupin) wrote :

The article I linked was for an AMD CPU so it only helped me up to a certain point.

I'm thinking I must've messed up the second bisect (the one for the merge point) so I'm going to try to do the bisect again.

Revision history for this message
bmaupin (bmaupin) wrote :

I'm hopeful that doing the bisect over again will find a more relevant commit. While bisecting this time I ran into a weird scenario the I believe may have affected the previous attempts to do a bisect.

The short version of this is that when the computer has failed to boot a kernel (without noapic), all kernels seem to fail until I've successfully booted once with noapic. I have a feeling this has hampered many of my troubleshooting efforts.

For a more detailed idea of what this looks like, this is what happened when I realized the problem:

1. I compiled, installed, and booted to 4.12.0-rc2
2. I rebooted and it worked, so I marked it as good and then powered off the computer
3. Later I powered on to resume the bisect but I wasn't paying attention at the Grub menu and accidentally booted 5.3 (without noapic), which failed
4. I then tried to boot 4.12.0-rc2 without noapic, and it wouldn't boot
5. I booted with 4.12.0-rc2 with noapic, and then rebooted and booted 4.12.0-rc2 without noapic, and once again it booted successfully

Revision history for this message
bmaupin (bmaupin) wrote :

Due to the aforementioned issue, I believe I was getting a lot of false negatives and my previous bisects were faulty as a result.

I redid the bisect from the beginning, and this is what I got:

# first bad commit: [c4e1be9ec1130fff4d691cdc0e0f9d666009f9ae] mm, sparsemem: break out of loops early

I was able to boot with the previous commit (7660a6fddcbae344de8583aa4092071312f110c3). With c4e1be9ec1130fff4d691cdc0e0f9d666009f9ae I can only boot using either noapic or intremap=off.

What should I do now?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please test latest mainline kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7-rc2/

Looks like there are fixes on this topic.

Revision history for this message
bmaupin (bmaupin) wrote :

Very odd... 5.7 boots (without noapic or intremap=off), but not consistently.

It boots a number of times just fine (in my tests, anywhere from 3-5 successful boots) and then after that it will hang at boot. Most of the time after powering off/powering back on it boots fine again, without needing to modify the kernel parameters.

There is one other unrelated issue with 5.7: my BCM43228 wireless doesn't work. Maybe the bcmwl-kernel-source package just isn't compatible with 5.7 yet?

Revision history for this message
bmaupin (bmaupin) wrote :

> Maybe the bcmwl-kernel-source package just isn't compatible with 5.7 yet?

I see now in the logs that's the case:

Building module:
cleaning build area...
make -j4 KERNELRELEASE=5.7.0-050700rc2-generic -C /lib/modules/5.7.0-050700rc2-generic/build M=/var/lib/dkms/bcmwl/6.30.223.271+bdcom/build....(bad exit status: 2)
ERROR (dkms apport): kernel package linux-headers-5.7.0-050700rc2-generic is not supported
Error! Bad return status for module build on kernel: 5.7.0-050700rc2-generic (x86_64)

But I won't say anything more about that here since it's a separate issue.

Revision history for this message
bmaupin (bmaupin) wrote :

I just upgraded to 20.04 and I can confirm that this bug affects the default 5.4 kernel that it ships with.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Is there any log when the hang happens on v5.7?

Revision history for this message
bmaupin (bmaupin) wrote :

I tried booting another half dozen times or so with 5.7 on Ubuntu 20.04 and I didn't have any issues. So maybe it's fixed with that combination? Unfortunately since the wifi doesn't work I can't continue to use the 5.7 kernel.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

One possible choice is to use the 5.7 kernel here:
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/bootstrap

Revision history for this message
bmaupin (bmaupin) wrote :

I installed 5.7.0-5 from that link but it looks like the bcmwl module still fails to compile:

Building module:
cleaning build area...
make -j4 KERNELRELEASE=5.7.0-5-generic -C /lib/modules/5.7.0-5-generic/build M=/var/lib/dkms/bcmwl/6.30.223.271+bdcom/build....(bad exit status: 2)
ERROR (dkms apport): kernel package linux-headers-5.7.0-5-generic is not supported
Error! Bad return status for module build on kernel: 5.7.0-5-generic (x86_64)
Consult /var/lib/dkms/bcmwl/6.30.223.271+bdcom/build/make.log for more information.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

We are gonna have 5.7 lands to 20.10 soon, so the DKMS build fail will also be fixed there.

Revision history for this message
bmaupin (bmaupin) wrote :

Oh cool. Is there a page I can check to see when that happens so I can test it? Thanks!

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

$ rmadison linux

When groovy jumps to 5.7, it's time to change.

Revision history for this message
bmaupin (bmaupin) wrote :

5.8.0-36-generic was just installed as part of the 20.04 HWE and it does not boot without intremap=off.

What does that mean? Was this fixed in 5.7 and broken again in 5.8? Do I need to do another rebase?

Revision history for this message
pablo (pblvp) wrote :

Same thing with 5.8.0-63-generic, on 20.04.2.

Need a lot of rebooting to load the kernel.

Adding kernel options:

nolapic => loads the kernel, but disables multicore, making use impracticable

intremap=off => loads the kernel, does not seem to have noticeable side effects

Revision history for this message
bmaupin (bmaupin) wrote :

I just installed a few kernels to see if I could narrow down the latest regression, and this is what I found:

- 5.7.0-050700 boots fine
- 5.7.5-050705 exhibits the problematic behaviour (does not boot)
- The latest kernel (5.13.7-051307) also does not boot

I doubt this is helpful, but these are the last log lines shown on 5.7.0 on a successful boot right before the GUI kicks in; I'm not sure if they're meaningful since they're just a few lines from the log as a whole:

[ 5.999124] ACPI: \_SB_.PCI0.LPCB.EC__.HKEY: BCTG evaluated but flagged as error
[ 5.999126] thinkpad_acpi: Error probing battery 2
[ 5.999135] battery: extension failed to load: ThinkPad Battery Extension
[ 8.097361] Bluetooth: hci0: command 0x1001 tx timeout

I guess at this point I need to do another bisect between 5.7.0 and 5.7.5. Here goes...

bmaupin (bmaupin)
tags: added: focal
Revision history for this message
bmaupin (bmaupin) wrote :

Can someone with administrative permissions mark that this bug also affects Focal? Thanks!

Revision history for this message
bmaupin (bmaupin) wrote :

I've done some more testing and built tags v5.7 and v5.7-rc2 from source.

v5.7-rc2 seems to consistently boot without any special kernel flags

v5.7 gives me rather odd behaviour:

- When I'm first powering on the machine, v5.7 does not boot without any special kernel flags
- If I set intremap=off, it will boot every time, but not if I remove it
- If I boot v5.7 once with noapic, it will boot, then it will continue to boot after I remove it, but only if I restart. If I power off, it will not boot

So I would consider v5.7 to be affected by this bug, and I'll do a rebase between the v5.7-rc2 and v5.7 tags.

Revision history for this message
bmaupin (bmaupin) wrote (last edit ):

This is what I got with a bisect between tags v5.7-rc2 and v5.7. It looks like a merge commit. Assuming that looks right I'll see if I can figure out how to do a bisect on the individual commits of the merge.

$ git bisect bad
900db15047044ef50b32e23630880f4780ec5b9e is the first bad commit
commit 900db15047044ef50b32e23630880f4780ec5b9e
Merge: 86852175b016 e9bdf7e655b9
Author: Linus Torvalds <email address hidden>
Date: Sat May 30 12:26:21 2020 -0700

    Merge tag 'gpio-v5.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

    Pull GPIO fixes from Linus Walleij:
     "Here are some (very) late fixes for GPIO, none of them very serious
      except the one tagged for stable for enabling IRQ on open drain lines:

       - Fix probing of mvebu chips without PWM

       - Fix error path on ida_get_simple() on the exar driver

       - Notify userspace properly about line status changes when flags are
         changed on lines.

       - Fix a sleeping while holding spinlock in the mellanox driver.

       - Fix return value of the PXA and Kona probe calls.

       - Fix IRQ locking of open drain lines, it is fine to have IRQs on
         open drain lines flagged for output"

    * tag 'gpio-v5.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
      gpio: fix locking open drain IRQ lines
      gpio: bcm-kona: Fix return value of bcm_kona_gpio_probe()
      gpio: pxa: Fix return value of pxa_gpio_probe()
      gpio: mlxbf2: Fix sleeping while holding spinlock
      gpiolib: notify user-space about line status changes after flags are set
      gpio: exar: Fix bad handling for ida_simple_get error path
      gpio: mvebu: Fix probing for chips without PWM

 drivers/gpio/gpio-bcm-kona.c | 2 +-
 drivers/gpio/gpio-exar.c | 7 +++++--
 drivers/gpio/gpio-mlxbf2.c | 6 +++---
 drivers/gpio/gpio-mvebu.c | 15 +++++++++------
 drivers/gpio/gpio-pxa.c | 4 ++--
 drivers/gpio/gpiolib.c | 26 ++++++++++++++++++++++----
 6 files changed, 42 insertions(+), 18 deletions(-)

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

But that commit didn't modify any file...

Revision history for this message
bmaupin (bmaupin) wrote :

I don't understand; doesn't the commit say it changed files?

> 6 files changed, 42 insertions(+), 18 deletions(-)

Revision history for this message
bmaupin (bmaupin) wrote :

I built the two parents of commit 900db15: 8685217 and e9bdf7e

They both booted fine. I built 900db15 again just in case, and it does not boot. So it does indeed seem to be the first bad commit, unless I'm doing something wrong on my end.

Any suggestions?

Thanks!

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you please specifically check commit 9fefca775c8ddbbbcd97f2860218188b8641819d and its parent commit?

Revision history for this message
bmaupin (bmaupin) wrote :

9fefca7 does not boot (without intremap=off)
333830a (the parent) boots fine

bmaupin (bmaupin)
description: updated
Revision history for this message
bmaupin (bmaupin) wrote :

I just upgraded to Ubuntu 22.04 and I'm still affected by this bug. I tried with the latest kernel (5.15.0-47) and it won't boot without intremap=off

Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.