Thinkpad T430u won't boot without noapic workaround

Bug #1808418 reported by bmaupin on 2018-12-13
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned

Bug Description

I've used Ubuntu on this specific machine for years without problems, but starting with the 4.13 kernel that came with the 17.10 HWE and continuing into 18.04 kernels my computer would no longer boot, hanging at various screens:

- A blank screen with a flashing cursor
- A screen with this message:
Loading Linux 4.13.0.36-generic ...
Loading initial ramdisk ...
- A screen with kernel messages, ending in:
APCI: EC: interrupt blocked

I've encountered the bug in a number of kernels, including:
- 4.13.0-32
- 4.13.0-36
- 4.13.0-37
- 4.15.0-24
- 4.15.0-34
- 4.15.0-36
- 4.15.0-38
- 4.15.0-42
- 4.20.0-rc7

I was able to work around the issue by adding noapic to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default grub, e.g.:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash noapic"

I'm not entirely sure of the consequences of this workaround, but one thing I've noticed is significantly reduced battery life.

This bug seems very similar to what's described here for the Thinkpad E485/E585: https://evilazrael.de/node/401

I'll attach screenshots of various times I've encountered this bug over the last year.

$ lsb_release -rd
Description: Ubuntu 18.04.1 LTS
Release: 18.04

$ apt-cache policy linux-image-4.15.0-42-generic
linux-image-4.15.0-42-generic:
  Installed: 4.15.0-42.45
  Candidate: 4.15.0-42.45
  Version table:
 *** 4.15.0-42.45 500
        500 http://us.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
        100 /var/lib/dpkg/status

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-42-generic 4.15.0-42.45
ProcVersionSignature: Ubuntu 4.15.0-42.45-generic 4.15.18
Uname: Linux 4.15.0-42-generic x86_64
NonfreeKernelModules: wl
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: bryan 2349 F.... pulseaudio
CurrentDesktop: XFCE
Date: Thu Dec 13 15:31:15 2018
HibernationDevice: RESUME=UUID=4afa8032-cfe5-45f4-a626-738ab33904ac
InstallationDate: Installed on 2014-05-08 (1680 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417)
MachineType: LENOVO 3351CTO
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-42-generic root=UUID=cffbc87d-956b-4986-94df-b3b64ae5237f ro quiet splash noapic vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-42-generic N/A
 linux-backports-modules-4.15.0-42-generic N/A
 linux-firmware 1.173.2
SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2018-07-12 (154 days ago)
dmi.bios.date: 06/01/2018
dmi.bios.vendor: LENOVO
dmi.bios.version: H6ETA0WW (2.18 )
dmi.board.asset.tag: Not Available
dmi.board.name: 3351CTO
dmi.board.vendor: LENOVO
dmi.board.version: Win8 STD DPK TPG
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvrH6ETA0WW(2.18):bd06/01/2018:svnLENOVO:pn3351CTO:pvrThinkPadT430u:rvnLENOVO:rn3351CTO:rvrWin8STDDPKTPG:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.family: ThinkPad T430u
dmi.product.name: 3351CTO
dmi.product.version: ThinkPad T430u
dmi.sys.vendor: LENOVO

bmaupin (bmaupin) wrote :
bmaupin (bmaupin) wrote :
bmaupin (bmaupin) wrote :
description: updated

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
bmaupin (bmaupin) on 2018-12-13
description: updated
bmaupin (bmaupin) on 2018-12-14
description: updated
bmaupin (bmaupin) on 2018-12-14
description: updated
description: updated
description: updated
Kai-Heng Feng (kaihengfeng) wrote :

Please try latest mainline kernel [1] without "noapic" parameter.

[1] https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20-rc7/

bmaupin (bmaupin) wrote :

I just tried kernel v4.20-rc7 and without noapic my computer hangs at a blank screen. With noapic it boots fine.

Thanks!

description: updated
Kai-Heng Feng (kaihengfeng) wrote :

The next step is to find the last working -rc kernel and first non-working -rc kernel.

From your description, both of them should be <= v4.13.

The kernels can be found in [1].

[1] https://kernel.ubuntu.com/~kernel-ppa/mainline/

bmaupin (bmaupin) wrote :

I'm still using v4.20-rc7, and I wanted to try some more kernel parameters to narrow down the problem. Since the last kernel messages I got using earlyprintk (https://gist.githubusercontent.com/bmaupin/b743bb3325e100341c62ee62e713d8a4/raw/7372dddb5704d3f15ff429807355d5d3686165d2/boot7.png) were related to the Spectre mitigations, I removed noapic and replaced it with spec_store_bypass_disable=off and it booted. Oddly enough then I removed spec_store_bypass_disable=off altogether and it booted without any special parameters.

So I don't fully understand what happened, but it seems like my system boots fine now without any special parameters. I'll remove v4.20-rc next and see if it still boots with the kernels from Bionic.

bmaupin (bmaupin) wrote :

I tried to boot with an older kernel (4.15.0-43), and it wouldn't boot without noapic. But v4.20-rc7 seems to boot fine. I guess it must've been a fluke that it didn't boot for me the first time.

Should I still try to figure out which RC kernel the problem started?

Anthony Wong (anthonywong) wrote :

Maybe you can try adding 'apic=debug' and remove 'quiet splash' parameters when you boot and see if kernel shows more messages that can point to the problem.

bmaupin (bmaupin) wrote :

I'm assuming I should add apic=debug to a kernel that has the bug (and not one that boots fine), correct?

Should I used apic=debug with or without noapic?

bmaupin (bmaupin) wrote :

I haven't reformatted my machine in years and was running into some other issues, so I just did a reinstall of 18.04. With both the install media and the latest 18.04 kernel (4.15.something), I needed noapic.

So I installed kernel v4.20.4-042004, and it would not boot without noapic either. However, after I boot to v4.20 with noapic one time, I can remove it and it seems to boot fine. So the issue persists with the latest kernels albeit slightly differently.

When I get a chance I'll try to figure out which kernel the problem started since apparently the problem hasn't yet been resolved.

Thanks!

bmaupin (bmaupin) wrote :

I tested a few kernels, and v4.12.14 is the last kernel that works without noapic. Starting with v4.13-rc1 my computer will hang on boot unless I use noapic.

bmaupin (bmaupin) wrote :

Here's the kernel log from v4.12.14

$ uname -a
Linux bryan-ThinkPad-T430u 4.12.14-041214-generic #201709200843 SMP Wed Sep 20 12:46:23 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

bmaupin (bmaupin) wrote :

Here's the kernel log from v4.12.14 (with apic=debug)

$ uname -a
Linux bryan-ThinkPad-T430u 4.12.14-041214-generic #201709200843 SMP Wed Sep 20 12:46:23 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Kai-Heng Feng (kaihengfeng) wrote :

So I guess v4.12 also works? Let's do a kernel bisection:

$ sudo apt build-dep linux
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ git bisect start
$ git bisect good v4.12
$ git bisect bad v4.13-rc1
$ make localmodconfig
$ make -j`nproc` deb-pkg
Install the newly built kernel, then reboot with it.
If the issue still happens,
$ git bisect bad
Otherwise,
$ git bisect good
Repeat to "make -j`nproc` deb-pkg" until you find the commit that causes the regression.

bmaupin (bmaupin) wrote :

I finally got around to doing the bisect, and here's the result:

7bf1e44f865523aa16e0eb340a82d643da9215b5 is the first bad commit

Thanks!

Kai-Heng Feng (kaihengfeng) wrote :

The commit is not correct. When doing a bisection, please skip stable release, (e.g. 4.12.1), and use mainline release (e.g. 4.12).

bmaupin (bmaupin) wrote :

I followed your instructions exactly as written. How can I use the mainline release?

Kai-Heng Feng (kaihengfeng) wrote :

This is really weird. I don't think you system has "ccree" device which only presents on ARM devices.

bmaupin (bmaupin) wrote :

This is the correct commit, right?:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7bf1e44f865523aa16e0eb340a82d643da9215b5

It looks to me like there are a lot more than just ARM changes there, but I don't know anything about kernel development so I might not be . I'm surprised to see so many changes committed all at once.

Is there any way to narrow down which change is causing the regression?

From skimming the changes, one thing that stands out to me was a change to the intel microcode, only because when my computer started having problems initially I had ran into one or more other bugs related to the microcode and I had to upgrade my BIOS to get the latest microcode working:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1751584
https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1759920

... but I'm honestly not sure if any of that was related or just a timing coincidence.

Thanks!

Kai-Heng Feng (kaihengfeng) wrote :

You are right.

Apparently something's wrong with my git,

$ git show 7bf1e44f865523aa16e0eb340a82d643da9215b5
commit 7bf1e44f865523aa16e0eb340a82d643da9215b5
Merge: e5770b7bdbfe 32c1431eea48
Author: Greg Kroah-Hartman <email address hidden>
Date: Mon Jun 12 08:20:47 2017 +0200

    Merge 4.12-rc5 into staging-next

    We want the IIO fixes and other staging driver fixes in here as well.

    Signed-off-by: Greg Kroah-Hartman <email address hidden>

diff --cc drivers/staging/ccree/ssi_buffer_mgr.c
index 1ff603f8f8f5,6471d3d2d375..88ebda854377
--- a/drivers/staging/ccree/ssi_buffer_mgr.c
+++ b/drivers/staging/ccree/ssi_buffer_mgr.c
@@@ -136,13 -210,14 +136,14 @@@ void ssi_buffer_mgr_zero_sgl(struct sca
   */
  void ssi_buffer_mgr_copy_scatterlist_portion(
        u8 *dest, struct scatterlist *sg,
 - uint32_t to_skip, uint32_t end,
 + u32 to_skip, u32 end,
        enum ssi_sg_cpy_direct direct)
  {
 - uint32_t nents, lbytes;
 + u32 nents, lbytes;

        nents = ssi_buffer_mgr_get_sgl_nents(sg, end, &lbytes, NULL);
- sg_copy_buffer(sg, nents, (void *)dest, (end - to_skip), 0, (direct == SSI_SG_TO_BUF));
+ sg_copy_buffer(sg, nents, (void *)dest, (end - to_skip + 1), to_skip,
+ (direct == SSI_SG_TO_BUF));
  }

  static inline int ssi_buffer_mgr_render_buff_to_mlli(

Kai-Heng Feng (kaihengfeng) wrote :

Oh it's a merge point so there are still some commits:
$ git diff e5770b7bdbfe..32c1431eea48

So probably a good idea to bisect between e5770b7bdbfe and 32c1431eea48.

bmaupin (bmaupin) wrote :

This is what I'm getting:

63db7c815bc0997c29e484d2409684fdd9fcd93b is the first bad commit

But that feels wrong; it looks like it's a fix for XFS?:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=63db7c815bc0997c29e484d2409684fdd9fcd93b

... and I'm not using XFS:

$ cat /etc/fstab | grep -i xfs
$ mount | grep -i xfs

Kai-Heng Feng (kaihengfeng) wrote :

- First, try latest mainline kernel (v5.0) again, hopefully there's a fix already.
- Does the issue happen 100% of the time? Otherwise the bisection can be inconclusive.
- Or, take a look kernel config on v4.12 and v4.13-rc1 - maybe some new option breaks your system.

bmaupin (bmaupin) wrote :

> - First, try latest mainline kernel (v5.0) again, hopefully there's a fix already.
No, it looks like it still won't boot (without noapic).

> - Does the issue happen 100% of the time? Otherwise the bisection can be inconclusive.
Yes, it happens 100% of the time.

> - Or, take a look kernel config on v4.12 and v4.13-rc1 - maybe some new option breaks your system.
Sorry to belabour the point, but how can I do this? I have a .config generated from the bisects but aside from deleting it and doing all the bisects over again I don't know how to see the changes. Surely there's an easier way?

Thanks!

Kai-Heng Feng (kaihengfeng) wrote :
Kai-Heng Feng (kaihengfeng) wrote :
Kai-Heng Feng (kaihengfeng) wrote :

Try changing IRQ configs (CONFIG_GENERIC_IRQ_MIGRATION=y to =n) and i2c-designware's (CONFIG_I2C_DESIGNWARE_CORE=y to =m).

To post a comment you must log in.