Lenovo X1 Carbon Gen5 fails to resume

Bug #1708043 reported by Dean Henrichsmeyer on 2017-08-01
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Critical
Kai-Heng Feng

Bug Description

Last week it worked fine, this week however if I open the lid for it to resume the power light flashes for a bit, then comes on solid, the but screen stays blank and the machine is unresponsive. I have to power it off / on again in order to use it. It might be related to the APST issue with Samsung NVMe SSDs, I'm not sure.

ProblemType: Bug
DistroRelease: Ubuntu 17.10
Package: linux-image-4.11.0-10-generic 4.11.0-10.15
ProcVersionSignature: Ubuntu 4.11.0-10.15-generic 4.11.8
Uname: Linux 4.11.0-10-generic x86_64
ApportVersion: 2.20.6-0ubuntu4
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/pcmC0D0p: dean 1652 F...m pulseaudio
 /dev/snd/controlC0: dean 1652 F.... pulseaudio
CurrentDesktop: GNOME
Date: Tue Aug 1 17:39:20 2017
HibernationDevice: RESUME=UUID=fb0675d7-391c-4293-a2c5-d283708f1284
InstallationDate: Installed on 2017-07-23 (9 days ago)
InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Alpha amd64 (20170720)
MachineType: LENOVO 20HRCTO1WW
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.11.0-10-generic.efi.signed root=/dev/mapper/ubuntu--vg-root ro quiet splash nvme_core.default_ps_max_latency_us=0 vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-4.11.0-10-generic N/A
 linux-backports-modules-4.11.0-10-generic N/A
 linux-firmware 1.167
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/04/2017
dmi.bios.vendor: LENOVO
dmi.bios.version: N1MET37W (1.22 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20HRCTO1WW
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrN1MET37W(1.22):bd07/04/2017:svnLENOVO:pn20HRCTO1WW:pvrThinkPadX1Carbon5th:rvnLENOVO:rn20HRCTO1WW:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:
dmi.product.name: 20HRCTO1WW
dmi.product.version: ThinkPad X1 Carbon 5th
dmi.sys.vendor: LENOVO
---
ApportVersion: 2.20.6-0ubuntu4
Architecture: amd64
CurrentDesktop: GNOME
DistroRelease: Ubuntu 17.10
InstallationDate: Installed on 2017-07-23 (9 days ago)
InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Alpha amd64 (20170720)
Package: linux (not installed)
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
Tags: artful wayland-session
Uname: Linux 4.13.0-041300rc3-generic x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True

Dean Henrichsmeyer (dean) wrote :
Dean Henrichsmeyer (dean) wrote :

Adding the results of "nvme get-feature -f 0x0c -H /dev/nvme0".

Dean Henrichsmeyer (dean) wrote :

As an update, I tried setting nvme_core.default_ps_max_latency_us=0 to disable APST and that didn't help. I don't think it's related to that.

Changed in linux (Ubuntu):
assignee: nobody → Kai-Heng Feng (kaihengfeng)

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
assignee: Kai-Heng Feng (kaihengfeng) → nobody
Kai-Heng Feng (kaihengfeng) wrote :

BTW, when it "worked fine", did it boot with Linux 4.10 instead of 4.11?

Changed in linux (Ubuntu):
assignee: nobody → Kai-Heng Feng (kaihengfeng)
Dean Henrichsmeyer (dean) wrote :

The kernel on the image that I installed was

linux-image-4.11.0-10-generic 4.11.0-10.15

and it worked fine.

apport information

tags: added: apport-collected
description: updated

apport information

Dean Henrichsmeyer (dean) wrote :

I tried the mainline kernel and got similar behavior. The only thing different with the 3.13 kernel is that when I closed the lid, the red dot never went dim and started blinking. It stayed red. After waiting a bit and opening the lid, it failed to resume, the screen did not turn on, and the computer was unresponsive.

Kai-Heng Feng (kaihengfeng) wrote :

Hmm, 4.11.0-10.15 is already the latest one on Artful, and it doesn't have the issue per comment #6.

So which kernel has the suspend/resume issue? Is it the mainline linux (Linux 4.13.0-041300rc3-generic) you installed?

Dean Henrichsmeyer (dean) wrote :
Download full text (7.5 KiB)

So the problem turned out to be thunderbolt 3. I noticed issues in the kernel log (one example below). I disabled thunderbolt3 in the BIOS and suspend/resume works fine even with the standard artful kernel (currently 4.11).

Aug 2 09:40:56 valor kernel: [ 30.815914] thunderbolt 0000:08:00.0: NHI initialized, starting thunderbolt
Aug 2 09:40:56 valor kernel: [ 30.815918] thunderbolt 0000:08:00.0: allocating TX ring 0 of size 10
Aug 2 09:40:56 valor kernel: [ 30.815943] thunderbolt 0000:08:00.0: allocating RX ring 0 of size 10
Aug 2 09:40:56 valor kernel: [ 30.815956] thunderbolt 0000:08:00.0: control channel created
Aug 2 09:40:56 valor kernel: [ 30.815957] thunderbolt 0000:08:00.0: control channel starting...
Aug 2 09:40:56 valor kernel: [ 30.815957] thunderbolt 0000:08:00.0: starting TX ring 0
Aug 2 09:40:56 valor kernel: [ 30.815973] thunderbolt 0000:08:00.0: enabling interrupt at register 0x38200 bit 0 (0x0 -> 0x1)
Aug 2 09:40:56 valor kernel: [ 30.815974] thunderbolt 0000:08:00.0: starting RX ring 0
Aug 2 09:40:56 valor kernel: [ 30.815988] thunderbolt 0000:08:00.0: enabling interrupt at register 0x38200 bit 12 (0x1 -> 0x1001)
Aug 2 09:40:56 valor kernel: [ 30.816009] thunderbolt 0000:08:00.0: starting ICM firmware
Aug 2 09:40:56 valor kernel: [ 30.816019] BUG: unable to handle kernel NULL pointer dereference at 0000000000000988
Aug 2 09:40:56 valor kernel: [ 30.816060] IP: pci_write_config_dword+0x5/0x40
Aug 2 09:40:56 valor kernel: [ 30.816078] PGD 0
Aug 2 09:40:56 valor kernel: [ 30.816078] P4D 0
Aug 2 09:40:56 valor kernel: [ 30.816088]
Aug 2 09:40:56 valor kernel: [ 30.816106] Oops: 0000 [#1] SMP
Aug 2 09:40:56 valor kernel: [ 30.816120] Modules linked in: videobuf2_core thunderbolt(+) cfg80211(+) rtsx_pci_ms snd_timer memstick videodev btusb btqca btrtl media btbcm btintel mei_me bluetooth intel_pch_thermal mei snd shpchp soundcore ecdh_generic intel_lpss_acpi intel_lpss mac_hid tpm_crb acpi_pad parport_pc ppdev lp parport ip_tables x_tables autofs4 algif_skcipher af_alg dm_crypt crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc i915 e1000e aesni_intel i2c_algo_bit ptp drm_kms_helper aes_x86_64 syscopyarea crypto_simd sysfillrect glue_helper sysimgblt cryptd rtsx_pci_sdmmc psmouse pps_core fb_sys_fops nvme drm nvme_core rtsx_pci wmi pinctrl_sunrisepoint video pinctrl_intel i2c_hid hid
Aug 2 09:40:56 valor kernel: [ 30.816364] CPU: 0 PID: 427 Comm: systemd-udevd Not tainted 4.13.0-041300rc3-generic #201707301631
Aug 2 09:40:56 valor kernel: [ 30.816393] Hardware name: LENOVO 20HRCTO1WW/20HRCTO1WW, BIOS N1MET37W (1.22 ) 07/04/2017
Aug 2 09:40:56 valor kernel: [ 30.816419] task: ffff92450353df00 task.stack: ffffb1b402110000
Aug 2 09:40:56 valor kernel: [ 30.816441] RIP: 0010:pci_write_config_dword+0x5/0x40
Aug 2 09:40:56 valor kernel: [ 30.816460] RSP: 0018:ffffb1b4021139f8 EFLAGS: 00010286
Aug 2 09:40:56 valor kernel: [ 30.816479] RAX: 0000000040000126 RBX: 0000000000000000 RCX: 0000000000000050
Aug 2 09:40:56 valor kernel: [ 30.816504] RDX: 0000000000000200 RSI: 0000000000000034 RDI: 0000000000000000
Aug 2 09:40:56 valor kernel: [ 30.816529] RBP: ffffb...

Read more...

Kai-Heng Feng (kaihengfeng) wrote :

The code path with the issue does not exist for 4.11. I think you can safely enable TB3 under 4.11 kernel.

IIUC, icm_firmware_init() should not be called on non-Apple hardware. I'll build a kernel and let you try.

Kai-Heng Feng (kaihengfeng) wrote :

Try this kernel with TB3 enabled:
http://people.canonical.com/~khfeng/lp1708043/

Dean Henrichsmeyer (dean) wrote :

While 4.11 doesn't throw the same error that 4.13rc3 did, it still will not sleep/resume if I have TB3 enabled in the BIOS.

Booting the kernel you provided I don't get the same dump. The only thing I see is:

[ 62.429495] thunderbolt: probe of 0000:08:00.0 failed with error -5

I'll comment whether or not it sleeps/resumes (closing the lid now. :)

Dean Henrichsmeyer (dean) wrote :

I can confirm the kernel at http://people.canonical.com/~khfeng/lp1708043/ does not sleep/resume successfully with TB3 enabled either.

Another symptom of both 4.11 and 4.13 with TB3 enabled: When I close the lid, the red light blinks several times before appearing to enter suspend. When I open the lid, the power light flashes similarly before it becomes solid.

If I disable TB3 running the standard 4.11 kernel, I don't get the blinking. It goes normally to suspend/resume with no issues.

Kai-Heng Feng (kaihengfeng) wrote :

That means your machine also needs to run icm_firmware_init(). I'll build a new kernel.

Dean Henrichsmeyer (dean) wrote :

Booting that kernel I still get:

[ 34.464020] thunderbolt 0000:08:00.0: could not start ICM firmware
[ 34.464024] thunderbolt 0000:08:00.0: stopping RX ring 0
[ 34.464034] thunderbolt 0000:08:00.0: disabling interrupt at register 0x38200 bit 12 (0x1001 -> 0x1)
[ 34.464045] thunderbolt 0000:08:00.0: stopping TX ring 0
[ 34.464057] thunderbolt 0000:08:00.0: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0)
[ 34.464097] thunderbolt 0000:08:00.0: control channel stopped
[ 34.464109] thunderbolt 0000:08:00.0: freeing RX ring 0
[ 34.464116] thunderbolt 0000:08:00.0: freeing TX ring 0
[ 34.464122] thunderbolt 0000:08:00.0: shutdown
[ 34.464314] thunderbolt: probe of 0000:08:00.0 failed with error -5

Dean Henrichsmeyer (dean) wrote :

Suspect/resume doesn't work with it either.

AaronMa (mapengyu) wrote :

Hi

Here is laptop (Thinkpad X1 Carbon 5th) from sutton project.

BIOS Version: N1MET29W (1.14 )
Product Name: 20HQZ2YJUS

It doesn't have Alpine Ridge PCI device, please check the log.
For now this bug can not be reproduced on this laptop.

This is the latest BIOS from Lenovo.
I did search Lenovo website, there is a BIOS 1.22.
But I don't know if it can be flashed on this FVT machine.
I will check with Lenovo PM.

Please wait for Lenovo's reply.

Kai-Heng Feng (kaihengfeng) wrote :

At least the null dereference issue is solved, I'll send a patch for that.

Regarding to the probe issue, please test this kernel and attach dmesg:
http://people.canonical.com/~khfeng/linux-image-4.13.0-rc3-tbdbg_4.13.0-rc3-tbdbg-1_amd64.deb

Dean, just that I understand this correctly. Do you have anything connected to the Thunderbolt port?

Also can you attach full dmesg when you boot the system using v4.13-rcX kernel?

Can you also try the attached patch? It should apply on top of v4.13-rcX. I'm guessing ICM is not running on the Lenovo system so we should start it but skip all the link reset things. Please post dmesg of this test as well.

Dean Henrichsmeyer (dean) wrote :

~mika-westerberg, no, I don't have anything plugged into any of the thunderbolt (USB-C) ports.

Did you do any settings in BIOS related to Thunderbolt? Sometimes there is an option called "Force power" which basically turns power on the controller always. In normal cases that option should be disabled.

Also can you attach acpidump (along with the other things I requeted) of the system to the bug?

tags: added: patch
AaronMa (mapengyu) wrote :

System suspend OK after plugin usb-c device on 4.13-rc3 kernel.

Please check the log.

Thanks Aaron. So as expected the Thunderbolt controller is not there. Only xHCI when USB-C device is connected.

Dean, is there something special you have connected to the machine? Aaron, who has exactly the same machine and BIOS, can't reproduce the issue you have reported.

AaronMa (mapengyu) wrote :

Quoted from Lenovo:

Yoda-1.0 (X1 Carbon 5th) only have Alpine Ridge AR-DP B1 (Device ID = 0x15D3) model.

Lenovo Thunderbolt Dock also uses it.

0x1578 is previous Alpine Ridge chipset (2015), P70 (Payton) only has it.
Or other vendor dock may have it (Ex: HP Elite Thunderbolt 3 65W Dock etc.).

AR_HR_4C_XHC

Alpine Ridge DP (B Step)

1578

AR_HR_C0_4C_XHC

Alpine Ridge DP (C Step)

15D3

I doubt that customer plugged 3rtd party TBT device.

Dean Henrichsmeyer (dean) wrote :
Dean Henrichsmeyer (dean) wrote :

Sleep / resume still doesn't work with TB3 enabled on 4.13.0-rc4-icmtest. I have nothing plugged into the laptop whatsoever, including power. I boot it unplugged, let it finish booting, close the lid. It suspends. When I open the lid, the power LED blinks a few times, stays on, but the screen never comes back on.

There are no weird settings in the BIOS relating to power. I have:

Wake by Thunderbolt 3 - disabled
Security Level - Display Port and USB
Support in Pre Boot Environment for Thunderbolt - disabled

bios version is 1.22

The only way suspend/resume works normally is if I go into the security settings of the bios, I/O Port Access, and disable Thunderbolt 3.

Thanks for the information. I'll go through them.

Aaron, can you try to switch security level of your machine to the same:

Security Level - Display Port and USB

and see if the problem reproduces?

Dean Henrichsmeyer (dean) wrote :

Also, if there is a recommended security level for it with Linux, I'm happy to use that.

All of them are expected to work. Usually the default is "User authorization".

AaronMa (mapengyu) wrote :

I did change all the setting in security level when I uploaded in comment #28.
It didn't make any differences on tbt or suspend.
Anyway I upload dmesg again.

The only failure in log is:

[ 192.140684] PM: Finishing wakeup.
[ 192.141012] pci_bus 0000:07: Allocating resources
[ 192.141063] pcieport 0000:07:01.0: bridge window [mem 0x00100000-0x000fffff] to [bus 09-3b] add_size 400000 add_align 100000
[ 192.141089] pcieport 0000:07:04.0: bridge window [mem 0x00100000-0x000fffff] to [bus 3d-70] add_size 400000 add_align 100000
[ 192.141110] pcieport 0000:07:01.0: BAR 14: no space for [mem size 0x00400000]
[ 192.141112] pcieport 0000:07:01.0: BAR 14: failed to assign [mem size 0x00400000]
[ 192.141115] pcieport 0000:07:04.0: BAR 14: no space for [mem size 0x00400000]
[ 192.141117] pcieport 0000:07:04.0: BAR 14: failed to assign [mem size 0x00400000]
[ 192.141120] pcieport 0000:07:04.0: BAR 14: no space for [mem size 0x00400000]
[ 192.141122] pcieport 0000:07:04.0: BAR 14: failed to assign [mem size 0x00400000]
[ 192.141124] pcieport 0000:07:01.0: BAR 14: no space for [mem size 0x00400000]
[ 192.141126] pcieport 0000:07:01.0: BAR 14: failed to assign [mem size 0x00400000]
[ 192.142070] pci_bus 0000:07: Allocating resources
[ 192.142116] pcieport 0000:07:01.0: bridge window [mem 0x00100000-0x000fffff] to [bus 09-3b] add_size 400000 add_align 100000
[ 192.142140] pcieport 0000:07:04.0: bridge window [mem 0x00100000-0x000fffff] to [bus 3d-70] add_size 400000 add_align 100000
[ 192.142156] pcieport 0000:07:01.0: BAR 14: no space for [mem size 0x00400000]
[ 192.142158] pcieport 0000:07:01.0: BAR 14: failed to assign [mem size 0x00400000]
[ 192.142160] pcieport 0000:07:04.0: BAR 14: no space for [mem size 0x00400000]
[ 192.142162] pcieport 0000:07:04.0: BAR 14: failed to assign [mem size 0x00400000]
[ 192.142165] pcieport 0000:07:04.0: BAR 14: no space for [mem size 0x00400000]
[ 192.142167] pcieport 0000:07:04.0: BAR 14: failed to assign [mem size 0x00400000]
[ 192.142169] pcieport 0000:07:01.0: BAR 14: no space for [mem size 0x00400000]
[ 192.142171] pcieport 0000:07:01.0: BAR 14: failed to assign [mem size 0x00400000]
[ 192.179272] OOM killer enabled.
[ 192.180129] Restarting tasks ...

AaronMa (mapengyu) wrote :

Got feedback from Lenovo:

We received positive feedback from our firmware team about this issue, can you ask the user to try it?

//

We've seen a similar issue. This issue happened when BIOS setup Wake on LAN is disabled.
We will release new ECFW to solve this issue, but you may avoid this issue by enable Wake on LAN so far.

Lenovo Community - https://forums.lenovo.com/t5/ThinkPad-T400-T500-and-newer-T/ThinkPad-T470s-BIOS-bug-WOL-and-Thunderbolt-3/m-p/3707059
//

Thanks,

This definitely sounds like a platform/configuration issue rather than a driver bug.

AaronMa (mapengyu) wrote :

I agree this is not a driver issue.

After I change many BIOS setting and restore BIOS to default.
Finally I reproduce the X1 Carbon 5th with PCIID 1577/1678/15d6.

Actually I don't know how exactly reproduce this issue, only 2 times reproduced.

When the failure happened the PCIID 1577/1578/15d6 shown that instead of 15d3.

And thunderbolt driver loaded with call trace, suspend will make system hang.

AaronMa (mapengyu) wrote :

lspci result when failure.

I reproduced this issue when disabled Wake on LAN.
So I suggest user to enable Wake on LAN as a workaround.

Dean Henrichsmeyer (dean) wrote :

I can confirm that enabling Wake on LAN lets suspend/resume work just fine with TB3 enabled. Thanks for all the diligence guys, exceptional.

AaronMa (mapengyu) wrote :

Lenovo's Final solution:

//
It may be a common issue of Kabylake ThinkPads. So you may reproduce it by using other platforms, for example T470s.
Our EC team plans to fix it by end of this month, but the new ECFW web release will need more weeks to do QA.
//

For now we can close this issue by workaround, final fix will be on Lenovo's new BIOS on their website.

Changed in linux (Ubuntu):
importance: Undecided → Critical
status: Confirmed → Fix Committed

Can anyone confirm please that Lenovo has actually fixed the problem? I have an X1 Carbon 6th Generation, with the newest BIOS, and I fear it is affected by the bug.

AaronMa (mapengyu) wrote :

Hi NJ:
X1C6 doesn't support S3, so it should not be related to this bug.

Could you describe what issue you have met?

Peter Bittner (peter-bittner) wrote :

I have an X1 Carbon 6th Generation (Core i7 8th Gen, 16 GB RAM, 512 GB SSD) and it behaves like described in the first comments above.

Specifically, when I close the notebook it somewhat goes in standby (or not really, I'm not sure), the red dot blinks but underneath the notebook gets hot within a few minutes. And sometimes, not always, it freezes (or stays frozen) instead of waking up when I open the notebook again.

I'm running 18.04 Bionic (development branch).

Ron Ellis (rkeiii) wrote :

I'm on the X1 Carbon 6th gen config Peter Bittner describes. I'm seeing the same behavior.

Schemen (schemen) wrote :

Hi!

I use a Lenovo X1 Carbon 5th Gen. I also have the described suspend behavior. I was able to "solve" the problem by installing a Mainline Kernel (v4.17 RC7 -> http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17-rc7/). With that, suspend and resume works as intended again.

My BIOS should also be on the latest released version.

From time to time I run into the problem that my ethernet connection is not working anymore using the Lenovo USB C dock, wlan is still functional. This might be a related problem, maybe not.

Michael Penkov (mpenkov) wrote :

I'm on an X1 Carbon 6th Gen. Suspend has been working flawlessly when on mains power. On battery power however, it fails from time to time.

The work-around of enabling Wake-on-LAN worked for me. I noticed that it was set to "External power only" (sorry, can't remember the exact wording in the BIOS) - setting it to "External and Battery Power" seems to have solved the problem.

I'm using BIOS 1.27, released 2018-07-18.

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
John Carter (johncarter2679) wrote :

You can try by updating or installing a newer window to solve this issue. Anyone is Lenovo user of Lenovo mobile, laptop. Desktop or other Lenovo products having an issue with their Lenovo product follow Lenovo Support visit https://www.lenovosupportphonenumber.com/ for solving and fix all kind of error related to Lenovo.

Fup (fupduck) wrote :

My laptop (X1, 5th generation, model 20K4001XUS) sometimes would freeze when suspending (but not regularly or reproduceably).

Enabling wake-on-lan in the BIOS fixed this issue.

Brad Figg (brad-figg) on 2019-07-24
tags: added: ubuntu-certified
tags: added: cscc
To post a comment you must log in.