TPM intermittently fails after cold-boot

Bug #1762672 reported by Alexey Bazhin
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned
Bionic
Fix Released
High
Tyler Hicks

Bug Description

[Impact]
On an 18.04 LTS system with a TPM, the TPM will fail intermittently on cold boots. The problem seems to be that the TPM gets into a state where the partial self-test doesn't return TPM_RC_SUCCESS (meaning all tests have run to completion), but instead returns TPM_RC_TESTING (meaning some tests are still running in the background). A reboot can sometimes restore TPM functionality.

This bug was originally reported on a Dell XPS 13, but has also recently been reported on a Dell Edge Gateway 3000.

The bug has been confirmed to be fixed in the current development release (19.04/Cosmic).

[Test Case]
Cold boot a Dell XPS 13 or Dell Edge Gateway 3000 running 18.04 LTS Desktop or Server and grep for the following error log message:

"tpm tpm0: A TPM error (2314) occurred continue selftest"

Any attempts at using the TPM via tpm2-tss libraries or tpm2-tools should produce errors.

As this bug is due to a race condition, ideally this test case would be run multiple times (20+ cold boots).

Once the patch is installed the following error message may still be present in the syslog, however attempts to use the TPM should work:

"tpm tpm0: A TPM error (2314) occurred attempting the self test"

[Regression Potential]
The chance of regression is low, as this patch was written by a well respected kernel developer with deep TPM experience. The patch is also being cherry-picked from the upstream stable and LTS kernels, and as mentioned, has already landed in Disco.

[Original Description]
After updating a Dell XPS 13 to 18.04 LTS, the TPM started to intermittently fail on cold boot. The following log messages could be observed in syslog:

[ 0.801334] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
 [ 0.812132] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 0.843629] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 0.895424] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 0.987230] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 1.159026] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 1.490819] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 2.142530] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 3.423100] tpm tpm0: TPM self test failed
 [ 3.456304] ima: No TPM chip found, activating TPM-bypass! (rc=-19)

Discussion https://lkml.org/lkml/2017/12/6/284

Fix https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/char/tpm/tpm2-cmd.c?id=2be8ffed093b91536d52b5cd2c99b52f605c9ba6

Revision history for this message
Alexey Bazhin (baz-irc) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: TPM on Dell XPS 13 stopped working after upgrade to 18.04

Linux-next commit 2be8ffed093b91536d52b5cd2c99b52f605c9ba6 does not apply cleanly to v4.15. The commit was cc'd to upstream stable, but without a backport. I'd like to see what upstream stable applies for a backport.

Changed in linux (Ubuntu Bionic):
status: Confirmed → Triaged
tags: added: kernel-da-key
Revision history for this message
Tyler Hicks (tyhicks) wrote :

Hello - I've prepared a backport of commit 2be8ffed093b91536d52b5cd2c99b52f605c9ba6 and a kernel test build. If someone affected by this bug could verify that the test kernel fixes it, I'll land this fix in the Bionic kernel. The test kernel is here:

  https://people.canonical.com/~tyhicks/lp1762672-tpm.1/

Thanks!

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
assignee: nobody → Tyler Hicks (tyhicks)
Revision history for this message
Tony Espy (awe) wrote :

@Tyler I've tested your kernel on a Dell Edge Gateway 3000 which was showing the same TPM selftest log messages as originally described in this bug. When cold-booted with your kernel I only see the following messages now:

14:57:44 [0.000000] ACPI: TPM2 0x0000000076D537C8 000034 (v03 Tpm2Tabl 00000001 AMI 00000000)
14:57:44 [2.703384] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
14:57:44 [2.714914] tpm tpm0: A TPM error (2314) occurred attempting the self test

I was able to verify that the TPM is operational by running the following tpm2-tools commands:

$ sudo tpm2_startup -T device --clear
$ tpm2_nvlist
(produces valid output)

Note - in this case, the system is using the in-kernel resource manager, which it appears doesn't initialize the TPM, hence the need for using tpm2_startup to initialize the TPM. The version of tpm2-tools used is 2.1.0.

Revision history for this message
Tony Espy (awe) wrote :

I've run 10 cold boots on the gateway mentioned in my previous comment, and in each case after issuing a tpm2_startup clear command, I've been able to query the NVLIST of the TPM. So the back-ported patch appears to be working as advertised.

Tony Espy (awe)
summary: - TPM on Dell XPS 13 stopped working after upgrade to 18.04
+ TPM intermittently fails after cold-boot
Tony Espy (awe)
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Tony Espy (awe) wrote :

@Khaled

Just curious as to why this is now FixCommitted? Has Tyler's back-port landed in git for the next OEM and/or mainline kernel SRU release?

Revision history for this message
Tyler Hicks (tyhicks) wrote : Re: [Bug 1762672] Re: TPM intermittently fails after cold-boot

On 2019-02-22 17:02:23, Tony Espy wrote:
> Just curious as to why this is now FixCommitted? Has Tyler's back-port
> landed in git for the next OEM and/or mainline kernel SRU release?

Khaled has applied my backport to the Bionic tree. linux-oem will soon
inherit it (within the same SRU cycle).

Revision history for this message
Tony Espy (awe) wrote :

@Tyler

So it looks like we landed this just in time for the new SRU cycle, which means we're looking at a tentative release to proposed on Mar 25. Does that sound right? If so, I may check with Anthony to see if there's a possibility that linux-oem could possibly re-spin and release earlier...

Revision history for this message
Pierre Equoy (pieq) wrote :

Before installing proposed kernel:

admin@1234567:~$ uname -a
Linux 1234567 4.15.0-1031-oem #36-Ubuntu SMP Mon Jan 7 09:40:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

admin@1234567:~$ dmesg | grep -i tpm
[ 0.000000] ACPI: TPM2 0x0000000076D60D78 000034 (v03 Tpm2Tabl 00000001 AMI 00000000)
[ 2.438389] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
[ 2.449920] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.482185] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.534448] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.626704] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.798961] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 3.131219] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 3.783485] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 5.064239] tpm tpm0: TPM self test failed
[ 5.121846] ima: No TPM chip found, activating TPM-bypass! (rc=-19)

→ Checkbox TPM2-related tests fail.

Install proposed kernel:

admin@1234567:~$ uname -a
Linux 1234567 4.15.0-47-generic #50~tpm.1 SMP Wed Feb 13 15:53:50 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
admin@1234567:~$ dmesg | grep -i tpm
[ 0.000000] Linux version 4.15.0-47-generic (tyhicks@kathleen) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #50~tpm.1 SMP Wed Feb 13 15:53:50 UTC 2019 (Ubuntu 4.15.0-47.50~tpm.1-generic 4.15.18)
[ 0.000000] ACPI: TPM2 0x0000000076D60D78 000034 (v03 Tpm2Tabl 00000001 AMI 00000000)
[ 2.454686] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
[ 2.466218] tpm tpm0: A TPM error (2314) occurred attempting the self test
[ 19.908470] CPU: 0 PID: 451 Comm: systemd-udevd Not tainted 4.15.0-47-generic #50~tpm.1

→ Checkbox TPM2-related tests pass.

I've tried cold-booting and re-running Checkbox tests 10 times, and I got the same passed results 10 times.

Revision history for this message
Pierre Equoy (pieq) wrote :

(the commands run in the previous comment have been run on a Dell Edge Gateway 3000)

Revision history for this message
Tyler Hicks (tyhicks) wrote :

On 2019-02-25 22:44:17, Tony Espy wrote:
> So it looks like we landed this just in time for the new SRU cycle,
> which means we're looking at a tentative release to proposed on Mar 25.
> Does that sound right?

Yes, that's correct according to the SRU cycle announcement here:

  https://lists.ubuntu.com/archives/kernel-sru-announce/2019-February/000143.html

> If so, I may check with Anthony to see if there's a possibility that
> linux-oem could possibly re-spin and release earlier...

I'm not sure how common that is. I'll leave it up to you to discuss with
Anthony. Thanks!

Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Tony Espy (awe) wrote :

Tested the latest OEM kernel from -proposed on a Dell Edge Gateway 3000 running Ubuntu Server 18.04 LTS:

# rmadison linux-image-oem | grep bionic-proposed
 linux-image-oem | 4.15.0.1035.40 | bionic-proposed | amd64

# dpkg -l | grep linux-image-oem
ii linux-image-oem 4.15.0.1035.40 amd64 OEM Linux kernel image

Cold booted (x5) the system and instead of seeing "TPM error (2314)...selftest" messages, I see the following (expected) messages:

Mar 28 18:35:01 1K5JB02 kernel: ACPI: TPM2 0x76D537C8 000034 (v03 Tpm2Tabl 00000001 AMI 00000000)
Mar 28 18:35:01 1K5JB02 kernel: tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
Mar 28 18:35:01 1K5JB02 kernel: tpm tpm0: A TPM error (2314) occurred attempting the self test

Verified that the TPM is operational by running tpm2_listpcrs (version 3.1.3 built from source) using the in-kernel resource manager:

admin@1K5JB02:~$ sudo -i
root@1K5JB02:~# export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
root@1K5JB02:~# export TPM2TOOLS_TCTI_NAME=device TPM2TOOLS_DEVICE_FILE=/dev/tpmrm0
root@1K5JB02:~# tpm2_startup --clear
root@1K5JB02:~# tpm2_listpcrs

Bank/Algorithm: TPM_ALG_SHA1(0x0004)
PCR_00: 51 3f 1d 55 df 26 29 a2 42 ac 0b bf ae 7d 76 54 ef 91 24 d3
.
.
.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Tony Espy (awe) wrote :

I just ran an additional five cycles of the testing described in my previous comment with no failures.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (25.4 KiB)

This bug was fixed in the package linux - 4.15.0-47.50

---------------
linux (4.15.0-47.50) bionic; urgency=medium

  * linux: 4.15.0-47.50 -proposed tracker (LP: #1819716)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts
    - [Packaging] resync retpoline extraction

  * C++ demangling support missing from perf (LP: #1396654)
    - [Packaging] fix a mistype

  * arm-smmu-v3 arm-smmu-v3.3.auto: CMD_SYNC timeout (LP: #1818162)
    - iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout

  * Crash in nvme_irq_check() when using threaded interrupts (LP: #1818747)
    - nvme-pci: fix out of bounds access in nvme_cqe_pending

  * CVE-2019-9213
    - mm: enforce min addr even if capable() in expand_downwards()

  * CVE-2019-3460
    - Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt

  * amdgpu with mst WARNING on blanking (LP: #1814308)
    - drm/amd/display: Don't use dc_link in link_encoder
    - drm/amd/display: Move wait for hpd ready out from edp power control.
    - drm/amd/display: eDP sequence BL off first then DP blank.
    - drm/amd/display: Fix unused variable compilation error
    - drm/amd/display: Fix warning about misaligned code
    - drm/amd/display: Fix MST dp_blank REG_WAIT timeout

  * tun/tap: unable to manage carrier state from userland (LP: #1806392)
    - tun: implement carrier change

  * CVE-2019-8980
    - exec: Fix mem leak in kernel_read_file

  * raw_skew in timer from the ubuntu_kernel_selftests failed on Bionic
    (LP: #1811194)
    - selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET/other clock
      adjustments are in progress

  * [Packaging] Allow overlay of config annotations (LP: #1752072)
    - [Packaging] config-check: Add an include directive

  * CVE-2019-7308
    - bpf: move {prev_,}insn_idx into verifier env
    - bpf: move tmp variable into ax register in interpreter
    - bpf: enable access to ax register also from verifier rewrite
    - bpf: restrict map value pointer arithmetic for unprivileged
    - bpf: restrict stack pointer arithmetic for unprivileged
    - bpf: restrict unknown scalars of mixed signed bounds for unprivileged
    - bpf: fix check_map_access smin_value test when pointer contains offset
    - bpf: prevent out of bounds speculation on pointer arithmetic
    - bpf: fix sanitation of alu op with pointer / scalar type from different
      paths
    - bpf: add various test cases to selftests

  * CVE-2017-5753
    - bpf: properly enforce index mask to prevent out-of-bounds speculation
    - bpf: fix inner map masking to prevent oob under speculation

  * BPF: kernel pointer leak to unprivileged userspace (LP: #1815259)
    - bpf/verifier: disallow pointer subtraction

  * squashfs hardening (LP: #1816756)
    - squashfs: more metadata hardening
    - squashfs metadata 2: electric boogaloo
    - squashfs: more metadata hardening
    - Squashfs: Compute expected length from inode size rather than block length

  * efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted (LP: #1814982)
    - efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted

  * Update ENA driver to version 2.0.3K (LP: #1816806)...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
Jason Gallas (jgallas) wrote :

After upgrading from 18.04 to 21.04, my XPS 13 fails to boot, giving
the message "a TPM error (2314) occurred attempting the self test".

How can this be fixed? Thanks, jason

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.