TPM intermittently fails after cold-boot

Bug #1762672 reported by Alexey Bazhin on 2018-04-10
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Bionic
High
Tyler Hicks

Bug Description

[Impact]
On an 18.04 LTS system with a TPM, the TPM will fail intermittently on cold boots. The problem seems to be that the TPM gets into a state where the partial self-test doesn't return TPM_RC_SUCCESS (meaning all tests have run to completion), but instead returns TPM_RC_TESTING (meaning some tests are still running in the background). A reboot can sometimes restore TPM functionality.

This bug was originally reported on a Dell XPS 13, but has also recently been reported on a Dell Edge Gateway 3000.

The bug has been confirmed to be fixed in the current development release (19.04/Cosmic).

[Test Case]
Cold boot a Dell XPS 13 or Dell Edge Gateway 3000 running 18.04 LTS Desktop or Server and grep for the following error log message:

"tpm tpm0: A TPM error (2314) occurred continue selftest"

Any attempts at using the TPM via tpm2-tss libraries or tpm2-tools should produce errors.

As this bug is due to a race condition, ideally this test case would be run multiple times (20+ cold boots).

Once the patch is installed the following error message may still be present in the syslog, however attempts to use the TPM should work:

"tpm tpm0: A TPM error (2314) occurred attempting the self test"

[Regression Potential]
The chance of regression is low, as this patch was written by a well respected kernel developer with deep TPM experience. The patch is also being cherry-picked from the upstream stable and LTS kernels, and as mentioned, has already landed in Disco.

[Original Description]
After updating a Dell XPS 13 to 18.04 LTS, the TPM started to intermittently fail on cold boot. The following log messages could be observed in syslog:

[ 0.801334] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
 [ 0.812132] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 0.843629] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 0.895424] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 0.987230] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 1.159026] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 1.490819] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 2.142530] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 3.423100] tpm tpm0: TPM self test failed
 [ 3.456304] ima: No TPM chip found, activating TPM-bypass! (rc=-19)

Discussion https://lkml.org/lkml/2017/12/6/284

Fix https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/char/tpm/tpm2-cmd.c?id=2be8ffed093b91536d52b5cd2c99b52f605c9ba6

Alexey Bazhin (baz-irc) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High

Linux-next commit 2be8ffed093b91536d52b5cd2c99b52f605c9ba6 does not apply cleanly to v4.15. The commit was cc'd to upstream stable, but without a backport. I'd like to see what upstream stable applies for a backport.

Changed in linux (Ubuntu Bionic):
status: Confirmed → Triaged
tags: added: kernel-da-key
Tyler Hicks (tyhicks) wrote :

Hello - I've prepared a backport of commit 2be8ffed093b91536d52b5cd2c99b52f605c9ba6 and a kernel test build. If someone affected by this bug could verify that the test kernel fixes it, I'll land this fix in the Bionic kernel. The test kernel is here:

  https://people.canonical.com/~tyhicks/lp1762672-tpm.1/

Thanks!

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
assignee: nobody → Tyler Hicks (tyhicks)
Tony Espy (awe) wrote :

@Tyler I've tested your kernel on a Dell Edge Gateway 3000 which was showing the same TPM selftest log messages as originally described in this bug. When cold-booted with your kernel I only see the following messages now:

14:57:44 [0.000000] ACPI: TPM2 0x0000000076D537C8 000034 (v03 Tpm2Tabl 00000001 AMI 00000000)
14:57:44 [2.703384] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
14:57:44 [2.714914] tpm tpm0: A TPM error (2314) occurred attempting the self test

I was able to verify that the TPM is operational by running the following tpm2-tools commands:

$ sudo tpm2_startup -T device --clear
$ tpm2_nvlist
(produces valid output)

Note - in this case, the system is using the in-kernel resource manager, which it appears doesn't initialize the TPM, hence the need for using tpm2_startup to initialize the TPM. The version of tpm2-tools used is 2.1.0.

Tony Espy (awe) wrote :

I've run 10 cold boots on the gateway mentioned in my previous comment, and in each case after issuing a tpm2_startup clear command, I've been able to query the NVLIST of the TPM. So the back-ported patch appears to be working as advertised.

Tony Espy (awe) on 2019-02-21
summary: - TPM on Dell XPS 13 stopped working after upgrade to 18.04
+ TPM intermittently fails after cold-boot
Tony Espy (awe) on 2019-02-21
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Tony Espy (awe) wrote :

@Khaled

Just curious as to why this is now FixCommitted? Has Tyler's back-port landed in git for the next OEM and/or mainline kernel SRU release?

On 2019-02-22 17:02:23, Tony Espy wrote:
> Just curious as to why this is now FixCommitted? Has Tyler's back-port
> landed in git for the next OEM and/or mainline kernel SRU release?

Khaled has applied my backport to the Bionic tree. linux-oem will soon
inherit it (within the same SRU cycle).

Tony Espy (awe) wrote :

@Tyler

So it looks like we landed this just in time for the new SRU cycle, which means we're looking at a tentative release to proposed on Mar 25. Does that sound right? If so, I may check with Anthony to see if there's a possibility that linux-oem could possibly re-spin and release earlier...

Pierre Equoy (pieq) wrote :

Before installing proposed kernel:

admin@1234567:~$ uname -a
Linux 1234567 4.15.0-1031-oem #36-Ubuntu SMP Mon Jan 7 09:40:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

admin@1234567:~$ dmesg | grep -i tpm
[ 0.000000] ACPI: TPM2 0x0000000076D60D78 000034 (v03 Tpm2Tabl 00000001 AMI 00000000)
[ 2.438389] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
[ 2.449920] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.482185] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.534448] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.626704] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.798961] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 3.131219] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 3.783485] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 5.064239] tpm tpm0: TPM self test failed
[ 5.121846] ima: No TPM chip found, activating TPM-bypass! (rc=-19)

→ Checkbox TPM2-related tests fail.

Install proposed kernel:

admin@1234567:~$ uname -a
Linux 1234567 4.15.0-47-generic #50~tpm.1 SMP Wed Feb 13 15:53:50 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
admin@1234567:~$ dmesg | grep -i tpm
[ 0.000000] Linux version 4.15.0-47-generic (tyhicks@kathleen) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #50~tpm.1 SMP Wed Feb 13 15:53:50 UTC 2019 (Ubuntu 4.15.0-47.50~tpm.1-generic 4.15.18)
[ 0.000000] ACPI: TPM2 0x0000000076D60D78 000034 (v03 Tpm2Tabl 00000001 AMI 00000000)
[ 2.454686] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
[ 2.466218] tpm tpm0: A TPM error (2314) occurred attempting the self test
[ 19.908470] CPU: 0 PID: 451 Comm: systemd-udevd Not tainted 4.15.0-47-generic #50~tpm.1

→ Checkbox TPM2-related tests pass.

I've tried cold-booting and re-running Checkbox tests 10 times, and I got the same passed results 10 times.

Pierre Equoy (pieq) wrote :

(the commands run in the previous comment have been run on a Dell Edge Gateway 3000)

Tyler Hicks (tyhicks) wrote :

On 2019-02-25 22:44:17, Tony Espy wrote:
> So it looks like we landed this just in time for the new SRU cycle,
> which means we're looking at a tentative release to proposed on Mar 25.
> Does that sound right?

Yes, that's correct according to the SRU cycle announcement here:

  https://lists.ubuntu.com/archives/kernel-sru-announce/2019-February/000143.html

> If so, I may check with Anthony to see if there's a possibility that
> linux-oem could possibly re-spin and release earlier...

I'm not sure how common that is. I'll leave it up to you to discuss with
Anthony. Thanks!

Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Tony Espy (awe) wrote :

Tested the latest OEM kernel from -proposed on a Dell Edge Gateway 3000 running Ubuntu Server 18.04 LTS:

# rmadison linux-image-oem | grep bionic-proposed
 linux-image-oem | 4.15.0.1035.40 | bionic-proposed | amd64

# dpkg -l | grep linux-image-oem
ii linux-image-oem 4.15.0.1035.40 amd64 OEM Linux kernel image

Cold booted (x5) the system and instead of seeing "TPM error (2314)...selftest" messages, I see the following (expected) messages:

Mar 28 18:35:01 1K5JB02 kernel: ACPI: TPM2 0x76D537C8 000034 (v03 Tpm2Tabl 00000001 AMI 00000000)
Mar 28 18:35:01 1K5JB02 kernel: tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
Mar 28 18:35:01 1K5JB02 kernel: tpm tpm0: A TPM error (2314) occurred attempting the self test

Verified that the TPM is operational by running tpm2_listpcrs (version 3.1.3 built from source) using the in-kernel resource manager:

admin@1K5JB02:~$ sudo -i
root@1K5JB02:~# export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
root@1K5JB02:~# export TPM2TOOLS_TCTI_NAME=device TPM2TOOLS_DEVICE_FILE=/dev/tpmrm0
root@1K5JB02:~# tpm2_startup --clear
root@1K5JB02:~# tpm2_listpcrs

Bank/Algorithm: TPM_ALG_SHA1(0x0004)
PCR_00: 51 3f 1d 55 df 26 29 a2 42 ac 0b bf ae 7d 76 54 ef 91 24 d3
.
.
.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Tony Espy (awe) wrote :

I just ran an additional five cycles of the testing described in my previous comment with no failures.

Launchpad Janitor (janitor) wrote :
Download full text (25.4 KiB)

This bug was fixed in the package linux - 4.15.0-47.50

---------------
linux (4.15.0-47.50) bionic; urgency=medium

  * linux: 4.15.0-47.50 -proposed tracker (LP: #1819716)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts
    - [Packaging] resync retpoline extraction

  * C++ demangling support missing from perf (LP: #1396654)
    - [Packaging] fix a mistype

  * arm-smmu-v3 arm-smmu-v3.3.auto: CMD_SYNC timeout (LP: #1818162)
    - iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout

  * Crash in nvme_irq_check() when using threaded interrupts (LP: #1818747)
    - nvme-pci: fix out of bounds access in nvme_cqe_pending

  * CVE-2019-9213
    - mm: enforce min addr even if capable() in expand_downwards()

  * CVE-2019-3460
    - Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt

  * amdgpu with mst WARNING on blanking (LP: #1814308)
    - drm/amd/display: Don't use dc_link in link_encoder
    - drm/amd/display: Move wait for hpd ready out from edp power control.
    - drm/amd/display: eDP sequence BL off first then DP blank.
    - drm/amd/display: Fix unused variable compilation error
    - drm/amd/display: Fix warning about misaligned code
    - drm/amd/display: Fix MST dp_blank REG_WAIT timeout

  * tun/tap: unable to manage carrier state from userland (LP: #1806392)
    - tun: implement carrier change

  * CVE-2019-8980
    - exec: Fix mem leak in kernel_read_file

  * raw_skew in timer from the ubuntu_kernel_selftests failed on Bionic
    (LP: #1811194)
    - selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET/other clock
      adjustments are in progress

  * [Packaging] Allow overlay of config annotations (LP: #1752072)
    - [Packaging] config-check: Add an include directive

  * CVE-2019-7308
    - bpf: move {prev_,}insn_idx into verifier env
    - bpf: move tmp variable into ax register in interpreter
    - bpf: enable access to ax register also from verifier rewrite
    - bpf: restrict map value pointer arithmetic for unprivileged
    - bpf: restrict stack pointer arithmetic for unprivileged
    - bpf: restrict unknown scalars of mixed signed bounds for unprivileged
    - bpf: fix check_map_access smin_value test when pointer contains offset
    - bpf: prevent out of bounds speculation on pointer arithmetic
    - bpf: fix sanitation of alu op with pointer / scalar type from different
      paths
    - bpf: add various test cases to selftests

  * CVE-2017-5753
    - bpf: properly enforce index mask to prevent out-of-bounds speculation
    - bpf: fix inner map masking to prevent oob under speculation

  * BPF: kernel pointer leak to unprivileged userspace (LP: #1815259)
    - bpf/verifier: disallow pointer subtraction

  * squashfs hardening (LP: #1816756)
    - squashfs: more metadata hardening
    - squashfs metadata 2: electric boogaloo
    - squashfs: more metadata hardening
    - Squashfs: Compute expected length from inode size rather than block length

  * efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted (LP: #1814982)
    - efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted

  * Update ENA driver to version 2.0.3K (LP: #1816806)...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers