TPM intermittently fails after cold-boot

Bug #1762672 reported by Alexey Bazhin on 2018-04-10
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Bionic
High
Tyler Hicks

Bug Description

[Impact]
On an 18.04 LTS system with a TPM, the TPM will fail intermittently on cold boots. The problem seems to be that the TPM gets into a state where the partial self-test doesn't return TPM_RC_SUCCESS (meaning all tests have run to completion), but instead returns TPM_RC_TESTING (meaning some tests are still running in the background). A reboot can sometimes restore TPM functionality.

This bug was originally reported on a Dell XPS 13, but has also recently been reported on a Dell Edge Gateway 3000.

The bug has been confirmed to be fixed in the current development release (19.04/Cosmic).

[Test Case]
Cold boot a Dell XPS 13 or Dell Edge Gateway 3000 running 18.04 LTS Desktop or Server and grep for the following error log message:

"tpm tpm0: A TPM error (2314) occurred continue selftest"

Any attempts at using the TPM via tpm2-tss libraries or tpm2-tools should produce errors.

As this bug is due to a race condition, ideally this test case would be run multiple times (20+ cold boots).

Once the patch is installed the following error message may still be present in the syslog, however attempts to use the TPM should work:

"tpm tpm0: A TPM error (2314) occurred attempting the self test"

[Regression Potential]
The chance of regression is low, as this patch was written by a well respected kernel developer with deep TPM experience. The patch is also being cherry-picked from the upstream stable and LTS kernels, and as mentioned, has already landed in Disco.

[Original Description]
After updating a Dell XPS 13 to 18.04 LTS, the TPM started to intermittently fail on cold boot. The following log messages could be observed in syslog:

[ 0.801334] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
 [ 0.812132] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 0.843629] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 0.895424] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 0.987230] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 1.159026] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 1.490819] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 2.142530] tpm tpm0: A TPM error (2314) occurred continue selftest
 [ 3.423100] tpm tpm0: TPM self test failed
 [ 3.456304] ima: No TPM chip found, activating TPM-bypass! (rc=-19)

Discussion https://lkml.org/lkml/2017/12/6/284

Fix https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/char/tpm/tpm2-cmd.c?id=2be8ffed093b91536d52b5cd2c99b52f605c9ba6

Alexey Bazhin (baz-irc) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High

Linux-next commit 2be8ffed093b91536d52b5cd2c99b52f605c9ba6 does not apply cleanly to v4.15. The commit was cc'd to upstream stable, but without a backport. I'd like to see what upstream stable applies for a backport.

Changed in linux (Ubuntu Bionic):
status: Confirmed → Triaged
tags: added: kernel-da-key
Tyler Hicks (tyhicks) wrote :

Hello - I've prepared a backport of commit 2be8ffed093b91536d52b5cd2c99b52f605c9ba6 and a kernel test build. If someone affected by this bug could verify that the test kernel fixes it, I'll land this fix in the Bionic kernel. The test kernel is here:

  https://people.canonical.com/~tyhicks/lp1762672-tpm.1/

Thanks!

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
assignee: nobody → Tyler Hicks (tyhicks)
Tony Espy (awe) wrote :

@Tyler I've tested your kernel on a Dell Edge Gateway 3000 which was showing the same TPM selftest log messages as originally described in this bug. When cold-booted with your kernel I only see the following messages now:

14:57:44 [0.000000] ACPI: TPM2 0x0000000076D537C8 000034 (v03 Tpm2Tabl 00000001 AMI 00000000)
14:57:44 [2.703384] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
14:57:44 [2.714914] tpm tpm0: A TPM error (2314) occurred attempting the self test

I was able to verify that the TPM is operational by running the following tpm2-tools commands:

$ sudo tpm2_startup -T device --clear
$ tpm2_nvlist
(produces valid output)

Note - in this case, the system is using the in-kernel resource manager, which it appears doesn't initialize the TPM, hence the need for using tpm2_startup to initialize the TPM. The version of tpm2-tools used is 2.1.0.

Tony Espy (awe) wrote :

I've run 10 cold boots on the gateway mentioned in my previous comment, and in each case after issuing a tpm2_startup clear command, I've been able to query the NVLIST of the TPM. So the back-ported patch appears to be working as advertised.

Tony Espy (awe) on 2019-02-21
summary: - TPM on Dell XPS 13 stopped working after upgrade to 18.04
+ TPM intermittently fails after cold-boot
Tony Espy (awe) on 2019-02-21
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Tony Espy (awe) wrote :

@Khaled

Just curious as to why this is now FixCommitted? Has Tyler's back-port landed in git for the next OEM and/or mainline kernel SRU release?

On 2019-02-22 17:02:23, Tony Espy wrote:
> Just curious as to why this is now FixCommitted? Has Tyler's back-port
> landed in git for the next OEM and/or mainline kernel SRU release?

Khaled has applied my backport to the Bionic tree. linux-oem will soon
inherit it (within the same SRU cycle).

Tony Espy (awe) wrote :

@Tyler

So it looks like we landed this just in time for the new SRU cycle, which means we're looking at a tentative release to proposed on Mar 25. Does that sound right? If so, I may check with Anthony to see if there's a possibility that linux-oem could possibly re-spin and release earlier...

Pierre Equoy (pieq) wrote :

Before installing proposed kernel:

admin@1234567:~$ uname -a
Linux 1234567 4.15.0-1031-oem #36-Ubuntu SMP Mon Jan 7 09:40:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

admin@1234567:~$ dmesg | grep -i tpm
[ 0.000000] ACPI: TPM2 0x0000000076D60D78 000034 (v03 Tpm2Tabl 00000001 AMI 00000000)
[ 2.438389] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
[ 2.449920] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.482185] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.534448] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.626704] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 2.798961] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 3.131219] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 3.783485] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 5.064239] tpm tpm0: TPM self test failed
[ 5.121846] ima: No TPM chip found, activating TPM-bypass! (rc=-19)

→ Checkbox TPM2-related tests fail.

Install proposed kernel:

admin@1234567:~$ uname -a
Linux 1234567 4.15.0-47-generic #50~tpm.1 SMP Wed Feb 13 15:53:50 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
admin@1234567:~$ dmesg | grep -i tpm
[ 0.000000] Linux version 4.15.0-47-generic (tyhicks@kathleen) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #50~tpm.1 SMP Wed Feb 13 15:53:50 UTC 2019 (Ubuntu 4.15.0-47.50~tpm.1-generic 4.15.18)
[ 0.000000] ACPI: TPM2 0x0000000076D60D78 000034 (v03 Tpm2Tabl 00000001 AMI 00000000)
[ 2.454686] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 4)
[ 2.466218] tpm tpm0: A TPM error (2314) occurred attempting the self test
[ 19.908470] CPU: 0 PID: 451 Comm: systemd-udevd Not tainted 4.15.0-47-generic #50~tpm.1

→ Checkbox TPM2-related tests pass.

I've tried cold-booting and re-running Checkbox tests 10 times, and I got the same passed results 10 times.

Pierre Equoy (pieq) wrote :

(the commands run in the previous comment have been run on a Dell Edge Gateway 3000)

Tyler Hicks (tyhicks) wrote :

On 2019-02-25 22:44:17, Tony Espy wrote:
> So it looks like we landed this just in time for the new SRU cycle,
> which means we're looking at a tentative release to proposed on Mar 25.
> Does that sound right?

Yes, that's correct according to the SRU cycle announcement here:

  https://lists.ubuntu.com/archives/kernel-sru-announce/2019-February/000143.html

> If so, I may check with Anthony to see if there's a possibility that
> linux-oem could possibly re-spin and release earlier...

I'm not sure how common that is. I'll leave it up to you to discuss with
Anthony. Thanks!

Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers