"fwts uefirtmisc" causes reboot in recent kernels

Bug #1743799 reported by Rod Smith
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox Provider - Base
Invalid
Undecided
Unassigned
linux (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

In doing Meltdown regression testing, we found that one server, lucuma, a Dell PowerEdge T710, would spontaneously reboot when running the miscellanea/fwts_test under Ubuntu 14.04.5 with a 3.13.0-139-generic #188-Ubuntu kernel. This test has been dropped from our test list for 16.04, so it was not initially tested under more recent kernels. This server has been out of our testing queue for years, so it hasn't been tested since Ubuntu 10.04. Thus, we don't yet know if the problem is a regression or if it's existed for a long time. We need to do more testing to better isolate the issue; stay tuned for more information....

The system's entry on C3 is:

https://certification.canonical.com/hardware/200910-4539/

The submission showing the error is:

https://certification.canonical.com/hardware/200910-4539/submission/126589/

Revision history for this message
Rod Smith (rodsmith) wrote :

I've done more testing. The problem occurs beginning with the 3.13.0-139 kernel; the 3.13.0-138 kernel is unaffected. The 4.4.0-109 kernel also has the problem, but I've not tested others in this series. Links to relevant test results:

* 3.13.0-138:
  https://certification.canonical.com/hardware/200910-4539/submission/126597/
* 3.13.0-139:
  https://certification.canonical.com/hardware/200910-4539/submission/126593/
* 4.4.0-109:
  https://certification.canonical.com/hardware/200910-4539/submission/126595/

The problem is with the miscellanea/fwts_test result. Note that the test fails in all cases (a near-100% false alarm rate is why we dropped the test); however, the failures with 3.13.0-138 and before did not result in a system reboot. Beginning with 3.13.0-139, the test results in a system reboot, which is recorded in the test results as "Failed after resuming execution."

tags: added: pti
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1743799

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
Rod Smith (rodsmith) wrote : Re: miscellanea/fwts_test causes reboot

I've managed to isolate this to the fwts "uefirtmisc" test:

$ sudo fwts uefirtmisc
Running 1 tests, results appended to results.log
Test: UEFI miscellaneous runtime service interface tests.
  Test for UEFI miscellaneous runtime service interfaces. 1 passed, 5 skipped
Timeout, server 10.1.11.176 not responding.e service interfaces. : 33.3% |

(That last line is partially overwritten by the SSH termination message.) At 33.3% through that test, the system spontaneously reboots. None of the other fwts tests causes any problem.

Oddly, the problem disappeared briefly during my testing. This occurred just after I removed the 4.4.0-109 kernel, and a run of fwts produced a message about it being unable to run some tests because it couldn't parse EFI data, but that a reboot should fix the problem. (Sorry, I didn't save the exact wording of the message.) I suspect this was related to EFI variable storage and garbage collection. When I rebooted, the problem reappeared.

Some further notes:

* The system is rather old (we got it in 2009).
* The system is running the latest firmware, version 6.4.0, released in 2013.
* The UEFI version is rather old (2.1.0).
* It's booted in UEFI mode (of course).

I'm going to try re-creating this problem with a Dell T110 of similar vintage. For the moment, I'll leave lucuma, the system with the problem, in its current Ubuntu 14.04 configuration. I can run further tests as directed. As this looks like a kernel bug, or possibly a bug in fwts, I'll mark this as "invalid" for plainbox-provider-checkbox.

Changed in plainbox-provider-checkbox:
status: New → Invalid
summary: - miscellanea/fwts_test causes reboot
+ "fwts uefirtmisc" causes reboot in recent kernels
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Rod Smith (rodsmith) wrote :

I tried running apport-collect, as requested by the Ubuntu Kernel Bot, but it didn't work; it tried to open a (text-based) browser, but that hung on "Opening socket....", presumably because the system in question is behind strict firewalls in 1SS.

Changed in linux (Ubuntu):
status: Triaged → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Rod Smith (rodsmith) wrote :

I was unable to reproduce this problem on the Dell T110; however, I did notice that it failed the test at the point where the T710 rebooted. ("Stress test for UEFI miscellaneous runtime service")

Revision history for this message
Rod Smith (rodsmith) wrote :

This bug also affects hogplum, a Dell PowerEdge T610, running Ubuntu 14.04 with a 3.13.0-140 kernel. Here's the uname data from it:

$ uname -a
Linux hogplum 3.13.0-140-generic #189-Ubuntu SMP Mon Jan 15 16:06:29 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Running "sudo fwts uefirtmisc" causes a reboot with "33.3%" shown as the percent complete, just as with the Dell T710.

Revision history for this message
Rod Smith (rodsmith) wrote :

A further update: This problem does NOT occur under Artful on the Dell PowerEdge T610 with the following kernel:

$ uname -a
Linux hogplum 4.13.0-30-generic #33-Ubuntu SMP Mon Jan 15 19:45:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I have not yet tested with Xenial or with other kernel series on the PowerEdge T710.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.