linux-hwe 4.15.0-133.137~16.04.1 ADT test failure with linux-hwe ubuntu_qrt_kernel_panic

Bug #1913188 reported by Kelsey Steele
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QA Regression Testing
New
Undecided
Unassigned
ubuntu-kernel-tests
New
Undecided
Unassigned
linux-hwe (Ubuntu)
New
Undecided
Unassigned

Bug Description

Happened at least once before on 4.15.0-126.129~16.04.1

Testing failed on:
    i386: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-xenial/xenial/i386/l/linux-hwe/20210123_133436_7de7b@/log.gz

09:19:41 INFO | Test suite HEAD SHA1: 7ccda2e
09:19:43 INFO | GOOD ubuntu_qrt_kernel_panic.setup ubuntu_qrt_kernel_panic.setup timestamp=1611393583 localtime=Jan 23 09:19:43 completed successfully
09:19:43 INFO | END GOOD ubuntu_qrt_kernel_panic.setup ubuntu_qrt_kernel_panic.setup timestamp=1611393583 localtime=Jan 23 09:19:43
09:19:43 DEBUG| Persistent state client._record_indent now set to 1
09:19:43 DEBUG| Persistent state client.unexpected_reboot deleted
09:19:43 DEBUG| Test has timeout: 1200 sec.
09:19:43 INFO | START ubuntu_qrt_kernel_panic.test-kernel-panic.py ubuntu_qrt_kernel_panic.test-kernel-panic.py timestamp=1611393583 timeout=1200 localtime=Jan 23 09:19:43
09:19:43 DEBUG| Persistent state client._record_indent now set to 2
09:19:43 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_qrt_kernel_panic.test-kernel-panic.py', 'ubuntu_qrt_kernel_panic.test-kernel-panic.py')
09:19:43 DEBUG| Waiting for pid 4648 for 1200 seconds
09:19:43 DEBUG| Running 'python2 ./test-kernel-panic.py -v'
09:19:43 DEBUG| [stdout] Running test: './test-kernel-panic.py' distro: 'Ubuntu 16.04' kernel: '4.15.0-133.137~16.04.1 (Ubuntu 4.15.0-133.137~16.04.1-generic 4.15.18)' arch: 'i386' uid: 0/0 SUDO_USER: 'ubuntu')
09:19:43 ERROR| [stderr] test_cve_2013_2094 (__main__.KernelPanicTest)
09:19:43 DEBUG| [stdout] Skipping private tests
09:19:43 ERROR| [stderr] test linux kernel perf out-of-bounds access (CVE-2013-2094) ... (skipped: test works on amd64 only) ok
09:19:43 ERROR| [stderr] test_cve_2015_7550 (__main__.KernelPanicTest)
09:19:43 DEBUG| [stdout]
09:39:45 INFO | Timer expired (1200 sec.), nuking pid 4648
09:39:46 INFO | ERROR ubuntu_qrt_kernel_panic.test-kernel-panic.py ubuntu_qrt_kernel_panic.test-kernel-panic.py timestamp=1611394786 localtime=Jan 23 09:39:46 Test timeout expired, rc=15
09:39:46 INFO | END ERROR ubuntu_qrt_kernel_panic.test-kernel-panic.py ubuntu_qrt_kernel_panic.test-kernel-panic.py timestamp=1611394786 localtime=Jan 23 09:39:46
09:39:46 DEBUG| Persistent state client._record_indent now set to 1
09:39:46 DEBUG| Persistent state client.unexpected_reboot deleted
09:39:46 INFO | END GOOD ---- ---- timestamp=1611394786 localtime=Jan 23 09:39:46

tags: added: kernel-adt-failure
tags: added: 4.15 kqa-blocker sru-20200104 ubuntu-qrt-kernel-panic xenial
description: updated
summary: - linux-hwe 4.15.0-133.137~16.04.1 ADT test failure with linux-hwe
- 4.15.0-133.137~16.04.1
+ linux-hwe 4.15.0-133.137~16.04.1 ADT test failure with linux-hwe ubuntu-
+ qrt-kernel-panic i386
summary: - linux-hwe 4.15.0-133.137~16.04.1 ADT test failure with linux-hwe ubuntu-
- qrt-kernel-panic i386
+ linux-hwe 4.15.0-133.137~16.04.1 ADT test failure with linux-hwe
+ ubuntu_qrt_kernel_panic i386
Revision history for this message
Kelsey Steele (kelsey-steele) wrote : Re: linux-hwe 4.15.0-133.137~16.04.1 ADT test failure with linux-hwe ubuntu_qrt_kernel_panic i386
Revision history for this message
Kelsey Steele (kelsey-steele) wrote :

 Focal/hwe-5.4 above should be Bionic/hwe-5.4

summary: linux-hwe 4.15.0-133.137~16.04.1 ADT test failure with linux-hwe
- ubuntu_qrt_kernel_panic i386
+ ubuntu_qrt_kernel_panic
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This testcase hasn't been changed in several months, so we can discard any recent testcase change causing the timeout.

However, looking at the logs for focal/linux-hwe-5.8 runs on arm64 there are some interesting data.

The last time the testcase was successful was with version 5.8.0-36.40~20.04.1:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/arm64/l/linux-hwe-5.8/20210108_133824_09617@/log.gz

Looking at the timestamps the test took a little bit over 19 min to finish.

Starting from version 5.8.0-37.42~20.04.1 it stated to fail:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-focal/focal/arm64/l/linux-hwe-5.8/20210113_031753_cd06b@/log.gz

The timeout reached was 1200 sec, i.e. 20 min.

This test has been running for a little under the timeout limit for the slower architectures, and at least for hwe-5.8 the changes introduced after -36 could not explain any change on the execution time of this testcase.

Therefore, I believe these failures have been caused by variations on the test systems and the testcase timeout should be adapted for expected longer runs.

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

I have assigned this bug to qa-regression-testing and ubuntu-kernel-tests projects.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I have the test timeout bumped to 30min:
https://kernel.ubuntu.com/git/ubuntu/autotest-client-tests.git/commit/?id=805c4ff2be97961a2838c27c2e710a1df0fc1159

ADT test for 4.15.0-133.137~16.04.1 i386 retriggered, let's see how it goes.

Revision history for this message
Steve Beattie (sbeattie) wrote :

Hi,

The particular test that is timing out actually is supposed to emit periodic output to stdout; it basically is running the test program from kernel commit b4a1b4f5047e4f54e194681125c74c0aa64d637d 100000 times and attempts to emit a count every 1000 iterations, writing to and flushing stdout.

The test itself spins up four threads that do keyctl_read() and keyctl_revoke(), to try to trigger a race condition in the kernel.

Is the test timeout a strict timeout, or is it not seeing forward progress via output to stdout?
I know in the kernel autotests stdout gets buffered in some way, so I could adjust the script to emit the counter to stderr instead, if that would help indicate that forward progress is being made.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hello Steve,

this timeout I believe is not a strict limitation. I am not sure if I understand your "or is it not seeing forward progress via output to stdout?" correctly, this timeout is implemented in the autotest framework, if the test didn't finish in time, it will be killed and emit this timeout error.
Thanks

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

BTW in this ADT test re-run for 4.15.0-133.137~16.04.1 i386, the test passed with about 10min (if the stdout is correct)

05:42:26 INFO | START ubuntu_qrt_kernel_panic.test-kernel-panic.py ubuntu_qrt_kernel_panic.test-kernel-panic.py timestamp=1611726146 timeout=1800 localtime=Jan 27 05:42:26
....
05:49:23 INFO | GOOD ubuntu_qrt_kernel_panic.test-kernel-panic.py ubuntu_qrt_kernel_panic.test-kernel-panic.py timestamp=1611726563 localtime=Jan 27 05:49:23 completed successfully

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-xenial/xenial/i386/l/linux-hwe/20210127_065050_dfa30@/log.gz

tags: added: bionic sru-20211108
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.