dev test from ubuntu_stress_smoke_tests hang with T-3.13 on some AWS instances
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
New
|
Undecided
|
Unassigned |
Bug Description
Issue found on 3.13.0-190 with stress-ng commit 8a8add4, the dev test will hang on the following AWS cloud instances:
* c5n.large
* i3.metal
* i3en.24xlarge
* m5a.large
* r5.large
* r5.metal
* t3.medium
* t3a.2xlarge
Note that it's been skipped on the following instance due to they're too old:
* c3.xlarge
* c4.large
* t2.small
* x1e.xlarge
This looks like a test-case issue, it will pass with stress-ng V0.13.07
Test output:
$ time sudo ./stress-ng -v -t 5 --dev 4 --dev-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable
stress-ng: debug: [2615] stress-ng 0.13.11 gde14c6695830
stress-ng: debug: [2615] system: Linux ip-172-31-14-180 3.13.0-190-generic #241-Ubuntu SMP Tue May 31 12:06:16 UTC 2022 x86_64
stress-ng: debug: [2615] RAM total: 15.4G, RAM free: 15.1G, swap free: 0.0
stress-ng: debug: [2615] 2 processors online, 2 processors configured
stress-ng: info: [2615] setting to a 5 second run per stressor
stress-ng: info: [2615] dispatching hogs: 4 dev
stress-ng: debug: [2615] cache allocate: shared cache buffer size: 33792K
stress-ng: debug: [2615] starting stressors
stress-ng: debug: [2616] stress-ng-dev: started [2616] (instance 0)
stress-ng: debug: [2615] 4 stressors started
stress-ng: debug: [2617] stress-ng-dev: started [2617] (instance 1)
stress-ng: debug: [2619] stress-ng-dev: started [2619] (instance 3)
stress-ng: debug: [2618] stress-ng-dev: started [2618] (instance 2)
stress-ng: debug: [2617] stress-ng-dev: exited [2617] (instance 1)
stress-ng: debug: [2619] stress-ng-dev: exited [2619] (instance 3)
stress-ng: debug: [2618] stress-ng-dev: exited [2618] (instance 2)
(test hangs here, system is still responsive)
syslog:
stress-ng: info: [1379] stress-ng-dev: 15 of 64 devices opened and exercised
kernel: [ 1324.186794] INFO: task stress-ng:1392 blocked for more than 120 seconds.
kernel: [ 1324.189937] Not tainted 3.13.0-190-generic #241-Ubuntu
kernel: [ 1324.192595] "echo 0 > /proc/sys/
kernel: [ 1324.196298] stress-ng D ffff88042d013b80 0 1392 1380 0x00000004
kernel: [ 1324.196303] ffff880414063cf0 0000000000000086 ffff88041719c800 0000000000013b80
kernel: [ 1324.196305] ffff880414063fd8 0000000000013b80 ffff88041719c800 ffff8804119e8c28
kernel: [ 1324.196307] ffffffff00000002 ffff8804119e8c30 ffff88041719c800 7fffffffffffffff
kernel: [ 1324.196309] Call Trace:
kernel: [ 1324.196316] [<ffffffff81740
kernel: [ 1324.196318] [<ffffffff8173f
kernel: [ 1324.196323] [<ffffffff81323
kernel: [ 1324.196325] [<ffffffff81744
kernel: [ 1324.196330] [<ffffffff81465
kernel: [ 1324.196333] [<ffffffff8145e
kernel: [ 1324.196337] [<ffffffff811dc
kernel: [ 1324.196339] [<ffffffff811dc
kernel: [ 1324.196342] [<ffffffff8174d
kernel: [ 1324.196344] INFO: task stress-ng:1393 blocked for more than 120 seconds.
kernel: [ 1324.199457] Not tainted 3.13.0-190-generic #241-Ubuntu
kernel: [ 1324.202194] "echo 0 > /proc/sys/
kernel: [ 1324.205856] stress-ng D ffff880411c7e000 0 1393 1380 0x00000004
kernel: [ 1324.205859] ffff880412bd9cf0 0000000000000086 ffff880411c7e000 0000000000013b80
kernel: [ 1324.205862] ffff880412bd9fd8 0000000000013b80 ffff880411c7e000 ffff8804119e8c28
kernel: [ 1324.205866] ffffffff00000002 ffff8804119e8c30 ffff880411c7e000 7fffffffffffffff
kernel: [ 1324.205868] Call Trace:
kernel: [ 1324.205874] [<ffffffff81740
kernel: [ 1324.205877] [<ffffffff8173f
kernel: [ 1324.205881] [<ffffffff81323
kernel: [ 1324.205885] [<ffffffff81744
kernel: [ 1324.205895] [<ffffffff81465
kernel: [ 1324.205899] [<ffffffff8145e
kernel: [ 1324.205904] [<ffffffff811ce
kernel: [ 1324.205908] [<ffffffff811dc
kernel: [ 1324.205911] [<ffffffff811ce
kernel: [ 1324.205914] [<ffffffff811dc
kernel: [ 1324.205918] [<ffffffff8174d
tags: | added: 3.13 amd64 sru-20220509 trusty ubuntu-stress-smoke-test |
Comment copied from Colin's comment in the upstream stress-ng project [1]:
This appears to be a kernel issue triggered by the stress-ng dev stressor on a tty device. One can corner this by working through the devices by specifying the dev file, for example: stress-ng --dev 0 --dev-file /dev/tty0
By the look of the commit it could be due to the enablement of the TIOCGETD ioctl() call, the commit has the following change:
/*
* On some older 3.13 kernels this can lock up, need to add
* a method to detect and skip this somehow. For the moment
* disable this stress test.
*/
-#if defined(TIOCGETD) && 0
+#if defined(TIOCGETD)
{
{
int ldis;
@@ -705,9 +779,13 @@ static void stress_dev_tty(
ret = ioctl(fd, TIOCSETD, &ldis);
if (ret == 0) {
}
+#else
+ UNEXPECTED
#endif
This seems to match up with the ldisc issue shown in the kernel trace. So I suspect there is a kernel fix for this that needs to be backported as this does not occur with newer kernels.
[1] https:/ /github. com/ColinIanKin g/stress- ng/issues/ 202