linux-hwe-5.4 ADT test failure (ubuntu_stress_smoke_test) with linux-hwe-5.4/5.4.0-100.113~18.04.1

Bug #1961076 reported by Kleber Sacilotto de Souza
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
Fix Released
Medium
Colin Ian King
ubuntu-kernel-tests
New
Undecided
Unassigned
linux-hwe-5.4 (Ubuntu)
New
Undecided
Unassigned
Bionic
New
Undecided
Unassigned

Bug Description

The 'dev-shm' stress-ng test is failing with bionic/linux-hwe-5.4 5.4.0-100.113~18.04.1 on ADT, only on ppc64el.

Testing failed on:
    ppc64el: https://autopkgtest.ubuntu.com/results/autopkgtest-bionic/bionic/ppc64el/l/linux-hwe-5.4/20220216_115416_c1d6c@/log.gz

11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] stress-ng 0.13.11 g48be8ff4ffc4
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] system: Linux autopkgtest 5.4.0-100-generic #113~18.04.1-Ubuntu SMP Mon Feb 7 15:02:55 UTC 2022 ppc64le
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] RAM total: 7.9G, RAM free: 3.3G, swap free: 1023.9M
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] 4 processors online, 4 processors configured
11:35:08 DEBUG| [stdout] stress-ng: info: [26897] setting to a 5 second run per stressor
11:35:08 DEBUG| [stdout] stress-ng: info: [26897] dispatching hogs: 4 dev-shm
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] cache allocate: using cache maximum level L1
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] cache allocate: shared cache buffer size: 32K
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] starting stressors
11:35:08 DEBUG| [stdout] stress-ng: debug: [26899] stress-ng-dev-shm: started [26899] (instance 0)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26900] stress-ng-dev-shm: started [26900] (instance 1)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26901] stress-ng-dev-shm: started [26901] (instance 2)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26902] stress-ng-dev-shm: started [26902] (instance 3)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] 4 stressors started
11:35:08 DEBUG| [stdout] stress-ng: debug: [26899] stress-ng-dev-shm: assuming killed by OOM killer, restarting again (instance 0)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26902] stress-ng-dev-shm: assuming killed by OOM killer, restarting again (instance 3)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26901] stress-ng-dev-shm: assuming killed by OOM killer, restarting again (instance 2)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26900] stress-ng-dev-shm: assuming killed by OOM killer, restarting again (instance 1)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26899] (stress-ng-dev-shm) terminated on signal: 9 (Killed)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26899] (stress-ng-dev-shm) was killed by the OOM killer
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26899] terminated
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26900] (stress-ng-dev-shm) terminated on signal: 9 (Killed)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26900] (stress-ng-dev-shm) was possibly killed by the OOM killer
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26900] terminated
11:35:08 DEBUG| [stdout] stress-ng: debug: [26901] stress-ng-dev-shm: exited [26901] (instance 2)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26901] terminated
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26902] (stress-ng-dev-shm) terminated on signal: 9 (Killed)
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26902] (stress-ng-dev-shm) was killed by the OOM killer
11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26902] terminated
11:35:08 DEBUG| [stdout] stress-ng: info: [26897] successful run completed in 5.06s
11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 0 corrupted bogo-ops counter, 14 vs 0
11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 0 hash error in bogo-ops counter and run flag, 2146579844 vs 0
11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 1 corrupted bogo-ops counter, 13 vs 0
11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 1 hash error in bogo-ops counter and run flag, 1093487894 vs 0
11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 3 corrupted bogo-ops counter, 13 vs 0
11:35:08 DEBUG| [stdout] info: 5 failures reached, aborting stress process
11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 3 hash error in bogo-ops counter and run flag, 1093487894 vs 0
11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] metrics-check: stressor metrics corrupted, data is compromised
11:35:08 DEBUG| [stdout]
11:35:08 DEBUG| [stdout] [ 3840.094607] stress-ng invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=1000
11:35:08 DEBUG| [stdout] [ 3840.094611] CPU: 2 PID: 26903 Comm: stress-ng Tainted: P OE 5.4.0-100-generic #113~18.04.1-Ubuntu
11:35:08 DEBUG| [stdout] [ 3840.094612] Call Trace:
11:35:08 DEBUG| [stdout] [ 3840.094618] [c00000011edd77d0] [c000000000f05e28] dump_stack+0xbc/0x104 (unreliable)
11:35:08 DEBUG| [stdout] [ 3840.094623] [c00000011edd7810] [c0000000003863fc] dump_header+0x5c/0x2c0
11:35:08 DEBUG| [stdout] [ 3840.094625] [c00000011edd78a0] [c000000000386c1c] oom_kill_process+0x1ac/0x2a0
11:35:08 DEBUG| [stdout] [ 3840.094626] [c00000011edd78e0] [c000000000387d78] out_of_memory+0x128/0x750
11:35:08 DEBUG| [stdout] [ 3840.094630] [c00000011edd7980] [c00000000046a948] mem_cgroup_out_of_memory+0x118/0x150
11:35:08 DEBUG| [stdout] [ 3840.094632] [c00000011edd7a00] [c000000000470ab4] try_charge+0xa04/0xac0
11:35:08 DEBUG| [stdout] [ 3840.094634] [c00000011edd7b00] [c00000000047477c] mem_cgroup_try_charge+0xdc/0x330
11:35:08 DEBUG| [stdout] [ 3840.094635] [c00000011edd7b50] [c000000000474a0c] mem_cgroup_try_charge_delay+0x3c/0x80
11:35:08 DEBUG| [stdout] [ 3840.094637] [c00000011edd7b90] [c0000000003aa748] shmem_getpage_gfp+0x218/0xd00
11:35:08 DEBUG| [stdout] [ 3840.094639] [c00000011edd7c90] [c0000000003ad268] shmem_fallocate+0x348/0x610
11:35:08 DEBUG| [stdout] [ 3840.094641] [c00000011edd7d60] [c00000000048c994] vfs_fallocate+0x174/0x330
11:35:08 DEBUG| [stdout] [ 3840.094642] [c00000011edd7db0] [c00000000048e428] ksys_fallocate+0x68/0xf0
11:35:08 DEBUG| [stdout] [ 3840.094643] [c00000011edd7e00] [c00000000048e4d8] sys_fallocate+0x28/0x40
11:35:08 DEBUG| [stdout] [ 3840.094646] [c00000011edd7e20] [c00000000000b378] system_call+0x5c/0x68
11:35:08 DEBUG| [stdout] [ 3840.094649] --- interrupt: c01 at 0x71d6d5855ab0

This doesn't seem to be a kernel regression, as it has already failed with 5.4.0-94.106~18.04.1 (https://autopkgtest.ubuntu.com/results/autopkgtest-bionic/bionic/ppc64el/l/linux-hwe-5.4/20220108_161257_2ca9f@/log.gz), but passed with versions -97 and -99, and it doesn't fail in the regression tests infrastructure. The focal/linux main kernel also did not have any failure.

I have checked the stress-ng repo and didn't find any recent change to this testcase.

tags: added: kernel-adt-failure
summary: - linux-hwe-5.4 ADT test failure with linux-hwe-5.4/5.4.0-100.113~18.04.1
+ linux-hwe-5.4 ADT test failure (ubuntu_stress_smoke_test) with linux-
+ hwe-5.4/5.4.0-100.113~18.04.1
no longer affects: stress-ng (Ubuntu)
no longer affects: stress-ng (Ubuntu Bionic)
description: updated
description: updated
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Found again with linux-meta-hwe-5.4/5.4.0.132.148~18.04.109 only with ppc64el:

https://autopkgtest.ubuntu.com/results/autopkgtest-bionic/bionic/ppc64el/l/linux-hwe-5.4/20221105_134608_3ac96@/log.gz

Revision history for this message
Colin Ian King (colin-king) wrote :

I'll add a OOM check wrapper around this stressor so it can detect OOM'ing and do a sane error handling condition.

Revision history for this message
Colin Ian King (colin-king) wrote :
Changed in stress-ng:
status: New → Fix Released
assignee: nobody → Colin Ian King (colin-king)
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.