bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM

Bug #1863989 reported by Po-Hsu Lin on 2020-02-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
High
Colin Ian King
ubuntu-kernel-tests
Undecided
Colin Ian King
linux (Ubuntu)
Undecided
Unassigned

Bug Description

Issue found on Eoan zVM node kernel03

Test hung at bad-altstack test.

Reproducible rate: 4 out of 4 attempts

02:36:12 DEBUG| [stdout] aiol STARTING
02:36:17 DEBUG| [stdout] aiol RETURNED 0
02:36:17 DEBUG| [stdout] aiol PASSED
02:36:17 DEBUG| [stdout] bad-altstack STARTING
+ ARCHIVE=/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive
+ scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@kernel03:kernel-test-results /var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive

dmesg only shows:
[ 102.352136] Adding 1048572k swap on /home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img. Priority:-3 extents:95 across:26763272k SSFS
[ 122.402895] NET: Registered protocol family 38

It looks like this is caused by OOM issue, x3270 console flushed with OOM error messages.

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: linux-image-5.3.0-41-generic 5.3.0-41.33
ProcVersionSignature: Ubuntu 5.3.0-41.33-generic 5.3.18
Uname: Linux 5.3.0-41-generic s390x
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.11-0ubuntu8.4
Architecture: s390x
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Date: Thu Feb 20 06:06:04 2020
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: root=/dev/mapper/kl03vg01-kl03root crashkernel=196M BOOT_IMAGE=0
RelatedPackageVersions:
 linux-restricted-modules-5.3.0-41-generic N/A
 linux-backports-modules-5.3.0-41-generic N/A
 linux-firmware 1.183.4
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to eoan on 2019-09-30 (142 days ago)

Po-Hsu Lin (cypressyew) wrote :

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1863989

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Po-Hsu Lin (cypressyew) wrote :
Download full text (4.1 KiB)

This will appear in dmesg output later:

[ 1374.068457] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=gdb,pid=11199,uid=0
[ 1374.068467] Out of memory (oom_kill_allocating_task): Killed process 11199 (gdb) total-vm:26508kB, anon-rss:3872kB, file-rss:1740kB, shmem-rss:0kB
[ 1374.070884] oom_reaper: reaped process 11199 (gdb), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 1377.008811] gdb invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[ 1377.008816] CPU: 4 PID: 10868 Comm: gdb Tainted: P O 5.3.0-41-generic #33-Ubuntu
[ 1377.008817] Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
[ 1377.008817] Call Trace:
[ 1377.008826] ([<00000000b70db9ae>] show_stack+0x8e/0xd0)
[ 1377.008831] [<00000000b7958e0a>] dump_stack+0x8a/0xb8
[ 1377.008836] [<00000000b72cf262>] dump_header+0x62/0x250
[ 1377.008838] [<00000000b72ce332>] oom_kill_process+0x172/0x178
[ 1377.008839] [<00000000b72ce48e>] out_of_memory.part.0+0x156/0x4e0
[ 1377.008841] [<00000000b72cf0fe>] out_of_memory+0x6e/0xf8
[ 1377.008844] [<00000000b732c33a>] __alloc_pages_slowpath+0xda2/0xeb0
[ 1377.008846] [<00000000b732c6ee>] __alloc_pages_nodemask+0x2a6/0x318
[ 1377.008847] [<00000000b72c7736>] pagecache_get_page+0xde/0x2f8
[ 1377.008848] [<00000000b72c933c>] filemap_fault+0x4bc/0xa40
[ 1377.008852] [<00000000b746f87c>] ext4_filemap_fault+0x4c/0x70
[ 1377.008854] [<00000000b7307410>] __do_fault+0x50/0xe8
[ 1377.008855] [<00000000b730c4ce>] do_fault+0x266/0x560
[ 1377.008856] [<00000000b730d214>] __handle_mm_fault+0x76c/0x910
[ 1377.008857] [<00000000b730d47e>] handle_mm_fault+0xc6/0x1a0
[ 1377.008859] [<00000000b70ec4a4>] do_exception+0x12c/0x3e0
[ 1377.008860] [<00000000b70ed30a>] do_dat_exception+0x2a/0x58
[ 1377.008864] [<00000000b7979af8>] pgm_check_handler+0x1cc/0x220
[ 1377.008865] Mem-Info:
[ 1377.008869] active_anon:486067 inactive_anon:121575 isolated_anon:0
                active_file:160 inactive_file:115 isolated_file:0
                unevictable:4209 dirty:0 writeback:0 unstable:0
                slab_reclaimable:16367 slab_unreclaimable:55796
                mapped:1794 shmem:0 pagetables:30144 bounce:0
                free:13635 free_pcp:1380 free_cma:207
[ 1377.008872] Node 0 active_anon:1944268kB inactive_anon:486300kB active_file:640kB inactive_file:460kB unevictable:16836kB isolated(anon):0kB isolated(file):0kB mapped:7176kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1377.008873] Node 0 DMA free:44452kB min:39184kB low:42196kB high:45208kB active_anon:1196556kB inactive_anon:307972kB active_file:680kB inactive_file:884kB unevictable:0kB writepending:0kB present:2097152kB managed:2097056kB mlocked:0kB kernel_stack:8960kB pagetables:69696kB bounce:0kB free_pcp:3940kB local_pcp:84kB free_cma:0kB
[ 1377.008877] lowmem_reserve[]: 0 1751 1751
[ 1377.008879] Node 0 Normal free:10088kB min:10464kB low:13080kB high:15696kB active_anon:747712kB inactive_anon:178328kB active_file:376kB inactive_file:280kB unevictable:16836kB writepending:0kB present:2...

Read more...

tags: added: sru-20200217 ubuntu-stress-smoke-test
tags: added: ubuntu-stress-smoke-tests
removed: ubuntu-stress-smoke-test
tags: added: ubuntu-stress-smoke-test
removed: ubuntu-stress-smoke-tests
Po-Hsu Lin (cypressyew) wrote :

@Colin, I have your keys imported, if you need please feel free to access this instance with ubuntu@kernel03

Thanks.

Changed in stress-ng:
assignee: nobody → Colin Ian King (colin-king)
importance: Undecided → High
Colin Ian King (colin-king) wrote :

This is probably fixed with commit: https://kernel.ubuntu.com/git/ubuntu/autotest-client-tests.git/commit/?id=4db07fef60449c786364638d7978b239676624eb

I've run this a few times with the fix above and can't reproduce this issue.

Changed in ubuntu-kernel-tests:
status: New → Fix Committed
Changed in stress-ng:
status: New → Invalid
Changed in linux (Ubuntu):
assignee: nobody → Colin Ian King (colin-king)
assignee: Colin Ian King (colin-king) → nobody
Changed in ubuntu-kernel-tests:
assignee: nobody → Colin Ian King (colin-king)
Colin Ian King (colin-king) wrote :

@Sam. Can you re-run the test and if it's OK then I no longer require the instance kernel03.

Po-Hsu Lin (cypressyew) wrote :

Hi Colin,
with HEAD SHA1 fcc82c0 in 3 different attempts (running the whole smoke test suite), this test hang with bad-altstack twice.

I am now giving it another chance.

Po-Hsu Lin (cypressyew) wrote :

The 4th attempt has passed.

Colin Ian King (colin-king) wrote :

I found this was failing on kernel03 because there was very little space for the test to enable a large swap file. I cleaned the machine up and was unable to reproduce the failure. I'm assuming the tests were failing on kernel03, if not what machine were they being run on?

Colin Ian King (colin-king) wrote :

Can this be re-tested to see if this now fails after I cleaned up kernel03?

Po-Hsu Lin (cypressyew) wrote :

Hello Colin,
with disk space freed up to 8.3G free on /, this node became more stabilized. Failed once with bad-altstack out of 5 attempts.

I think we can call this fixed for now.
Thanks!

Changed in stress-ng:
status: Invalid → Fix Committed
Changed in ubuntu-kernel-tests:
status: Fix Committed → Fix Released
Changed in stress-ng:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers