OOM by msgstress04 in ubuntu_ltp_syscalls caused network connectivity lost on openstack P8 with B-hwe-5.4

Bug #2039520 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned

Bug Description

This is not a regression, it can be found since cycle 2023.07.10 with B-hwe-5.4.0-156.173~18.04.1 when we first start testing this ubuntu_ltp_controllers on openstack instances.

This is only affecting P8 instance on openstack.

Just like bug 2039515, the instance will be disconnected when running the msgstress04 test, test output:
04:36:36 INFO | START ubuntu_ltp_syscalls.msgstress04 ubuntu_ltp_syscalls.msgstress04 timestamp=1697430996 timeout=3600 localtime=Oct 16 04:36:36
04:36:36 DEBUG| Persistent state client._record_indent now set to 2
04:36:36 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_ltp_syscalls.msgstress04', 'ubuntu_ltp_syscalls.msgstress04')
04:36:36 DEBUG| Waiting for pid 14884 for 3600 seconds
Connection to 10.43.123.7 closed by remote host.
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(231) [Receiver=3.2.7]

TEST SYSTEM FAILURE DETECTED
A failure of the system under test has been detected.
Please review log files for a potential panic, hang or unexpected reboot

-------------------------------------------------------------------------------------------------------
R E S U L T S
-------------------------------------------------------------------------------------------------------

With a manual test you will see this test caused OOM and kills sshd:
Oct 17 05:25:44 10 kernel: [ 297.512158] LTP: starting msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479487] msgstress04 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Oct 17 05:25:52 10 kernel: [ 305.479493] CPU: 1 PID: 8725 Comm: msgstress04 Not tainted 5.4.0-165-generic #182~18.04.1-Ubuntu
Oct 17 05:25:52 10 kernel: [ 305.479496] Call Trace:
Oct 17 05:25:52 10 kernel: [ 305.479502] [c00000006aa2b370] [c000000000f2da68] dump_stack+0xbc/0x104 (unreliable)
Oct 17 05:25:52 10 kernel: [ 305.479506] [c00000006aa2b3b0] [c00000000038e53c] dump_header+0x5c/0x2c0
Oct 17 05:25:52 10 kernel: [ 305.479508] [c00000006aa2b440] [c00000000038ed9c] oom_kill_process+0x19c/0x2c0
Oct 17 05:25:52 10 kernel: [ 305.479510] [c00000006aa2b480] [c000000000390088] out_of_memory+0x128/0x790
Oct 17 05:25:52 10 kernel: [ 305.479512] [c00000006aa2b520] [c00000000040cb34] __alloc_pages_slowpath+0xb64/0xea0
Oct 17 05:25:52 10 kernel: [ 305.479514] [c00000006aa2b6e0] [c00000000040d188] __alloc_pages_nodemask+0x318/0x3d0
Oct 17 05:25:52 10 kernel: [ 305.479517] [c00000006aa2b760] [c000000000436bf8] alloc_pages_vma+0xb8/0x300
Oct 17 05:25:52 10 kernel: [ 305.479519] [c00000006aa2b7d0] [c0000000003dbd10] wp_page_copy+0xb0/0xf70
Oct 17 05:25:52 10 kernel: [ 305.479520] [c00000006aa2b8a0] [c0000000003dfab4] do_wp_page+0xd4/0xae0
Oct 17 05:25:52 10 kernel: [ 305.479522] [c00000006aa2b8f0] [c0000000003e36a0] __handle_mm_fault+0x11b0/0x1ae0
Oct 17 05:25:52 10 kernel: [ 305.479523] [c00000006aa2b9e0] [c0000000003e40d0] handle_mm_fault+0x100/0x1d0
Oct 17 05:25:52 10 kernel: [ 305.479525] [c00000006aa2ba20] [c00000000008b65c] __do_page_fault+0x30c/0xec0
Oct 17 05:25:52 10 kernel: [ 305.479527] [c00000006aa2baf0] [c00000000000a908] handle_page_fault+0x10/0x30
Oct 17 05:25:52 10 kernel: [ 305.479531] --- interrupt: 301 at schedule_tail+0x88/0x140
Oct 17 05:25:52 10 kernel: [ 305.479531] LR = schedule_tail+0x80/0x140
Oct 17 05:25:52 10 kernel: [ 305.479532] [c00000006aa2bdf0] [c0000000001977a4] schedule_tail+0x24/0x140 (unreliable)
Oct 17 05:25:52 10 kernel: [ 305.479534] [c00000006aa2be20] [c00000000000b69c] ret_from_fork+0x4/0x54
Oct 17 05:25:52 10 kernel: [ 305.479535] Mem-Info:
Oct 17 05:25:52 10 kernel: [ 305.479540] active_anon:52102 inactive_anon:42 isolated_anon:0
Oct 17 05:25:52 10 kernel: [ 305.479540] active_file:16 inactive_file:0 isolated_file:1
Oct 17 05:25:52 10 kernel: [ 305.479540] unevictable:0 dirty:0 writeback:0 unstable:0
Oct 17 05:25:52 10 kernel: [ 305.479540] slab_reclaimable:617 slab_unreclaimable:3576
Oct 17 05:25:52 10 kernel: [ 305.479540] mapped:0 shmem:127 pagetables:1702 bounce:0
Oct 17 05:25:52 10 kernel: [ 305.479540] free:2728 free_pcp:18 free_cma:0
Oct 17 05:25:52 10 kernel: [ 305.479543] Node 0 active_anon:3334528kB inactive_anon:2688kB active_file:1024kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):64kB mapped:0kB dirty:0kB writeback:0kB shmem:8128kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2648064kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Oct 17 05:25:52 10 kernel: [ 305.479544] Node 0 Normal free:174592kB min:180224kB low:225280kB high:270336kB active_anon:3328320kB inactive_anon:2688kB active_file:7872kB inactive_file:0kB unevictable:0kB writepending:0kB present:4194304kB managed:4144512kB mlocked:0kB kernel_stack:29104kB pagetables:108928kB bounce:0kB free_pcp:1152kB local_pcp:192kB free_cma:0kB
Oct 17 05:25:52 10 kernel: [ 305.479548] lowmem_reserve[]: 0 0 0
Oct 17 05:25:52 10 kernel: [ 305.479549] Node 0 Normal: 114*64kB (UME) 21*128kB (E) 3*256kB (UE) 26*512kB (M) 13*1024kB (UM) 1*2048kB (M) 1*4096kB (U) 16*8192kB (M) 0*16384kB = 174592kB
Oct 17 05:25:52 10 kernel: [ 305.479556] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 17 05:25:52 10 kernel: [ 305.479558] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 17 05:25:52 10 kernel: [ 305.479558] 147 total pagecache pages
Oct 17 05:25:52 10 kernel: [ 305.479560] 0 pages in swap cache
Oct 17 05:25:52 10 kernel: [ 305.479561] Swap cache stats: add 0, delete 0, find 0/0
Oct 17 05:25:52 10 kernel: [ 305.479561] Free swap = 0kB
Oct 17 05:25:52 10 kernel: [ 305.479562] Total swap = 0kB
Oct 17 05:25:52 10 kernel: [ 305.479562] 65536 pages RAM
Oct 17 05:25:52 10 kernel: [ 305.479563] 0 pages HighMem/MovableOnly
Oct 17 05:25:52 10 kernel: [ 305.479563] 778 pages reserved
Oct 17 05:25:52 10 kernel: [ 305.479564] 0 pages cma reserved
Oct 17 05:25:52 10 kernel: [ 305.479564] 0 pages hwpoisoned
Oct 17 05:25:52 10 kernel: [ 305.479566] Tasks state (memory values in pages):
Oct 17 05:25:52 10 kernel: [ 305.479566] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Oct 17 05:25:52 10 kernel: [ 305.479570] [ 398] 0 398 771 65 29696 0 0 systemd-journal
Oct 17 05:25:52 10 kernel: [ 305.479572] [ 410] 0 410 322 53 27392 0 -1000 systemd-udevd
Oct 17 05:25:52 10 kernel: [ 305.479574] [ 411] 0 411 112 22 26368 0 0 blkmapd
Oct 17 05:25:52 10 kernel: [ 305.479576] [ 414] 0 414 1266 22 30976 0 0 lvmetad
Oct 17 05:25:52 10 kernel: [ 305.479578] [ 455] 0 455 88 26 26368 0 0 rpc.idmapd
Oct 17 05:25:52 10 kernel: [ 305.479580] [ 469] 62583 469 1411 63 28160 0 0 systemd-timesyn
Oct 17 05:25:52 10 kernel: [ 305.479581] [ 470] 0 470 173 46 26880 0 0 rpcbind
Oct 17 05:25:52 10 kernel: [ 305.479583] [ 531] 0 531 147 105 30976 0 0 haveged
Oct 17 05:25:52 10 kernel: [ 305.479585] [ 883] 100 883 428 68 28160 0 0 systemd-network
Oct 17 05:25:52 10 kernel: [ 305.479587] [ 906] 101 906 276 69 32256 0 0 systemd-resolve
Oct 17 05:25:52 10 kernel: [ 305.479589] [ 940] 0 940 142 45 30720 0 0 rpc.mountd
Oct 17 05:25:52 10 kernel: [ 305.479591] [ 1072] 0 1072 266 72 32256 0 0 systemd-logind
Oct 17 05:25:52 10 kernel: [ 305.479593] [ 1073] 0 1073 3786 77 34304 0 0 accounts-daemon
Oct 17 05:25:52 10 kernel: [ 305.479594] [ 1075] 0 1075 1739 207 30464 0 0 networkd-dispat
Oct 17 05:25:52 10 kernel: [ 305.479596] [ 1079] 0 1079 166 34 27136 0 0 cron
Oct 17 05:25:52 10 kernel: [ 305.479598] [ 1080] 0 1080 101 31 26368 0 0 atd
Oct 17 05:25:52 10 kernel: [ 305.479599] [ 1084] 0 1084 2381 47 26880 0 0 lxcfs
Oct 17 05:25:52 10 kernel: [ 305.479601] [ 1085] 103 1085 186 53 31232 0 -900 dbus-daemon
Oct 17 05:25:52 10 kernel: [ 305.479603] [ 1086] 0 1086 1331 8 26112 0 0 iprdump
Oct 17 05:25:52 10 kernel: [ 305.479604] [ 1091] 0 1091 1331 42 31232 0 0 irqbalance
Oct 17 05:25:52 10 kernel: [ 305.479606] [ 1092] 102 1092 3513 52 32256 0 0 rsyslogd
Oct 17 05:25:52 10 kernel: [ 305.479608] [ 1097] 0 1097 1841 205 30976 0 0 unattended-upgr
Oct 17 05:25:52 10 kernel: [ 305.479610] [ 1104] 0 1104 118 27 26368 0 0 rtas_errd
Oct 17 05:25:52 10 kernel: [ 305.479611] [ 1213] 0 1213 269 67 32000 0 -1000 sshd
Oct 17 05:25:52 10 kernel: [ 305.479613] [ 1258] 0 1258 3741 117 37632 0 0 polkitd
Oct 17 05:25:52 10 kernel: [ 305.479614] [ 1270] 0 1270 54 9 26112 0 0 iprinit
Oct 17 05:25:52 10 kernel: [ 305.479616] [ 1273] 0 1273 54 9 26112 0 0 iprupdate
Oct 17 05:25:52 10 kernel: [ 305.479618] [ 1350] 0 1350 130 16 30720 0 0 agetty
Oct 17 05:25:52 10 kernel: [ 305.479620] [ 1364] 0 1364 96 16 30464 0 0 agetty
Oct 17 05:25:52 10 kernel: [ 305.479622] [ 1490] 0 1490 334 107 28416 0 0 sshd
Oct 17 05:25:52 10 kernel: [ 305.479623] [ 1505] 1000 1505 315 78 32000 0 0 systemd
Oct 17 05:25:52 10 kernel: [ 305.479625] [ 1507] 1000 1507 1682 140 29440 0 0 (sd-pam)
Oct 17 05:25:52 10 kernel: [ 305.479627] [ 1620] 1000 1620 334 105 28416 0 0 sshd
Oct 17 05:25:52 10 kernel: [ 305.479629] [ 1621] 1000 1621 186 45 31232 0 0 bash
Oct 17 05:25:52 10 kernel: [ 305.479630] [ 1633] 0 1633 230 62 31744 0 0 sudo
Oct 17 05:25:52 10 kernel: [ 305.479632] [ 1635] 0 1635 55 11 29952 0 0 runltp
Oct 17 05:25:52 10 kernel: [ 305.479634] [ 1769] 0 1769 52 7 25856 0 0 ltp-pan
Oct 17 05:25:52 10 kernel: [ 305.479636] [ 1770] 0 1770 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479637] [ 6836] 0 6836 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479639] [ 6838] 0 6838 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479641] [ 6839] 0 6839 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479642] [ 6841] 0 6841 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479644] [ 6842] 0 6842 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479646] [ 6843] 0 6843 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480800] [ 9051] 0 9051 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480801] [ 9052] 0 9052 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480803] [ 9053] 0 9053 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480804] [ 9054] 0 9054 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480805] [ 9055] 0 9055 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480807] [ 9056] 0 9056 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480808] [ 9057] 0 9057 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480810] [ 9058] 0 9058 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480811] [ 9059] 0 9059 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480813] [ 9060] 0 9060 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480815] [ 9061] 0 9061 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480816] [ 9062] 0 9062 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480818] [ 9063] 0 9063 178 75 30720 0 0 msgstress04
...
Oct 17 05:25:52 10 kernel: [ 305.482178] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/networkd-dispatcher.service,task=networkd-dispat,pid=1075,uid=0
Oct 17 05:25:52 10 kernel: [ 305.482242] Out of memory: Killed process 1075 (networkd-dispat) total-vm:111296kB, anon-rss:13120kB, file-rss:128kB, shmem-rss:0kB, UID:0 pgtables:29kB oom_score_adj:0

$ grep Killed /var/log/syslog
Oct 17 05:25:52 10 kernel: [ 305.482242] Out of memory: Killed process 1075 (networkd-dispat) total-vm:111296kB, anon-rss:13120kB, file-rss:128kB, shmem-rss:0kB, UID:0 pgtables:29kB oom_score_adj:0
Oct 17 05:25:52 10 kernel: [ 305.687351] Out of memory: Killed process 1097 (unattended-upgr) total-vm:117824kB, anon-rss:13120kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:30kB oom_score_adj:0
Oct 17 05:26:00 10 kernel: [ 312.817879] Out of memory: Killed process 1507 ((sd-pam)) total-vm:107648kB, anon-rss:8960kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:28kB oom_score_adj:0
Oct 17 05:26:00 10 kernel: [ 312.859713] Out of memory: Killed process 1258 (polkitd) total-vm:239424kB, anon-rss:7616kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:36kB oom_score_adj:0
Oct 17 05:26:52 10 kernel: [ 364.547278] Out of memory: Killed process 14222 (msgstress04) total-vm:11392kB, anon-rss:4800kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:30kB oom_score_adj:0
Oct 17 05:26:52 10 kernel: [ 364.952675] Out of memory: Killed process 14221 (msgstress04) total-vm:11392kB, anon-rss:4800kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:30kB oom_score_adj:0
Oct 17 05:26:52 10 kernel: [ 365.401469] Out of memory: Killed process 13876 (msgstress04) total-vm:11392kB, anon-rss:4800kB, file-rss:64kB, shmem-rss:0kB, UID:0 pgtables:30kB oom_score_adj:0
Oct 17 05:26:53 10 kernel: [ 365.499581] Out of memory: Killed process 13787 (msgstress04) total-vm:11392kB, anon-rss:4800kB, file-rss:448kB, shmem-rss:0kB, UID:0 pgtables:30kB oom_score_adj:0

Memory on this instance:
$ free -mh
              total used free shared buff/cache available
Mem: 4.0G 300M 3.5G 7.9M 140M 3.3G
Swap: 0B 0B 0B

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue is also affecting Focal Openstack PowerPC VM.

tags: added: focal
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue can be found on B-4.15.0-221.232 openstack ppc64el VM as well.

When this happens you will see incomplete msgstress04 test output:
19:43:38 INFO | START ubuntu_ltp_syscalls.msgstress04 ubuntu_ltp_syscalls.msgstress04 timestamp=1706211818 timeout=3600 localtime=Jan 25 19:43:38
19:43:38 DEBUG| Persistent state client._record_indent now set to 2
19:43:38 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_ltp_syscalls.msgstress04', 'ubuntu_ltp_syscalls.msgstress04')
19:43:38 DEBUG| Waiting for pid 17012 for 3600 seconds
-------------------------------------------------------------------------------------------------------
R E S U L T S
-------------------------------------------------------------------------------------------------------

tags: added: 4.15 sru-s20231030
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.