OOM by cpuacct_100_100 in ubuntu_ltp_controllers caused network connectivity lost on openstack P8 with B-hwe-5.4

Bug #2039515 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned

Bug Description

This is not a regression, it can be found since cycle 2023.07.10 with B-hwe-5.4.0-156.173~18.04.1 when we first start testing this ubuntu_ltp_controllers on openstack instances.

This is only affecting P8 instance on openstack.

The instance will be disconnected when running the cpuacct_100_100 test, due to the systemd-network was killed, test output:
04:51:04 INFO | START ubuntu_ltp_controllers.cpuacct_100_1 ubuntu_ltp_controllers.cpuacct_100_1 timestamp=1697431864 timeout=4500 localtime=Oct 16 04:51:04
04:51:04 DEBUG| Persistent state client._record_indent now set to 2
04:51:04 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_ltp_controllers.cpuacct_100_1', 'ubuntu_ltp_controllers.cpuacct_100_1')
04:51:04 DEBUG| Waiting for pid 10009 for 4500 seconds
04:51:05 INFO | Checking for required user/group ids
04:51:05 INFO |
04:51:05 INFO | 'root' user id and group found.
04:51:05 INFO | 'nobody' user id and group found.
04:51:05 INFO | 'bin' user id and group found.
04:51:05 INFO | 'daemon' user id and group found.
04:51:05 INFO | Users group found.
04:51:05 INFO | Sys group found.
04:51:05 INFO | Required users/groups exist.
04:51:05 INFO | no big block device was specified on commandline.
04:51:05 INFO | Tests which require a big block device are disabled.
04:51:05 INFO | You can specify it with option -z
04:51:05 INFO | INFO: Test start time: Mon Oct 16 04:51:04 UTC 2023
04:51:05 INFO | COMMAND: /opt/ltp/bin/ltp-pan -q -e -S -a 10013 -n 10013 -f /tmp/ltp-p3OGf1KQRt/alltests -l /dev/null -C /dev/null -T /dev/null
04:51:05 INFO | LOG File: /dev/null
04:51:05 INFO | FAILED COMMAND File: /dev/null
04:51:05 INFO | TCONF COMMAND File: /dev/null
04:51:05 INFO | Running tests.......
04:51:05 INFO | cpuacct 1 TINFO: timeout per run is 0h 5m 0s
04:51:05 INFO | tst_pid.c:84: TINFO: Cannot read session user limits from '/sys/fs/cgroup/user.slice/user-1000.slice/pids.max'
04:51:05 INFO | tst_pid.c:94: TINFO: Found limit of processes 10331 (from /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max)
04:51:05 INFO | cpuacct 1 TINFO: task limit fulfilled (approximate need 100, limit 10119)
04:51:05 INFO | cpuacct 1 TINFO: cpuacct: /sys/fs/cgroup/cpu,cpuacct
04:51:05 INFO | cpuacct 1 TINFO: Creating 100 subgroups each with 1 processes
04:51:05 INFO | cpuacct 1 TPASS: cpuacct.usage is not equal to 0 for every subgroup
04:51:05 INFO | cpuacct 1 TPASS: cpuacct.usage equal to subgroup*/cpuacct.usage
04:51:05 INFO | cpuacct 2 TINFO: removing created directories
04:51:05 INFO |
04:51:05 INFO | Summary:
04:51:05 INFO | passed 2
04:51:05 INFO | failed 0
04:51:05 INFO | broken 0
04:51:05 INFO | skipped 0
04:51:05 INFO | warnings 0
04:51:05 INFO | INFO: ltp-pan reported all tests PASS
04:51:05 INFO | LTP Version: 20230516
04:51:05 INFO | INFO: Test end time: Mon Oct 16 04:51:05 UTC 2023
04:51:06 INFO | GOOD ubuntu_ltp_controllers.cpuacct_100_1 ubuntu_ltp_controllers.cpuacct_100_1 timestamp=1697431866 localtime=Oct 16 04:51:06 completed successfully
04:51:06 INFO | END GOOD ubuntu_ltp_controllers.cpuacct_100_1 ubuntu_ltp_controllers.cpuacct_100_1 timestamp=1697431866 localtime=Oct 16 04:51:06
04:51:06 DEBUG| Persistent state client._record_indent now set to 1
04:51:06 DEBUG| Persistent state client.unexpected_reboot deleted
04:51:06 DEBUG| Test has timeout: 4500 sec.
04:51:06 INFO | START ubuntu_ltp_controllers.cpuacct_100_100 ubuntu_ltp_controllers.cpuacct_100_100 timestamp=1697431866 timeout=4500 localtime=Oct 16 04:51:06
04:51:06 DEBUG| Persistent state client._record_indent now set to 2
04:51:06 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_ltp_controllers.cpuacct_100_100', 'ubuntu_ltp_controllers.cpuacct_100_100')
04:51:06 DEBUG| Waiting for pid 10507 for 4500 seconds
# system disconnects here, test interrupted
-------------------------------------------------------------------------------------------------------
R E S U L T S
-------------------------------------------------------------------------------------------------------

With a manual test you will see this test caused OOM and kills systemd-network:
Oct 17 02:48:47 10 systemd[1]: Started Session 11 of user ubuntu.
Oct 17 02:50:25 10 kernel: [ 1435.609205] LTP: starting cpuacct_100_100 (cpuacct.sh 100 100)
Oct 17 02:50:45 10 kernel: [ 1455.360651] cpuacct_task invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Oct 17 02:50:45 10 kernel: [ 1455.360659] CPU: 0 PID: 31263 Comm: cpuacct_task Not tainted 5.4.0-165-generic #182~18.04.1-Ubuntu
Oct 17 02:50:45 10 kernel: [ 1455.360660] Call Trace:
Oct 17 02:50:45 10 kernel: [ 1455.360667] [c000000013b6f7c0] [c000000000f2da68] dump_stack+0xbc/0x104 (unreliable)
Oct 17 02:50:45 10 kernel: [ 1455.360671] [c000000013b6f800] [c00000000038e53c] dump_header+0x5c/0x2c0
Oct 17 02:50:45 10 kernel: [ 1455.360673] [c000000013b6f890] [c00000000038ed9c] oom_kill_process+0x19c/0x2c0
Oct 17 02:50:45 10 kernel: [ 1455.360675] [c000000013b6f8d0] [c000000000390088] out_of_memory+0x128/0x790
Oct 17 02:50:45 10 kernel: [ 1455.360677] [c000000013b6f970] [c00000000040cb34] __alloc_pages_slowpath+0xb64/0xea0
Oct 17 02:50:45 10 kernel: [ 1455.360679] [c000000013b6fb30] [c00000000040d188] __alloc_pages_nodemask+0x318/0x3d0
Oct 17 02:50:45 10 kernel: [ 1455.360681] [c000000013b6fbb0] [c000000000436bf8] alloc_pages_vma+0xb8/0x300
Oct 17 02:50:45 10 kernel: [ 1455.360683] [c000000013b6fc20] [c0000000003e2dc4] __handle_mm_fault+0x8d4/0x1ae0
Oct 17 02:50:45 10 kernel: [ 1455.360685] [c000000013b6fd10] [c0000000003e40d0] handle_mm_fault+0x100/0x1d0
Oct 17 02:50:45 10 kernel: [ 1455.360687] [c000000013b6fd50] [c00000000008b65c] __do_page_fault+0x30c/0xec0
Oct 17 02:50:45 10 kernel: [ 1455.360689] [c000000013b6fe20] [c00000000000a908] handle_page_fault+0x10/0x30
Oct 17 02:50:45 10 kernel: [ 1455.360693] --- interrupt: 301 at 0x789a6c9b6f60
Oct 17 02:50:45 10 kernel: [ 1455.360693] LR = 0x789a6c989a24
Oct 17 02:50:45 10 kernel: [ 1455.360693] Mem-Info:
Oct 17 02:50:45 10 kernel: [ 1455.360698] active_anon:40193 inactive_anon:44 isolated_anon:0
Oct 17 02:50:45 10 kernel: [ 1455.360698] active_file:7 inactive_file:7 isolated_file:25
Oct 17 02:50:45 10 kernel: [ 1455.360698] unevictable:0 dirty:0 writeback:0 unstable:0
Oct 17 02:50:45 10 kernel: [ 1455.360698] slab_reclaimable:782 slab_unreclaimable:6253
Oct 17 02:50:45 10 kernel: [ 1455.360698] mapped:22 shmem:129 pagetables:4689 bounce:0
Oct 17 02:50:45 10 kernel: [ 1455.360698] free:2804 free_pcp:31 free_cma:0
Oct 17 02:50:45 10 kernel: [ 1455.360701] Node 0 active_anon:2572352kB inactive_anon:2816kB active_file:448kB inactive_file:448kB unevictable:0kB isolated(anon):0kB isolated(file):1600kB mapped:1408kB dirty:0kB writeback:0kB shmem:8256kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6144kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Oct 17 02:50:45 10 kernel: [ 1455.360702] Node 0 Normal free:179456kB min:180224kB low:225280kB high:270336kB active_anon:2572352kB inactive_anon:2816kB active_file:448kB inactive_file:448kB unevictable:0kB writepending:0kB present:4194304kB managed:4144512kB mlocked:0kB kernel_stack:76496kB pagetables:300096kB bounce:0kB free_pcp:1984kB local_pcp:960kB free_cma:0kB
Oct 17 02:50:45 10 kernel: [ 1455.360706] lowmem_reserve[]: 0 0 0
Oct 17 02:50:45 10 kernel: [ 1455.360707] Node 0 Normal: 238*64kB (UME) 53*128kB (UME) 279*256kB (UME) 164*512kB (UM) 2*1024kB (UM) 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 179456kB
Oct 17 02:50:45 10 kernel: [ 1455.360713] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 17 02:50:45 10 kernel: [ 1455.360714] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 17 02:50:45 10 kernel: [ 1455.360715] 168 total pagecache pages
Oct 17 02:50:45 10 kernel: [ 1455.360716] 0 pages in swap cache
Oct 17 02:50:45 10 kernel: [ 1455.360717] Swap cache stats: add 0, delete 0, find 0/0
Oct 17 02:50:45 10 kernel: [ 1455.360718] Free swap = 0kB
Oct 17 02:50:45 10 kernel: [ 1455.360718] Total swap = 0kB
Oct 17 02:50:45 10 kernel: [ 1455.360719] 65536 pages RAM
Oct 17 02:50:45 10 kernel: [ 1455.360719] 0 pages HighMem/MovableOnly
Oct 17 02:50:45 10 kernel: [ 1455.360720] 778 pages reserved
Oct 17 02:50:45 10 kernel: [ 1455.360720] 0 pages cma reserved
Oct 17 02:50:45 10 kernel: [ 1455.360721] 0 pages hwpoisoned
Oct 17 02:50:45 10 kernel: [ 1455.360721] Tasks state (memory values in pages):
Oct 17 02:50:45 10 kernel: [ 1455.360722] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Oct 17 02:50:45 10 kernel: [ 1455.360728] [ 396] 0 396 771 69 33280 0 0 systemd-journal
Oct 17 02:50:45 10 kernel: [ 1455.360730] [ 411] 0 411 1266 22 26624 0 0 lvmetad
Oct 17 02:50:45 10 kernel: [ 1455.360731] [ 412] 0 412 319 51 31744 0 -1000 systemd-udevd
Oct 17 02:50:45 10 kernel: [ 1455.360734] [ 457] 62583 457 1411 67 28416 0 0 systemd-timesyn
Oct 17 02:50:45 10 kernel: [ 1455.360736] [ 869] 100 869 428 68 32256 0 0 systemd-network
Oct 17 02:50:45 10 kernel: [ 1455.360738] [ 887] 101 887 276 77 27904 0 0 systemd-resolve
Oct 17 02:50:45 10 kernel: [ 1455.360739] [ 1015] 0 1015 166 33 30976 0 0 cron
Oct 17 02:50:45 10 kernel: [ 1455.360741] [ 1016] 0 1016 1331 40 31232 0 0 irqbalance
Oct 17 02:50:45 10 kernel: [ 1455.360743] [ 1025] 0 1025 1739 208 30720 0 0 networkd-dispat
Oct 17 02:50:45 10 kernel: [ 1455.360745] [ 1026] 0 1026 1331 8 25856 0 0 iprdump
Oct 17 02:50:45 10 kernel: [ 1455.360746] [ 1027] 0 1027 101 30 26368 0 0 atd
Oct 17 02:50:45 10 kernel: [ 1455.360755] [ 1028] 0 1028 2381 15 31232 0 0 lxcfs
Oct 17 02:50:45 10 kernel: [ 1455.360757] [ 1029] 0 1029 268 73 27648 0 0 systemd-logind
Oct 17 02:50:45 10 kernel: [ 1455.360759] [ 1030] 0 1030 3787 75 33792 0 0 accounts-daemon
Oct 17 02:50:45 10 kernel: [ 1455.360760] [ 1035] 102 1035 3513 51 32256 0 0 rsyslogd
Oct 17 02:50:45 10 kernel: [ 1455.360762] [ 1037] 103 1037 186 57 26880 0 -900 dbus-daemon
Oct 17 02:50:45 10 kernel: [ 1455.360764] [ 1041] 0 1041 269 70 32000 0 -1000 sshd
Oct 17 02:50:45 10 kernel: [ 1455.360766] [ 1042] 0 1042 118 26 26368 0 0 rtas_errd
Oct 17 02:50:45 10 kernel: [ 1455.360768] [ 1164] 0 1164 3741 85 33536 0 0 polkitd
Oct 17 02:50:45 10 kernel: [ 1455.360769] [ 1169] 0 1169 54 9 26112 0 0 iprupdate
Oct 17 02:50:45 10 kernel: [ 1455.360771] [ 1173] 0 1173 1841 205 30976 0 0 unattended-upgr
Oct 17 02:50:45 10 kernel: [ 1455.360772] [ 1189] 0 1189 54 9 25856 0 0 iprinit
Oct 17 02:50:45 10 kernel: [ 1455.360774] [ 1334] 0 1334 130 16 30720 0 0 agetty
Oct 17 02:50:45 10 kernel: [ 1455.360776] [ 1346] 0 1346 96 17 30208 0 0 agetty
Oct 17 02:50:45 10 kernel: [ 1455.360778] [ 3408] 0 3408 173 46 26880 0 0 rpcbind
Oct 17 02:50:45 10 kernel: [ 1455.360780] [ 4148] 0 4148 88 26 30208 0 0 rpc.idmapd
Oct 17 02:50:45 10 kernel: [ 1455.360782] [ 4149] 0 4149 142 45 26624 0 0 rpc.mountd
Oct 17 02:50:45 10 kernel: [ 1455.360784] [ 4268] 0 4268 147 64 30720 0 0 haveged
Oct 17 02:50:45 10 kernel: [ 1455.360786] [ 25518] 0 25518 334 107 28160 0 0 sshd
Oct 17 02:50:45 10 kernel: [ 1455.360788] [ 25520] 1000 25520 315 78 28160 0 0 systemd
Oct 17 02:50:45 10 kernel: [ 1455.360790] [ 25521] 1000 25521 2719 157 34048 0 0 (sd-pam)
Oct 17 02:50:45 10 kernel: [ 1455.360792] [ 25599] 1000 25599 334 105 27904 0 0 sshd
Oct 17 02:50:45 10 kernel: [ 1455.360794] [ 25600] 1000 25600 188 47 27392 0 0 bash
Oct 17 02:50:45 10 kernel: [ 1455.360796] [ 25612] 0 25612 334 107 32512 0 0 sshd
Oct 17 02:50:45 10 kernel: [ 1455.360797] [ 25677] 1000 25677 334 105 32256 0 0 sshd
Oct 17 02:50:45 10 kernel: [ 1455.360799] [ 25678] 1000 25678 185 46 26624 0 0 bash
Oct 17 02:50:45 10 kernel: [ 1455.360800] [ 25690] 1000 25690 124 9 26624 0 0 dmesg
Oct 17 02:50:45 10 kernel: [ 1455.360802] [ 25730] 0 25730 230 61 27904 0 0 sudo
Oct 17 02:50:45 10 kernel: [ 1455.360804] [ 25731] 0 25731 55 11 30208 0 0 runltp
Oct 17 02:50:45 10 kernel: [ 1455.360806] [ 25867] 0 25867 52 8 25856 0 0 ltp-pan
Oct 17 02:50:45 10 kernel: [ 1455.360808] [ 25868] 0 25868 58 22 26112 0 0 cpuacct.sh
Oct 17 02:50:45 10 kernel: [ 1455.360810] [ 25887] 0 25887 49 7 25856 0 0 tst_timeout_kil
Oct 17 02:50:45 10 kernel: [ 1455.360811] [ 26507] 0 26507 52 7 25856 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.360813] [ 26508] 0 26508 52 7 25856 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.360815] [ 26509] 0 26509 52 7 25856 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.360816] [ 26510] 0 26510 52 7 25856 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.360818] [ 26511] 0 26511 52 7 26112 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.504029] [ 28610] 0 28610 52 7 26112 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.504030] [ 28611] 0 28611 52 8 25856 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.504032] [ 28613] 0 28613 52 7 25856 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.504033] [ 28614] 0 28614 52 7 25856 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.504034] [ 28615] 0 28615 52 7 29952 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.504036] [ 28616] 0 28616 52 7 25856 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.504037] [ 28617] 0 28617 52 7 26112 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.504038] [ 28618] 0 28618 52 8 25856 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.504039] [ 28619] 0 28619 52 7 30208 0 0 cpuacct_task
Oct 17 02:50:45 10 kernel: [ 1455.504041] [ 28620] 0 28620 52 7 30208 0 0 cpuacct_task
....
Oct 17 02:50:45 10 kernel: [ 1455.507546] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/systemd-networkd.service,task=systemd-network,pid=869,uid=100

Memory on this instance:
$ free -mh
              total used free shared buff/cache available
Mem: 4.0G 300M 3.5G 7.9M 140M 3.3G
Swap: 0B 0B 0B

Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

The sshd was killed in the second attempt.

summary: - OOM caused by cpuacct_100_100 in ubuntu_ltp_controllers kills systemd-
- network on openstack P8 with B-hwe-5.4
+ OOM by cpuacct_100_100 in ubuntu_ltp_controllers caused network
+ connectivity lost on openstack P8 with B-hwe-5.4
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue is also affecting Focal Openstack PowerPC VM.

tags: added: focal
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.