ubuntu_ltp_controllers:memcg_stress: 1 TBROK: Test timed out, sending SIGTERM!

Bug #1946348 reported by Stefan Bader
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Po-Hsu Lin

Bug Description

This happened on lagalla(amd64) with focal:linux-hwe-5.11(5.11.0-38.42~20.04.1) on sru-20210927. I have not seen this on any other node but lowlatency was only done on this host and generic on a different one. It does not feel like a kernel issue. Rather either lowlatency related in general or specifically that host.

  Running tests.......
  memcg_stress_test 1 TINFO: timeout per run is 0h 35m 0s
  memcg_stress_test 1 TINFO: Calculated available memory 35621 MB
  memcg_stress_test 1 TINFO: Testing 150 cgroups, using 236 MB, interval 5
  memcg_stress_test 1 TINFO: Starting cgroups
  memcg_stress_test 1 TINFO: Testing cgroups for 900s
  memcg_stress_test 1 TINFO: Killing groups
  memcg_stress_test 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
  memcg_stress_test 1 TBROK: test terminated
  memcg_stress_test 1 TINFO: Test is still running, waiting 10s
  memcg_stress_test 1 TINFO: AppArmor enabled, this may affect test results
  memcg_stress_test 1 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root)
  memcg_stress_test 1 TINFO: Test is still running, waiting 9s
  memcg_stress_test 1 TINFO: loaded AppArmor profiles: none

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

I'm seeing the same with bionic/linux-hwe-5.4 5.4.0-89.100~18.04.1 on different architectures. Hard to say if this is a new kernel issue as ubuntu_ltp_controllers has been split from ubuntu_ltp, which used to run on different nodes so these tests had never been run on the same nodes as now.

tags: added: 5.4
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Probably it's hw dependent, I will give it a try with a longer timeout.

Found on:
  * Focal 5.13.0-22 with node rizzo (lowlatency)
  * Focal 5.13.0-23 with node rumford (generic)
  * Hirsute 5.11.0-44.48 with node lagalla / s2lp4

Changed in ubuntu-kernel-tests:
assignee: nobody → Po-Hsu Lin (cypressyew)
tags: added: sru-20211129
Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
status: New → In Progress
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

It took almost 35 minutes (34 minutes and 45 seconds) to run on node rizzo with focal-intel-5.13.0-1009.9

INFO: Test start time: Wed Jan 12 01:48:44 UTC 2022
 COMMAND: /opt/ltp/bin/ltp-pan -q -e -S -a 102857 -n 102857 -f /tmp/ltp-rlnaiGTSwf/alltests -l /dev/null -C /dev/null -T /dev/null
 LOG File: /dev/null
 FAILED COMMAND File: /dev/null
 TCONF COMMAND File: /dev/null
 Running tests.......
 memcg_stress_test 1 TINFO: timeout per run is 0h 35m 0s
 memcg_stress_test 1 TINFO: Calculated available memory 9571 MB
 memcg_stress_test 1 TINFO: Testing 150 cgroups, using 62 MB, interval 5
 memcg_stress_test 1 TPASS: mkdir /dev/memcg passed as expected
 memcg_stress_test 1 TPASS: mount -t cgroup -omemory memcg /dev/memcg passed as expected
 memcg_stress_test 1 TINFO: Starting cgroups
 memcg_stress_test 1 TINFO: Testing cgroups for 900s
 memcg_stress_test 1 TINFO: Killing groups
 memcg_stress_test 1 TPASS: Test passed
 memcg_stress_test 1 TPASS: umount /dev/memcg passed as expected
 memcg_stress_test 1 TPASS: rmdir /dev/memcg passed as expected
 memcg_stress_test 2 TINFO: Testing 1 cgroups, using 9570 MB, interval 5
 memcg_stress_test 2 TPASS: mkdir /dev/memcg passed as expected
 memcg_stress_test 2 TPASS: mount -t cgroup -omemory memcg /dev/memcg passed as expected
 memcg_stress_test 2 TINFO: Starting cgroups
 memcg_stress_test 2 TINFO: Testing cgroups for 900s
 memcg_stress_test 2 TINFO: Killing groups
 memcg_stress_test 2 TPASS: Test passed
 memcg_stress_test 2 TPASS: umount /dev/memcg passed as expected
 memcg_stress_test 2 TPASS: rmdir /dev/memcg passed as expected

 Summary:
 passed 10
 failed 0
 broken 0
 skipped 0
 warnings 0
 INFO: ltp-pan reported all tests PASS
 LTP Version: 20210927
 INFO: Test end time: Wed Jan 12 02:23:29 UTC 2022

And the second attempt just failed, so yes we should bump the timeout for this.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

It took about 44m 26s on node rizzo to finish with focal-intel-5.13.0-1009.9
The bumped timeout does help with this issue:
https://git.launchpad.net/~canonical-kernel-team/+git/autotest-client-tests/commit/?id=c5546d04bc7bc89091b1830243da2a0469487269

I will mark this as fix-committed, and it will be marked as fix-released after this cycle.

Changed in ubuntu-kernel-tests:
status: In Progress → Fix Committed
Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.