ltp-syscalls: msgstress03 / msgstress04 fails because systemd limits number of processes

Bug #1783881 reported by Thadeu Lima de Souza Cascardo on 2018-07-26
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Undecided
Sean Feole
linux (Ubuntu)
Low
Canonical Kernel Team
Xenial
Undecided
Unassigned
Bionic
Undecided
Unassigned
linux-azure (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned
Bionic
Undecided
Unassigned
systemd (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned
Bionic
Undecided
Unassigned

Bug Description

As systemd limits the number of processes, this test will fail because it can't fork enough processes. That is limited to when the test is run after logging as user 1000, then running sudo. I guess that logging as root may not cause this to happen.

# ./testcases/bin/msgstress03
Fork failed (may be OK if under stress)
Fork failed (may be OK if under stress)
msgstress03 1 TFAIL : msgstress03.c:157: Fork failed (may be OK if under stress)
#

Changed in linux (Ubuntu Cosmic):
importance: Undecided → Low
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1783881

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

What about calling it like this:

$ sudo systemd-run ./testcases/bin/msgstress03

Does that make it pass correctly?

Which resource in particular is exhausted? and can it be toggled somehow using any of https://www.freedesktop.org/software/systemd/man/systemd.directives.html ?

Changed in systemd (Ubuntu Cosmic):
status: New → Incomplete

The number of processes (in systemd, tasks).

The command below works for me.

Now, should we change the default on our test systems? Running under systemd-run does not look like a good option.

# systemd-run -p TasksMax=1000000 ./testcases/bin/msgstress03

Dimitri John Ledkov (xnox) wrote :

Hi,

I recommend you to change your test system.

For example, you can modify /etc/systemd/system.conf and change DefaultTasksMax there. But that is for the systemd started units...
Note that TasksMax these days can accept % values of kernel configured max tasks too, meaning i.e. one can set it to 100%..... The upstream default is 15% and we reverted that, meaning setting it to unlimited.

However something odd is going on.

I wonder if you are actually hitting UserTasksMax instead (which appears to be under-documented).

I wonder if setting UserTasksMax=1000000 in /etc/systemd/logind.conf in the [Login] section, restarting systemd-logind, creating a brand new user session (logout _all_ sessions, and relogin) would actually solve your problem?

ps. Also you can use a "drop-in" instead of modifying a config file, as all config files in systemd support .d `drop-ins` like so:

instead of modifying /etc/systemd/system.conf one can instead install files like these:

/{lib,etc,run}/systemd/system.conf.d/bump-tasks-max.conf

with like contents of
   [Manager]
   DefaultTasksMax=1000000

Depending on whether you want it to be packaged in a package, be a config file, or be a runtime adjustment.

tags: added: cosmic
no longer affects: systemd (Ubuntu Cosmic)
Po-Hsu Lin (cypressyew) on 2019-06-24
tags: added: amd64 linux-kvm sru-20190603 ubuntu-ltp-syscalls
Sean Feole (sfeole) on 2019-11-25
Changed in ubuntu-kernel-tests:
status: New → Triaged
Sean Feole (sfeole) on 2019-12-16
tags: added: sru-20191202
tags: added: s390x
Sean Feole (sfeole) wrote :

This failure occurs in the cloud, on amazon aws, it can be reproduced on c4/c5.large abd c5.metal

where it passes on c5n.xlarge / i3.metal m3.large m4.large.

I will try the suggested steps above on one of the affected flavor types and see if we can get to the bottom of this.

Changed in ubuntu-kernel-tests:
assignee: nobody → Sean Feole (sfeole)
tags: added: sqa
Sean Feole (sfeole) on 2019-12-16
tags: added: azure
no longer affects: linux (Ubuntu Cosmic)
Sean Feole (sfeole) on 2020-01-24
tags: added: sru-20200106
Sean Feole (sfeole) on 2020-01-27
tags: added: 5.3
Sean Feole (sfeole) on 2020-02-05
tags: added: sru-20200127
Sean Feole (sfeole) on 2020-02-11
tags: added: gke
Sean Feole (sfeole) on 2020-02-20
tags: added: sru-20200217
tags: added: gcp
Sean Feole (sfeole) on 2020-02-20
tags: added: 5.0
Sean Feole (sfeole) on 2020-07-16
tags: added: sru-20200629
tags: added: sru-20200921

msgstress03 and msgstress04 is failing with similar behavior on Focal aws : 5.4.0-1026.26 : amd64

Download full text (3.3 KiB)

Updating bug to include msgstress04 failure.

msgstress04 - part of log from F aws : 5.4.0-1026.26 : amd64

12969. 09/23 17:30:15 DEBUG| utils:0153| [stdout] startup='Wed Sep 23 17:11:41 2020'
12970. 09/23 17:30:15 DEBUG| utils:0153| [stdout] msgstress04 0 TINFO : Found 32000 available message queues
12971. 09/23 17:30:15 DEBUG| utils:0153| [stdout] msgstress04 0 TINFO : Using upto 2097063 pids
12972. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9217
12973. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9237
12974. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9202
12975. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9251
12976. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9273
12977. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9277
12978. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9275
12979. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9276
12980. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9278
12981. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9279
12982. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9248
12983. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9262
12984. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9274
12985. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9310
12986. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9290
12987. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9314
12988. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9315
12989. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9316
12990. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9309
12991. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9325
12992. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9326
12993. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9298
12994. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9327
12995. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9308
12996. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9324
12997. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9357
12998. 09/23 17:30:15 DEBUG| utils:0153| [stdout] msgstress04 1 TFAIL : msgstress04.c:204: Fork failed ...

Read more...

summary: - ltp-syscalls: msgstress03 fails because systemd limits number of
- processes
+ ltp-syscalls: msgstress03 / msgstress04 fails because systemd limits
+ number of processes
tags: added: 5.4 aws
tags: added: 4.15 bionic
tags: added: focal
Po-Hsu Lin (cypressyew) wrote :

Still visible 4.15.0-1059.65 - oracle
on instance VM.DenseIO2.8 only.

tags: added: sru-20201109

Found on Groovy/linux 5.8.0-31.33

tags: added: 5.8 groovy
tags: added: sru-20201130
tags: added: sru-20210104
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers