ltp-syscalls: msgstress03 / msgstress04 fails because systemd limits number of processes

Bug #1783881 reported by Thadeu Lima de Souza Cascardo
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Krzysztof Kozlowski
linux (Ubuntu)
Invalid
Low
Canonical Kernel Team
Xenial
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned
linux-azure (Ubuntu)
Invalid
Undecided
Unassigned
Xenial
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned
systemd (Ubuntu)
Invalid
Undecided
Unassigned
Xenial
Won't Fix
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned

Bug Description

As systemd limits the number of processes, this test will fail because it can't fork enough processes. That is limited to when the test is run after logging as user 1000, then running sudo. I guess that logging as root may not cause this to happen.

# ./testcases/bin/msgstress03
Fork failed (may be OK if under stress)
Fork failed (may be OK if under stress)
msgstress03 1 TFAIL : msgstress03.c:157: Fork failed (may be OK if under stress)
#

Changed in linux (Ubuntu Cosmic):
importance: Undecided → Low
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1783881

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Dimitri John Ledkov (xnox) wrote : Re: ltp-syscalls: msgstress03 fails because systemd limits number of processes

What about calling it like this:

$ sudo systemd-run ./testcases/bin/msgstress03

Does that make it pass correctly?

Which resource in particular is exhausted? and can it be toggled somehow using any of https://www.freedesktop.org/software/systemd/man/systemd.directives.html ?

Changed in systemd (Ubuntu Cosmic):
status: New → Incomplete
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

The number of processes (in systemd, tasks).

The command below works for me.

Now, should we change the default on our test systems? Running under systemd-run does not look like a good option.

# systemd-run -p TasksMax=1000000 ./testcases/bin/msgstress03

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Hi,

I recommend you to change your test system.

For example, you can modify /etc/systemd/system.conf and change DefaultTasksMax there. But that is for the systemd started units...
Note that TasksMax these days can accept % values of kernel configured max tasks too, meaning i.e. one can set it to 100%..... The upstream default is 15% and we reverted that, meaning setting it to unlimited.

However something odd is going on.

I wonder if you are actually hitting UserTasksMax instead (which appears to be under-documented).

I wonder if setting UserTasksMax=1000000 in /etc/systemd/logind.conf in the [Login] section, restarting systemd-logind, creating a brand new user session (logout _all_ sessions, and relogin) would actually solve your problem?

ps. Also you can use a "drop-in" instead of modifying a config file, as all config files in systemd support .d `drop-ins` like so:

instead of modifying /etc/systemd/system.conf one can instead install files like these:

/{lib,etc,run}/systemd/system.conf.d/bump-tasks-max.conf

with like contents of
   [Manager]
   DefaultTasksMax=1000000

Depending on whether you want it to be packaged in a package, be a config file, or be a runtime adjustment.

tags: added: cosmic
no longer affects: systemd (Ubuntu Cosmic)
Po-Hsu Lin (cypressyew)
tags: added: amd64 linux-kvm sru-20190603 ubuntu-ltp-syscalls
Sean Feole (sfeole)
Changed in ubuntu-kernel-tests:
status: New → Triaged
Sean Feole (sfeole)
tags: added: sru-20191202
tags: added: s390x
Revision history for this message
Sean Feole (sfeole) wrote :

This failure occurs in the cloud, on amazon aws, it can be reproduced on c4/c5.large abd c5.metal

where it passes on c5n.xlarge / i3.metal m3.large m4.large.

I will try the suggested steps above on one of the affected flavor types and see if we can get to the bottom of this.

Changed in ubuntu-kernel-tests:
assignee: nobody → Sean Feole (sfeole)
tags: added: sqa
Sean Feole (sfeole)
tags: added: azure
no longer affects: linux (Ubuntu Cosmic)
Sean Feole (sfeole)
tags: added: sru-20200106
Sean Feole (sfeole)
tags: added: 5.3
Sean Feole (sfeole)
tags: added: sru-20200127
Sean Feole (sfeole)
tags: added: gke
Sean Feole (sfeole)
tags: added: sru-20200217
tags: added: gcp
Sean Feole (sfeole)
tags: added: 5.0
Sean Feole (sfeole)
tags: added: sru-20200629
tags: added: sru-20200921
Revision history for this message
Kelsey Steele (kelsey-steele) wrote :

msgstress03 and msgstress04 is failing with similar behavior on Focal aws : 5.4.0-1026.26 : amd64

Revision history for this message
Kelsey Steele (kelsey-steele) wrote :
Download full text (3.3 KiB)

Updating bug to include msgstress04 failure.

msgstress04 - part of log from F aws : 5.4.0-1026.26 : amd64

12969. 09/23 17:30:15 DEBUG| utils:0153| [stdout] startup='Wed Sep 23 17:11:41 2020'
12970. 09/23 17:30:15 DEBUG| utils:0153| [stdout] msgstress04 0 TINFO : Found 32000 available message queues
12971. 09/23 17:30:15 DEBUG| utils:0153| [stdout] msgstress04 0 TINFO : Using upto 2097063 pids
12972. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9217
12973. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9237
12974. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9202
12975. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9251
12976. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9273
12977. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9277
12978. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9275
12979. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9276
12980. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9278
12981. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9279
12982. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9248
12983. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9262
12984. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9274
12985. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9310
12986. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9290
12987. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9314
12988. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9315
12989. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9316
12990. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9309
12991. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9325
12992. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9326
12993. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9298
12994. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9327
12995. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9308
12996. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the second child of child group 9324
12997. 09/23 17:30:15 DEBUG| utils:0153| [stdout] Fork failure in the first child of child group 9357
12998. 09/23 17:30:15 DEBUG| utils:0153| [stdout] msgstress04 1 TFAIL : msgstress04.c:204: Fork failed ...

Read more...

summary: - ltp-syscalls: msgstress03 fails because systemd limits number of
- processes
+ ltp-syscalls: msgstress03 / msgstress04 fails because systemd limits
+ number of processes
tags: added: 5.4 aws
tags: added: 4.15 bionic
tags: added: focal
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Still visible 4.15.0-1059.65 - oracle
on instance VM.DenseIO2.8 only.

tags: added: sru-20201109
Revision history for this message
Kelsey Steele (kelsey-steele) wrote :

Found on Groovy/linux 5.8.0-31.33

tags: added: 5.8 groovy
tags: added: sru-20201130
tags: added: sru-20210104
tags: added: sru-20210315
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Still happening with linux-azure 5.8.0-1027.29 for cycle sru-20210315 but also for sru-20210222.

tags: added: sru-20210222
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Found on focal/azure-5.8 5.8.0-1031.33~20.04.1

tags: added: sru-20210412
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Found on B-5.4/aws, cycle sru-20210531 .

tags: added: sru-20210531
Changed in ubuntu-kernel-tests:
assignee: Sean Feole (sfeole) → Krzysztof Kozlowski (krzk)
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :
Changed in ubuntu-kernel-tests:
status: Triaged → In Progress
Revision history for this message
Dan Streetman (ddstreet) wrote :

marking invalid for systemd as this doesn't seem like a systemd bug

Changed in systemd (Ubuntu):
status: Incomplete → Invalid
Changed in systemd (Ubuntu Xenial):
status: New → Won't Fix
Changed in systemd (Ubuntu Bionic):
status: New → Invalid
tags: added: sru-20210621
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed in B/aws-5.4, cycle sru-20210621.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed in B/aws (kernel 4.15), cycle sru-20210621.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Observed in B/aws (kernel 4.15), cycle sru-20210621.

Changed in ubuntu-kernel-tests:
status: In Progress → Fix Released
Changed in linux (Ubuntu Xenial):
status: New → Invalid
Changed in linux (Ubuntu Bionic):
status: New → Invalid
Changed in linux-azure (Ubuntu):
status: New → Invalid
Changed in linux-azure (Ubuntu Xenial):
status: New → Invalid
Changed in linux-azure (Ubuntu Bionic):
status: New → Invalid
Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.