Call to fork/clone fails with EAGAIN (before encountering resource limits)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned |
Bug Description
I wrote a test program that forks processes until the fork calls start to fail. It forks around 12000 processes and then the fork calls start failing with EAGAIN. According to the fork man page, there are four conditions that could cause EAGAIN to be returned:
- the RLIMIT_NPROC soft resource limit, which limits the number of processes and threads for a real user ID, was reached
- the kernel's system-wide limit on the number of processes and threads, /proc/sys/
- the maximum number of PIDs, /proc/sys/
- The caller is operating under the SCHED_DEADLINE scheduling policy and does not have the reset-on-fork flag set
On my machine:
- Before running the program, ~250 processes / ~500 threads are running (as determined by ps)
- RLIMIT_NPROC (soft and hard) is 31616
- threads-max is 63233
- pid_max is 32768
- the program runs with the SCHED_NORMAL scheduling policy (so, not SCHED_DEADLINE)
It seems strange that the fork calls fail after ~12000 forks, (it should fail at 31616.) Some more technical details:
- Reproducible on Ubuntu 16.04.1 running with kernel 4.4.0-36-generic.
- Reproducible when tested with mainline kernel 4.8.0-040800rc6
- Doesn't occur on Ubuntu 12.04 running with kernel 3.2.0-23-generic
- Monitoring thread usage, it appears to fail at exactly the 12,500 thread mark
- From using strace, it looks like clone is the syscall actually being used behind the scenes (should have the same EAGAIN error semantics, from the clone man page.)
- From using systemtap and ftrace, it looks like copy_process in _do_fork returns an error when this case is hit. Maybe from sched_trace? It's hard to tell - the ftrace output doesn't seem complete.
I'm attaching the test fork program I've been using, which has some code to also print the aforementioned values.
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-
ProcVersionSign
Uname: Linux 4.4.0-36-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/
/dev/snd/
CurrentDesktop: GNOME-Flashback
Date: Thu Sep 15 11:07:16 2016
EcryptfsInUse: Yes
HibernationDevice: RESUME=
InstallationDate: Installed on 2016-09-12 (3 days ago)
InstallationMedia: Ubuntu 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
IwConfig:
lo no wireless extensions.
eno1 no wireless extensions.
MachineType: Dell Inc. Precision T1600
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=
RelatedPackageV
linux-
linux-
linux-firmware 1.157.3
RfKill:
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/11/2011
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A02
dmi.board.name: 06NWYK
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 6
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.
dmi.product.name: Precision T1600
dmi.product.
dmi.sys.vendor: Dell Inc.
tags: | added: kernel-bug-exists-upstream |
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
tags: | added: cscc |
This change was made by a bot.