Puppet agent using 100% CPU, in sched_yield() loop. Looks like an issue with ruby2.3 which has been fixed but not yet made it into Ubuntu.
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
ruby2.3 (Debian) |
Fix Released
|
Unknown
|
||||
ruby2.3 (Ubuntu) | ||||||
Xenial |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
Ruby processes can sometimes get stuck in a loop consuming 100% CPU, as described upstream and in the debian bug report. It has most commonly been seen in the puppet agent.
[Test Case]
It's not easy to reproduce. It has been suggested that this script eventually reproduces the problem:
while nice -n19 /opt/puppetlabs
Where sched_yield_loop.rb comes from "https:/
I personally haven't seen it happen with the script, but maybe it could take days.
[Regression Potential]
Races with threads can be hard to reproduce, and so can regressions.
Patch has been applied upstream and to debian for more than a year now.
[Other Info]
Bionic has ruby 2.5.1 and it has this fix already, as do all later ubuntu releaes.
[Original Description]
Ubuntu 16.04
ruby 2.3.1-2~16.04.12
kernel 4.4.0-148-generic
We've noticed an issue across multiple servers where puppet agent will seem to get stuck and consume 100% CPU for days or weeks on end until manually killed.
root@ps-
root 1412 0.0 0.2 143716 38680 ? Ssl Jun11 0:39 /usr/bin/ruby /usr/bin/puppet agent
root 34884 74.4 0.3 286848 53724 ? Rs Jun23 1141:44 puppet agent: applying configuration
root 111481 94.1 0.3 288572 54996 ? Rs Jun18 8642:32 puppet agent: applying configuration
root 128479 54.8 0.3 286744 53596 ? Rs 10:30 250:17 puppet agent: applying configuration
Strace shows it in a sched_yield() loop:
root@ps-
strace: Process 34884 attached
^Cstrace: Process 34884 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.002130 0 123189 sched_yield
------ ----------- ----------- --------- --------- ----------------
100.00 0.002130 123189 total
Some googling shows this is a common issue which was supposedly fixed/backported to ruby 2.3:
https:/
https:/
https:/
The following open Ubuntu bugs look to be having the same issue and suggest that this fix made it into Debian but never made it into Ubuntu:
https:/
https:/
Related branches
- Christian Ehrhardt (community): Approve
- Canonical Server: Pending requested
-
Diff: 104 lines (+70/-1)4 files modifieddebian/changelog (+7/-0)
debian/control (+2/-1)
debian/patches/do-not-wakeup-inside-child-processes.patch (+60/-0)
debian/patches/series (+1/-0)
tags: | added: server-next |
Changed in ruby2.3 (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in ruby2.3 (Debian): | |
status: | Unknown → Fix Released |
Changed in ruby2.3 (Ubuntu): | |
status: | Triaged → In Progress |
assignee: | nobody → Andreas Hasenack (ahasenack) |
description: | updated |
description: | updated |
description: | updated |
no longer affects: | ruby2.3 (Ubuntu) |
https:/ /salsa. debian. org/ruby- team/ruby/ commit/ 50d860d0bd7834e 95214a2b1ff5b8e 0ede7910a1 seems to be the fix
If I build packages in a ppa, can you test for us, prior to the upload to proposed?