pcsd runaway processes use 100% CPU
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
pcs (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
On multiple Ubuntu 16.04 servers where I have set up pacemaker clusters with pcs, there occasionally ends up being one or more processes consuming 100% of a single CPU core. Since this occurs on multiple nodes on a seemingly-random basis, it seems to be a problem with the version of the software in the package, rather than a localized issue.
Here is an example of 4 such offensive processes on a single node from `ps aux --forest`:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 6103 0.0 0.3 1076744 59200 ? Ssl Apr06 59:09 /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/
root 17548 99.3 0.2 873648 46308 ? Rl Apr18 43356:57 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/
root 16688 98.9 0.3 941160 49472 ? Rl May01 24300:52 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/
root 6009 98.8 0.3 942188 49688 ? R May02 22607:08 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/
root 15556 98.8 0.3 1076344 51836 ? R May03 21410:12 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/
I don't see anything of note in /var/log/
I ran `strace -p <pid>`, and the screen filled with the following line repeating as fast as my terminal can render:
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
I redirected this into a file for about 1 second and it filled with about 20,000 of those lines.
`ltrace -p <pid> -S` showed something similar to strace:
SYS_sched_
SYS_sched_
SYS_sched_
I'm not very fluent in how to introspect processes so I'm afraid this is the limit of what I know how to do as of yet. I am happy to run more commands on a server with such a runaway process as suggested. I have a node with a runaway process on it right now, which I will probably leave running in case there is more introspection that I can do.
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: pcs 0.9.149-1ubuntu1.1
ProcVersionSign
Uname: Linux 4.4.0-34-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
Date: Wed May 23 18:50:52 2018
InstallationDate: Installed on 2016-08-22 (639 days ago)
InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
ProcEnviron:
LANGUAGE=en_US:
TERM=screen
PATH=(custom, no user)
LANG=en_US
SHELL=/bin/bash
SourcePackage: pcs
UpgradeStatus: No upgrade log present (probably fresh install)
modified.
mtime.conffile.
Any response on this? It is affecting a lot of servers where we are attempting to use it.
I believe this would be remedied, as well as #1772098, by simply providing a newer version of PCS.