pcsd runaway processes use 100% CPU

Bug #1772998 reported by Casey & Gina
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
pcs (Ubuntu)
New
Undecided
Unassigned

Bug Description

On multiple Ubuntu 16.04 servers where I have set up pacemaker clusters with pcs, there occasionally ends up being one or more processes consuming 100% of a single CPU core. Since this occurs on multiple nodes on a seemingly-random basis, it seems to be a problem with the version of the software in the package, rather than a localized issue.

Here is an example of 4 such offensive processes on a single node from `ps aux --forest`:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 6103 0.0 0.3 1076744 59200 ? Ssl Apr06 59:09 /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
root 17548 99.3 0.2 873648 46308 ? Rl Apr18 43356:57 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
root 16688 98.9 0.3 941160 49472 ? Rl May01 24300:52 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
root 6009 98.8 0.3 942188 49688 ? R May02 22607:08 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
root 15556 98.8 0.3 1076344 51836 ? R May03 21410:12 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &

I don't see anything of note in /var/log/pcsd/pcsd.log - seems to just be normal activity being logged by the master process that isn't runaway. I have enabled debugging in /etc/default/pcsd in hopes that I may see something more next time it happens.

I ran `strace -p <pid>`, and the screen filled with the following line repeating as fast as my terminal can render:
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0

I redirected this into a file for about 1 second and it filled with about 20,000 of those lines.

`ltrace -p <pid> -S` showed something similar to strace:

SYS_sched_yield(0x7f0ebc3f5c40, 0x7f0ebc3f5c40, 0, 0x7273752f3a6e6962) = 0
SYS_sched_yield(0x7f0ebc3f5c40, 0x7f0ebc3f5c40, 0, 0x7273752f3a6e6962) = 0
SYS_sched_yield(0x7f0ebc3f5c40, 0x7f0ebc3f5c40, 0, 0x7273752f3a6e6962) = 0

I'm not very fluent in how to introspect processes so I'm afraid this is the limit of what I know how to do as of yet. I am happy to run more commands on a server with such a runaway process as suggested. I have a node with a runaway process on it right now, which I will probably leave running in case there is more introspection that I can do.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: pcs 0.9.149-1ubuntu1.1
ProcVersionSignature: Ubuntu 4.4.0-34.53-generic 4.4.15
Uname: Linux 4.4.0-34-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
Date: Wed May 23 18:50:52 2018
InstallationDate: Installed on 2016-08-22 (639 days ago)
InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
ProcEnviron:
 LANGUAGE=en_US:
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US
 SHELL=/bin/bash
SourcePackage: pcs
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.default.pcsd: [modified]
mtime.conffile..etc.default.pcsd: 2018-05-23T18:36:58.697084

Revision history for this message
Casey & Gina (caseyandgina) wrote :
Revision history for this message
Casey & Gina (caseyandgina) wrote :

Any response on this? It is affecting a lot of servers where we are attempting to use it.

I believe this would be remedied, as well as #1772098, by simply providing a newer version of PCS.

Revision history for this message
Casey & Gina (caseyandgina) wrote :

Can somebody please respond?

Revision history for this message
Casey & Gina (caseyandgina) wrote :

Well, in case anybody ever looks at this bug, it may be a bug in the Ruby interpreter. This is fixed in Debian - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=876377

It would be great to get this fix backported to Ubuntu 16, as otherwise pcs is pretty much unusable.

I've tried installing the PCS 0.9.164 .deb from Ubuntu 18 on Ubuntu 16, and can confirm that it still exhibits the same problem.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.