Comment 0 for bug 1856256

Revision history for this message
Bill F (bill-f-666) wrote : Online profile replacement appears to cause severe CPU contention

First of all, I would like to clearly disclaim that this report is very qualitative in nature, and I cannot reliably reproduce the problem. However, my hope is that a clear description of the observed behavior will be sufficient to indicate potential underlying causes to someone with the right knowledge.

That said, here is the behavior:

While "apparmor_parser -r" was running, on a large (> 50 cores) host with ~50 apache server processes (which use AA subprofiles, primarily for file system isolation), the volume of requests handled dropped precipitously (from ~thousands/minute to <100/minute), while both apparmor_parser and the apache processes appear to be consuming high amounts of CPU continuously (apparmor_parser, specifically, was consuming at least a full core in all samples observed).

Furthermore, and this is the interesting part, *every 10 seconds*, two things occur:

1) AA logs one message of the form below, where the sequence of $profile_name values (again, one every 10 seconds) represents a lexicographically-ordered iteration over all of the relevant AA profiles on this host.

$timestamp $host kernel: [...] audit: type=1400 audit(...): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="$profile_name" pid=<...> comm="apparmor_parser"

2) A burst of requests are successfully processed.

Unfortunately, someone rebooted the host before I was able to capture any additional information (or attempt any finer-grained remediation measures), and the host came back up in a nominal state.

Again, I realize all of this is very vague/high-level, but two things stand out to me as particularly significant here:

1) The behavior described above began *immediately* after the invocation of "appamor_parser -r", and that "make some progress every 10 seconds" periodicity strongly implies some sort of resource contention. And, for whatever it's worth from a random person on the internet, a reasonably thorough forensic analysis has uncovered absolutely no other potential sources of that contention/periodicity.

2) From https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html: "If a CPU is still holding out 10 seconds into the grace period, RCU will invoke resched_cpu() on it regardless of its nohz_full state."

My initial thought was that this behavior smelled like some kind of spinlock contention -- but I simply could not explain the consistency of the 10-second intervals, at which both apparmor_parser and the apache processes (and, for that matter, various other processes running on the host) were able to make progress.

The only thing I can think of that might explain this magical "every 10 seconds" behavior would be some kind of an RCU-related bug (e.g. an infinite loop inside of a read-side critical section).

One other bit of nuance that might be worth sharing (which I elided in the description above in the hopes of increasing the odds of someone reading this far :)) is that the system degradation occurred in two phases:

a) Immediately after apparmor_parser started running, the degradation (in particular, in terms of request-processing throughput) began, but after about ~15 seconds, the request volume briefly appeared to recover.

b) However, about ~10 seconds later, during the same second that the very first "same as current profile, skipping" message was logged, the degradation (and the unmistakeable 10-second periodicity) began in earnest and continued until reboot.

Kernel info:
$ uname -a
Linux my-host 5.0.0-1025-gcp #26~18.04.1-Ubuntu SMP Mon Nov 11 13:09:18 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Package info:
$ dpkg -l | grep -i apparmor
ii apparmor 2.12-4ubuntu5.1 amd64 user-space parser utility for AppArmor
ii apparmor-profiles 2.12-4ubuntu5.1 all experimental profiles for AppArmor security policies
ii apparmor-utils 2.12-4ubuntu5.1 amd64 utilities for controlling AppArmor
ii libapparmor1:amd64 2.12-4ubuntu5.1 amd64 changehat AppArmor library
ii python3-apparmor 2.12-4ubuntu5.1 amd64 AppArmor Python3 utility library
ii python3-libapparmor 2.12-4ubuntu5.1 amd64 AppArmor library Python3 bindings