AppArmor

Online profile replacement appears to cause severe CPU contention (likely RCU-related?)

Bug #1856256 reported by Bill F on 2019-12-12

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	AppArmor	New	Undecided	Unassigned

Bug Description

First of all, I would like to loudly disclaim that this report is very qualitative in nature, and I cannot reliably reproduce the problem. However, my hope is that a clear description of the observed behavior will be sufficient to indicate potential underlying causes to someone with the right knowledge.

That said, here is the behavior:

While "apparmor_parser -r" is running, on a large (> 50 cores) host with ~50 apache server processes (which use AA subprofiles, primarily for file system isolation), the volume of requests handled drops precipitously (from ~thousands/minute to <100/minute), while both apparmor_parser and the apache processes appear to be consuming high amounts of CPU continuously (apparmor_parser, specifically, was consuming at least a full core in all samples observed).

Furthermore, and this is the interesting part, *every 10 seconds*, two things occur:

1) AA logs one message of the form below, where the sequence of $profile_name values (again, one every 10 seconds) represents a lexicographically-ordered iteration over all of the relevant AA profiles on this host.

$timestamp $host kernel: [...] audit: type=1400 audit(...): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="$profile_name" pid=<...> comm="apparmor_parser"

2) A burst of requests are successfully processed.

Unfortunately, someone rebooted the host before I was able to capture any additional information (or attempt any finer-grained remediation measures), and the host came back up in a nominal state.

Again, I realize all of this is very vague/high-level, but two things stand out to me as particularly significant here:

1) The behavior described above began *immediately* after the invocation of "appamor_parser -r", and that "make some progress every 10 seconds" periodicity strongly implies some sort of resource contention. And, for whatever it's worth from a random person on the internet, a reasonably thorough forensic analysis has uncovered absolutely no other potential sources of that contention/periodicity.

2) From https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html: "If a CPU is still holding out 10 seconds into the grace period, RCU will invoke resched_cpu() on it regardless of its nohz_full state."

My initial thought was that this behavior smelled like some kind of spinlock contention -- but I simply could not explain the consistency of the 10-second intervals, at which both apparmor_parser and the apache processes (and, for that matter, various other processes running on the host) were able to make progress.

The only thing I can think of that might explain this magical "every 10 seconds" behavior would be some kind of an RCU-related bug (e.g. an infinite loop inside of a read-side critical section).

One other bit of nuance that might be worth sharing (which I elided in the description above in the hopes of increasing the odds of someone reading this far :)) is that the system degradation occurred in two phases:

a) Immediately after apparmor_parser started running, the degradation (in particular, in terms of request-processing throughput) began, but after about ~15 seconds, the request volume briefly appeared to recover.

b) However, about ~10 seconds later, during the same second that the very first "same as current profile, skipping" message was logged, the degradation (and the unmistakeable 10-second periodicity) began in earnest and continued until reboot.

Kernel info:
$ uname -a
Linux my-host 5.0.0-1025-gcp #26~18.04.1-Ubuntu SMP Mon Nov 11 13:09:18 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Package info:
$ dpkg -l | grep -i apparmor
ii apparmor 2.12-4ubuntu5.1 amd64 user-space parser utility for AppArmor
ii apparmor-profiles 2.12-4ubuntu5.1 all experimental profiles for AppArmor security policies
ii apparmor-utils 2.12-4ubuntu5.1 amd64 utilities for controlling AppArmor
ii libapparmor1:amd64 2.12-4ubuntu5.1 amd64 changehat AppArmor library
ii python3-apparmor 2.12-4ubuntu5.1 amd64 AppArmor Python3 utility library
ii python3-libapparmor 2.12-4ubuntu5.1 amd64 AppArmor library Python3 bindings

See original description

Bill F (bill-f-666) on 2019-12-12

summary:

Online profile replacement appears to cause severe CPU contention
+ (likely RCU-related?)

Bill F (bill-f-666) on 2019-12-13

description:

updated

Revision history for this message

John Johansen (jjohansen) wrote on 2019-12-13:

How many profiles and subprofiles are there?

Can you provide the profile set used in this bug report to test against?

How many processors does the system have?

How much memory does the system have? And how loaded is the system?

So first I will provide a generic overview of profile replacement and then I will try to get to your individual points.

AppArmor does make use of rcu to try to avoid take locks as much as possible during mediation. This is a deliberate design decision as profile replacement should take place far less than the actual mediation checks. If there are a lot of objects flooding the rcu queue, rcu is known to regress and cause its own slow downs. Exactly how will depend on the kernel.

Profile replacement is also can be a costly operation in that locking and cache overheads come into play.

Profile replacement takes place in phases. Most of them can overlap with concurrent replacements.

1. Compile - there is check done to see if there is a current previously compiled version of the profile to load. If there is the apparmor_parser will load that into the kernel. If not a compile will be done which can be cpu intensive (use 100% of the cpu), and can use multiple cpus (depending on how --jobs is set, the default is the number of cpus).

Currently the cache check and compile is done for each profile file. So for multiple profiles this step can overlap the following steps.

2. Profile load. The profile is loaded into the kernel and the kernel unpacks and verifies the profile being loaded.

3. Duplicate elimination check. The loaded profile is checked against the current profiles if it is found to be an exact duplicate. A message is logged that the profile is a duplicate and the rest of the steps are skipped.

4. Profile replacement. This involves taking locks, that will block other profile replacements (steps 1-3 can happen currently) and will block certain tasks like the creation of new labels. So depending on the operation done this could cause a slow down due to lock contention.

The live profile is replaced with the new profile and the label tree is walked to replace all labels with references to the old profile to the new profile.

The locks are released and profile replacement is done from a userspace perspective.

5. Task profile replacement. Individual tasks detect that their confinement label is stale and replace it with the new label generated in step 4. Putting their references to the old label.

6. Once all references are put freeing of labels use rcu to put the label on rcu call back.

Profile replacement will also cause cache hits to miss forcing revalidations which means fast past mediation will be lost until the cache is replaced. Hence mediation in general will be slowed down for a time due to replacement. The exact amount of time is dependent on the application access patterns and kernel version.