Increased lock contention in aa_{get,set}_buffer

Bug #2069478 reported by Sebastian Mayr
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-aws (Ubuntu)
New
Undecided
Unassigned

Bug Description

After upgrading our images from 5.15.0-1052-aws to 5.15.0-1053-aws we've noticed increased CPU usage caused by lock contention in the AppArmor code paths. We've since continued upgraded past that, but the problem persisted.

Digging more, we've found that rapid (parallel) invocation of getaddrinfo was the cause of the contention (via openat, stat, etc syscalls) and thereby spinning on that lock, which in turn maxed the CPU of those instances. This is on c6i.24xlarge instances, so 96 CPUs. I'm attaching a perf flamegraph of the exact stacktraces we're seeing.

We did not expect a patch-level kernel upgrade to cause this increased CPU usage, however we managed to work around this issue via caching, so this report is more informational/checking if this was intended. Looking at the upstream kernel it seems like this eventually got fixed with per-CPU buffers.

The only related changes we've found in the diffs between these two versions were the addition of AppArmor checking for mqueues, which I don't believe we're using explicitly anywhere. I'm not sure if that change had other effects though.

This is all on Ubuntu 20.04 with the AWS-specific kernel package, although I'm assuming it applies to the Ubuntu kernel generally. I was unable to do the mapping to the "real" release versions though, so reporting in this package since I got the exact version numbers.

Revision history for this message
Sebastian Mayr (sebmaster) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.