Increased lock contention in aa_{get,set}_buffer
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-aws (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
After upgrading our images from 5.15.0-1052-aws to 5.15.0-1053-aws we've noticed increased CPU usage caused by lock contention in the AppArmor code paths. We've since continued upgraded past that, but the problem persisted.
Digging more, we've found that rapid (parallel) invocation of getaddrinfo was the cause of the contention (via openat, stat, etc syscalls) and thereby spinning on that lock, which in turn maxed the CPU of those instances. This is on c6i.24xlarge instances, so 96 CPUs. I'm attaching a perf flamegraph of the exact stacktraces we're seeing.
We did not expect a patch-level kernel upgrade to cause this increased CPU usage, however we managed to work around this issue via caching, so this report is more informational/
The only related changes we've found in the diffs between these two versions were the addition of AppArmor checking for mqueues, which I don't believe we're using explicitly anywhere. I'm not sure if that change had other effects though.
This is all on Ubuntu 20.04 with the AWS-specific kernel package, although I'm assuming it applies to the Ubuntu kernel generally. I was unable to do the mapping to the "real" release versions though, so reporting in this package since I got the exact version numbers.