Sever performance degradation after updating to 5.0.0-15 due to mds mitigation

Bug #1829255 reported by munbi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

After updating to the latest kernel 5.0.0-15 I'm experiencing a severe performance degradation on my machine with one of my applications (a gbc emulator developed by me).

I saw about a 27% performance loss, going from 55fps to 40.

Booting an old 5.0.0-13 kernel or disabling mds mitigation at boot with 'mds=off' solves the problem.

I know there isn't much to do about (apart from disabling the mitigation) but I'm reporting this
because I spent half a day bisecting with git cause I thought the problem was with my latest commits.

Security updates are obviously important and a priority, but I think that at least with this kind
of mitigations that cause big performance degradation, an message or a simple alert to the user
during the update should be shown. Something like: "Plese be aware that this kernel update can
result in performance loss. See this page for more info." and then link the specific SecurityTeam
page (which in this case is https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/MDS )

ProblemType: Bug
DistroRelease: Ubuntu 19.04
Package: linux-image-5.0.0-15-generic 5.0.0-15.16
ProcVersionSignature: Ubuntu 5.0.0-15.16-generic 5.0.6
Uname: Linux 5.0.0-15-generic x86_64
ApportVersion: 2.20.10-0ubuntu27
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: gabriele 2045 F.... pulseaudio
 /dev/snd/controlC1: gabriele 2045 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
Date: Wed May 15 16:36:36 2019
HibernationDevice: RESUME=UUID=e43b75a3-a278-4d4e-83e8-54a9e0c95276
InstallationDate: Installed on 2018-05-31 (349 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
MachineType: Dell Inc. Latitude E6540
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.0.0-15-generic root=UUID=8a8a023d-2a12-4e14-b187-7beaf4d89763 ro quiet splash vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-5.0.0-15-generic N/A
 linux-backports-modules-5.0.0-15-generic N/A
 linux-firmware 1.178
SourcePackage: linux
UpgradeStatus: Upgraded to disco on 2019-04-23 (21 days ago)
dmi.bios.date: 10/09/2018
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A26
dmi.board.name: 0725FP
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 9
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA26:bd10/09/2018:svnDellInc.:pnLatitudeE6540:pvr00:rvnDellInc.:rn0725FP:rvrA00:cvnDellInc.:ct9:cvr:
dmi.product.name: Latitude E6540
dmi.product.sku: 05BE
dmi.product.version: 00
dmi.sys.vendor: Dell Inc.

Revision history for this message
munbi (gabriele) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Tyler Hicks (tyhicks) wrote :

Hello and thanks for the bug report. We hate to hear that you're seeing such a performance hit on your application when the MDS mitigations are enabled. Unfortunately, we are simply following Intel's recommendations[1] for mitigating MDS attacks. The kernel changes are relatively simple and the overhead comes from the kernel calling into the CPU microcode to flush the internal CPU buffers as well as the inefficiencies involved with flushing such buffers. Since the recommendation includes flushing the buffers before exiting from the kernel to userspace, workloads which are syscall heavy are likely to see the largest performance hit.

I like your idea of alerting the user of such a potential performance hit, on the surface. However, the vast majority of users won't know how to handle that information and, even worse, it could scare users out of taking the update even though the mitigations may not significantly impact their typical usage. Very few users will have the need to bisect kernel changes to identify a performance decrease that they've measured.

Another problem is that there's not a consistent way to alert users with pertinent information. The updates are provided to desktop systems, to headless servers, packaged in pre-built cloud images, delivered automatically to IoT devices that don't support typical user logins, etc. Even across something like desktop systems, users apply the updates in a variety of ways (manually with apt, automatically with unattended-upgrades, with a GUI such as update-manager, etc.). This is why we provide out-of-band information like this in Ubuntu Security Announcements[2] and, in some cases, more verbose KnowledgeBase articles.

What I can promise is that we'll continue to work with Intel and the upstream kernel community in the case that future improvements are identified for the existing MDS mitigations.

Thanks again for opening this bug report and please don't take the "Won't Fix" bug status as your voice being ignored. At the very least, when writing up the next KnowledgeBase article, I now know that any time we spend describing performance impacts will be much appreciated by someone out there. :)

[1] https://software.intel.com/security-software-guidance/insights/deep-dive-intel-analysis-microarchitectural-data-sampling
[2] https://usn.ubuntu.com

Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Tyler Hicks (tyhicks) wrote :

I'll point out that munbi is seeing this hit using the following CPU sig and microcode revision:

  sig=0x306c3, pf=0x10, revision=0x27

Revision history for this message
munbi (gabriele) wrote :

Tyler, thanks a lot for taking the time to examine the issue and writing such a complete response.
I agree with all your points.

I'll take this as an opportunity to try to learn more about profiling and user/kernel space switching and optimization, because this made me curious:

> Since the recommendation includes flushing the buffers before exiting from the kernel to userspace, > workloads which are syscall heavy are likely to see the largest performance hit.

My emulator is basically a simple asm parser with SDL2 as graphic backend and I'm not doing (that I'm aware of..) any ioctl or syscall. So I need to understand what is happening and what I'm doing work.

Back to study now :-) and thanks again

Revision history for this message
munbi (gabriele) wrote :

Just I quick follow up. The impact was so severe in my code because I was using SDL_PollEvent() really often to read the keys status... which was basically blocking my code.

That was one of the first things I wrote and forgot about it.

So that was really a context switching overhead problem. Thanks again.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.