Win10 guest unusable after a few minutes

Bug #1877716 reported by Xavier Ruppen
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

On Arch Linux, the recent qemu package update seems to misbehave on some systems. In my case, my Windows 10 guest runs fine for around 5 minutes and then start to get really sluggish, even unresponsive. It needs to be forced off. I could reproduce this on a minimal VM with no passthrough, although my current testing setup involves an nvme pcie passthrough.

I bisected it to the following commit which rapidly starts to run sluggishly on my setup:
https://github.com/qemu/qemu/commit/73fd282e7b6dd4e4ea1c3bbb3d302c8db51e4ccf

I've ran the previous commit ( https://github.com/qemu/qemu/commit/b321051cf48ccc2d3d832af111d688f2282f089b ) for the entire night without an issue so far.

I believe this might be a duplicate of https://bugs.launchpad.net/qemu/+bug/1873032 , although I'm not sure.

Linux cc 5.6.10-arch1-1 #1 SMP PREEMPT Sat, 02 May 2020 19:11:54 +0000 x86_64 GNU/Linux
AMD Ryzen 7 2700X Eight-Core Processor

Revision history for this message
Xavier Ruppen (zkrx) wrote :
Revision history for this message
Xavier Ruppen (zkrx) wrote :

Note that bisecting is difficult due to the nature of the bug (does not appear before 5 to 10 minutes on my machine).

Revision history for this message
Xavier Ruppen (zkrx) wrote :
Revision history for this message
Xavier Ruppen (zkrx) wrote :

I can also replicate the problem on current master. I can solve it by building master with --disable-linux-io-uring.

I also tried building Linux 5.4.39 where the issue happens too:
Linux cc 5.4.39qemu #1 SMP PREEMPT Sat May 9 12:11:38 CEST 2020 x86_64 GNU/Linux

I attached the logs of that latest run.

Xavier Ruppen (zkrx)
summary: - Win10 guest unsuable after a few minutes
+ Win10 guest unusable after a few minutes
Revision history for this message
Stefan Hajnoczi (stefanha) wrote : Re: [Bug 1877716] [NEW] Win10 guest unsuable after a few minutes

On Sat, May 9, 2020 at 9:16 AM Xavier <email address hidden> wrote:
>
> Public bug reported:
>
> On Arch Linux, the recent qemu package update seems to misbehave on some
> systems. In my case, my Windows 10 guest runs fine for around 5 minutes
> and then start to get really sluggish, even unresponsive. It needs to be
> forced off. I could reproduce this on a minimal VM with no passthrough,
> although my current testing setup involves an nvme pcie passthrough.
>
> I bisected it to the following commit which rapidly starts to run sluggishly on my setup:
> https://github.com/qemu/qemu/commit/73fd282e7b6dd4e4ea1c3bbb3d302c8db51e4ccf

Thanks for bisecting this bug! Arch Linux can work around it in the
short term by building with ./configure --disable-linux-io-uring
and/or removing the liburing build dependency.

I will try to reproduce the issue and send a QEMU patch to fix it.

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

On Mon, May 11, 2020 at 10:12 AM Stefan Hajnoczi <email address hidden> wrote:
> On Sat, May 9, 2020 at 9:16 AM Xavier <email address hidden> wrote:
> >
> > Public bug reported:
> >
> > On Arch Linux, the recent qemu package update seems to misbehave on some
> > systems. In my case, my Windows 10 guest runs fine for around 5 minutes
> > and then start to get really sluggish, even unresponsive. It needs to be
> > forced off. I could reproduce this on a minimal VM with no passthrough,
> > although my current testing setup involves an nvme pcie passthrough.
> >
> > I bisected it to the following commit which rapidly starts to run sluggishly on my setup:
> > https://github.com/qemu/qemu/commit/73fd282e7b6dd4e4ea1c3bbb3d302c8db51e4ccf
>
> Thanks for bisecting this bug! Arch Linux can work around it in the
> short term by building with ./configure --disable-linux-io-uring
> and/or removing the liburing build dependency.

Hmm...a brief look at the Arch Linux package source suggests QEMU is
not being built with io_uring enabled. Anatol, please confirm whether
this is correct.

If io_uring is not enabled then this bug may affect most existing
users on Linux. Initially I thought it was because Arch Linux had
enabled the new io_uring feature but I was probably mistaken.

Revision history for this message
post-factum (post-factum) wrote :

Stefan,

Arch explicitly disabled io_uring for qemu after discovering this bug. That's why you don't see it enabled in the recent version.

5.0.0-6 doesn't have io_uring enabled.
5.0.0-5 does have it, and you can grab it here: [1].

[1] https://archive.archlinux.org/packages/q/qemu/qemu-5.0.0-5-x86_64.pkg.tar.zst

Revision history for this message
Alexander Bus (busimus) wrote :

As of version 5.0.0-6 (released a day ago), qemu on Arch is no longer built with io_uring support because of this bug. The previous version (5.0.0-5) was built with io_uring support and this bug was happening on my machine. Before the fixed version was released I myself added --disable-linux-io-uring to the build file of 5.0.0-5 and that fixed the issue for me. Now I'm running 5.0.0-6 and it's working as good as ever.
You can track the changes of qemu build files here: https://git.archlinux.org/svntogit/packages.git/log/trunk?h=packages/qemu

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

That's good news! Most users do not have io_uring enabled yet.

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

I have been able to reproduce the issue and found that nodes are not being removed from the AioContext->aio_handlers list when aio_set_fd_handler() is called. perf shows that large amounts of CPU time are spent in aio_pending().

Working on getting to the bottom of the issue and fixing it.

Revision history for this message
Anatol Pomozov (anatol) wrote :

Thank you Stefan for looking at this issue.

As Alexander and @postfactum mentioned Arch disabled io_uring feature after this bug has been discovered. Here is an Arch Linux issue that tracks it https://bugs.archlinux.org/task/66578

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :
Revision history for this message
post-factum (post-factum) wrote :

Unofficial x86_64 build of v5.0.0 with those 2 patches for Arch is here: [1].

[1] https://download.opensuse.org/repositories/home:/post-factum/Arch/x86_64/

Revision history for this message
Xavier Ruppen (zkrx) wrote :

Hi Stefan,

I applied your series on top of master with io_uring enabled and I no longer experience the issue. Let me know if you need additional testing.

Thank you for fixing this so promptly.

Revision history for this message
Stefan Hajnoczi (stefanha) wrote : Re: [Bug 1877716] Re: Win10 guest unusable after a few minutes

On Tue, May 12, 2020, 16:25 zkrx <email address hidden> wrote:

> Hi Stefan,
>
> I applied your series on top of master with io_uring enabled and I no
> longer experience the issue. Let me know if you need additional testing.
>
> Thank you for fixing this so promptly.
>

That's great! Thanks for your bug report and time investigating this issue.

Stefan

>

Revision history for this message
Anatol Pomozov (anatol) wrote :

Thank you Stefan for the fixes. Once these patches land the upstream repo I'll pull it into the Arch package and reenable io_uring.

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :
Thomas Huth (th-huth)
Changed in qemu:
status: New → Fix Committed
Thomas Huth (th-huth)
Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.