-icount,sleep=off mode is broken (target slows down or hangs)

Bug #1790460 reported by Artem Pisarenko
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Undecided
Unassigned

Bug Description

QEMU running with options "-icount,sleep=off -rtc clock=vm" doesn't execute emulation at maximum possible speed.
Target virtual clock may run faster or slower than realtime clock by N times, where N value depends on various unrelated conditions (i.e. random from the user point of view). The worst case is when target hangs (hopefully, early in booting stage).
Example scenarios I've described here: http://lists.nongnu.org/archive/html/qemu-discuss/2018-08/msg00102.html

QEMU process just sleeps most of the time (polling, waiting some condition, etc.). I've tried to debug issue and came to 99% conclusion that there are racing somewhere in qemu internals.

The feature is broken since v2.6.0 release.
Bad commit is 281b2201e4e18d5b9a26e1e8d81b62b5581a13be by Pavel Dovgalyuk, 03/10/2016 05:56 PM:

  icount: remove obsolete warp call

  qemu_clock_warp call in qemu_tcg_wait_io_event function is not needed
  anymore, because it is called in every iteration of main_loop_wait.

  Reviewed-by: Paolo Bonzini <email address hidden>

  Signed-off-by: Pavel Dovgalyuk <email address hidden>
  Message-Id: <20160310115603.4812.67559.stgit@PASHA-ISP>
  Signed-off-by: Paolo Bonzini <email address hidden>

I've reverted commit to all major releases and latest git master branch. Issue was fixed for all of them. My adaptation is trivial: just restoring removed function call before "qemu_cond_wait(...)" line.

I'm sure following bugs are just particular cases of the issue: #1774677, #1653063 .

Revision history for this message
Artem Pisarenko (artem.pisarenko) wrote :

This modification also fixes issue:
https://git.greensocs.com/qemu/qbox/commit/a8ed106032e375e715a531d6e93e4d9ec295dbdb
Although I tested it only on v2.12.0 and didn't noticed performance improvement.

Revision history for this message
Artem Pisarenko (artem.pisarenko) wrote :

Long-term testing showed that my trivial adaptation didn't fixed issue and I sticked to modification from QBox. It didn't failed yet.
And yes, issue still relevant to current master branch.

Revision history for this message
Peter Maydell (pmaydell) wrote :

I think we fixed this bug in commit 013aabdc665e4256b38d which would have been in the 3.1.0 release (this is why we closed #1774677, which as you say is the same issue).

Changed in qemu:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers