Activity log for bug #1894942

Date Who What changed Old value New value Message
2020-09-09 05:19:46 bugproxy bug added bug
2020-09-09 05:19:48 bugproxy tags architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004
2020-09-09 05:19:49 bugproxy attachment added Debug only patch that simulates PV guest IO on a non-PV machine https://bugs.launchpad.net/bugs/1894942/+attachment/5408906/+files/0001-DEBUG-add-option-to-fake-IO.patch
2020-09-09 05:19:51 bugproxy ubuntu: assignee Skipper Bug Screeners (skipper-screen-team)
2020-09-09 05:19:55 bugproxy affects ubuntu qemu (Ubuntu)
2020-09-09 05:31:53 Frank Heimes bug task added ubuntu-z-systems
2020-09-09 05:32:21 Frank Heimes ubuntu-z-systems: assignee Skipper Bug Screeners (skipper-screen-team)
2020-09-09 05:32:34 Frank Heimes qemu (Ubuntu): assignee Skipper Bug Screeners (skipper-screen-team) Canonical Server Team (canonical-server)
2020-09-09 05:32:40 Frank Heimes ubuntu-z-systems: importance Undecided High
2020-09-09 05:32:51 Frank Heimes bug added subscriber Christian Ehrhardt 
2020-09-09 06:40:06 Christian Ehrhardt  nominated for series Ubuntu Xenial
2020-09-09 06:40:06 Christian Ehrhardt  bug task added qemu (Ubuntu Xenial)
2020-09-09 06:40:06 Christian Ehrhardt  nominated for series Ubuntu Focal
2020-09-09 06:40:06 Christian Ehrhardt  bug task added qemu (Ubuntu Focal)
2020-09-09 06:40:06 Christian Ehrhardt  nominated for series Ubuntu Bionic
2020-09-09 06:40:06 Christian Ehrhardt  bug task added qemu (Ubuntu Bionic)
2020-09-09 06:40:14 Christian Ehrhardt  qemu (Ubuntu Xenial): status New Incomplete
2020-09-09 06:40:16 Christian Ehrhardt  qemu (Ubuntu Bionic): status New Incomplete
2020-09-09 06:40:19 Christian Ehrhardt  qemu (Ubuntu Focal): status New Triaged
2020-09-09 06:40:23 Christian Ehrhardt  qemu (Ubuntu Focal): importance Undecided Medium
2020-09-09 06:40:26 Christian Ehrhardt  qemu (Ubuntu): importance Undecided High
2020-09-09 06:40:29 Christian Ehrhardt  qemu (Ubuntu): status New In Progress
2020-09-09 06:49:54 Frank Heimes ubuntu-z-systems: status New Triaged
2020-09-09 07:01:48 Launchpad Janitor merge proposal linked https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/390440
2020-09-09 14:10:20 bugproxy attachment added The domainxml of the gurst I used to debug https://bugs.launchpad.net/bugs/1894942/+attachment/5409074/+files/f31-qnet.xml
2020-09-14 06:28:31 Launchpad Janitor merge proposal linked https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/390655
2020-09-14 06:30:16 Launchpad Janitor merge proposal unlinked https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/390655
2020-09-21 12:41:33 Launchpad Janitor qemu (Ubuntu): status In Progress Fix Released
2020-09-22 07:25:48 Frank Heimes nominated for series Ubuntu Groovy
2020-09-22 07:25:48 Frank Heimes bug task added qemu (Ubuntu Groovy)
2020-10-13 05:39:48 Christian Ehrhardt  qemu (Ubuntu Bionic): status Incomplete Triaged
2020-10-13 05:39:50 Christian Ehrhardt  qemu (Ubuntu Xenial): status Incomplete Triaged
2020-10-13 05:39:56 Christian Ehrhardt  qemu (Ubuntu Xenial): importance Undecided Low
2020-10-13 05:39:58 Christian Ehrhardt  qemu (Ubuntu Bionic): importance Undecided Medium
2020-10-13 06:41:12 Christian Ehrhardt  description Problem Description: When irqfds are not used setting of the adapter interruption host-->guest notifier bit is accomplished by the QEMU function virtio_set_ind_atomic(). The atomic_cmpxchg() loop in virtio_set_ind_atomic() is broken because we occasionally end up with old and _old having different values (a legit compiler can generate code that accessed *ind_addr again to pick up a value for _old instead of using the value of old that was already fetched according to the rules of the abstract machine). This means the underlying CS instruction may use a different old (_old) than the one we intended to use if atomic_cmpxchg() performed the xchg part. The direct consequence of the problem is that host --> guest notifications can get lost. The indirect consequence is that queues may get stuck and the devices may cease operate normally. We stumbled on debugging a choked virtio-net interface (one that used the qemu driver and not vhost). But it can affect other virtio-ccw devices as well. If irqfds are used for host->guest notifications, then we are safe because notifier bit manipulation is done in the kernel (and it's done correctly). The problem described above is fixed upstream by commit. 1a8242f7c3 ("virtio-ccw: fix virtio_set_ind_atomic") All upstream versions since v2.0.0 are (potentially) affected. The same mistake was made in QEMU in another place, and is fixed by: 45175361f1 ("s390x/pci: fix set_ind_atomic") We can file a separate BZ for it if necessary. [Impact] * Host -> Guest notifications can be lost and kill I/O due to that, see below at the original bug report for more details. * Backport the fix that ensures that the generated code has to re-load variables properly avoiding the issue. [Test Case] * Set up iperf in the host and run the server "iperf -s" * get a guest using driver=qemu like: <interface type='network'> <source network='default'/> <model type='virtio'/> <driver name='qemu'/> <interface/> * In the guest run a loop of iperf runs connecting to the server on the host. #!/bin/bash for i in $(seq 1 1000); do echo Try $i iperf -c 192.168.122.1 || break done * Depending on the HW model, the machine saturation and such it seems the above test either is rather reproducible or not-at-all. That is bad, but we haven't found a much better repro, gladly IBM who reported this issue (and created the fix) can recreate this on their end and are willing to do so again for the SRU verification. [Regression Potential] * The changed code path is s390x only and there on the virtio-ccw handling. Therefore regressions - if any - would be isolated to s390x only and there manifest on virtio-ccw based I/O. [Other Info] * n/a ---- Problem Description: When irqfds are not used setting of the adapter interruption host-->guest notifier bit is accomplished by the QEMU function virtio_set_ind_atomic(). The atomic_cmpxchg() loop in virtio_set_ind_atomic() is broken because we occasionally end up with old and _old having different values (a legit compiler can generate code that accessed *ind_addr again to pick up a value for _old instead of using the value of old that was already fetched according to the rules of the abstract machine). This means the underlying CS instruction may use a different old (_old) than the one we intended to use if atomic_cmpxchg() performed the xchg part. The direct consequence of the problem is that host --> guest notifications can get lost. The indirect consequence is that queues may get stuck and the devices may cease operate normally. We stumbled on debugging a choked virtio-net interface (one that used the qemu driver and not vhost). But it can affect other virtio-ccw devices as well. If irqfds are used for host->guest notifications, then we are safe because notifier bit manipulation is done in the kernel (and it's done correctly). The problem described above is fixed upstream by commit. 1a8242f7c3 ("virtio-ccw: fix virtio_set_ind_atomic") All upstream versions since v2.0.0 are (potentially) affected. The same mistake was made in QEMU in another place, and is fixed by: 45175361f1 ("s390x/pci: fix set_ind_atomic") We can file a separate BZ for it if necessary.
2020-10-13 07:11:35 Launchpad Janitor merge proposal linked https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/392163
2020-10-13 07:12:09 Launchpad Janitor merge proposal linked https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/392164
2020-10-13 07:12:33 Launchpad Janitor merge proposal linked https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/392165
2020-10-19 09:55:20 Christian Ehrhardt  qemu (Ubuntu Xenial): status Triaged In Progress
2020-10-19 09:55:22 Christian Ehrhardt  qemu (Ubuntu Bionic): status Triaged In Progress
2020-10-19 09:55:24 Christian Ehrhardt  qemu (Ubuntu Focal): status Triaged In Progress
2020-10-19 10:03:00 Frank Heimes ubuntu-z-systems: status Triaged In Progress
2020-10-27 21:10:00 Brian Murray qemu (Ubuntu Focal): status In Progress Fix Committed
2020-10-27 21:10:02 Brian Murray bug added subscriber Ubuntu Stable Release Updates Team
2020-10-27 21:10:04 Brian Murray bug added subscriber SRU Verification
2020-10-27 21:10:10 Brian Murray tags architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 verification-needed verification-needed-focal
2020-10-27 21:11:24 Brian Murray qemu (Ubuntu Bionic): status In Progress Fix Committed
2020-10-27 21:11:32 Brian Murray tags architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 verification-needed verification-needed-focal architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 verification-needed verification-needed-bionic verification-needed-focal
2020-10-27 21:12:51 Brian Murray qemu (Ubuntu Xenial): status In Progress Fix Committed
2020-10-27 21:13:00 Brian Murray tags architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 verification-needed verification-needed-bionic verification-needed-focal architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 verification-needed verification-needed-bionic verification-needed-focal verification-needed-xenial
2020-10-28 06:33:58 Frank Heimes ubuntu-z-systems: status In Progress Fix Committed
2020-11-03 11:26:54 Frank Heimes tags architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 verification-needed verification-needed-bionic verification-needed-focal verification-needed-xenial architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 verification-done-focal verification-needed verification-needed-bionic verification-needed-xenial
2020-11-04 00:03:25 Chris Halse Rogers removed subscriber Ubuntu Stable Release Updates Team
2020-11-04 00:03:50 Launchpad Janitor qemu (Ubuntu Focal): status Fix Committed Fix Released
2020-11-04 13:37:58 Frank Heimes tags architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 verification-done-focal verification-needed verification-needed-bionic verification-needed-xenial architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 verification-done-bionic verification-done-focal verification-needed verification-needed-xenial
2020-11-04 18:42:51 Andrew Cloke tags architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 verification-done-bionic verification-done-focal verification-needed verification-needed-xenial architecture-s39064 bugnameltc-184605 severity-high targetmilestone-inin2004 verification-done verification-done-bionic verification-done-focal verification-done-xenial
2020-11-05 10:09:19 Launchpad Janitor qemu (Ubuntu Bionic): status Fix Committed Fix Released
2020-11-05 10:10:13 Launchpad Janitor qemu (Ubuntu Xenial): status Fix Committed Fix Released
2020-11-05 10:52:44 Frank Heimes ubuntu-z-systems: status Fix Committed Fix Released