kvm_unit_tests: emulator test fails on 4.4 / 4.15 kernel, timeout

Bug #1932966 reported by Guilherme G. Piccoli
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Medium
Po-Hsu Lin
linux (Ubuntu)
Medium
Unassigned
Xenial
Undecided
Unassigned
Bionic
Medium
Po-Hsu Lin

Bug Description

[Impact]
Our Bionic 4.15 kernel lacks of movups/movupd emulation support.

With the following commit added into the emulator test in
ubuntu_kvm_unit_tests:
  commit 8726f9771911d6749dbd36ab2fc70f0f25e2b1a9
  Author: Jacob Xu <email address hidden>
  Date: Wed Apr 21 16:12:57 2021 -0700

      x86: add movups/movupd sse testcases to emulator.c

      Here we add movups/movupd tests corresponding to functionality
      introduced in commit 29916968c486 ("kvm: Add emulation for movups/movupd").

      Signed-off-by: Jacob Xu <email address hidden>
      Message-Id: <email address hidden>
      Signed-off-by: Paolo Bonzini <email address hidden>

It will cause the emulator test in ubuntu_kvm_unit_tests fail with timeout:
  ...
  PASS: movdqu (read)
  PASS: movdqu (write)
  PASS: movaps (read)
  PASS: movaps (write)
  PASS: movapd (read)
  PASS: movapd (write)
  KVM internal error. Suberror: 1
  emulation failure
  RAX=000000000000000a RBX=ffffffffffffe000 RCX=00000000000003fd RDX=00000000000003f8
  RSI=0000000000419991 RDI=0000000000419991 RBP=000000000051b490 RSP=000000000051b470
  R8 =000000000000000a R9 =00000000000003f8 R10=000000000000000d R11=0000000000000000
  R12=ffffffffffffe000 R13=1111111111111111 R14=ffffffffffffd000 R15=3333333333333333
  RIP=0000000000400a1f RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
  CS =0008 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
  SS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
  DS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
  FS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
  GS =0010 000000000051a510 ffffffff 00c09300 DPL=0 DS [-WA]
  LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
  TR =0080 000000000041207a 0000ffff 00008b00 DPL=0 TSS64-busy
  GDT= 000000000041100a 0000106f
  IDT= 0000000000410000 00000fff
  CR0=80010011 CR2=0000000000000000 CR3=0000000001007000 CR4=00000220
  DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
  DR6=00000000ffff0ff0 DR7=0000000000000400
  EFER=0000000000000500
  Code=00 c7 45 e8 03 00 00 00 c7 45 ec 04 00 00 00 66 0f 6f 45 e0 <0f> 11 03 48 89 de 48 8d 7d e0 e8 e5 f9 ff ff 0f b6 f8 be a1 8f 41 00 b8 00 00 00 00 e8 07
  qemu-system-x86_64: terminating on signal 15 from pid 15758 (timeout)
  FAIL emulator (timeout; duration=90s)

[Fix]
* 29916968c48691 kvm: Add emulation for movups/movupd

This patch can be cherry-picked into Bionic.

It can fix our test failure plus, as mentioned in the commit message,
emulation failures with openbsd as guest and with Windows 10 with
intel HD graphics pass through.

[Test]
Test kernel can be found here:
https://people.canonical.com/~phlin/kernel/lp-1932966-kvm-emulator/

Run the emulator test from ubuntu_kvm_unit_tests, with this patch
applied it will pass without any issue:
  ...
  PASS: movdqu (read)
  PASS: movdqu (write)
  PASS: movaps (read)
  PASS: movaps (write)
  PASS: movapd (read)
  PASS: movapd (write)
  PASS: movups (read)
  PASS: movups (write)
  PASS: movupd (read)
  PASS: movupd (write)
  PASS: movups unaligned
  PASS: movupd unaligned
  PASS: unaligned movaps exception
  PASS: movups unaligned crosspage
  PASS: movups crosspage exception
  PASS: movq (mmx, read)
  PASS: movq (mmx, write)
  PASS: movb $imm, 0(%rip)
  PASS: shld (cl)
  PASS: shrd (cl)
  PASS: mov null, %ss
  PASS: mov null, %ss (with ss.rpl != cpl)
  PASS: Test ret/iret with a nullified segment
  PASS: ltr
  PASS: cmovnel
  SKIP: skipping register-only tests, use kvm.force_emulation_prefix=1 to enable
  PASS: push16
  PASS: cross-page mmio read
  PASS: cross-page mmio write
  PASS: string_io_mmio
  PASS: jump to non-canonical address
  SKIP: illegal movbe
  SUMMARY: 135 tests, 2 skipped
  PASS emulator (135 tests, 2 skipped)

[Where problems could occur]
I didn't see any other patch that claims to be a fix of this one in
the upstream tree. The problem I think of for the moment is that we
might see other failures when using this in the future.

[Original Bug Report]
Found this on B/KVM, current cycle (sru-20210531):

18:27:11 DEBUG| [stdout] PASS: movapd (write)^M
18:27:11 DEBUG| [stderr] KVM internal error. Suberror: 1
18:27:11 DEBUG| [stderr] emulation failure
18:27:11 DEBUG| [stderr] RAX=000000000000000a RBX=ffffffffffffe000 RCX=00000000000003fd RDX=00000000000003f8
18:27:11 DEBUG| [stderr] RSI=0000000000419991 RDI=0000000000419991 RBP=000000000051b440 RSP=000000000051b420
18:27:11 DEBUG| [stderr] R8 =000000000000000a R9 =00000000000003f8 R10=000000000000000d R11=0000000000000000
18:27:11 DEBUG| [stderr] R12=ffffffffffffe000 R13=1111111111111111 R14=ffffffffffffd000 R15=3333333333333333
18:27:11 DEBUG| [stderr] RIP=0000000000400a0c RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
18:27:11 DEBUG| [stderr] ES =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
18:27:11 DEBUG| [stderr] CS =0008 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
18:27:11 DEBUG| [stderr] SS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
18:27:11 DEBUG| [stderr] DS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
18:27:11 DEBUG| [stderr] FS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
18:27:11 DEBUG| [stderr] GS =0010 000000000051a4d0 ffffffff 00c09300 DPL=0 DS [-WA]
18:27:11 DEBUG| [stderr] LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
18:27:11 DEBUG| [stderr] TR =0080 000000000041207a 0000ffff 00008b00 DPL=0 TSS64-busy
18:27:11 DEBUG| [stderr] GDT= 000000000041100a 0000106f
18:27:11 DEBUG| [stderr] IDT= 0000000000410000 00000fff
18:27:11 DEBUG| [stderr] CR0=80010011 CR2=0000000000000000 CR3=0000000001007000 CR4=00000220
18:27:11 DEBUG| [stderr] DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
18:27:11 DEBUG| [stderr] DR6=00000000ffff0ff0 DR7=0000000000000400
18:27:11 DEBUG| [stderr] EFER=0000000000000500
18:27:11 DEBUG| [stderr] Code=00 c7 45 e8 03 00 00 00 c7 45 ec 04 00 00 00 66 0f 6f 45 e0 <0f> 11 03 48 89 de 48 8d 7d e0 e8 f8 f9 ff ff 0f b6 f8 be a1 8f 41 00 b8 00 00 00 00 e8 05
18:28:40 DEBUG| [stderr] qemu-system-x86_64: terminating on signal 15 from pid 13634 (timeout)
18:28:40 DEBUG| [stdout] FAIL emulator (timeout; duration=90s)
[...]
TestError: Test failed for emulator
18:28:40 ERROR| child process failed
18:28:40 DEBUG| Traceback (most recent call last):
18:28:40 DEBUG| File "/home/ubuntu/autotest/client/parallel.py", line 25, in fork_start
18:28:40 DEBUG| l()
18:28:40 DEBUG| File "/home/ubuntu/autotest/client/job.py", line 505, in <lambda>
18:28:40 DEBUG| l = lambda: test.runtest(self, url, tag, args, dargs)
18:28:40 DEBUG| File "/home/ubuntu/autotest/client/test.py", line 125, in runtest
18:28:40 DEBUG| job.sysinfo.log_after_each_iteration)
18:28:40 DEBUG| File "/home/ubuntu/autotest/client/shared/test.py", line 913, in runtest
18:28:40 DEBUG| mytest._exec(args, dargs)
18:28:40 DEBUG| File "/home/ubuntu/autotest/client/shared/test.py", line 411, in _exec
18:28:40 DEBUG| _call_test_function(self.execute, *p_args, **p_dargs)
18:28:40 DEBUG| File "/home/ubuntu/autotest/client/shared/test.py", line 823, in _call_test_function
18:28:40 DEBUG| return func(*args, **dargs)
18:28:40 DEBUG| File "/home/ubuntu/autotest/client/shared/test.py", line 291, in execute
18:28:40 DEBUG| postprocess_profiled_run, args, dargs)
18:28:40 DEBUG| File "/home/ubuntu/autotest/client/shared/test.py", line 212, in _call_run_once
18:28:40 DEBUG| self.run_once(*args, **dargs)
18:28:40 DEBUG| File "/home/ubuntu/autotest/client/tests/ubuntu_kvm_unit_tests/ubuntu_kvm_unit_tests.py", line 82, in run_once
18:28:40 DEBUG| raise error.TestError("Test failed for {}".format(test_name))
18:28:40 DEBUG| TestError: Test failed for emulator
18:28:41 INFO | ERROR ubuntu_kvm_unit_tests.emulator ubuntu_kvm_unit_tests.emulator timestamp=1624040921 localtime=Jun 18 18:28:41 Test failed for emulator
18:28:41 INFO | END ERROR ubuntu_kvm_unit_tests.emulator ubuntu_kvm_unit_tests.emulator timestamp=1624040921 localtime=Jun 18 18:28:41

Changed in linux-kvm (Ubuntu):
status: New → Confirmed
Changed in linux-kvm (Ubuntu Bionic):
status: New → Confirmed
Changed in linux-kvm (Ubuntu):
importance: Undecided → Medium
Changed in linux-kvm (Ubuntu Bionic):
importance: Undecided → Medium
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux-kvm (Ubuntu):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

While debugging this please use hirsute-WIP branch in our own kvm-unit-tests (don't use disco branch)
Thanks!

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This test can pass with the disco branch, so out of curiosity I ran a bisect against kvm-unit-test and this test case seem to be the cause:

8726f9771911d6749dbd36ab2fc70f0f25e2b1a9 is the first bad commit
commit 8726f9771911d6749dbd36ab2fc70f0f25e2b1a9
Author: Jacob Xu <email address hidden>
Date: Wed Apr 21 16:12:57 2021 -0700

    x86: add movups/movupd sse testcases to emulator.c

    Here we add movups/movupd tests corresponding to functionality
    introduced in commit 29916968c486 ("kvm: Add emulation for movups/movupd").

    Signed-off-by: Jacob Xu <email address hidden>
    Message-Id: <email address hidden>
    Signed-off-by: Paolo Bonzini <email address hidden>

$ git bisect log
git bisect start
# bad: [d4123fddf8a39b504fecd89ccb3dde61e338b4ee] Merge branch 's390x-pull-2021-22-06' into 'master'
git bisect bad d4123fddf8a39b504fecd89ccb3dde61e338b4ee
# good: [764aa0b88d9556520457f13b38d5cea21600545b] arm: powerpc: comment halt(code)
git bisect good 764aa0b88d9556520457f13b38d5cea21600545b
# good: [f3154609b29ad92746c77d46bca03e3b79431437] x86: use a non-negative number in shift
git bisect good f3154609b29ad92746c77d46bca03e3b79431437
# good: [a7eb7780d525b69a8fcfc8a3cfba85570a321c33] lib/list.h: add list_add_tail
git bisect good a7eb7780d525b69a8fcfc8a3cfba85570a321c33
# good: [956e3107eb042b3b9f9806c4850eebe04da0bc43] update git tree location in MAINTAINERS to point at gitlab
git bisect good 956e3107eb042b3b9f9806c4850eebe04da0bc43
# bad: [83d815a2bf92890d6fddcb5b9503006550285d2b] s390x: Add more Ultravisor command structure definitions
git bisect bad 83d815a2bf92890d6fddcb5b9503006550285d2b
# bad: [7fbcef02aaef9797724b818898bca662293cfff8] s390x: css: Store CSS Characteristics
git bisect bad 7fbcef02aaef9797724b818898bca662293cfff8
# good: [142ff6358dacd32a170602de608f2d920cc924d2] x86: msr: Verify 64-bit only MSRs fault on 32-bit hosts
git bisect good 142ff6358dacd32a170602de608f2d920cc924d2
# bad: [0b6f6cedcd6ef4a881fb8806cefde35b88a0363a] nSVM: Test VMLOAD/VMSAVE intercepts
git bisect bad 0b6f6cedcd6ef4a881fb8806cefde35b88a0363a
# bad: [88f0bb17adea42f5d86e00c243c70104c3598620] x86: msr: Test that always-canonical MSRs #GP on non-canonical value
git bisect bad 88f0bb17adea42f5d86e00c243c70104c3598620
# bad: [e5e76263b544c4f1032d0201611ffb7b9b7f408c] x86: add additional test cases for sse exceptions to emulator.c
git bisect bad e5e76263b544c4f1032d0201611ffb7b9b7f408c
# bad: [8726f9771911d6749dbd36ab2fc70f0f25e2b1a9] x86: add movups/movupd sse testcases to emulator.c
git bisect bad 8726f9771911d6749dbd36ab2fc70f0f25e2b1a9
# first bad commit: [8726f9771911d6749dbd36ab2fc70f0f25e2b1a9] x86: add movups/movupd sse testcases to emulator.c

Which leads to this commit, I think it's the fix:
https://github.com/torvalds/linux/commit/29916968c48691c94be466a0b47cc9adcea9cb8d

Sean Feole (sfeole)
tags: added: hinted
Changed in ubuntu-kernel-tests:
assignee: Guilherme G. Piccoli (gpiccoli) → nobody
Changed in linux-kvm (Ubuntu Bionic):
assignee: Guilherme G. Piccoli (gpiccoli) → nobody
Changed in linux-kvm (Ubuntu):
assignee: Guilherme G. Piccoli (gpiccoli) → nobody
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Found on node riccioli with 4.15.0-153.160

tags: added: sru-20210719
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Also on: xenial/linux-oracle/4.15.0-1080.88~16.04.1

Po-Hsu Lin (cypressyew)
affects: linux-kvm (Ubuntu) → linux (Ubuntu)
Changed in ubuntu-kernel-tests:
assignee: nobody → Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
status: Confirmed → In Progress
Changed in linux (Ubuntu Bionic):
status: Confirmed → In Progress
Po-Hsu Lin (cypressyew)
description: updated
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
description: updated
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I found this is failing on X-4.4 as well. Since it's an ESM series now I don't think we will fix it.

Po-Hsu Lin (cypressyew)
summary: - kvm_unit_tests: emulator test fails on 4.15 kernel, timeout
+ kvm_unit_tests: emulator test fails on 4.4 / 4.15 kernel, timeout
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Xenial):
status: New → Confirmed
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Found on xenial/linux-aws/4.4.0-1132.146

tags: added: 4.4 aws
tags: added: sru-20210816
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Test passed with 4.15 kernel, verified on X-Azure 4.15.0-1124.137~16.04.1 and B-4.15 4.15.0-157.164

tags: added: verificationeon-d-bionic
removed: verification-needed-bionic
tags: added: verification-done-bionic
removed: verificationeon-d-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (12.2 KiB)

This bug was fixed in the package linux - 4.15.0-159.167

---------------
linux (4.15.0-159.167) bionic; urgency=medium

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2021.09.06)

  * dell300x: rsi wifi and bluetooth crash after suspend and resume
    (LP: #1940488)
    - Revert "rsi: Use resume_noirq for SDIO"

  * LRMv5: switch primary version handling to kernel-versions data set
    (LP: #1928921)
    - [Packaging] switch to kernel-versions

  * kvm_unit_tests: emulator test fails on 4.4 / 4.15 kernel, timeout
    (LP: #1932966)
    - kvm: Add emulation for movups/movupd

  * memory leaking when removing a profile (LP: #1939915)
    - security/apparmor/label.c: Clean code by removing redundant instructions
    - apparmor: Fix memory leak of profile proxy

  * ubunut_kernel_selftests: memory-hotplug: avoid spamming logs with
    dump_page() (LP: #1941829)
    - selftests: memory-hotplug: avoid spamming logs with dump_page(), ratio limit
      hot-remove error test

  * Bionic update: upstream stable patchset 2021-08-27 (LP: #1941916)
    - btrfs: mark compressed range uptodate only if all bio succeed
    - regulator: rt5033: Fix n_voltages settings for BUCK and LDO
    - r8152: Fix potential PM refcount imbalance
    - qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union()
    - net: Fix zero-copy head len calculation.
    - Revert "Bluetooth: Shutdown controller after workqueues are flushed or
      cancelled"
    - KVM: do not allow mapping valid but non-reference-counted pages
    - Revert "watchdog: iTCO_wdt: Account for rebooting on second timeout"
    - spi: mediatek: Fix fifo transfer
    - padata: validate cpumask without removed CPU during offline
    - Revert "ACPICA: Fix memory leak caused by _CID repair function"
    - ALSA: seq: Fix racy deletion of subscriber
    - clk: stm32f4: fix post divisor setup for I2S/SAI PLLs
    - omap5-board-common: remove not physically existing vdds_1v8_main fixed-
      regulator
    - scsi: sr: Return correct event when media event code is 3
    - media: videobuf2-core: dequeue if start_streaming fails
    - net: natsemi: Fix missing pci_disable_device() in probe and remove
    - nfp: update ethtool reporting of pauseframe control
    - mips: Fix non-POSIX regexp
    - bnx2x: fix an error code in bnx2x_nic_load()
    - net: pegasus: fix uninit-value in get_interrupt_interval
    - net: fec: fix use-after-free in fec_drv_remove
    - net: vxge: fix use-after-free in vxge_device_unregister
    - Bluetooth: defer cleanup of resources in hci_unregister_dev()
    - USB: usbtmc: Fix RCU stall warning
    - USB: serial: option: add Telit FD980 composition 0x1056
    - USB: serial: ch341: fix character loss at high transfer rates
    - USB: serial: ftdi_sio: add device ID for Auto-M3 OP-COM v2
    - usb: gadget: f_hid: added GET_IDLE and SET_IDLE handlers
    - usb: gadget: f_hid: fixed NULL pointer dereference
    - usb: gadget: f_hid: idle uses the highest byte for duration
    - usb: otg-fsm: Fix hrtimer list corruption
    - scripts/tracing: fix the bug that can't parse raw_trace_func
    - staging: rtl8723bs: Fix a resource lea...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

I found this failing on 3.13.0-187 and 3.13.0-188 as well, though after doing movdqu (write).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers