QEMU touchpad input erratic after wakeup from sleep

Bug #1834113 reported by Avi Eis on 2019-06-25
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Undecided
Unassigned
libvirt (Ubuntu)
Low
Unassigned
qemu (Ubuntu)
Low
Unassigned

Bug Description

Using Ubuntu host and guest. Normally the touchpad works great. Within the last few days, suddenly, apparently after a wake from sleep, the touchpad will behave erratically. For example, it will take two clicks to select something, and when moving the cursor it will act as though it is dragging even with the button not clicked.

A reboot fixes the issue temporarily.

ProblemType: Bug
DistroRelease: Ubuntu 19.04
Package: qemu 1:3.1+dfsg-2ubuntu3.1
Uname: Linux 5.1.14-050114-generic x86_64
ApportVersion: 2.20.10-0ubuntu27
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Mon Jun 24 20:55:44 2019
Dependencies:

EcryptfsInUse: Yes
InstallationDate: Installed on 2019-02-20 (124 days ago)
InstallationMedia: Ubuntu 18.04 "Bionic" - Build amd64 LIVE Binary 20180608-09:38
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 002: ID 8087:0025 Intel Corp.
 Bus 001 Device 003: ID 0c45:671d Microdia
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. Precision 5530
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.1.14-050114-generic root=UUID=18e8777c-1764-41e4-a19f-62476055de23 ro mem_sleep_default=deep mem_sleep_default=deep acpi_rev_override=1 scsi_mod.use_blk_mq=1 nouveau.modeset=0 nouveau.runpm=0 nouveau.blacklist=1 acpi_backlight=none acpi_osi=Linux acpi_osi=!
SourcePackage: qemu
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/26/2019
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.10.1
dmi.board.name: 0FP2W2
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.10.1:bd04/26/2019:svnDellInc.:pnPrecision5530:pvr:rvnDellInc.:rn0FP2W2:rvrA00:cvnDellInc.:ct10:cvr:
dmi.product.family: Precision
dmi.product.name: Precision 5530
dmi.product.sku: 087D
dmi.sys.vendor: Dell Inc.

Avi Eis (ikes73) wrote :

There wasn't an update in that area that I'd know of except maybe in the kernel (which has too many updates to track all of them).

Sorry I'm really not a UI expert, lets check a few things still:
- The suspend/wakeup was that just the guest or did you supend/wakeup the host?
- you targetted this for qemu, is the bad effect only happening in the guest UI?
- If you go back to the release kernel instead of the last update does it still happen?
- In general you can always go back to packages in the release pocket, doing so can you identify an
  update to one of the packages that caused this to happen?

Changed in qemu (Ubuntu):
status: New → Incomplete
Avi Eis (ikes73) wrote :

1. Suspend wakeup was on host, not guest.

2. Yes. Works fine on host, but two guests are both experiencing this.

I was using 5.1.6, had this issue, updated to latest and it went away, but came back on sleep/awake. If it's a kernel issue, it was introduced before 5.1.6.

It also doesn't seem to happen on every sleep, and I'm not sure if I can reproduce it. After it happened twice in two days it was enough for me to report.

Avi Eis (ikes73) wrote :

The fact that it doesn't happen every time makes it difficult to test against different versions of packages.

Bryce Harrington (bryce) wrote :

For disco there has been a single qemu update for security, with the following changes:

      * SECURITY UPDATE: Add support for exposing md-clear functionality
        to guests
        - d/p/ubuntu/enable-md-clear.patch
        - d/p/ubuntu/enable-md-no.patch
        - CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091
      * SECURITY UPDATE: heap overflow when loading device tree blob
        - d/p/ubuntu/CVE-2018-20815.patch: specify how large the buffer to
          copy the device tree blob into is.
        - CVE-2018-20815
      * SECURITY UPDATE: device driver denial of service via NULL pointer
        dereference
        - d/p/ubuntu/CVE-2019-5008.patch: Define skeleton 'power_mem_read'
          routine
        - CVE-2019-5008
      * SECURITY UPDATE: information leak in SLiRP
        - d/p/ubuntu/CVE-2019-9824.patch: check sscanf result when
          emulating ident.
        - CVE-2019-9824

$ git show b818da7a1a0dfa55c0f4edf0be10394fe4d7f3f8 | diffstat
 changelog | 23 ++++++++++++
 patches/series | 5 ++
 patches/ubuntu/CVE-2018-20815.patch | 38 +++++++++++++++++++
 patches/ubuntu/CVE-2019-5008.patch | 43 ++++++++++++++++++++++
 patches/ubuntu/CVE-2019-9824.patch | 49 +++++++++++++++++++++++++
 patches/ubuntu/enable-md-clear.patch | 67 +++++++++++++++++++++++++++++++++++
 patches/ubuntu/enable-md-no.patch | 28 ++++++++++++++
 7 files changed, 253 insertions(+)

I took a cursory look through the five patches, but none leap out as anything relating to touchpads, and don't appear to be related to power management, but hard to say for certain.

Bryce Harrington (bryce) wrote :

touchpad issues with power management can be challenging to sort out, and it's not unusual for them to reproduce non-reliably. Power management problems are almost always kernel-related, though I know it can be labor intensive to test.

However, I've seen the double-action behavior myself with touchpads and keyboards, and the problem wasn't the kernel; in at least one of those cases the cause was a second package that was consuming input events, which resolved through a combination of apt-get purges and reboots. Reviewing the tail end of /var/log/apt/history.log and rolling things back one by one might reveal something, but you'd need to do multiple suspend/resume cycles to test each time. You mentioned seeing this behavior starting a couple days ago, so you could focus attention on changes within the past week or so. (And check when the qemu 1:3.1+dfsg-2ubuntu3.1 update installed (and when you subsequently rebooted)).

An alternative thing to test would be to see if there are differences in what processes are running when the bug is reproducing, vs. when it is not. You'd want to examine the process tables on both the host and guest. But its possible something starts stealing events after resume, that wasn't doing so before, and diffing process tables won't show that; instead, the way to diagnose this would be to kill X clients one by one (e.g. `xlsclients -la`).

Beyond that, I can just offer some of the standard troubleshooting techniques for input device troubles:

  * Check if your bios firmware is up to date
  * Identify your touchpad device and driver (xinput / sudo lsinput / sudo lshw -C input)
  * Check input device properties if using evdev/synaptics (i.e. have any settings changed?)
  * xev is a helpful testing tool
  * Good luck

Changed in qemu (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Avi Eis (ikes73) wrote :

Rebooting the guest when this is happening does not fix the issue, for what that's worth.

Last upgrades before this happened are:

Start-Date: 2019-06-20 23:46:55
Commandline: apt dist-upgrade
Requested-By: a (1001)
Upgrade: intel-microcode:amd64 (3.20190514.0ubuntu0.19.04.3, 3.20190618.0ubuntu0.19.04.1)
End-Date: 2019-06-20 23:47:11

Start-Date: 2019-06-24 08:23:45
Commandline: apt dist-upgrade
Requested-By: a (1001)
Upgrade: snapd:amd64 (2.38+19.04, 2.39.2+19.04), firefox:amd64 (67.0.3+build1-0ubuntu0.19.04.1, 67.0.4+build1-0ubuntu0.19.04.1)
End-Date: 2019-06-24 08:24:00

I also see a bunch of updates to libvirt on 2019-06-19.

Should I try downgrading intel-microcode?

Avi Eis (ikes73) wrote :

Looks like the 3.20190514.0ubuntu0.19.04.3 version of intel-microcode is no longer published, how can I revert to it for testing?

If you don't have the cache the archive only leaves the release version as well as the latest one in -updates and the latest one in -security. It will not be that easy from apt to use any other.
But Launchpad keeps all of the publishing history [1] and there you'll find the version still [2] and from there at the amd64 build the deb file [3].

But that said, I doubt that intel-microcode is really involved - not saying it would not be worth a try, but my hopes on it are low to affect this case.

[1]: https://launchpad.net/ubuntu/+source/intel-microcode/+publishinghistory
[2]: https://launchpad.net/ubuntu/+source/intel-microcode/3.20190514.0ubuntu0.18.04.3
[3]: https://launchpad.net/~ubuntu-security-proposed/+archive/ubuntu/ppa/+build/16832070/+files/intel-microcode_3.20190514.0ubuntu0.18.04.3_amd64.deb

Avi Eis (ikes73) wrote :

Looking into libvirt, the release skipped several version numbers and went straight from 4.6.0 to 5.0.0, which were released 5 months apart. Also, 5.0.0 is itself several releases behind.

I'll test 4.60 first, then try the latest version of libvirt and see if either fixes the issue.

Avi Eis (ikes73) wrote :

Actually, the version of libvirt I'd upgraded from was 5.0.0-1ubuntu2.2 -> 5.0.0-1ubuntu2.3.

Downgrading all of libvirt-clients libvirt-daemon libvirt-daemon-driver-storage-rbd libvirt-daemon-system libvirt0 to 5.0.0-1ubuntu2.2 seems to have fixed this after several sleep-resume cycles, although it's hard to be sure. Does any change in libvirt seem relevant?

Related changes are
    3 * SECURITY UPDATE: DoS via incorrect permissions check
    4 - debian/patches/CVE-2019-3886-1.patch: disallow virDomainGetHostname
    5 for read-only connections in src/libvirt-domain.c.
    6 - debian/patches/CVE-2019-3886-2.patch: enforce ACL write permission
    7 for getting guest time & hostname in src/remote/remote_protocol.x.
    8 - CVE-2019-3886
    9 * SECURITY UPDATE: privilege escalation via incorrect socket permissions
   10 - debian/patches/CVE-2019-10132-1.patch: reject clients unless their
   11 UID matches the current UID in src/admin/admin_server_dispatch.c.
   12 - debian/patches/CVE-2019-10132-2.patch: restrict sockets to mode 0600
   13 in src/locking/virtlockd-admin.socket.in,
   14 src/locking/virtlockd.socket.in.
   15 - debian/patches/CVE-2019-10132-3.patch: restrict sockets to mode 0600
   16 in src/logging/virtlogd-admin.socket.in,
   17 src/logging/virtlogd.socket.in.
   18 - CVE-2019-10132

None of these is important for mouse integration :-/
So it might be a red herring.

Avi Eis (ikes73) wrote :

I'll try a few full boot-sleep-resume cycles on both versions and see how often it replicates

Avi Eis (ikes73) wrote :

The issue replicated on the older libvirt, so it wasn't that. Only thing left to try is intel-microcode now

Bryce Harrington (bryce) wrote :

intel-microcode is closely related to kernel behavior, and so wouldn't surprise me if it's involved - like I mentioned earlier invariably input device + power management bugs are something kernel related.

However, looking at the diff for the intel-microcode upgrade the changes are highly processor specific, and affects a small handful:

http://launchpadlibrarian.net/424908874/intel-microcode_3.20190514.0ubuntu0.19.04.1_3.20190514.0ubuntu0.19.04.3.diff.gz

I'm guessing for a qemu environment these aren't even relevant, but if one of the lines matches your host cpu then perhaps this would be worth more investigation. Otherwise, probably another red herring.

There is more you could try though. I suggested some ideas in my previous comment. You could also run xlsclients before and after reproducing the error, and see if there are any new X clients running that might have a grab on the cursor, and then kill them until the touchpad comes back. (See http://who-t.blogspot.com/2010/11/high-level-overview-of-grabs.html)

I hate to mention this as a possibility, but like I mentioned earlier, sometimes these bugs can reproduce very non-reliably. I've seen cases where, for instance, the root cause always existed but it was some change in usage or other random things, that caused the input behavior to suddenly start happening, only to disappear again - quite mysteriously - after some other system change.

The way input devices work, at least in context of this particular bug, is that each movement or click generates an "event", that gets communicated up through the system through various layers until an application consumes it. You can read about this in more extensive detail at https://www.x.org/wiki/Development/Documentation/InputEventProcessing/
That leads us to two questions for this case: A) Is the event getting generated at all? and B) If it is, then is something unexpectedly consuming it? So a good first step would be to eliminate one or the other of these. You've made some progress towards ruling out B.

For testing if the event is getting generated, the tool 'xev' is one of the easiest and handiest places to start from. Have you had a chance to give that a test? Run it from the command line when you've got the non-responsive touchpad, and use the touchpad and see if anything prints in the xev window. You can do some googling to get some tips and tricks for filtering xev output and to understand what its output means.

'xdotool' can also be useful; it's intended for automation but it lets you manhandle mouse events, such as force a click or mouse up/down. Longshot but at least is easy / low risk to try.

My guess though is the event isn't even getting generated. In that case i'd proceed with the standard troubleshooting techniques to see if something's wrong with the device itself, and go from there.

Avi Eis (ikes73) wrote :

>Core Gen8 Mobile

That's my i9-8950HK.

What I don't understand is why it works perfectly on the host but not on the guest. And the fact that it persists even when rebooting the guest implies it's not an issue with the guest runtime or anything. It seems like the issue must be with the way qemu is sending the events through to the guest, which I have no idea about.

Also, it never ignores my clicks. It's generating clicks that I never made - specifically, when moving the cursor it's acting as if I had clicked and dragged. So moving the cursor on a webpage just selects a bunch of text, moving it across the desktop draws a window, etc.

I will try some of the other troubleshooting methods.

Avi Eis (ikes73) wrote :

1. seems like same issue on older intel-microcode.

2. I checked xev on the guest while issue was occuring with the following results:

when moving the cursor, a buttonpress event is generated along with a bunch of motionnotify events. After moving it, if I click or touch the touchpad without moving it it shows only a buttonrelease but no buttonpress.

This is consistent with the behavior I'm seeing: when I move, it's as if I clicked right before, producing the dragging motion. And once it's registered buttonpress, another buttonpress event won't be generated until a buttonrelease one is generated.

xev works as expected on the host.

The issue is a phantom buttonpress event being generated on the guest somehow.

Avi Eis (ikes73) wrote :

xlsclients has the same output both times.

Hi Avi,
for the sake of giving it a try I had a second level guest and suspended/resumed the first level guest a few times. I can't reproduce it.

OTOH you seem to have a hard time to identify which change introduced this - if it was any change at all and not just by accident not showing up before.

I feel bad for you, but right now there isn't much we could action to further help.
Especially bad since you were always so prompt in feedback to our questions and suggestions :-/

I'll monitor the bug if you come up with new insights or questions on your debugging I'll try to help as I did so far. Thanks to Bryce who understands UI more than I do for his input as well.

But given the current state, this most likely will stay incomplete and un-actionable :-/

Changed in qemu (Ubuntu):
importance: Undecided → Low
Changed in libvirt (Ubuntu):
status: New → Incomplete
importance: Undecided → Low
Thomas Huth (th-huth) on 2019-08-16
Changed in qemu:
status: New → Incomplete

Avi,

Something I have realized we missed as a feedback here - or maybe I missed checking previous comments - is how your mouse is being setup for the guest. Is it being PS/2 emulated (default) or is it being given as an USB device (when qemu cmd line has "-usb -device usb-tablet"). Also, are you using SPICE protocol (perhaps with USB direction option ?).

Are you able to tell which xserver-xorg-input-XX module is being used inside the guest ? You will probably find that information from Xorg log files (check if you're using xf86-input-wacom or xserver-xorg-input-evdev or some other).

Another thing that comes to my mind as well, are you using powersaving features ? Specifically the I2C bus I'm concerned. Using "powertop", you are able to change "Runtime PM for I2C Adapter" option under the Tunables Tab (turning the power mgmt to off). I would like to know if you are able to reproduce the issue without having power management enabled for I2C. You can try disabling only I2C and then disabling all PM options as a second attempt.

From your host:

Device #1

[ 2.834320] input: WCOM488E:00 056A:488E Mouse as /devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-1/i2c-WCOM488E:00/0018:056A:488E.0001/input/input12

[ 3.064686] input: Wacom HID 488E Finger as /devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-1/i2c-WCOM488E:00/0018:056A:488E.0001/input/input17

Device #2

[ 2.834860] input: SYNA2393:00 06CB:7A13 Mouse as /devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-6/i2c-SYNA2393:00/0018:06CB:7A13.0002/input/input13

[ 2.834929] input: SYNA2393:00 06CB:7A13 Touchpad as /devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-6/i2c-SYNA2393:00/0018:06CB:7A13.0002/input/input14

Could you describe your input devices ? How many mice, trackpads, pens, etc, you are using connected to the host ?

Thanks! And sorry for so many questions =).

Avi Eis (ikes73) wrote :

Right now it stopped happening, although I did see something briefly last week that fixed itself on a reboot.

If it happens again I'll check those details.

Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
Launchpad Janitor (janitor) wrote :

[Expired for libvirt (Ubuntu) because there has been no activity for 60 days.]

Changed in libvirt (Ubuntu):
status: Incomplete → Expired
Launchpad Janitor (janitor) wrote :

[Expired for qemu (Ubuntu) because there has been no activity for 60 days.]

Changed in qemu (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers