Repeatable hang within 5 minutes using stress-ng + sleep + usb mouse
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
xorg-server (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
If I run a thermal transition test script (30 seconds stress-ng, 30 seconds sleep, in a loop) and move a local USB mouse, Kubuntu reliably crashes, usually in the first couple of runs and almost 100% of the time by run 6.
This appears to be hardware-linked, but not due to a specific piece of bad hardware: I have swapped literally every piece of hardware in the system.
It shows up (while running the script at the end):
- On both an MSI B450 Gaming plus max and MSI MPG X570 Gaming plus mainboard.
- On both an AMD Ryzen 5 3600 and 3600X CPU.
- With one or two sticks of RAM. I've tested both sticks individually, in more than one mainboard slot.
- Regardless of whether the mainboard is in/attached to a case.
- Regardless of whether there is an m.2 SSD installed or I'm running off a live Kubuntu 19.10 USB stick with no hard disk attached.
- Regardless of which of two mice I use (an old Logitech one, or a GTX 133 Gaming mouse).
- Regardless of whether I'm using a Corsair VS650 or Corsair AX850 PSU.
- Regardless of whether I'm using an AMD RX 5700 XT or using an Nvidia Gigabyte GeForce RTX 2700 Super (with open source drivers in both cases).
- Regardless of whether I'm using KDE or XFCE.
- Regardless of whether I'm using the default KDE DM or switch to GDM3 and set WaylandEnable=
- Regardless of whether I use the default 5.3.0-29-generic kernel or 5.4.17-
- Regardless of whether I go directly into the graphical environment or start in runlevel 3 and then manually run startx.
- Regardless of whether it's on the rising or falling edge of the stress-script's temperature changes.
- Regardless of bios version on the X570 mainboard (the one it shipped with, or the newest one released in January 2020).
- Regardless of whether XMP is on or off in the bios.
- Regardless of whether I use the default or set global c-state to "control = disabled" in the bios.
- Regardless of whether I add processor.
- Regardless of whether or not speakers are plugged in.
- Regardless of whether I'm using a USB port that is directly on the motherboard or is on the front of the case.
- Regardless of which monitor it is attached to.
It doesn't show up:
- On an old i7-4771 machine I have, also running Kubuntu 19.10, while running the test script.
- When I use a mouse remotely with ssh -Y [ip of the machine I am reporting this from] xeyes, while running the test script.
- When I do non-mouse USB input, ie via a USB keyboard or USB wifi dongle, including under saturated network load, while running the test script.
- During stress tests of the GPU, CPU, etc. Tools like memtest, mprime, Unigine Superposition, repeated kernel compiles, etc run stably overnight.
- When the system is entirely idle aside from mouse movement.
- When I start in runlevel 3 and run the same test script, using the mouse with gpm.
- Running the same test script without mouse movement: this was stable overnight, then crashed within a couple of minutes of moving the mouse.
It shows up with load other than the stress-ng+sleep script too, but much less reliably - I'm writing this bug report on the relevant machine, with firefox open. Crashes occur at least once a week under these conditions, but not frequently.
Crashes occur with sensor-reported CPU temperatures of 32 to 41 degrees Celsius. Nothing is overheating, and the system is stable at much higher temperatures under sustained stress tests.
The symptoms of the crash: the display stops updating and the system does not respond to any further input, including via the network or magic sysrq key. There is nothing related to it in syslog or journalctl, including when I'm running journalctl -f at the time of the crash.
The test script:
#!/bin/bash
for x in {1..10000}
do
echo "Run $x at `date`"
stress-ng --cpu 12 --cpu-method all --verify -t 30s --metrics-brief
sleep 30
done
ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: xorg 1:7.7+19ubuntu12
Uname: Linux 5.4.17-
ApportVersion: 2.20.11-0ubuntu8.2
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CompositorRunning: None
CurrentDesktop: KDE
Date: Fri Feb 7 00:02:22 2020
DistUpgraded: Fresh install
DistroCodename: eoan
DistroVariant: ubuntu
GraphicsCard:
NVIDIA Corporation Device [10de:1e84] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd Device [1458:4008]
InstallationDate: Installed on 2020-01-30 (7 days ago)
InstallationMedia: Kubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017)
MachineType: Micro-Star International Co., Ltd. MS-7C37
ProcKernelCmdLine: BOOT_IMAGE=
SourcePackage: xorg
Symptom: display
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 01/08/2020
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: A.71
dmi.board.
dmi.board.name: MPG X570 GAMING PLUS (MS-7C37)
dmi.board.vendor: Micro-Star International Co., Ltd.
dmi.board.version: 2.0
dmi.chassis.
dmi.chassis.type: 3
dmi.chassis.vendor: Micro-Star International Co., Ltd.
dmi.chassis.
dmi.modalias: dmi:bvnAmerican
dmi.product.family: To be filled by O.E.M.
dmi.product.name: MS-7C37
dmi.product.sku: To be filled by O.E.M.
dmi.product.
dmi.sys.vendor: Micro-Star International Co., Ltd.
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.99-1ubuntu1
version.
version.
version.
version.
version.
version.
version.
affects: | ubuntu → xorg (Ubuntu) |
no longer affects: | linux (Ubuntu) |
Changed in xorg-server (Ubuntu): | |
status: | Incomplete → New |
Thank you for taking the time to report this bug and helping to make Ubuntu better. It sounds like some part of the system has crashed. To help us find the cause of the crash please follow these steps:
1. Look in /var/crash for crash files and if found run:
ubuntu-bug YOURFILE.crash
Then tell us the ID of the newly-created bug.
2. If step 1 failed then look at https:/ /errors. ubuntu. com/user/ ID where ID is the content of file /var/lib/ whoopsie/ whoopsie- id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.
3. If step 2 also failed then apply the workaround from bug 994921, reboot, reproduce the crash, and retry step 1.
Please take care to avoid attaching .crash files to bugs as we are unable to process them as file attachments. It would also be a security risk for yourself.