Running I/O heavy operation (cp) inside terminal in X server session will lock up entire system while running it in a TTY works fine

Bug #1682402 reported by ellie
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned

Bug Description

Running I/O heavy operation (cp) inside terminal in X server session will lock up entire system while running it in a TTY works fine.

I am filing this as a kernel bug because the following steps will still hang the system:

1. Open up gnome-terminal
2. Start copying a really large file between two different EXT4 partitions on my local SSD with a "cp" command
3. Switch to a TTY
4. Login to TTY and observe htop or iotop

Expected result: I can see the operation happen and after a while it completes

Actual result: Keyboard drops out some time during operation (TTY stops responding, caps lock light no longer responds to pressing caps lock, computer appears effectively dead with last screen image frozen)

When I run the "cp" operation on the TTY to start with, no matter if the X session is running in parallel or not, it works as expected with no trouble.

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-generic 4.10.0.19.21
ProcVersionSignature: Ubuntu 4.10.0-19.21-generic 4.10.8
Uname: Linux 4.10.0-19-generic x86_64
ApportVersion: 2.20.4-0ubuntu4
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/pcmC0D0p: jonas 2297 F...m pulseaudio
 /dev/snd/controlC0: jonas 2297 F.... pulseaudio
CurrentDesktop: GNOME
Date: Thu Apr 13 13:04:03 2017
InstallationDate: Installed on 2017-04-08 (4 days ago)
InstallationMedia: Ubuntu 16.10 "Yakkety Yak" - Release amd64 (20161012.2)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 004: ID 138a:0090 Validity Sensors, Inc.
 Bus 001 Device 003: ID 13d3:5248 IMC Networks
 Bus 001 Device 005: ID 056a:5047 Wacom Co., Ltd
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: LENOVO 20FDCTO1WW
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.10.0-19-generic.efi.signed root=/dev/mapper/fedora-root ro rootdelay=5 quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-19-generic N/A
 linux-backports-modules-4.10.0-19-generic N/A
 linux-firmware 1.164
SourcePackage: linux
UpgradeStatus: Upgraded to zesty on 2017-04-12 (1 days ago)
dmi.bios.date: 11/16/2015
dmi.bios.vendor: LENOVO
dmi.bios.version: N1GET35W (1.12 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20FDCTO1WW
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40700 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrN1GET35W(1.12):bd11/16/2015:svnLENOVO:pn20FDCTO1WW:pvrThinkPadYoga260:rvnLENOVO:rn20FDCTO1WW:rvrSDK0J40700WIN:cvnLENOVO:ct10:cvrNone:
dmi.product.name: 20FDCTO1WW
dmi.product.version: ThinkPad Yoga 260
dmi.sys.vendor: LENOVO

Revision history for this message
ellie (et1234567) wrote :
Revision history for this message
ellie (et1234567) wrote :

Just to rule out hardware issues:

smartctl --all on the SSD shows no sector reallocations, barely any media wearout, no uncorrectable error count and barely any CRC errors (9). I am just running a long self-test and will report back whether it shows any errors.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
ellie (et1234567) wrote :

Some additional info:

- a short self-test of the SSD has already completed without errors

- there was barely any memory usage while the bug occured, so it shouldn't be an out of memory issue

- iotop shows always ~5 seconds of 99% I/O usage of the "cp" command at the top with nothing notable else going on, then ~8 seconds of ~50% usage of the "dmcrypt_write" command at the top with nothing else notable going on. This might be normal, I just mentioned it because the weird switching back and forth between those two looked a bit odd

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc7

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
ellie (et1234567) wrote :

I just tested the latest upstream 4.11-rc7 which works fine. I didn't test a previous version of Ubuntu, since I just switched to 17.04 from Fedora 25.

tags: added: kernel-fixed-upstream
ellie (et1234567)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
ellie (et1234567) wrote :

I did some more testing with upstream kernel. The upstream kernel similarly has the occasional freeze where the entire desktop stops responding or updating (while the hardware mouse cursor is usually still movable but clicking doesn't do anything), however unlike the Ubuntu 17.04 kernel it always recovers immediately after 1-3 seconds and responds again. Ubuntu's kernel just gets stuck for me forever - one time I waited over a minute to see if something would wake up again, but the image just remained frozen forever and that's it.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a reverse bisect to figure out which commit upstream fixes this regression. It would be very helpful to know the last kernel that had this issue and the first kernel that did not.

Can you test the following kernels and report back? We are looking for the earliest kernel version that does not have this bug:

v4.11-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc4/

If v4.11rc-4 exhibits the bug then test v4.11-rc6:
v4.11-rc6: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc6/

If v4.11-rc4 does not exhibit the bug then test v4.11-rc2:
v4.11-rc2: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc2/

Thanks in advance!

tags: added: performing-bisect
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.