Comment 677 for bug 1690085

Revision history for this message
In , kmueller (kmueller-linux-kernel-bugs) wrote :

(In reply to Brendan Long from comment #589)
> I strongly suspect that the graphics driver was the problem since my lockups
> would cause the screen to become completely unresponsive, but sound
> continued working, and in one case I had a lockup during a video call and
> the other person could still see and hear me.

What you're describing here is a new "feature" introduced between kernel 4.19.16 and 17 e.g. I can see exactly the same here with radeon hardware. The system is completely working (even VMs on the host are running well) except of graphics - even tty terminals are working sometimes. When ssh'ing the machine, I can always see log entries like these:

radeon 0000:0a:00.0: ring 0 stalled for more than 14084msec
radeon 0000:0a:00.0: GPU lockup (current fence id 0x0000000000053ed7 last fence id 0x0000000000053f0f on ring 0)
...

I'm trying to near it down currently using git bisect. The suspicious changes left are at the moment:

2019-01-22 arm64: Don't trap host pointer auth use to EL2 Mark Rutland bad
2019-01-22 arm64/kvm: consistently handle host HCR_EL2 flags Mark Rutland
2019-01-22 scsi: target: iscsi: cxgbit: fix csk leak Varun Prakash
2019-01-22 scsi: target: iscsi: cxgbit: fix csk leak Varun Prakash
2019-01-22 Revert "scsi: target: iscsi: cxgbit: fix csk leak" Sasha Levin
2019-01-22 mmc: sdhci-msm: Disable CDR function on TX Loic Poulain
2019-01-22 netfilter: nf_conncount: fix argument order to find_next_bit
2019-01-22 netfilter: nf_conncount: speculative garbage collection on empty lists Pablo Neira Ayuso
2019-01-22 netfilter: nf_conncount: move all list iterations under spinlock Pablo Neira Ayuso
2019-01-22 netfilter: nf_conncount: merge lookup and add functions Florian Westphal

2019-01-22 netfilter: nf_conncount: restart search when nodes have been erased Florian Westphal ?
2019-01-22 netfilter: nf_conncount: split gc in two phases Florian Westphal
2019-01-22 netfilter: nf_conncount: don't skip eviction when age is negative Florian Westphal
2019-01-22 netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTS Shawn Bohrer
2019-01-22 can: gw: ensure DLC boundaries after CAN frame modification Oliver Hartkopp
2019-01-22 tty: Don't hold ldisc lock in tty_reopen() if ldisc present Dmitry Safonov
2019-01-22 tty: Simplify tty->count math in tty_reopen() Dmitry Safonov
2019-01-22 tty: Hold tty_ldisc_lock() during tty_reopen() Dmitry Safonov
2019-01-22 tty/ldsem: Wake up readers after timed out down_write() Dmitry Safonov

As you're describing correctly, the problem seems to be network related. I'm getting this error two when watching videos from internet. I'm currently testing the changes between "restart search when nodes have been erased" and "Wake up readers after timed out down_write()".