Activity log for bug #1850238

Date Who What changed Old value New value Message
2019-10-29 01:21:11 Nickolay Ponomarev bug added bug
2019-10-29 01:30:10 Ubuntu Kernel Bot linux (Ubuntu): status New Confirmed
2019-10-29 01:30:13 Ubuntu Kernel Bot tags amd64 apport-bug eoan package-from-proposed amd64 apport-bug disco eoan package-from-proposed
2019-10-30 00:52:25 Terry Rudd bug added subscriber Terry Rudd
2019-11-03 18:49:06 Igor Maculan bug added subscriber Igor Maculan
2019-11-08 11:56:45 Ashish Kumar Singh bug added subscriber Ashish Kumar Singh
2019-11-24 00:20:32 Per Christian Henden bug added subscriber Per Christian Henden
2019-12-27 21:38:22 Human bug added subscriber Human
2020-01-03 16:45:20 Nahuel Pastorale bug added subscriber Nahuel Pastorale
2020-01-04 21:43:37 Rion bug added subscriber Rion
2020-01-07 14:58:30 Nahuel Pastorale bug task added linux (Arch Linux)
2020-01-26 14:43:43 Anthony Cunningham bug added subscriber Anthony Cunningham
2020-01-28 03:18:35 j0rg3 bug added subscriber j0rg3
2020-02-17 07:45:57 Gregor Darius bug added subscriber Gregor Darius
2020-02-17 19:57:29 Tim Sweeney tags amd64 apport-bug disco eoan package-from-proposed amd64 apport-bug disco eoan focal package-from-proposed
2020-02-22 03:20:44 Ayman Rady bug added subscriber Ayman Rady
2020-02-23 09:52:24 Sergei Petunin bug added subscriber Sergei Petunin
2020-02-29 13:49:26 Gabriel Miretti bug added subscriber Gabriel Miretti
2020-03-26 13:05:56 Kai-Heng Feng description Short version ============= I'm experiencing a 50-second hang each time I resume from a "deep" (suspend-to-RAM) sleep. It happens with the newer kernel (5.3 series; I'm currently running the version from eoan-proposed), but not with the version from the Ubuntu 18.04.3 LTS (uname says "5.0.0-31-generic #33~18.04.1-Ubuntu SMP"). [I haven't yet tried to test the mainline builds, nor to find/confirm the regression range, as this seems like something that will take me another week, and I'm not sure if it would be helpful.] I narrowed the problem down to what I believe is a broken USB Type-C controller on the NVIDIA GPU: the ucsi_ccg driver for /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.3/i2c-0/0-0008 reports a timeout for both the initial PPM_RESET command (on system startup) and for the SET_NOTIFICATION_ENABLE command the driver runs on resume. I guess the hang is the driver waiting for a response to SET_NOTIFICATION_ENABLE; it appears to have been added recently in https://github.com/torvalds/linux/commit/a94ecde41f7e51e2742e53b5f151aee662c54d39, which could explain why I don't see the hang with 5.0.x. Creating /etc/modprobe.d/dell.conf with a `blacklist ucsi_ccg` line (and rebooting) makes the hang go away. Steps to reproduce ================== (these are not the actual steps one can take to reproduce, starting from a new install; let me know if those will be useful) 1. Boot Ubuntu 19.10 with NVIDIA GPU drivers uninstalled and the following kernel parameters <https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter>: nouveau.modeset=0 nouveau.runpm=0 # force using integrated graphics # (the problem can be reproduced using NVIDIA's proprietary driver too, but I # guessed it's better to avoid it, and nouveau prints lots of errors with this GPU) mem_sleep_default=deep # suspend to RAM; suspend-to-idle has its own problems on this system 2. Run `dmesg -w` and wait a minute or two until a message like the following is printed: [ 175.611346] ucsi_ccg 0-0008: failed to reset PPM! [ 175.611355] ucsi_ccg 0-0008: PPM init failed (-110) (attempting to suspend before the PPM init timeout will fail to enter sleep at all.) (if your system doesn't report PPM init timeout, you probably won't see the hang on resume either) 3. Run `sudo pm-suspend` (using the power button to suspend causes other problems) ...wait for the laptop to go to sleep and the fans to turn off. 4. Press Enter on the built-in keyboard to resume. (Although the way we wake up the system doesn't seem to matter.) 5. Observe a hang lasting for almost a minute before the system is operational, with dmesg reporting: [ 299.331393] ata1.00: configured for UDMA/100 <note the 47 second long gap> [ 346.133024] ucsi_ccg 0-0008: PPM NOT RESPONDING [ 346.133039] PM: dpm_run_callback(): ucsi_ccg_resume+0x0/0x20 [ucsi_ccg] returns -110 [ 346.133042] PM: Device 0-0008 failed to resume: error -110 ... [ 346.141504] Restarting tasks ... done. [ 346.340221] PM: suspend exit System info =========== My Dell G3 3590 laptop has an NVIDIA "GeForce GTX 1660 Ti with Max-Q Design" GPU. NVIDIA's "Turing" chips include USB Type-C controller on the GPU (I read future VR headsets are supposed to use it <https://github.com/envytools/envytools/search?q=4d151a19358579c77487ea3f72c32dc97c0250f7..ffd2dc9146482a5469209bbc861ed80adb066d31&type=Commits>), and indeed I'm seeing: # lspci -tv -[0000:00]-+-00.0 Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers +-01.0-[01]--+-00.0 NVIDIA Corporation TU116M [GeForce GTX 1660 Ti Mobile] | +-00.1 NVIDIA Corporation Device 1aeb | +-00.2 NVIDIA Corporation Device 1aec | \-00.3 NVIDIA Corporation Device 1aed ... Where the '1aed' device is detected as "NVIDIA USB Type-C Port Policy Controller" in Windows. I'm not sure if it's serving any useful purpose on this laptop, and it certainly doesn't seem to function properly: If I enable UCSI logging on startup (root's crontab): @reboot bash -c 'echo 1 > /sys/kernel/debug/tracing/events/ucsi/enable' ..the steps to reproduce above result in the following /sys/kernel/debug/tracing/trace: # tracer: nop # # entries-in-buffer/entries-written: 10/10 #P:12 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | kworker/6:2-679 [006] .... 68.593915: ucsi_command: control=00000001 (PPM_RESET) kworker/6:1-187 [006] .... 151.599387: ucsi_notify: CCI=00000000 kworker/6:2-679 [006] .... 175.617158: ucsi_reset_ppm: PPM_RESET -> FAIL (err=-110) kworker/6:1-187 [006] .... 211.582572: ucsi_notify: CCI=00000000 kworker/6:1-187 [006] .... 253.577823: ucsi_notify: CCI=00000000 kworker/6:1-187 [006] .... 295.574520: ucsi_notify: CCI=00000000 pm-suspend-3448 [007] .... 298.115894: ucsi_command: control=dbe70005 (SET_NOTIFICATION_ENABLE) pm-suspend-3448 [005] .... 346.138850: ucsi_run_command: SET_NOTIFICATION_ENABLE -> FAIL (err=-110) kworker/6:1-187 [006] .... 370.904651: ucsi_notify: CCI=00000000 kworker/6:1-187 [006] .... 412.901709: ucsi_notify: CCI=00000000 I updated the BIOS to the latest available (08/28/2019) and installed (by booting into Windows) all the other updates available for this system from the vendor. I don't know how to check what is the firmware version of the USB-C chip on the GPU and whether it even exists... ProblemType: Bug DistroRelease: Ubuntu 19.10 Package: linux-image-5.3.0-20-generic 5.3.0-20.21 ProcVersionSignature: Ubuntu 5.3.0-20.21-generic 5.3.7 Uname: Linux 5.3.0-20-generic x86_64 ApportVersion: 2.20.11-0ubuntu8 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC1: nickolay 1668 F.... pulseaudio /dev/snd/controlC0: nickolay 1668 F.... pulseaudio CurrentDesktop: ubuntu:GNOME Date: Tue Oct 29 01:21:28 2019 InstallationDate: Installed on 2019-10-20 (8 days ago) InstallationMedia: Ubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017) MachineType: Dell Inc. G3 3590 ProcFB: 0 i915drmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-20-generic root=UUID=0b40d72f-d832-47f6-ab77-faccfb6547fe ro nouveau.modeset=0 nouveau.runpm=0 mem_sleep_default=deep quiet splash vt.handoff=7 RelatedPackageVersions: linux-restricted-modules-5.3.0-20-generic N/A linux-backports-modules-5.3.0-20-generic N/A linux-firmware 1.183.1 SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 08/28/2019 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.7.1 dmi.board.name: 061RYD dmi.board.vendor: Dell Inc. dmi.board.version: A00 dmi.chassis.type: 10 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvr1.7.1:bd08/28/2019:svnDellInc.:pnG33590:pvr:rvnDellInc.:rn061RYD:rvrA00:cvnDellInc.:ct10:cvr: dmi.product.family: GSeries dmi.product.name: G3 3590 dmi.product.sku: 0949 dmi.sys.vendor: Dell Inc. === SRU Justification === [Impact] Some systems have a "phantom" Nvidia UCSI, which prevent systems from suspending. [Fix] ucsi_ccg is stuck in its probe routine because of the i2c bus never timeouts. Let it timeouts and probe can fail since it's just a phantom device. [Test] After applying this patch system can suspend/resume succesfully. [Regression Potential] Low. It's a trivial change to correctly handle timeout. === Original Bug Report === Short version ============= I'm experiencing a 50-second hang each time I resume from a "deep" (suspend-to-RAM) sleep. It happens with the newer kernel (5.3 series; I'm currently running the version from eoan-proposed), but not with the version from the Ubuntu 18.04.3 LTS (uname says "5.0.0-31-generic #33~18.04.1-Ubuntu SMP"). [I haven't yet tried to test the mainline builds, nor to find/confirm the regression range, as this seems like something that will take me another week, and I'm not sure if it would be helpful.] I narrowed the problem down to what I believe is a broken USB Type-C controller on the NVIDIA GPU: the ucsi_ccg driver for /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.3/i2c-0/0-0008 reports a timeout for both the initial PPM_RESET command (on system startup) and for the SET_NOTIFICATION_ENABLE command the driver runs on resume. I guess the hang is the driver waiting for a response to SET_NOTIFICATION_ENABLE; it appears to have been added recently in https://github.com/torvalds/linux/commit/a94ecde41f7e51e2742e53b5f151aee662c54d39, which could explain why I don't see the hang with 5.0.x. Creating /etc/modprobe.d/dell.conf with a `blacklist ucsi_ccg` line (and rebooting) makes the hang go away. Steps to reproduce ================== (these are not the actual steps one can take to reproduce, starting from a new install; let me know if those will be useful) 1. Boot Ubuntu 19.10 with NVIDIA GPU drivers uninstalled and the following kernel parameters <https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter>:  nouveau.modeset=0 nouveau.runpm=0 # force using integrated graphics                                             # (the problem can be reproduced using NVIDIA's proprietary driver too, but I                                             # guessed it's better to avoid it, and nouveau prints lots of errors with this GPU)  mem_sleep_default=deep # suspend to RAM; suspend-to-idle has its own problems on this system 2. Run `dmesg -w` and wait a minute or two until a message like the following is printed:  [ 175.611346] ucsi_ccg 0-0008: failed to reset PPM!  [ 175.611355] ucsi_ccg 0-0008: PPM init failed (-110) (attempting to suspend before the PPM init timeout will fail to enter sleep at all.) (if your system doesn't report PPM init timeout, you probably won't see the hang on resume either) 3. Run `sudo pm-suspend` (using the power button to suspend causes other problems) ...wait for the laptop to go to sleep and the fans to turn off. 4. Press Enter on the built-in keyboard to resume. (Although the way we wake up the system doesn't seem to matter.) 5. Observe a hang lasting for almost a minute before the system is operational, with dmesg reporting:  [ 299.331393] ata1.00: configured for UDMA/100  <note the 47 second long gap>  [ 346.133024] ucsi_ccg 0-0008: PPM NOT RESPONDING  [ 346.133039] PM: dpm_run_callback(): ucsi_ccg_resume+0x0/0x20 [ucsi_ccg] returns -110  [ 346.133042] PM: Device 0-0008 failed to resume: error -110  ...  [ 346.141504] Restarting tasks ... done.  [ 346.340221] PM: suspend exit System info =========== My Dell G3 3590 laptop has an NVIDIA "GeForce GTX 1660 Ti with Max-Q Design" GPU. NVIDIA's "Turing" chips include USB Type-C controller on the GPU (I read future VR headsets are supposed to use it <https://github.com/envytools/envytools/search?q=4d151a19358579c77487ea3f72c32dc97c0250f7..ffd2dc9146482a5469209bbc861ed80adb066d31&type=Commits>), and indeed I'm seeing: # lspci -tv -[0000:00]-+-00.0 Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers            +-01.0-[01]--+-00.0 NVIDIA Corporation TU116M [GeForce GTX 1660 Ti Mobile]            | +-00.1 NVIDIA Corporation Device 1aeb            | +-00.2 NVIDIA Corporation Device 1aec            | \-00.3 NVIDIA Corporation Device 1aed ... Where the '1aed' device is detected as "NVIDIA USB Type-C Port Policy Controller" in Windows. I'm not sure if it's serving any useful purpose on this laptop, and it certainly doesn't seem to function properly: If I enable UCSI logging on startup (root's crontab):  @reboot bash -c 'echo 1 > /sys/kernel/debug/tracing/events/ucsi/enable' ..the steps to reproduce above result in the following /sys/kernel/debug/tracing/trace: # tracer: nop # # entries-in-buffer/entries-written: 10/10 #P:12 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | |      kworker/6:2-679 [006] .... 68.593915: ucsi_command: control=00000001 (PPM_RESET)      kworker/6:1-187 [006] .... 151.599387: ucsi_notify: CCI=00000000      kworker/6:2-679 [006] .... 175.617158: ucsi_reset_ppm: PPM_RESET -> FAIL (err=-110)      kworker/6:1-187 [006] .... 211.582572: ucsi_notify: CCI=00000000      kworker/6:1-187 [006] .... 253.577823: ucsi_notify: CCI=00000000      kworker/6:1-187 [006] .... 295.574520: ucsi_notify: CCI=00000000       pm-suspend-3448 [007] .... 298.115894: ucsi_command: control=dbe70005 (SET_NOTIFICATION_ENABLE)       pm-suspend-3448 [005] .... 346.138850: ucsi_run_command: SET_NOTIFICATION_ENABLE -> FAIL (err=-110)      kworker/6:1-187 [006] .... 370.904651: ucsi_notify: CCI=00000000      kworker/6:1-187 [006] .... 412.901709: ucsi_notify: CCI=00000000 I updated the BIOS to the latest available (08/28/2019) and installed (by booting into Windows) all the other updates available for this system from the vendor. I don't know how to check what is the firmware version of the USB-C chip on the GPU and whether it even exists... ProblemType: Bug DistroRelease: Ubuntu 19.10 Package: linux-image-5.3.0-20-generic 5.3.0-20.21 ProcVersionSignature: Ubuntu 5.3.0-20.21-generic 5.3.7 Uname: Linux 5.3.0-20-generic x86_64 ApportVersion: 2.20.11-0ubuntu8 Architecture: amd64 AudioDevicesInUse:  USER PID ACCESS COMMAND  /dev/snd/controlC1: nickolay 1668 F.... pulseaudio  /dev/snd/controlC0: nickolay 1668 F.... pulseaudio CurrentDesktop: ubuntu:GNOME Date: Tue Oct 29 01:21:28 2019 InstallationDate: Installed on 2019-10-20 (8 days ago) InstallationMedia: Ubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017) MachineType: Dell Inc. G3 3590 ProcFB: 0 i915drmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-20-generic root=UUID=0b40d72f-d832-47f6-ab77-faccfb6547fe ro nouveau.modeset=0 nouveau.runpm=0 mem_sleep_default=deep quiet splash vt.handoff=7 RelatedPackageVersions:  linux-restricted-modules-5.3.0-20-generic N/A  linux-backports-modules-5.3.0-20-generic N/A  linux-firmware 1.183.1 SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 08/28/2019 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.7.1 dmi.board.name: 061RYD dmi.board.vendor: Dell Inc. dmi.board.version: A00 dmi.chassis.type: 10 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvr1.7.1:bd08/28/2019:svnDellInc.:pnG33590:pvr:rvnDellInc.:rn061RYD:rvrA00:cvnDellInc.:ct10:cvr: dmi.product.family: GSeries dmi.product.name: G3 3590 dmi.product.sku: 0949 dmi.sys.vendor: Dell Inc.
2020-03-26 16:22:20 Kai-Heng Feng nominated for series Ubuntu Bionic
2020-03-26 16:22:20 Kai-Heng Feng bug task added linux (Ubuntu Bionic)
2020-03-26 16:22:20 Kai-Heng Feng nominated for series Ubuntu Focal
2020-03-26 16:22:20 Kai-Heng Feng bug task added linux (Ubuntu Focal)
2020-03-26 16:22:20 Kai-Heng Feng nominated for series Ubuntu Eoan
2020-03-26 16:22:20 Kai-Heng Feng bug task added linux (Ubuntu Eoan)
2020-03-26 16:22:31 Kai-Heng Feng bug task added linux-oem-osp1 (Ubuntu)
2020-03-26 16:22:42 Kai-Heng Feng linux-oem-osp1 (Ubuntu Eoan): status New Won't Fix
2020-03-26 16:22:45 Kai-Heng Feng linux-oem-osp1 (Ubuntu Focal): status New Won't Fix
2020-03-26 16:22:48 Kai-Heng Feng linux (Ubuntu Bionic): status New Won't Fix
2020-03-26 21:39:08 Timo Aaltonen linux-oem-osp1 (Ubuntu Bionic): status New Fix Committed
2020-04-03 17:51:33 Launchpad Janitor linux-oem-osp1 (Ubuntu): status New Confirmed
2020-04-03 17:51:33 Launchpad Janitor linux (Ubuntu Eoan): status New Confirmed
2020-04-03 21:40:51 Kelsey Steele linux (Ubuntu Eoan): status Confirmed Fix Committed
2020-04-07 09:36:39 Ubuntu Kernel Bot tags amd64 apport-bug disco eoan focal package-from-proposed amd64 apport-bug disco eoan focal package-from-proposed verification-needed-eoan
2020-04-07 11:52:13 Oliver Klee bug added subscriber Oliver Klee
2020-04-07 12:46:01 Kai-Heng Feng tags amd64 apport-bug disco eoan focal package-from-proposed verification-needed-eoan amd64 apport-bug disco eoan focal package-from-proposed verification-done-eoan
2020-04-07 14:12:01 Launchpad Janitor linux-oem-osp1 (Ubuntu Bionic): status Fix Committed Fix Released
2020-04-21 13:56:12 Launchpad Janitor linux-oem-osp1 (Ubuntu Eoan): status Won't Fix Fix Released
2020-04-21 13:56:11 Launchpad Janitor linux-oem-osp1 (Ubuntu Eoan): status Won't Fix Fix Released
2020-04-25 02:24:24 Can Yildirim bug added subscriber Can Yildirim
2020-04-28 19:47:28 Launchpad Janitor linux (Ubuntu Eoan): status Fix Committed Fix Released
2020-04-28 19:47:28 Launchpad Janitor cve linked 2020-11884
2020-05-22 13:52:55 Hélio Márcio Filho bug added subscriber Hélio Márcio Filho