Ubuntu Desktop ISO fails to boot with nouveau on a displayport

Bug #1723619 reported by Chris Glass on 2017-10-14
56
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Linux
Unknown
Medium
Release Notes for Ubuntu
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned

Bug Description

On the latest daily artful image (/current as of 2017-10-14), the following occurs on a system with nVidia graphics (on my system, a GTX970) when connected to a 4k display over a displayport: https://photos.app.goo.gl/gLUva3Vgvtv0lAmj2

Steps to reproduce:
- Download latest daily image, check checksums, burn to USB
- Boot from USB.
- Choose "Install Ubuntu" or "try ubuntu" at the menu.

This happens on the livecd, but also upon ugrading the system from a previous daily with a different kernel: my "old" daily booted and installed fine with kernel 4.12.0-12, then failed with same error once dist-upgraded.

On the faulty system I could test with, it seems to *only* happen over displayport/on a 4k screen (I don't own a non-4k displayport screen to test the 4k out).

summary: - System fails to boot with nouveau
+ Ubuntu Desktop ISO fails to boot with nouveau

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1723619

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

I cannot run apport-collect because the livecd doesn't boot :(

Chris Glass (tribaal) wrote :

The livecd boots fine on another machine of mine with a nvidia GT 640 (so, an older generation), fromt eh same daily and same image burn.

I am out of extra hardware to try this on.

Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1723619

tags: added: iso-testing
Chris Glass (tribaal) wrote :

Another machine (GTX950m on an integrated screen) boots as well.

From my limited understanding of the error message in the screenshot - perhaps the problem is that the screen is connected to displayport (the GT 640 uses HDMI).

The failing system has a 4k screen connected to displayport.

Chris Glass (tribaal) wrote :

Switching the faulty system to using HDMI + 1080p succeeded. Updating bug description.

summary: - Ubuntu Desktop ISO fails to boot with nouveau
+ Ubuntu Desktop ISO fails to boot with nouveau on a displayport
description: updated
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Andy Whitcroft (apw) wrote :

I think from you description you can install this system from the new ISO using hdmi. Would you therefore be able to check out some of the interim kernels between 4.12.0-12 and 4.13.0-16 to see which of those work. They should be available in the launchpad librarian. Don't forget to install both linux-image and linux-image-extra when testing.

Chris Glass (tribaal) wrote :

I've done further testing as you suggested. Here's a breakdown of my findings:

I tested the following kernel versions on 4k + displayport (installed from launchpad librarian, with their matching -extra package and an "update-grub"):

- 4.12.0-12: Boots. System seems to crash after a few seconds on the 4k desktop (cannot switch VTs). Maybe a separate bug however, at least it makes it to login+desktop.

- 4.12.0-13: Boots. Same symptoms as previous kernel (crashes after a few seconds on the desktop).

- 4.13.0-10: Does not boot. Fails with a slightly different output than initially reported (see below)

- 4.13.0-11: Does not boot. Same symptoms.

- 4.13.0-12: Does not boot. Same symptoms. Screenshot https://photos.app.goo.gl/ztJvK7Im2LVCK0ss1

- 4.13.0-15: Does not boot. Same symptoms.

- 4.13.0-16: Does not boot. Same symptoms.

None of the tested kernels exhibit any problematic behavior when the HDMI + 1080p screen is plugged in.

Kai-Heng Feng (kaihengfeng) wrote :

Please try mainline kernel 4.13-rc*, so we can know which one contains the first bad commit that causes the regression.

Chris Glass (tribaal) wrote :

Tested all mainline kernels I could find for 4.13-rc*:

- 4.13-rc1: DisplayPort setup boots. Login screen appears. Choosing the default option (wayland) results in broken desktop (https://photos.app.goo.gl/xisx9EgTKTFoLXmJ3). Choosing the xorg fallback works perfectly.

- 4.13-rc2: DisplayPort setup boots. Login screen appears. Choosing the default option (wayland) results in broken desktop. Choosing the xorg fallback works perfectly.

- 4.13-rc3: DisplayPort setup does NOT boot (https://photos.app.goo.gl/ASCw1tDI6fHD1i6w1)

- 4.13-rc4: DisplayPort setup does NOT boot.

(could not find a -rc5)

- 4.13-rc6: DisplayPort setup does NOT boot.

- 4.13-rc7: DisplayPort setup does NOT boot.

Kai-Heng Feng (kaihengfeng) wrote :

Then it's highly likely that the culprit is one of the following commits:

$ git log --pretty=oneline v4.13-rc2..v4.13-rc3 drivers/gpu/drm | grep -E 'dp|nouveau'
38bcb208f60924a031b9f809f7cd252ea4a94e5f drm/nouveau/bar/gf100: fix access to upper half of BAR2
a90e049cacd965dade4dae7263b4d3fd550e78b6 drm/nouveau/disp/nv50-: bump max chans to 21
746c842d1f64caad81d82f0054c0e063c8aa5399 drm/nouveau/kms: remove call to drm_crtc_vblank_off() during unload/suspend
4a5431af19bc52c4dd491e989543c66a52380f00 drm/nouveau/kms/nv50: update vblank state in response to modeset actions
587f577e0beb4d20ee60bac8d21134b4c5a9fd29 drm/nouveau/disp: add tv encoders to output resource mapping
13a86519202c5d119d83640d6f781f3181205d2c drm/nouveau/i2c/gf119-: add support for address-only transactions
967003bb2cae121d345fd807eb757d9422229713 drm/dp: Don't trust drm_dp_downstream_id()
c11a93f5fd9229dc7c8b90570c75cf70bc3976c2 drm/dp: Fix read pointer for drm_dp_downsteam_debug()

Kai-Heng Feng (kaihengfeng) wrote :

Do you know how you build kernel? I can build test kernels in deb format if you don't know how to make one.

tags: added: nouveau
Chris Glass (tribaal) wrote :

As discussed on IRC kaihengfeng and I will sync up tomorrow and bissect the problem together.

Kai-Heng Feng (kaihengfeng) wrote :

Try kernel here: http://people.canonical.com/~khfeng/lp1723619-1/

Build on commit
4a5431af19bc52c4dd491e989543c66a52380f00 drm/nouveau/kms/nv50: update vblank state in response to modeset actions

Changed in linux (Ubuntu):
assignee: nobody → Kai-Heng Feng (kaihengfeng)
Chris Glass (tribaal) wrote :

The kernel at #14 fails to boot.

Kai-Heng Feng (kaihengfeng) wrote :

But "blacklist=nouveau modprobe.blacklist=nouveau" can boot, right?

13a86519202c5d119d83640d6f781f3181205d2c drm/nouveau/i2c/gf119-: add support for address-only transactions:
http://people.canonical.com/~khfeng/lp1723619-2/

Chris Glass (tribaal) wrote :

Blacklisting nouveau allows a full boot (at a very low screen resolution, obviously).

Kernel at #16 fails to boot.

Andy Whitcroft (apw) on 2017-10-17
Changed in ubuntu-release-notes:
status: New → Confirmed
Kai-Heng Feng (kaihengfeng) wrote :

Built on commit prior to c11a93f5fd9229dc7c8b90570c75cf70bc3976c2:
http://people.canonical.com/~khfeng/lp1723619-3/

Kai-Heng Feng (kaihengfeng) wrote :

The previous one is good.

Built on commit 967003bb2cae121d345fd807eb757d9422229713 drm/dp: Don't trust drm_dp_downstream_id():
http://people.canonical.com/~khfeng/lp1723619-4/

Kai-Heng Feng (kaihengfeng) wrote :

Chris said 967003bb2cae121d345fd807eb757d9422229713 is good.
So the bad commit is
13a86519202c5d119d83640d6f781f3181205d2c drm/nouveau/i2c/gf119-: add support for address-only transactions

Kai-Heng Feng (kaihengfeng) wrote :

Mainline kernel with 13a86519202c5d119d83640d6f781f3181205d2c reverted:
http://people.canonical.com/~khfeng/lp1723619-revert/

Seth Forshee (sforshee) wrote :

Unfortunately that commit says that it fixes regressions from something else, so I think it's too risky to simply revert it.

Have you tested any 4.14-rc kernels to see if the problem still exists there?

amano (jyaku) wrote :

uname -r
4.14.0-rc5-lp1723619+revert

That's (sadly) not my issue (the one that cropped up on October 6th. It took me 4 times again to have Plymouth not freezing/GDM starting up.

Chris Glass (tribaal) wrote :

I just tested 4.15-rc5 and the problem still persists.

A (bit painful) workaround if you don't have a choice:

- Boot liveCD, enable "expert mode" (F6 at boot menu"

- Edit the boot line to have "blacklist=nouveau modprobe.blacklist=nouveau" BOTH before AND after the "--" at the end of the line. The second part should transfer to your installed system.

- Boot into the live session. You now have horribly low resolution but it should allow you to run ubiquity.

- Install Ubuntu normally using horrible graphics

- Reboot. Hopefully your kernel cmdline parameters were saved and therefore you will reboot into a happy (still horrible) fresh system. Congratulations.

- If your reboot crashes odds are parts of the command line was not carried over. No worries, spam ESC to get in the grub menu then edit the kernel's entry to have both "blacklist=nouveau" and "modprobe.blacklist=nouveau". Boot into your ugly new system as previous point.

- Install the nvidia proprietary drivers: "sudo ubuntu-drivers" then install the appropriate nvidia-XXX package (was nvidia-384 in my case but it depends on your hardware).

- Reboot into hopefully glorious native resolution.

Chris Glass (tribaal) wrote :

For the record since I forgot to comment previously: kernel in #21 (mainline kernel with suspicious patch reverted) does boot fine with nouveau.

(However just reverting it will probably cause other regressions)

As discovered when testing the latest release of Ubuntu with (mainline) trees, commit 13a86519202c5d119d83640d6f781f3181205d2c seems to introduce a regression when booting with a screen attached to displayport (the system otherwise boots fine with the same kernel using an HDMI output).

An original bug was filed with the Ubuntu bug tracker and contains information about bissection: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723619

A (set of) dmesg outputs can be found here: https://pastebin.ubuntu.com/25767385/

4.14-rc5 still exhibits this behavior on the tested system.

Please don't hesitate to reach out to me should you need more information - one affected system happens to be my main workstation.

Chris Glass (tribaal) wrote :

A bug has been filed upstream here: https://bugs.freedesktop.org/show_bug.cgi?id=103351

This appears to be a GM204.

Note that 13a8651920 fixed a regression for most people precisely of the type that it caused for you, i.e. DDC failing.

So it sounds like the address-only transactions were actually working well for you before (which basically is impossible since *size - 1 would have been 0xffffffff and have overwritten the whole ctrl), and these have now been broken.

This leads me to believe that a different bit is now the address-only transaction bit. In gm200_i2c_aux_xfer, we assume it's 0x100 (same as for GF119+). Ben, did you trace it on GM200+ separately?

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Mathew Hodson (mhodson) on 2017-10-22
Changed in ubuntu-release-notes:
status: Confirmed → Fix Released

This issues persists in 4.14-rc6

Let me know if there is anything I can do to help here.

Steve Langasek (vorlon) on 2017-10-26
description: updated

I just tested 4.14-rc7 and the issue is still present.

Download full text (6.5 KiB)

I ran into the same issue on a Lenovo T420 with 01:00.0 VGA compatible controller: NVIDIA Corporation GF119M [Quadro NVS 4200M] (rev a1).

As soon as I plug an external monitor via DP, Nouveau oopses at nvkm_dp_train_drive(). If I try to boot with the monitor plugged in the system doesn't boot at all. I attached the relevant log at the end.

The external monitor works perfectly on 4.12.9-300.fc26.x86_64 and breaks on 4.13.1-301.fc27.x86_64, the commit mentioned in this issue was introduced between these two version.

Dec 25 01:41:47 localhost gnome-shell[1662]: Failed to apply DRM plane transform 0: Invalid argument
Dec 25 01:41:47 localhost gnome-shell[1662]: Failed to apply DRM plane transform 0: Invalid argument
Dec 25 01:41:48 localhost gnome-shell[1662]: JS WARNING: [resource:///org/gnome/shell/ui/workspaceThumbnail.js 892]: reference to undefined property "_switchWorkspaceNotifyId"
Dec 25 01:41:48 localhost gsd-color[1301]: no xrandr-Dell Inc.-DELL U2415-7MT0167S57AS device found: Failed to find output xrandr-Dell Inc.-DELL U2415-7MT0167S57AS
Dec 25 01:41:48 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
Dec 25 01:41:48 localhost kernel: IP: (null)
Dec 25 01:41:48 localhost kernel: PGD 0
Dec 25 01:41:48 localhost kernel: P4D 0
Dec 25 01:41:48 localhost kernel:
Dec 25 01:41:48 localhost kernel: Oops: 0010 [#1] SMP
Dec 25 01:41:48 localhost kernel: Modules linked in: rfcomm fuse ccm nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support mei_wdt irqbypass intel_cstate intel_uncore intel_rapl_perf arc4 joydev wmi_bmof uvcvideo iwldvm btusb btrtl mac80211 btbcm btintel bluetooth videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
Dec 25 01:41:48 localhost kernel: videobuf2_core videodev snd_hda_codec_hdmi i2c_i801 thinkpad_acpi media iwlwifi lpc_ich snd_hda_codec_conexant snd_hda_codec_generic mei_me snd_hda_intel ecdh_generic mei snd_hda_codec cfg80211 snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd shpchp tpm_tis soundcore tpm_tis_core tpm rfkill dm_crypt nouveau crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw mxm_wmi i2c_algo_bit drm_kms_helper sdhci_pci ttm sdhci e1000e mmc_core drm ptp pps_core wmi video
Dec 25 01:41:48 localhost kernel: CPU: 0 PID: 68 Comm: kworker/u16:1 Not tainted 4.13.1-301.fc27.x86_64 #1
Dec 25 01:41:48 localhost kernel: Hardware name: LENOVO 4180PC4/4180PC4, BIOS 83ET76WW (1.46 ) 07/05/2013
Dec 25 01:41:48 localhost kernel: Workqueue: nvkm-disp gf119_disp_super [nouveau]
Dec 25 01:41:48 localhost kernel: task: ffff8f14901b8000 task.stack: ffffa0d8c1250000
Dec 25 01:41:48 localhost kernel: RIP: 0010: ...

Read more...

Nikita (nikital) wrote :

I stumbled upon this bug independently running Fedora 27 on Lenovo T420 with GeForce 119M, but because I plugged the monitor only after the boot I have a stacktrace with hopefully helpful dmesg log. I attached the log to the upstream bug report.

(In reply to Nikita from comment #4)
> I ran into the same issue on a Lenovo T420 with 01:00.0 VGA compatible
> controller: NVIDIA Corporation GF119M [Quadro NVS 4200M] (rev a1).

This is a different issue, what you are seeing with GF119 is:

 https://bugs.freedesktop.org/show_bug.cgi?id=103421

Ersin (ersin-ertan) wrote :

Today's 18.04 upgrade from 17.10, using 4k or 1080 monitor with display port or with HDMI for Radeon RX 460 booting until bios,then screen flashing with jumbled colors every few seconds, shutdown displays Ubuntu logo.
Removing graphics card and using onboard HDMI for both is same as above, but shows Ubuntu loading animation, then no signal.
Starting without monitor the plugging in 1080 shows attached picture between flashing

Download full text (8.3 KiB)

This still exists on 4.17-rc3 with my GTX 980 (GM204).
Reverting 13a86519202c5d119d83640d6f781f3181205d2c gets my DP monitor working again. (My HDMI monitor works in both cases)

The DP monitor works (low-resolution console) until nouveau is loaded, then it goes blank.

dmesg output on stock 4.17-rc3 source:
[ 6.790387] fb: switching to nouveaufb from VESA VGA
[ 6.800232] input: HDA Intel PCH Front Mic as /devices/pci0000:00/0000:00:1b.0/sound/card0/input11
[ 6.828574] Console: switching to colour dummy device 80x25
[ 6.828726] nouveau 0000:01:00.0: NVIDIA GM204 (124000a1)
[ 6.829091] input: HDA Intel PCH Rear Mic as /devices/pci0000:00/0000:00:1b.0/sound/card0/input12
[ 6.829150] input: HDA Intel PCH Line as /devices/pci0000:00/0000:00:1b.0/sound/card0/input13
[ 6.829192] input: HDA Intel PCH Line Out Front as /devices/pci0000:00/0000:00:1b.0/sound/card0/input14
[ 6.829235] input: HDA Intel PCH Line Out Surround as /devices/pci0000:00/0000:00:1b.0/sound/card0/input15
[ 6.829276] input: HDA Intel PCH Line Out CLFE as /devices/pci0000:00/0000:00:1b.0/sound/card0/input16
[ 6.829322] input: HDA Intel PCH Line Out Side as /devices/pci0000:00/0000:00:1b.0/sound/card0/input17
[ 6.829369] input: HDA Intel PCH Front Headphone as /devices/pci0000:00/0000:00:1b.0/sound/card0/input18
[ 6.841234] fuse init (API version 7.26)
[ 6.888971] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: data=ordered
[ 6.911531] nouveau 0000:01:00.0: bios: version 84.04.2f.00.4b
[ 6.913746] nouveau 0000:01:00.0: fb: 4096 MiB GDDR5
[ 6.913762] nouveau 0000:01:00.0: bus: MMIO write of 8000012c FAULT at 10eb14 [ IBUS ]
[ 6.976881] [TTM] Zone kernel: Available graphics memory: 8185092 kiB
[ 6.976886] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
[ 6.976889] [TTM] Initializing pool allocator
[ 6.976892] [TTM] Initializing DMA pool allocator
[ 6.976902] nouveau 0000:01:00.0: DRM: VRAM: 4096 MiB
[ 6.976904] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[ 6.976907] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[ 6.976909] nouveau 0000:01:00.0: DRM: DCB version 4.1
[ 6.976911] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f02 00020030
[ 6.976913] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000f00 00000000
[ 6.976915] nouveau 0000:01:00.0: DRM: DCB outp 02: 04811f96 04400020
[ 6.976916] nouveau 0000:01:00.0: DRM: DCB outp 03: 04011f92 00020020
[ 6.976918] nouveau 0000:01:00.0: DRM: DCB outp 04: 02822f76 04400020
[ 6.976920] nouveau 0000:01:00.0: DRM: DCB outp 05: 02022f72 00020020
[ 6.976922] nouveau 0000:01:00.0: DRM: DCB outp 06: 02033f62 00020010
[ 6.976923] nouveau 0000:01:00.0: DRM: DCB outp 07: 04844f86 04400010
[ 6.976925] nouveau 0000:01:00.0: DRM: DCB outp 08: 04044f82 00020010
[ 6.976927] nouveau 0000:01:00.0: DRM: DCB outp 15: 01df6ff8 00000000
[ 6.976929] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
[ 6.976930] nouveau 0000:01:00.0: DRM: DCB conn 01: 02000146
[ 6.976932] nouveau 0000:01:00.0: DRM: DCB conn 02: 00020246
[ 6.976933] nouveau 0000:01:00.0: DRM: DCB conn 03: 00010361
[ 6.976935] nouveau 0000:01:00.0: DRM: DCB co...

Read more...

Just something I noticed... does this help? (Obviously without a revert applied.)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/gm200.c b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/gm200.c
index a23c5f315221..ff94c6cb9a29 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/gm200.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/gm200.c
@@ -26,7 +26,7 @@

 static const struct nvkm_i2c_func
 gm200_i2c = {
- .pad_x_new = gf119_i2c_pad_x_new,
+ .pad_x_new = gm200_i2c_pad_x_new,
        .pad_s_new = gm200_i2c_pad_s_new,
        .aux = 8,
        .aux_stat = gk104_aux_stat,

Chris (cseilus) wrote :

Have a similar problem. Ubuntu 18.04 with nvidia gtx 970. After the screen locks, it's impossible to wake it up. Need to disconnect and reconnect power to get it working again.

(In reply to Ilia Mirkin from comment #7)

Hi Ilia,
I tried that change on a clean 4.17-rc4 kernel but it had no effect for me; identical nouveau prints in dmesg as before.

Thanks,
Allen

chrone (chrone81) wrote :

Could not install on GTX960 as well. Hopefully, 18.04.1 ISO installer on next July 26th will fix this issue with Nvidia DisplayPort.

Kai-Heng Feng (kaihengfeng) wrote :

Please try drm-tip [1] for this issue, there are some new runtime PM fixes for nouveau.

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/current/

Changed in linux (Ubuntu):
assignee: Kai-Heng Feng (kaihengfeng) → nobody
Ersin (ersin-ertan) wrote :

This problem is still present within the 18.10 beta release

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1141.

Changed in linux:
status: Confirmed → Unknown
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.