oops in nvkm_udevice_info() [nouveau]

Bug #1898130 reported by dann frazier
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
dann frazier

Bug Description

The 5.8.0-20.21 reliably trips an Oops on startup when initializing the nouveau driver.

[ 221.032875] nouveau 0000:07:00.0: unknown chipset (170000a1)
[ 221.039893] nouveau 0000:07:00.0: unknown chipset (170000a1)
[ 221.046328] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 221.054098] #PF: supervisor read access in kernel mode
[ 221.059830] #PF: error_code(0x0000) - not-present page
[ 221.065559] PGD 0 P4D 0
[ 221.068383] Oops: 0000 [#1] SMP NOPTI
[ 221.072469] CPU: 32 PID: 1627 Comm: kworker/32:1 Not tainted 5.8.0-20-generic #21-Ubuntu
[ 221.081497] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 0.25 06/30/2020
[ 221.090729] Workqueue: events work_for_cpu_fn
[ 221.095865] RIP: 0010:nvkm_udevice_info+0x180/0x340 [nouveau]
[ 221.102277] Code: 10 49 89 45 08 4d 85 c9 74 10 48 85 c0 74 0b 41 8b 51 70 48 29 d0 49 89 45 10 49 8b 86 c0 00 00 00 49 8d 7d 18 ba 10 00 00 00 <48> 8b 30 e8 58 bb ff d5 49 8b 76 28 49 8d 7d 28 ba 40 00 00 00 e8
[ 221.123232] RSP: 0018:ffffa7c9dd153b70 EFLAGS: 00010246
[ 221.129058] RAX: 0000000000000000 RBX: 0000000000000068 RCX: 00000000000000c6
[ 221.137020] RDX: 0000000000000010 RSI: ffff994808386320 RDI: ffff994808386338
[ 221.144979] RBP: ffffa7c9dd153ba0 R08: 0000000000000000 R09: 0000000000000000
[ 221.152939] R10: 0000000000000088 R11: 0000000000000000 R12: ffff994808245680
[ 221.160900] R13: ffff994808386320 R14: ffff99483377f800 R15: 0000000000000000
[ 221.168862] FS: 0000000000000000(0000) GS:ffff99484e000000(0000) knlGS:0000000000000000
[ 221.177890] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 221.184299] CR2: 0000000000000000 CR3: 000000b23ac0a000 CR4: 0000000000340ee0
[ 221.192259] Call Trace:
[ 221.195049] ? nvkm_object_init+0x8d/0x110 [nouveau]
[ 221.200669] nvkm_udevice_mthd+0x51/0xb0 [nouveau]
[ 221.206073] nvkm_object_mthd+0x1a/0x30 [nouveau]
[ 221.211373] nvkm_ioctl_mthd+0x65/0x70 [nouveau]
[ 221.216574] nvkm_ioctl+0xf0/0x190 [nouveau]
[ 221.221418] nvkm_client_ioctl+0x12/0x20 [nouveau]
[ 221.226818] nvif_object_ioctl+0x4e/0x60 [nouveau]
[ 221.232213] nvif_object_mthd+0x9f/0x150 [nouveau]
[ 221.237609] ? nvif_object_init+0x10a/0x1a0 [nouveau]
[ 221.243294] nvif_device_init+0x4f/0x60 [nouveau]
[ 221.248618] nouveau_cli_init+0x199/0x450 [nouveau]
[ 221.254127] nouveau_drm_device_init+0x54/0x2d0 [nouveau]
[ 221.260215] nouveau_drm_probe+0x132/0x1f0 [nouveau]
[ 221.265747] local_pci_probe+0x48/0x80
[ 221.269926] work_for_cpu_fn+0x1a/0x30
[ 221.274107] process_one_work+0x1e8/0x3b0
[ 221.278577] worker_thread+0x218/0x370
[ 221.282759] kthread+0x12f/0x150
[ 221.286357] ? process_one_work+0x3b0/0x3b0
[ 221.291023] ? __kthread_bind_mask+0x70/0x70
[ 221.295788] ret_from_fork+0x22/0x30
[ 221.299773] Modules linked in: nouveau(+) mxm_wmi wmi video nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd amd_energy kvm_amd kvm efi_pstore rapl ipmi_ssif input_leds cdc_ether usbnet mii ccp k10temp acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ses enclosure hid_generic usbhid uas hid usb_storage ib_uverbs ib_core crct10dif_pclmul ast crc32_pclmul drm_vram_helper ghash_clmulni_intel drm_ttm_helper aesni_intel ttm drm_kms_helper crypto_simd syscopyarea sysfillrect cryptd glue_helper sysimgblt fb_sys_fops mlx5_core igb cec mpt3sas pci_hyperv_intf dca raid_class nvme rc_core i2c_algo_bit scsi_transport_sas tls xhci_pci nvme_core mlxfw drm xhci_pci_renesas i2c_piix4
[ 221.389734] CR2: 0000000000000000
[ 221.393431] ---[ end trace f0b36cab4e2bf100 ]---
[ 222.034293] RIP: 0010:nvkm_udevice_info+0x180/0x340 [nouveau]
[ 222.034300] Code: 10 49 89 45 08 4d 85 c9 74 10 48 85 c0 74 0b 41 8b 51 70 48 29 d0 49 89 45 10 49 8b 86 c0 00 00 00 49 8d 7d 18 ba 10 00 00 00 <48> 8b 30 e8 58 bb ff d5 49 8b 76 28 49 8d 7d 28 ba 40 00 00 00 e8
[ 222.061663] RSP: 0018:ffffa7c9dd153b70 EFLAGS: 00010246
[ 222.067491] RAX: 0000000000000000 RBX: 0000000000000068 RCX: 00000000000000c6
[ 222.075450] RDX: 0000000000000010 RSI: ffff994808386320 RDI: ffff994808386338
[ 222.083409] RBP: ffffa7c9dd153ba0 R08: 0000000000000000 R09: 0000000000000000
[ 222.091369] R10: 0000000000000088 R11: 0000000000000000 R12: ffff994808245680
[ 222.099329] R13: ffff994808386320 R14: ffff99483377f800 R15: 0000000000000000
[ 222.107290] FS: 0000000000000000(0000) GS:ffff99484e000000(0000) knlGS:0000000000000000
[ 222.116316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 222.122723] CR2: 0000000000000000 CR3: 000000b23ac0a000 CR4: 0000000000340ee0
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Oct 1 22:36 seq
 crw-rw---- 1 root audio 116, 33 Oct 1 22:36 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu48
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.10
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: NVIDIA DGXA100 920-23687-2530-000
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-20-generic root=UUID=57d67f63-474e-4352-88e2-c29fd5276a03 ro console=ttyS1,115200n8 iommu=pt modprobe.blacklist=nouveau
ProcVersionSignature: Ubuntu 5.8.0-20.21-generic 5.8.10
RelatedPackageVersions:
 linux-restricted-modules-5.8.0-20-generic N/A
 linux-backports-modules-5.8.0-20-generic N/A
 linux-firmware 1.190
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: groovy uec-images
Uname: Linux 5.8.0-20-generic x86_64
UpgradeStatus: Upgraded to groovy on 2020-09-30 (1 days ago)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 06/30/2020
dmi.bios.release: 0.25
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 0.25
dmi.board.asset.tag: 00000000000000000000000000000000
dmi.board.name: DGXA100
dmi.board.vendor: NVIDIA
dmi.board.version: 555.Z5601.D001
dmi.chassis.asset.tag: 00000000000000000000000000000000
dmi.chassis.type: 23
dmi.chassis.vendor: NVIDIA
dmi.chassis.version: 920-23687-2530-000
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr0.25:bd06/30/2020:br0.25:svnNVIDIA:pnDGXA100920-23687-2530-000:pvrv1.0:rvnNVIDIA:rnDGXA100:rvr555.Z5601.D001:cvnNVIDIA:ct23:cvr920-23687-2530-000:
dmi.product.family: DGX A100
dmi.product.name: DGXA100 920-23687-2530-000
dmi.product.sku: Default string
dmi.product.version: v1.0
dmi.sys.vendor: NVIDIA

Revision history for this message
dann frazier (dannf) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1898130

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: groovy
Revision history for this message
dann frazier (dannf) wrote : CRDA.txt

apport information

description: updated
tags: added: apport-collected uec-images
description: updated
Revision history for this message
dann frazier (dannf) wrote : CurrentDmesg.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : Lspci.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : Lspci-vt.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : Lsusb.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : Lsusb-t.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : Lsusb-v.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : ProcInterrupts.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : ProcModules.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : UdevDb.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : WifiSyslog.txt

apport information

Revision history for this message
dann frazier (dannf) wrote : acpidump.txt

apport information

Revision history for this message
dann frazier (dannf) wrote :
Download full text (4.7 KiB)

Also impacts latest upstream:

[ 213.131657] nouveau 0000:07:00.0: unknown chipset (170000a1)
[ 213.138547] nouveau 0000:07:00.0: unknown chipset (170000a1)
[ 213.144938] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 213.152704] #PF: supervisor read access in kernel mode
[ 213.158433] #PF: error_code(0x0000) - not-present page
[ 213.164162] PGD 0 P4D 0
[ 213.166985] Oops: 0000 [#1] SMP NOPTI
[ 213.171068] CPU: 32 PID: 206 Comm: kworker/32:0 Not tainted 5.9.0-rc7+ #1
[ 213.178639] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 0.25 06/30/2020
[ 213.187866] Workqueue: events work_for_cpu_fn
[ 213.192761] RIP: 0010:nvkm_udevice_mthd+0x1ed/0x7d0 [nouveau]
[ 213.199170] Code: 10 49 89 47 08 4d 85 c9 74 10 48 85 c0 74 0b 41 8b 51 70 48 29 d0 49 89 47 10 49 8b 86 c0 00 00 00 49 8d 7f 18 ba 10 00 00 00 <48> 8b 30 e8 6b 91 89 c0 49 8b 76 28 49 8d 7f 28 ba 40 00 00 00 e8
[ 213.220121] RSP: 0018:ffffae0619d47b48 EFLAGS: 00010246
[ 213.225948] RAX: 0000000000000000 RBX: ffff9cefab819580 RCX: 00000000000000c6
[ 213.233907] RDX: 0000000000000010 RSI: 0000000000000000 RDI: ffff9cef988f0578
[ 213.241864] RBP: ffffae0619d47b80 R08: 0000000000000000 R09: 0000000000000000
[ 213.249813] R10: 0000000000000088 R11: 0000000001320122 R12: 0000000000000000
[ 213.257762] R13: 0000000000000068 R14: ffff9cef6107c400 R15: ffff9cef988f0560
[ 213.265721] FS: 0000000000000000(0000) GS:ffff9cefce000000(0000) knlGS:0000000000000000
[ 213.274747] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 213.281153] CR2: 0000000000000000 CR3: 0000007f3019c000 CR4: 0000000000350ee0
[ 213.289104] Call Trace:
[ 213.291854] ? nvkm_object_insert+0x6f/0x80 [nouveau]
[ 213.297509] nvkm_object_mthd+0x1a/0x30 [nouveau]
[ 213.302773] nvkm_ioctl_mthd+0x65/0x70 [nouveau]
[ 213.307940] nvkm_ioctl+0xf0/0x190 [nouveau]
[ 213.312735] nvkm_client_ioctl+0x12/0x20 [nouveau]
[ 213.318097] nvif_object_ioctl+0x4f/0x60 [nouveau]
[ 213.323460] nvif_object_mthd+0x9f/0x150 [nouveau]
[ 213.328822] ? nvif_object_ctor+0x14b/0x1d0 [nouveau]
[ 213.334473] nvif_device_ctor+0x61/0x70 [nouveau]
[ 213.339749] nouveau_cli_init+0x1a3/0x460 [nouveau]
[ 213.345215] ? nouveau_drm_device_init+0x3e/0x780 [nouveau]
[ 213.351454] nouveau_drm_device_init+0x77/0x780 [nouveau]
[ 213.357479] ? pci_read_config_word+0x27/0x40
[ 213.362337] ? pci_enable_device_flags+0x14f/0x170
[ 213.367705] nouveau_drm_probe+0x132/0x1f0 [nouveau]
[ 213.373241] local_pci_probe+0x48/0x80
[ 213.377419] work_for_cpu_fn+0x1a/0x30
[ 213.381598] process_one_work+0x1e8/0x3b0
[ 213.386068] worker_thread+0x53/0x420
[ 213.390149] kthread+0x12f/0x150
[ 213.393745] ? process_one_work+0x3b0/0x3b0
[ 213.398406] ? __kthread_bind_mask+0x70/0x70
[ 213.403169] ret_from_fork+0x22/0x30
[ 213.407153] Modules linked in: nouveau(+) mxm_wmi wmi video nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua amd64_edac_mod edac_mce_amd amd_energy kvm_amd kvm rapl efi_pstore ipmi_ssif input_leds cdc_ether usbnet mii ccp k10temp acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_...

Read more...

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
dann frazier (dannf) wrote :

focal didn't have this problem, so I bisected v5.4..v5.8 to discover when it was introduced and hit:

commit 24d5ff40a732633dceab68c6559ba723784f4a68
Author: Karol Herbst <email address hidden>
Date: Tue Apr 28 18:54:02 2020 +0200

    drm/nouveau/device: rework mmio mapping code to get rid of second map

    Fixes warnings on GPUs with smaller a smaller mmio region like vGPUs.

    Signed-off-by: Karol Herbst <email address hidden>
    Signed-off-by: Ben Skeggs <email address hidden>

Revision history for this message
dann frazier (dannf) wrote :
Revision history for this message
dann frazier (dannf) wrote :
Changed in linux (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → dann frazier (dannf)
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (10.1 KiB)

This bug was fixed in the package linux - 5.8.0-22.23

---------------
linux (5.8.0-22.23) groovy; urgency=medium

  * groovy/linux: 5.8.0-22.23 -proposed tracker (LP: #1899099)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * oops in nvkm_udevice_info() [nouveau] (LP: #1898130)
    - drm/nouveau/device: return error for unknown chipsets

  * python3-venv is gone (LP: #1896801)
    - SAUCE: doc: remove python3-venv dependency

  * *-tools-common packages descriptions have typo "PGKVER" (LP: #1898903)
    - [Packaging] Fix typo in -tools template s/PGKVER/PKGVER/

  * Enable brightness control on HP DreamColor panel (LP: #1898865)
    - SAUCE: drm/i915/dpcd_bl: Skip testing control capability with force DPCD
      quirk
    - SAUCE: drm/dp: HP DreamColor panel brigntness fix

  * Groovy update: v5.8.14 upstream stable release (LP: #1898853)
    - io_uring: always delete double poll wait entry on match
    - btrfs: fix filesystem corruption after a device replace
    - mmc: sdhci: Workaround broken command queuing on Intel GLK based IRBIS
      models
    - USB: gadget: f_ncm: Fix NDP16 datagram validation
    - Revert "usbip: Implement a match function to fix usbip"
    - usbcore/driver: Fix specific driver selection
    - usbcore/driver: Fix incorrect downcast
    - usbcore/driver: Accommodate usbip
    - gpio: siox: explicitly support only threaded irqs
    - gpio: mockup: fix resource leak in error path
    - gpio: tc35894: fix up tc35894 interrupt configuration
    - gpio: amd-fch: correct logic of GPIO_LINE_DIRECTION
    - clk: samsung: Keep top BPLL mux on Exynos542x enabled
    - clk: socfpga: stratix10: fix the divider for the emac_ptp_free_clk
    - scsi: iscsi: iscsi_tcp: Avoid holding spinlock while calling getpeername()
    - i2c: i801: Exclude device from suspend direct complete optimization
    - Input: i8042 - add nopnp quirk for Acer Aspire 5 A515
    - iio: adc: qcom-spmi-adc5: fix driver name
    - ftrace: Move RCU is watching check after recursion check
    - tracing: Fix trace_find_next_entry() accounting of temp buffer size
    - memstick: Skip allocating card when removing host
    - drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config
    - xen/events: don't use chip_data for legacy IRQs
    - clocksource/drivers/timer-gx6605s: Fixup counter reload
    - vboxsf: Fix the check for the old binary mount-arguments struct
    - mt76: mt7915: use ieee80211_free_txskb to free tx skbs
    - libbpf: Remove arch-specific include path in Makefile
    - drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices
    - Revert "wlcore: Adding suppoprt for IGTK key in wlcore driver"
    - drm/sun4i: mixer: Extend regmap max_register
    - hv_netvsc: Cache the current data path to avoid duplicate call and message
    - net: dec: de2104x: Increase receive ring size for Tulip
    - rndis_host: increase sleep time in the query-response loop
    - nvme-pci: disable the write zeros command for Intel 600P/P3100
    - nvme-core: get/put ctrl and transport module in nvme_dev_open/release()
    - fuse: fix the ->direct_IO() treatment of iov_iter
    - drivers/net/wan/lapbether: Make skb->protocol co...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers