[witherspoon] removing module nouveau causes cpu hard lockup

Bug #1811470 reported by Manoj Iyer
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Won't Fix
Medium
Canonical Kernel Team
linux (Ubuntu)
Won't Fix
Medium
Canonical Kernel Team

Bug Description

Installed 18.04 and upgraded kernel to linux-image-generic-hwe-18.04 (4.18.0-13-generic #14~18.04.1-Ubuntu SMP Thu Dec 6 14:03:47). Copied gv100 firmware to /lib/firmware/nvidia, removed and reloaded nouveau (modprobe -r and modprobe). Tried to remove nouveau again using modprobe -r and I see the trace below. After a while the modprobe -r command completed the module was removed successfully.

[ 618.185258] nouveau 0035:04:00.0: DRM: failed to idle channel 1 [DRM]
[ 630.314599] watchdog: CPU 4 self-detected hard LOCKUP @ ioread32+0x2c/0x170
[ 630.314601] watchdog: CPU 4 TB:415266697100, last heartbeat TB:410146341428 (10000ms ago)
[ 630.314601] Modules linked in: nouveau(-) ofpart at24 cmdlinepart uio_pdrv_genirq ipmi_powernv ipmi_devintf powernv_flash uio mtd opal_prd ipmi_msghandler ibmpowernv vmx_crypto sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_vpmsum ahci crc32c_vpmsum tg3 libahci drm_panel_orientation_quirks [last unloaded: nouveau]
[ 630.314629] CPU: 4 PID: 6623 Comm: modprobe Not tainted 4.18.0-13-generic #14~18.04.1-Ubuntu
[ 630.314630] NIP: c000000000729afc LR: c00800000f766990 CTR: c000000000729ad0
[ 630.314630] REGS: c000003fffd87d80 TRAP: 0900 Not tainted (4.18.0-13-generic)
[ 630.314631] MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44002824 XER: 00000000
[ 630.314638] CFAR: c00800000f7b2ac4 IRQMASK: 1
[ 630.314639] GPR00: c00800000f7ad3f8 c000003f6c1bb850 c00000000178c200 c00c00008e7e0000
[ 630.314642] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000008
[ 630.314644] GPR08: c000003fffffa800 000000007fffffff c00c00008e7e0000 c00800000f7b2ab0
[ 630.314647] GPR12: c000000000729ad0 c000003fffffa800 0000000000000001 0000000000000000
[ 630.314649] GPR16: 0000000000000000 0000000000000000 0000039a7cc60074 0000039a7cc3c978
[ 630.314652] GPR20: 0000039a7cc60070 0000000000000000 00007ffff8448f88 0000000000000000
[ 630.314654] GPR24: 0000039a8bf40ee8 0000000000000000 c00800000f82dca0 c000003fed072098
[ 630.314657] GPR28: c000203968a2c290 c000203968840908 c000203968840900 0000000000000001
[ 630.314660] NIP [c000000000729afc] ioread32+0x2c/0x170
[ 630.314660] LR [c00800000f766990] nouveau_bo_rd32+0x48/0x70 [nouveau]
[ 630.314661] Call Trace:
[ 630.314662] [c000003f6c1bb850] [c000003f6c1bb890] 0xc000003f6c1bb890 (unreliable)
[ 630.314663] [c000003f6c1bb880] [c000003f6c1bb8b0] 0xc000003f6c1bb8b0
[ 630.314664] [c000003f6c1bb8a0] [c00800000f7ad3f8] nv84_fence_read+0x40/0x60 [nouveau]
[ 630.314666] [c000003f6c1bb8c0] [c00800000f7aab3c] nouveau_fence_update+0x44/0x100 [nouveau]
[ 630.314667] [c000003f6c1bb900] [c00800000f7ab5d8] nouveau_fence_done+0x100/0x180 [nouveau]
[ 630.314668] [c000003f6c1bb940] [c00800000f7ab8c8] nouveau_fence_wait+0x90/0x150 [nouveau]
[ 630.314669] [c000003f6c1bb970] [c00800000f7a8f90] nouveau_channel_idle+0xd8/0x140 [nouveau]
[ 630.314670] [c000003f6c1bba00] [c00800000f75f75c] nouveau_accel_fini+0x74/0xe0 [nouveau]
[ 630.314671] [c000003f6c1bba30] [c00800000f75f8e8] nouveau_drm_unload+0x60/0x130 [nouveau]
[ 630.314672] [c000003f6c1bba60] [c008000013b1b118] drm_dev_unregister+0x70/0x160 [drm]
[ 630.314673] [c000003f6c1bbaa0] [c008000013b1b3f0] drm_put_dev+0x48/0xa0 [drm]
[ 630.314675] [c000003f6c1bbb10] [c00800000f760f8c] nouveau_drm_device_remove+0x54/0x90 [nouveau]
[ 630.314676] [c000003f6c1bbb50] [c0000000007a746c] pci_device_remove+0x6c/0x120
[ 630.314677] [c000003f6c1bbb90] [c0000000008a7014] device_release_driver_internal+0x294/0x380
[ 630.314678] [c000003f6c1bbbe0] [c0000000008a719c] driver_detach+0x7c/0x140
[ 630.314679] [c000003f6c1bbc20] [c0000000008a5304] bus_remove_driver+0x84/0x170
[ 630.314680] [c000003f6c1bbc90] [c0000000008a7ef8] driver_unregister+0x48/0x90
[ 630.314681] [c000003f6c1bbd00] [c0000000007a52b8] pci_unregister_driver+0x38/0x150
[ 630.314682] [c000003f6c1bbd50] [c00800000f7af048] nouveau_drm_exit+0x30/0xfc08 [nouveau]
[ 630.314683] [c000003f6c1bbd70] [c0000000001e6b14] sys_delete_module+0x1d4/0x310
[ 630.314684] [c000003f6c1bbe30] [c00000000000b288] system_call+0x5c/0x70
[ 630.314685] Instruction dump:
[ 630.314686] 60000000 3c4c0106 38422730 fbe1fff8 f821ffd1 3920ffff 79290060 7c6a1b78
[ 630.314690] 7fa34840 409d0030 7c0004ac 83e30000 <0c1f0000> 4c00012c 2f9fffff 7bff0020
[ 633.222292] nouveau 0035:04:00.0: DRM: failed to idle channel 0 [DRM]
[ 633.222397] watchdog: CPU 4 became unstuck TB:416755750721
[ 633.222451] CPU: 4 PID: 37 Comm: ksoftirqd/4 Not tainted 4.18.0-13-generic #14~18.04.1-Ubuntu
[ 633.222452] Call Trace:
[ 633.222455] [c000003fe6af3910] [c000000000d4950c] dump_stack+0xb0/0xf4 (unreliable)
[ 633.222459] [c000003fe6af3950] [c00000000002f75c] wd_smp_clear_cpu_pending+0x41c/0x430
[ 633.222462] [c000003fe6af3a00] [c00000000002fec4] wd_timer_fn+0x64/0x3f0
[ 633.222465] [c000003fe6af3ac0] [c0000000001bfae0] call_timer_fn+0x50/0x1c0
[ 633.222467] [c000003fe6af3b40] [c0000000001bfd88] expire_timers+0x138/0x1f0
[ 633.222470] [c000003fe6af3bb0] [c0000000001bff48] run_timer_softirq+0x108/0x270
[ 633.222473] [c000003fe6af3c50] [c000000000d6bd58] __do_softirq+0x158/0x3d4
[ 633.222475] [c000003fe6af3d40] [c00000000011c1c4] run_ksoftirqd+0x64/0x90
[ 633.222478] [c000003fe6af3d60] [c000000000149130] smpboot_thread_fn+0x250/0x290
[ 633.222481] [c000003fe6af3dc0] [c000000000142d98] kthread+0x1a8/0x1b0
[ 633.222484] [c000003fe6af3e30] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80
[ 648.259266] nouveau 0035:03:00.0: DRM: failed to idle channel 1 [DRM]
[ 663.296153] nouveau 0035:03:00.0: DRM: failed to idle channel 0 [DRM]
[ 673.329216] watchdog: CPU 4 self-detected hard LOCKUP @ ioread32+0x2c/0x170
[ 673.329217] watchdog: CPU 4 TB:437290181288, last heartbeat TB:432168104943 (10004ms ago)
[ 673.329218] Modules linked in: nouveau(-) ofpart at24 cmdlinepart uio_pdrv_genirq ipmi_powernv ipmi_devintf powernv_flash uio mtd opal_prd ipmi_msghandler ibmpowernv vmx_crypto sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_vpmsum ahci crc32c_vpmsum tg3 libahci drm_panel_orientation_quirks [last unloaded: nouveau]
[ 673.329245] CPU: 4 PID: 6623 Comm: modprobe Not tainted 4.18.0-13-generic #14~18.04.1-Ubuntu
[ 673.329246] NIP: c000000000729afc LR: c00800000f766990 CTR: c000000000729ad0
[ 673.329247] REGS: c000003fffd87d80 TRAP: 0900 Not tainted (4.18.0-13-generic)
[ 673.329247] MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44002884 XER: 00000000
[ 673.329254] CFAR: c00800000f7b2ac4 IRQMASK: 1
[ 673.329255] GPR00: c00800000f7ad3f8 c000003f6c1bb850 c00000000178c200 c00c00008b240010
[ 673.329258] GPR04: 0000000000000010 0000000000000000 0000000000000001 0000000000000008
[ 673.329260] GPR08: c000003fffffa800 000000007fffffff c00c00008b240010 c00800000f7b2ab0
[ 673.329263] GPR12: c000000000729ad0 c000003fffffa800 0000000000000001 0000000000000000
[ 673.329265] GPR16: 0000000000000000 0000000000000000 0000039a7cc60074 0000039a7cc3c978
[ 673.329268] GPR20: 0000039a7cc60070 0000000000000000 00007ffff8448f88 0000000000000000
[ 673.329270] GPR24: 0000039a8bf40ee8 0000000000000000 c00800000f82dca0 c000003fed064098
[ 673.329273] GPR28: c000003fed6d2290 c000003fbd4ffe08 c000003fbd4ffe00 0000000000000000
[ 673.329275] NIP [c000000000729afc] ioread32+0x2c/0x170
[ 673.329276] LR [c00800000f766990] nouveau_bo_rd32+0x48/0x70 [nouveau]
[ 673.329277] Call Trace:
[ 673.329277] [c000003f6c1bb850] [c000003f6c1bb890] 0xc000003f6c1bb890 (unreliable)
[ 673.329279] [c000003f6c1bb880] [c000003f6c1bb8b0] 0xc000003f6c1bb8b0
[ 673.329280] [c000003f6c1bb8a0] [c00800000f7ad3f8] nv84_fence_read+0x40/0x60 [nouveau]
[ 673.329281] [c000003f6c1bb8c0] [c00800000f7aab3c] nouveau_fence_update+0x44/0x100 [nouveau]
[ 673.329282] [c000003f6c1bb900] [c00800000f7ab5d8] nouveau_fence_done+0x100/0x180 [nouveau]
[ 673.329284] [c000003f6c1bb940] [c00800000f7ab8c8] nouveau_fence_wait+0x90/0x150 [nouveau]
[ 673.329285] [c000003f6c1bb970] [c00800000f7a8f90] nouveau_channel_idle+0xd8/0x140 [nouveau]
[ 673.329286] [c000003f6c1bba00] [c00800000f75f714] nouveau_accel_fini+0x2c/0xe0 [nouveau]
[ 673.329287] [c000003f6c1bba30] [c00800000f75f8e8] nouveau_drm_unload+0x60/0x130 [nouveau]
[ 673.329288] [c000003f6c1bba60] [c008000013b1b118] drm_dev_unregister+0x70/0x160 [drm]
[ 673.329289] [c000003f6c1bbaa0] [c008000013b1b3f0] drm_put_dev+0x48/0xa0 [drm]
[ 673.329290] [c000003f6c1bbb10] [c00800000f760f8c] nouveau_drm_device_remove+0x54/0x90 [nouveau]
[ 673.329291] [c000003f6c1bbb50] [c0000000007a746c] pci_device_remove+0x6c/0x120
[ 673.329292] [c000003f6c1bbb90] [c0000000008a7014] device_release_driver_internal+0x294/0x380
[ 673.329293] [c000003f6c1bbbe0] [c0000000008a719c] driver_detach+0x7c/0x140
[ 673.329294] [c000003f6c1bbc20] [c0000000008a5304] bus_remove_driver+0x84/0x170
[ 673.329296] [c000003f6c1bbc90] [c0000000008a7ef8] driver_unregister+0x48/0x90
[ 673.329297] [c000003f6c1bbd00] [c0000000007a52b8] pci_unregister_driver+0x38/0x150
[ 673.329298] [c000003f6c1bbd50] [c00800000f7af048] nouveau_drm_exit+0x30/0xfc08 [nouveau]
[ 673.329299] [c000003f6c1bbd70] [c0000000001e6b14] sys_delete_module+0x1d4/0x310
[ 673.329300] [c000003f6c1bbe30] [c00000000000b288] system_call+0x5c/0x70
[ 673.329301] Instruction dump:
[ 673.329302] 60000000 3c4c0106 38422730 fbe1fff8 f821ffd1 3920ffff 79290060 7c6a1b78
[ 673.329306] 7fa34840 409d0030 7c0004ac 83e30000 <0c1f0000> 4c00012c 2f9fffff 7bff0020
[ 678.329001] nouveau 0004:05:00.0: DRM: failed to idle channel 1 [DRM]
[ 678.329115] watchdog: CPU 4 became unstuck TB:439850390082
[ 678.329165] CPU: 4 PID: 37 Comm: ksoftirqd/4 Not tainted 4.18.0-13-generic #14~18.04.1-Ubuntu
[ 678.329166] Call Trace:
[ 678.329169] [c000003fe6af3910] [c000000000d4950c] dump_stack+0xb0/0xf4 (unreliable)
[ 678.329173] [c000003fe6af3950] [c00000000002f75c] wd_smp_clear_cpu_pending+0x41c/0x430
[ 678.329176] [c000003fe6af3a00] [c00000000002fec4] wd_timer_fn+0x64/0x3f0
[ 678.329178] [c000003fe6af3ac0] [c0000000001bfae0] call_timer_fn+0x50/0x1c0
[ 678.329180] [c000003fe6af3b40] [c0000000001bfd88] expire_timers+0x138/0x1f0
[ 678.329182] [c000003fe6af3bb0] [c0000000001bff48] run_timer_softirq+0x108/0x270
[ 678.329184] [c000003fe6af3c50] [c000000000d6bd58] __do_softirq+0x158/0x3d4
[ 678.329186] [c000003fe6af3d40] [c00000000011c1c4] run_ksoftirqd+0x64/0x90
[ 678.329188] [c000003fe6af3d60] [c000000000149130] smpboot_thread_fn+0x250/0x290
[ 678.329191] [c000003fe6af3dc0] [c000000000142d98] kthread+0x1a8/0x1b0
[ 678.329193] [c000003fe6af3e30] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80
[ 693.373756] nouveau 0004:05:00.0: DRM: failed to idle channel 0 [DRM]
[ 708.414472] nouveau 0004:04:00.0: DRM: failed to idle channel 1 [DRM]
[ 723.455144] nouveau 0004:04:00.0: DRM: failed to idle channel 0 [DRM]

Manoj Iyer (manjo)
description: updated
bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-174680 severity-high targetmilestone-inin18043
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
assignee: nobody → bugproxy (bugproxy)
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Next step is for Manoj to check with Canonical kernel team to see if they've seen this elsewhere.

Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
importance: High → Medium
Manoj Iyer (manjo)
Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
no longer affects: linux
Changed in ubuntu-power-systems:
assignee: bugproxy (bugproxy) → Canonical Kernel Team (canonical-kernel-team)
status: New → Triaged
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1811470

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: cosmic
Manoj Iyer (manjo)
Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
Changed in ubuntu-power-systems:
status: Triaged → Won't Fix
bugproxy (bugproxy)
tags: added: targetmilestone-inin---
removed: targetmilestone-inin18043
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.