Stalls on CPUs/tasks on VisionFive 2 with external GPU

Bug #2039782 reported by Heinrich Schuchardt
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-riscv (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I am trying to install Ubuntu Mantic on the StarFive VisionFive 2 1.3B board using https://cdimage.ubuntu.com/releases/23.10/release/ubuntu-23.10-live-server-riscv64.img.gz

I have connected an Nvidia GT710 graphics card to the NVMe connector and see rcu_sched stalls. I have not observed this behavior on StarFive VisionFive 2 1.3B boards without an external GPU.

The U-Boot installed on SPI flash is
https://launchpad.net/~ubuntu-risc-v-team/+archive/ubuntu/release/+files/u-boot-starfive_2023.09.22-next-5d2fae79c7d6-0ubuntu1~ppa5_riscv64.deb

[ 93.102845] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 93.114452] rcu: 0-...!: (1 GPs behind) idle=c69c/1/0x4000000000000002 softirq=2431/2431 fqs=41
[ 93.128724] rcu: (detected by 2, t=15008 jiffies, g=4353, q=2369 ncpus=4)
[ 93.140996] Task dump for CPU 0:
[ 93.149549] task:swapper/0 state:R running task stack:0 pid:0 ppid:0 flags:0x00000000
[ 93.164907] Call Trace:
[ 93.172715] [<ffffffff80ce749c>] __schedule+0x27a/0x82e
[ 93.183385] rcu: rcu_sched kthread timer wakeup didn't happen for 14937 jiffies! g4353 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x200
[ 93.200202] rcu: Possible timer handling issue on cpu=0 timer-softirq=890
[ 93.212733] rcu: rcu_sched kthread starved for 14945 jiffies! g4353 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=0
[ 93.228777] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 93.243573] rcu: RCU grace-period kthread stack dump:
[ 93.254522] task:rcu_sched state:R stack:0 pid:15 ppid:2 flags:0x00000000
[ 93.268895] Call Trace:
[ 93.277340] [<ffffffff80ce749c>] __schedule+0x27a/0x82e
[ 93.288646] [<ffffffff80ce7a9e>] schedule+0x4e/0xde
[ 93.299623] [<ffffffff80ced874>] schedule_timeout+0x8c/0x15e
[ 93.311380] [<ffffffff800b0e26>] rcu_gp_fqs_loop+0x2fc/0x3d4
[ 93.323170] [<ffffffff800b3322>] rcu_gp_kthread+0x11a/0x142
[ 93.334901] [<ffffffff80044fe6>] kthread+0xc4/0xe4
[ 93.345833] [<ffffffff80003f82>] ret_from_fork+0xe/0x20

Revision history for this message
Heinrich Schuchardt (xypron) wrote :
Revision history for this message
Opvolger (opvolger) wrote :
Download full text (12.9 KiB)

I have the same problem

[ 14.873826] Console: switching to colour frame buffer device 240x67
[ 15.120436] radeon 0001:01:00.0: [drm] fb0: radeondrmfb frame buffer device
[ 75.138845] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 75.145779] rcu: 0-...0: (2 ticks this GP) idle=ae24/1/0x4000000000000002 softirq=881/882 fqs=7501
[ 75.156363] rcu: hardirqs softirqs csw/system
[ 75.162878] rcu: number: 17595499 0 0
[ 75.169396] rcu: cputime: 0 0 0 ==> 30016(ms)
[ 75.177538] rcu: (detected by 3, t=15011 jiffies, g=121, q=273 ncpus=4)
[ 75.185375] Task dump for CPU 0:
[ 75.189151] task:swapper/0 state:R running task stack:0 pid:0 ppid:0 flags:0x00000008
[ 75.200753] Call Trace:
[ 75.203615] [<ffffffff80ce749c>] __schedule+0x27a/0x82e
Timed out for waiting the udev queue being empty.
[ 198.633202] watchdog: Watchdog detected hard LOCKUP on cpu 0
[ 198.639832] Modules linked in: radeon(+) hid_generic usbhid hid motorcomm video drm_suballoc_helper i2c_algo_bit drm_ttm_helper ttm drm_display_helper dwmac_starfive stmmac_platform cec rc_core stmmac drm_kms_helper drm axp20x_regulator pcs_xpcs xhci_pci dw_mmc_starfive dw_mmc_pltfm phylink backlight xhci_pci_renesas pinctrl_starfive_jh7110_aon dw_mmc clk_starfive_jh7110_aon axp20x_i2c jh7110_trng clk_starfive_jh7110_isp clk_starfive_jh7110_vout axp20x spi_cadence_quadspi phy_jh7110_usb
[ 242.654895] INFO: task kworker/1:2:83 blocked for more than 120 seconds.
[ 242.662754] Not tainted 6.5.0-10-generic #10.1-Ubuntu
[ 242.669286] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.678454] task:kworker/1:2 state:D stack:0 pid:83 ppid:2 flags:0x00000000
[ 242.688246] Workqueue: events output_poll_execute [drm_kms_helper]
[ 242.695736] Call Trace:
[ 242.698600] [<ffffffff80ce749c>] __schedule+0x27a/0x82e
[ 242.704733] [<ffffffff80ce7a9e>] schedule+0x4e/0xde
[ 242.710452] [<ffffffff80ce7eb4>] schedule_preempt_disabled+0x18/0x20
[ 242.717901] [<ffffffff80ce907a>] __mutex_lock.constprop.0+0x3ce/0x6e0
[ 242.725452] [<ffffffff80ce949c>] __mutex_lock_slowpath+0x1a/0x26
[ 242.732495] [<ffffffff80ce94f0>] mutex_lock+0x48/0x58
[ 242.738419] [<ffffffff0246a11a>] drm_client_dev_hotplug+0x7c/0x10a [drm]
[ 242.746934] [<ffffffff025d4a2e>] output_poll_execute+0x1e2/0x21c [drm_kms_helper]
[ 242.755890] [<ffffffff8003c3a0>] process_one_work+0x1dc/0x3b4
[ 242.762628] [<ffffffff8003ca34>] worker_thread+0x88/0x456
[ 242.768959] [<ffffffff80044fe6>] kthread+0xc4/0xe4
[ 242.774576] [<ffffffff80003f82>] ret_from_fork+0xe/0x20
[ 242.780717] INFO: task (udev-worker):117 blocked for more than 120 seconds.
[ 242.788875] Not tainted 6.5.0-10-generic #10.1-Ubuntu
[ 242.795410] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.804577] task:(udev-worker) state:D stack:0 pid:117 ppid:111 flags:0x00000006
[ 242.814360] Call Trace:
[ 242.817230] [<ffffffff80ce749c>] __schedule+0x27a/0x82e
[ 242.823366] [<ffffffff80ce7a9e>] schedule+0x4e/0xde
[ 242.829088] [<ffffffff80ced910>] sched...

Changed in linux-riscv (Ubuntu):
status: New → Confirmed
Revision history for this message
Heinrich Schuchardt (xypron) wrote :

@opvolger

Thank you for confirming the issue.

The upstreaming of PCIe for the StarFive VisionFive 2 board is not finalized yet. The kernel team will revisit this issue once there is proper upstream support. They picked up what was available on the kernel list which brought us NVMe support for which I am grateful.

Revision history for this message
Opvolger (opvolger) wrote :

@xypron

I know, https://rvspace.org/en/project/JH7110_Upstream_Plan

Can't wait until everything is green on this site :)

Good to know that NVMe support is working correctly.

I was very surprised to see that an external GPU worked "out of the box" with Ubuntu 23.10. But then it crashed a lot :(

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.