RISC-V: Illegal instruction

Bug #1934548 reported by Pierce Andjelkovic
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux-riscv (Ubuntu)
Fix Released
High
Unassigned
Focal
Won't Fix
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Unassigned
linux-riscv-5.11 (Ubuntu)
Fix Released
High
Unassigned
Focal
Fix Released
Undecided
Unassigned
Hirsute
Invalid
Undecided
Unassigned

Bug Description

When booting the Ubuntu Server image on the SiFive HiFive Unmatched I am getting the following error.
The last known working version was 1012.
The issue is being tracked on the SiFive forums at https://forums.sifive.com/t/u-boot-says-unhandled-exception-illegal-instruction/4898

```
Starting kernel ...

[ 0.000000] Linux version 5.11.0-1014-generic (buildd@riscv64-qemu-lcy01-084) (gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0, GNU ld (GNU Binutils for Ubuntu) 2.36.1) #14-Ubuntu SMP Wed Jun 30 17:56:50 UTC 2021 (Ubuntu 5.11.0-1014.14-generic 5.11.22)
[ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
[ 0.000000] earlycon: sifive0 at MMIO 0x0000000010010000 (options '')
[ 0.000000] printk: bootconsole [sifive0] enabled
[ 0.000000] efi: UEFI not found.
[ 0.000000] Initial ramdisk at: 0x(____ptrval____) (183422976 bytes)
[ 0.000000] cma: Reserved 32 MiB at 0x00000000fe000000
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff]
[ 0.000000] Normal [mem 0x0000000100000000-0x000000047fffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000080200000-0x000000047fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x000000047fffffff]
[ 0.000000] DMA32 zone: 512 pages in unavailable ranges
[ 0.000000] SBI specification v0.2 detected
[ 0.000000] SBI implementation ID=0x1 Version=0x9
[ 0.000000] SBI v0.2 TIME extension detected
[ 0.000000] SBI v0.2 IPI extension detected
[ 0.000000] SBI v0.2 RFENCE extension detected
[ 0.000000] software IO TLB: mapped [mem 0x00000000fa000000-0x00000000fe000000] (64MB)
[ 0.000000] SBI v0.2 HSM extension detected
[ 0.000000] CPU with hartid=0 is not available
[ 0.000000] CPU with hartid=0 is not available
[ 0.000000] riscv: ISA extensions acdfim
[ 0.000000] riscv: ELF capabilities acdfim
[ 0.000000] percpu: Embedded 26 pages/cpu s69272 r8192 d29032 u106496
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 4128264
[ 0.000000] Kernel command line: root=/dev/nvme0n1p1 ro earlycon
[ 0.000000] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
[ 0.000000] Sorting __ex_table...
[ 0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
[ 0.000000] Memory: 16165452K/16775168K available (9854K kernel code, 5763K rwdata, 8192K rodata, 2519K init, 997K bss, 576948K reserved, 32768K cma-reserved)
[ 0.000000] random: get_random_u64 called from kmem_cache_open+0x36/0x338 with crng_init=0
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] ftrace: allocating 38893 entries in 152 pages
[ 0.000000] Oops - illegal instruction [#1]
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.11.0-1014-generic #14-Ubuntu
[ 0.000000] epc: ffffffe00000920e ra : ffffffe000009384 sp : ffffffe001803d30
[ 0.000000] gp : ffffffe001a14240 tp : ffffffe00180f440 t0 : ffffffe07fe38000
[ 0.000000] t1 : ffffffe0019cd338 t2 : 0000000000000000 s0 : ffffffe001803d70
[ 0.000000] s1 : 0000000000000000 a0 : ffffffe0000095aa a1 : 0000000000000001
[ 0.000000] a2 : 0000000000000002 a3 : 0000000000000000 a4 : 0000000000000000
[ 0.000000] a5 : 0000000000000000 a6 : 0000000000000004 a7 : 0000000052464e43
[ 0.000000] s2 : 0000000000000002 s3 : 0000000000000001 s4 : 0000000000000000
[ 0.000000] s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
[ 0.000000] s8 : ffffffe001a170c0 s9 : 0000000000000001 s10: 0000000000000001
[ 0.000000] s11: 00000000fffcc5d0 t3 : 0000000000000068 t4 : 000000000000000b
[ 0.000000] t5 : ffffffe0019cd3e0 t6 : ffffffe001803cd8
[ 0.000000] status: 0000000200000100 badaddr: 000000000513f187 cause: 0000000000000002
[ 0.000000] ---[ end trace f67eb9af4d8d492b ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
```

affects: charm-nrpe → linux-kernel-headers
tags: added: sifive
tags: added: rv64 rv64gc
no longer affects: linux-kernel-headers
Jessica Clarke (jrtc27)
affects: linux (Ubuntu) → linux-riscv (Ubuntu)
Revision history for this message
Pierce Andjelkovic (pierceandjelkovic) wrote :

This appears to be resolved in 5.11.0-1015-generic

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-riscv (Ubuntu):
status: New → Confirmed
Changed in ubuntu:
status: New → Confirmed
affects: ubuntu → linux-riscv-5.11 (Ubuntu)
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I am failing to reliably reproduce this issue.
At first I have experienced this too. When I tried to bisect it and install multiple kernel abis they are all booting fine.

So I was able to fail booting 1014 1015 1016 abis. And then they started to boot fine.

I wonder if there is some requirements in terms of cold boot; hard reset; soft boot. Or if the SD-card is like busted, and comes back to life if one writes the kernel image to a different location based on wear leveling.

At the moment we are preparing for point release of Ubuntu 20.04 with unmatched support. It currently has the 1015 abi too.

If you can, it would be nice if people can test Unmatched image from https://cdimage.ubuntu.com/ubuntu-server/focal/daily-preinstalled/pending/ and report if it is working for them or not.

Revision history for this message
Paul Larson (pwlars) wrote :

That's the image I'm using, and it's reproducible every single time for me

Revision history for this message
William Wilson (jawn-smith) wrote :

I've downloaded the daily build from August 9, and have reproduced this 5 times out of 5 boot attempts. Attached is my serial output:

https://paste.ubuntu.com/p/gfpb8tmk9h/

I have the board revision ending in 3A0

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

[ 0.000000] XNOX kernel/trace/ftrace.c(6255) ftrace_process_locs:
[ 0.000000] XNOX kernel/trace/ftrace.c(6272) ftrace_process_locs:
[ 0.000000] Oops - illegal instruction [#1]
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.11.0-1015-generic #16~20.04.1
[ 0.000000] epc: ffffffe00000920e ra : ffffffe000009384 sp : ffffffe001603d30

 if (!mod)
  local_irq_save(flags);
 pr_info("XNOX %s(%d) %s:\n", __FILE__, __LINE__, __func__);
 ftrace_update_code(mod, start_pg);

So it is blowing up in ftrace_update_code i think

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

added more pr_info calls inside ftrace_update_code...... and things boot now for me. =((((((((

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

https://people.canonical.com/~xnox/lp1934548/

findings published at above location. I am giving up at this point and need to handover to someone else.

Is there a way to attach gdb over jtag to unmatched board whilst it is attempting to boot?
Is there a way to analyze the vmlinuz image to figure out what is supposed to be at the crashing addresses?

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Looking at the crashing address (by doing objdump on the vmlinux from the ddeb), it lands on the middle of an instruction.

ffffffe000009204: 99c080e7 jalr -1636(ra) # ffffffe000006b9c <riscv_cpuid_to_hartid_mask>
ffffffe000009208: 0180e797 auipc a5,0x180e
ffffffe00000920c: f187b783 ld a5,-232(a5) # ffffffe001817120 <__sbi_send_ipi>
ffffffe000009210: fd040513 addi a0,s0,-48
ffffffe000009214: 9782 jalr a5
ffffffe000009216: fd843703 ld a4,-40(s0)

Perhaps, the patching done by ftrace is causing this.

There is afc76b8b80112189b6f11e67e19cf58301944814 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT"), which doesn't refer to any bug that is fixed, but changes the way patches are applied. So, perhaps, this would help fix the issue. Possibly worth the try until this is properly debugged. I can't test it, otherwise, so @xnox, would you be able to try this upstream commit?

Cascardo.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

The alternative theory here is that this is always breaking at sbi_send_cpumask_ipi, which is being patched at:

ffffffe0000091e2: fffff097 auipc ra,0xfffff
ffffffe0000091e6: dfe080e7 jalr -514(ra) # ffffffe000007fe0 <ftrace_stub>

I even found ffffffe0000091e2 as one of the addresses being patched (look at __start_mcount_loc). And it happens that the patching code will end up calling flush_icache_range (which is really flush_icache_all). That, in turn, will end up doing some form of IPI.

So, possibly there is a race involved here, where the IPI code is being executed after it is patched, but before icache is flushed. It's hard to think that this segment lies inside a cache boundary, but there is certainly something else necessary for the sync here.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hey, @xnox.

Can you try the attached patch and see if that works out? It's possible that other functions need to be marked notrace.

Cascardo.

tags: added: patch
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Changed in linux-riscv-5.11 (Ubuntu):
status: Confirmed → In Progress
Changed in linux-riscv (Ubuntu):
status: Confirmed → In Progress
Changed in linux-riscv-5.11 (Ubuntu):
importance: Undecided → High
Changed in linux-riscv (Ubuntu):
importance: Undecided → High
Changed in linux-riscv (Ubuntu):
status: In Progress → Fix Committed
Changed in linux-riscv-5.11 (Ubuntu):
status: In Progress → Fix Committed
Changed in linux-riscv (Ubuntu Hirsute):
status: New → Fix Committed
Changed in linux-riscv (Ubuntu):
status: Fix Committed → Confirmed
Changed in linux-riscv-5.11 (Ubuntu Hirsute):
status: New → Fix Committed
Changed in linux-riscv-5.11 (Ubuntu):
status: Fix Committed → Confirmed
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Hirsute:
$ uname -a
Linux ubuntu 5.11.0-1017-generic #18-Ubuntu SMP Wed Aug 11 18:02:14 UTC 2021 riscv64 riscv64 riscv64 GNU/Linux

$ systemctl is-system-running
running

$ sudo dmesg | grep gcc
[ 0.000000] Linux version 5.11.0-1017-generic (buildd@riscv64-qemu-lcy01-065) (gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0, GNU ld (GNU Binutils for Ubuntu) 2.36.1) #18-Ubuntu SMP Wed Aug 11 18:02:14 UTC 2021 (Ubuntu 5.11.0-1017.18-generic 5.11.22)

(based on gcc version it is hirsute build of this abi)

System booted fine.

tags: added: verification-done-hirsute
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Focal

ubuntu@ubuntu:~$ dmesg | grep gcc
[ 0.000000] Linux version 5.11.0-1017-generic (buildd@riscv64-qemu-lcy01-062) (gcc-10 (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #18~20.04.1-Ubuntu SMP Thu Aug 12 00:38:00 UTC 2021 (Ubuntu 5.11.0-1017.18~20.04.1-generic 5.11.22)

ubuntu@ubuntu:~$ systemctl is-system-running
running

Tested from the kernel team's ppa on focal, on unmatched. All is good.

tags: added: verification-done-focal
Revision history for this message
William Wilson (jawn-smith) wrote :

The 1017 kernel from the kernel team PPA is reliably booting on my hardware.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (48.6 KiB)

This bug was fixed in the package linux-riscv-5.11 - 5.11.0-1017.18~20.04.1

---------------
linux-riscv-5.11 (5.11.0-1017.18~20.04.1) focal; urgency=medium

  * focal/linux-riscv-5.11: 5.11.0-1017.18~20.04.1 -proposed tracker
    (LP: #1939588)

  * Packaging resync (LP: #1786013)
    - [Packaging] update variants

  [ Ubuntu: 5.11.0-1017.18 ]

  * hirsute/linux-riscv: 5.11.0-1017.18 -proposed tracker (LP: #1939589)
  * RISC-V: Illegal instruction (LP: #1934548)
    - SAUCE: RISC-V: prevent sbi_send_cpumask_ipi race with ftrace
  * hirsute/linux: 5.11.0-31.33 -proposed tracker (LP: #1939553)
  * REGRESSION: shiftfs lets sendfile fail with EINVAL (LP: #1939301)
    - SAUCE: shiftfs: fix sendfile() invocations

  [ Ubuntu: 5.11.0-1016.17 ]

  * hirsute/linux-riscv: 5.11.0-1016.17 -proposed tracker (LP: #1936501)
  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * large_dir in ext4 broken (LP: #1933074)
    - SAUCE: ext4: fix directory index node split corruption
  * Add l2tp.sh in net from ubuntu_kernel_selftests back (LP: #1934293)
    - Revert "UBUNTU: SAUCE: selftests/net -- disable l2tp.sh test"
  * icmp_redirect.sh in net from ubuntu_kernel_selftests failed on F-OEM-5.6 /
    F-OEM-5.10 / F-OEM-5.13 / F / G / H (LP: #1880645)
    - selftests: icmp_redirect: support expected failures
  * Mute/mic LEDs no function on some HP platfroms (LP: #1934878)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 450 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 630 G8
  * [SRU][OEM-5.10/H] Fix HDMI output issue on Intel TGL GPU (LP: #1934864)
    - drm/i915: Fix HAS_LSPCON macro for platforms between GEN9 and GEN10
  * mute/micmute LEDs no function on HP EliteBook 830 G8 Notebook PC
    (LP: #1934239)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 830 G8 Notebook PC
  * ubuntu-host driver lacks lseek ops (LP: #1934110)
    - ubuntu-host: add generic lseek op
  * ubuntu_kernel_selftests ftrace fails on arm64 F / aws-5.8 / amd64 F
    azure-5.8 (LP: #1927749)
    - selftests/ftrace: fix event-no-pid on 1-core machine
  * Hirsute update: upstream stable patchset 2021-06-29 (LP: #1934012)
    - proc: Track /proc/$pid/attr/ opener mm_struct
    - ASoC: max98088: fix ni clock divider calculation
    - ASoC: amd: fix for pcm_read() error
    - spi: Fix spi device unregister flow
    - spi: spi-zynq-qspi: Fix stack violation bug
    - bpf: Forbid trampoline attach for functions with variable arguments
    - net/nfc/rawsock.c: fix a permission check bug
    - usb: cdns3: Fix runtime PM imbalance on error
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet
    - vfio-ccw: Reset FSM state to IDLE inside FSM
    - vfio-ccw: Serialize FSM IDLE state with I/O completion
    - ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
    - spi: sprd: Add missing MODULE_DEVICE_TABLE
    - usb: chipidea: udc: assign interrupt number to USB gadget structure
    - isdn: mISDN: netjet: Fix crash in nj_probe:
    - bonding: init notify_work earlier t...

Changed in linux-riscv-5.11 (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (48.4 KiB)

This bug was fixed in the package linux-riscv - 5.11.0-1017.18

---------------
linux-riscv (5.11.0-1017.18) hirsute; urgency=medium

  * hirsute/linux-riscv: 5.11.0-1017.18 -proposed tracker (LP: #1939589)

  * RISC-V: Illegal instruction (LP: #1934548)
    - SAUCE: RISC-V: prevent sbi_send_cpumask_ipi race with ftrace

  [ Ubuntu: 5.11.0-31.33 ]

  * hirsute/linux: 5.11.0-31.33 -proposed tracker (LP: #1939553)
  * REGRESSION: shiftfs lets sendfile fail with EINVAL (LP: #1939301)
    - SAUCE: shiftfs: fix sendfile() invocations

linux-riscv (5.11.0-1016.17) hirsute; urgency=medium

  * hirsute/linux-riscv: 5.11.0-1016.17 -proposed tracker (LP: #1936501)

  [ Ubuntu: 5.11.0-26.28 ]

  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * large_dir in ext4 broken (LP: #1933074)
    - SAUCE: ext4: fix directory index node split corruption
  * Add l2tp.sh in net from ubuntu_kernel_selftests back (LP: #1934293)
    - Revert "UBUNTU: SAUCE: selftests/net -- disable l2tp.sh test"
  * icmp_redirect.sh in net from ubuntu_kernel_selftests failed on F-OEM-5.6 /
    F-OEM-5.10 / F-OEM-5.13 / F / G / H (LP: #1880645)
    - selftests: icmp_redirect: support expected failures
  * Mute/mic LEDs no function on some HP platfroms (LP: #1934878)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 450 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 630 G8
  * [SRU][OEM-5.10/H] Fix HDMI output issue on Intel TGL GPU (LP: #1934864)
    - drm/i915: Fix HAS_LSPCON macro for platforms between GEN9 and GEN10
  * mute/micmute LEDs no function on HP EliteBook 830 G8 Notebook PC
    (LP: #1934239)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 830 G8 Notebook PC
  * ubuntu-host driver lacks lseek ops (LP: #1934110)
    - ubuntu-host: add generic lseek op
  * ubuntu_kernel_selftests ftrace fails on arm64 F / aws-5.8 / amd64 F
    azure-5.8 (LP: #1927749)
    - selftests/ftrace: fix event-no-pid on 1-core machine
  * Hirsute update: upstream stable patchset 2021-06-29 (LP: #1934012)
    - proc: Track /proc/$pid/attr/ opener mm_struct
    - ASoC: max98088: fix ni clock divider calculation
    - ASoC: amd: fix for pcm_read() error
    - spi: Fix spi device unregister flow
    - spi: spi-zynq-qspi: Fix stack violation bug
    - bpf: Forbid trampoline attach for functions with variable arguments
    - net/nfc/rawsock.c: fix a permission check bug
    - usb: cdns3: Fix runtime PM imbalance on error
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet
    - vfio-ccw: Reset FSM state to IDLE inside FSM
    - vfio-ccw: Serialize FSM IDLE state with I/O completion
    - ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
    - spi: sprd: Add missing MODULE_DEVICE_TABLE
    - usb: chipidea: udc: assign interrupt number to USB gadget structure
    - isdn: mISDN: netjet: Fix crash in nj_probe:
    - bonding: init notify_work earlier to avoid uninitialized use
    - netlink: disable IRQs for netlink_lock_table()
    - net: mdiobus: get rid of a BUG_ON()
    - cgro...

Changed in linux-riscv (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Changed in linux-riscv (Ubuntu Focal):
status: New → Won't Fix
Changed in linux-riscv-5.11 (Ubuntu Focal):
status: New → Fix Released
Changed in linux-riscv-5.11 (Ubuntu Hirsute):
status: Fix Committed → Invalid
Changed in linux-riscv (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.