rtnetlink.sh test in ubuntu_kselftests_net cause kernel panic on ARM64 node scobee-kernel with J-5.15

Bug #2065350 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned
linux (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
In Progress
Undecided
Ike Panhc

Bug Description

Issue found with 5.15.0-111.121, verified manually with 5.15.0-106-generic

When running the rtnetlink.sh test from ubuntu_kselftests_net, it will cause kernel panic on ARM64 node scobee-kernel.

Reproduce rate is 100%, just run the rtnetlink.sh from the Jammy tree.

Test log:
ubuntu@scobee-kernel:~/autotest/client/tmp/ubuntu_kselftests_net/src/linux/tools/testing/selftests/net$ sudo ./rtnetlink.sh
PASS: policy routing
PASS: route get
PASS: preferred_lft addresses have expired
PASS: promote_secondaries complete
PASS: tc htb hierarchy
PASS: gre tunnel endpoint
PASS: gretap
PASS: ip6gretap
PASS: erspan
PASS: ip6erspan
PASS: bridge setup
PASS: ipv6 addrlabel
PASS: set ifalias 0b5c1e3b-f766-4ec3-bdf0-08779e7d3601 for test-dummy0
PASS: vrf
PASS: vxlan
PASS: fou
(system hangs here)

You will need to use console to see error message from dmesg:
[ 274.758075] MACsec IEEE 802.1AE
[ 275.075237] kernel BUG at mm/vmalloc.c:2716!
[ 275.079520] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 275.085590] Modules linked in: macsec fou vxlan ip6_udp_tunnel udp_tunnel vrf 8021q garp mrp bridge stp llc ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre cls_u32 sch_htb dummy binfmt_misc nls_iso8859_1 hisi_zip hisi_hpre hisi_sec2 hns_roce_hw_v2 hisi_qm arm_spe_pmu ecdh_generic libcurve25519_generic uacce ecc ipmi_ssif authenc hisi_trng_v2 hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler hisi_uncore_pmu cppc_cpufreq sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure mlx5_ib ib_uverbs ib_core realtek hibmc_drm crct10dif_ce drm_vram_helper ghash_ce sha2_ce drm_ttm_helper mlx5_core ttm sha256_arm64 sha1_ce i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt hns3 fb_sys_fops cec
[ 275.085753] hisi_sas_v3_hw mlxfw rc_core hisi_sas_main psample hclge tls drm libsas xhci_pci hnae3 xhci_pci_renesas ahci scsi_transport_sas spi_dw_mmio spi_dw gpio_dwapb aes_neon_blk crypto_simd cryptd aes_ce_cipher
[ 275.190870] CPU: 75 PID: 0 Comm: swapper/75 Not tainted 5.15.0-106-generic #116-Ubuntu
[ 275.198753] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B160.01 01/15/2020
[ 275.207584] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 275.214514] pc : vunmap+0x50/0x54
[ 275.217824] lr : dma_common_free_remap+0x5c/0x70
[ 275.222427] sp : ffff80000825bc50
[ 275.225728] x29: ffff80000825bc50 x28: ffffb7e0ce3efb40 x27: ffffb7e0cbc0ea50
[ 275.232831] x26: ffff203f7fb9e9b0 x25: 0000000000000000 x24: ffffb7e0ccd9be90
[ 275.239934] x23: 00000000f7eff000 x22: ffff20205cfa7480 x21: 0000000000000001
[ 275.247036] x20: ffff80001c1db000 x19: ffff80001c1db000 x18: 0000000000000000
[ 275.254139] x17: ffff685eb2070000 x16: ffffb7e0cbc15324 x15: 63657363616d3d45
[ 275.261242] x14: ffffb7e0ce30bf30 x13: ffffb7e0ce30ba18 x12: 000000000000004b
[ 275.268345] x11: 0000000000000000 x10: ffff202006e44000 x9 : ffffb7e0cbc1af5c
[ 275.275447] x8 : 000000000000001f x7 : 0000000000000061 x6 : 0000000000000021
[ 275.282550] x5 : ffff80001c1dbfff x4 : 0000000000000000 x3 : ffff80001b12dfff
[ 275.289652] x2 : ffffb7e0ce7ab5d8 x1 : 0000000000000100 x0 : ffff80001c1db000
[ 275.296756] Call trace:
[ 275.299194] vunmap+0x50/0x54
[ 275.302149] __iommu_dma_free+0xc4/0x10c
[ 275.306061] iommu_dma_free+0x44/0x60
[ 275.309706] dma_free_attrs+0xe0/0xec
[ 275.313356] sec_cipher_uninit+0x54/0x70 [hisi_sec2]
[ 275.318303] sec_aead_exit+0x34/0x80 [hisi_sec2]
[ 275.322902] sec_aead_xcm_ctx_exit+0x30/0x40 [hisi_sec2]
[ 275.328190] crypto_aead_exit_tfm+0x28/0x3c
[ 275.332361] crypto_destroy_tfm+0x48/0xa0
[ 275.336359] free_rxsa+0x28/0x50 [macsec]
[ 275.340354] rcu_do_batch+0x16c/0x450
[ 275.344001] rcu_core+0x160/0x310
[ 275.347302] rcu_core_si+0x18/0x2c
[ 275.350688] __do_softirq+0x15c/0x410
[ 275.354336] irq_exit+0xa0/0xe0
[ 275.357468] handle_domain_irq+0x6c/0xa0
[ 275.361378] gic_handle_irq+0xec/0x1b0
[ 275.365110] call_on_irq_stack+0x20/0x2c
[ 275.369016] do_interrupt_handler+0x5c/0x70
[ 275.373182] el1_interrupt+0x30/0x50
[ 275.376750] el1h_64_irq_handler+0x18/0x2c
[ 275.380830] el1h_64_irq+0x7c/0x80
[ 275.384216] arch_cpu_idle+0x18/0x3c
[ 275.387776] default_idle_call+0x44/0x150
[ 275.391771] cpuidle_idle_call+0x174/0x200
[ 275.395851] do_idle+0xac/0x100
[ 275.398979] cpu_startup_entry+0x2c/0x70
[ 275.402885] secondary_start_kernel+0xfc/0x190
[ 275.407316] __secondary_switched+0x90/0x94
[ 275.411485] Code: f9400bf3 a8c27bfd d50323bf d65f03c0 (d4210000)
[ 275.417555] ---[ end trace 6fe56b1fa29bb224 ]---
[ 275.427929] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
[ 275.435292] SMP: stopping secondary CPUs
[ 275.439278] Kernel Offset: 0x37e0c3ab0000 from 0xffff800008000000
[ 275.445342] PHYS_OFFSET: 0x0
[ 275.448210] CPU features: 0x0,00000441,a3202c40
[ 275.452721] Memory Limit: none
[ 275.461227] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]---

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

It looks like it's caused by the "macsec" test.

Here is an output from -91 + test from -106:
$ sudo ./rtnetlink.sh
PASS: policy routing
PASS: route get
PASS: preferred_lft addresses have expired
PASS: promote_secondaries complete
PASS: tc htb hierarchy
PASS: gre tunnel endpoint
PASS: gretap
PASS: ip6gretap
PASS: erspan
PASS: ip6erspan
PASS: bridge setup
PASS: ipv6 addrlabel
PASS: set ifalias e15265b2-91be-4a42-a08f-682913cb1767 for test-dummy0
PASS: vrf
PASS: vxlan
PASS: fou
PASS: macsec
PASS: ipsec
PASS: ipsec_offload
PASS: bridge fdb get
PASS: neigh get
PASS: bridge_parent_id

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Failed with -97 + test code from -106.
Failed with -94 + test code from -106.
Passed with -92 + test code from -106.

So it looks like this issue happens between -92 and -94.

We didn't catch this becuase scobee-kernel was added to the pool since 2024.04.01 cycle but at that time the ubuntu_kselftests_net failed with deployment error with 5.15.0-106.116.

And then we finally get it tested with -111 in 2024.04.29

description: updated
Changed in linux (Ubuntu):
status: New → Invalid
Ike Panhc (ikepanhc)
Changed in linux (Ubuntu Jammy):
assignee: nobody → Ike Panhc (ikepanhc)
Ike Panhc (ikepanhc)
Changed in linux (Ubuntu Jammy):
status: New → In Progress
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Download full text (4.6 KiB)

Still exist with J-hwe-6.8.0-39-generic-64k #39~22.04.1

 Running 'make run_tests -C net TEST_PROGS=rtnetlink.sh TEST_GEN_PROGS='' TEST_CUSTOM_PROGS='''
 make: Entering directory '/home/ubuntu/autotest/client/tmp/ubuntu_kselftests_net/src/linux/tools/testing/selftests/net'
 TAP version 13
 1..1
 # timeout set to 0
 # selftests: net: rtnetlink.sh
 # PASS: policy routing
 # PASS: route get
 # PASS: preferred_lft addresses have expired
 # PASS: promote_secondaries complete
 # PASS: tc htb hierarchy
 # PASS: gre tunnel endpoint
 # PASS: gretap
 # PASS: ip6gretap
 # PASS: erspan
 # PASS: ip6erspan
 # PASS: bridge setup
 # PASS: ipv6 addrlabel
 # PASS: set ifalias bfa064a6-93c6-4c02-bbe0-afa6bc5f037b for test-dummy0
 # PASS: vrf
(hang here)

[ 871.770786] kernel BUG at mm/vmalloc.c:2865!
[ 871.775074] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 871.781158] Modules linked in: fou vxlan ip6_udp_tunnel udp_tunnel vrf 8021q garp mrp bridge stp llc ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre cls_u32 sch_htb macvtap macvlan tap dummy aria_generic aes_ce_ccm sm4_generic ccm sm4_neon poly1305_generic libpoly1305 poly1305_neon chacha_generic chacha_neon libchacha chacha20poly1305 binfmt_misc nls_iso8859_1 ipmi_ssif onboard_usb_hub hisi_hpre ecdh_generic hisi_sec2 hisi_zip hns_roce_hw_v2 libcurve25519_generic acpi_ipmi hisi_qm uacce ecc authenc ipmi_si hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu ipmi_devintf hisi_uncore_pmu arm_spe_pmu ipmi_msghandler arm_smmuv3_pmu hisi_trng_v2 cppc_cpufreq dm_multipath sch_fq_codel scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 mlx5_ib ib_uverbs macsec ib_core realtek crct10dif_ce polyval_ce polyval_generic ses ghash_ce enclosure mlx5_core hclge
[ 871.781275] sm4 sha2_ce mlxfw hibmc_drm hisi_sas_v3_hw sha256_arm64 drm_vram_helper psample sha1_ce hisi_sas_main drm_ttm_helper tls hns3 libsas pci_hyperv_intf xhci_pci ttm hnae3 xhci_pci_renesas ahci scsi_transport_sas i2c_algo_bit spi_dw_mmio gpio_dwapb spi_dw aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [last unloaded: test_bpf]
[ 871.900751] CPU: 38 PID: 246 Comm: ksoftirqd/38 Not tainted 6.8.0-39-generic-64k #39~22.04.1-Ubuntu
[ 871.909755] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B160.01 01/15/2020
[ 871.918587] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 871.925520] pc : vunmap+0x6c/0x90
[ 871.928843] lr : dma_common_free_remap+0x74/0xa0
[ 871.933454] sp : ffff80009720fb30
[ 871.936754] x29: ffff80009720fb30 x28: ffffac87e676b19c x27: ffff005ff6e9edb8
[ 871.943858] x26: 000000000000000a x25: 0000000000000000 x24: ffffac87e7ec0470
[ 871.950963] x23: 00000000f84e0000 x22: ffff0040181050b0 x21: 0000000000000001
[ 871.958068] x20: ffff8000d0e80000 x19: ffff8000d0e80000 x18: ffff800097220078
[ 871.965174] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 871.972280] x14: 0000000000000000 x13: 0000000000000000 x12: 0101010101010101
[ 871.979386] x11: 0000000000000000 x10: 0000000000000000 x9 ...

Read more...

Revision history for this message
Mehmet Basaran (mehmetbasaran) wrote :

Seen this with jammy:linux-lowlatency-hwe-6.8 6.8.0-44.44.1~22.04.1 on metal:scobee-kernel (flavour:lowlatency-64k)

kselftests_net is also run as an adt test on arm here: https://autopkgtest.ubuntu.com/results/autopkgtest-jammy/jammy/arm64/l/linux-lowlatency-hwe-6.8/20240826_235140_fdde8@/log.gz

But it doesn't cause a crash. This might be either instance specific or related to lowlatency-64k.

Revision history for this message
Kevin Becker (kevinbecker) wrote (last edit ):

I saw this with jammy:linux-realtime 5.15.0-1071.79 on scobee-kernel. I think this was the first time this test used scobee-kernel in a while, so that's probably why we didn't see it before.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.