rtnetlink.sh test in ubuntu_kselftests_net cause kernel panic on ARM64 node scobee-kernel with J-5.15

Bug #2065350 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned
linux (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
In Progress
Undecided
Ike Panhc

Bug Description

Issue found with 5.15.0-111.121, verified manually with 5.15.0-106-generic

When running the rtnetlink.sh test from ubuntu_kselftests_net, it will cause kernel panic on ARM64 node scobee-kernel.

Reproduce rate is 100%, just run the rtnetlink.sh from the Jammy tree.

Test log:
ubuntu@scobee-kernel:~/autotest/client/tmp/ubuntu_kselftests_net/src/linux/tools/testing/selftests/net$ sudo ./rtnetlink.sh
PASS: policy routing
PASS: route get
PASS: preferred_lft addresses have expired
PASS: promote_secondaries complete
PASS: tc htb hierarchy
PASS: gre tunnel endpoint
PASS: gretap
PASS: ip6gretap
PASS: erspan
PASS: ip6erspan
PASS: bridge setup
PASS: ipv6 addrlabel
PASS: set ifalias 0b5c1e3b-f766-4ec3-bdf0-08779e7d3601 for test-dummy0
PASS: vrf
PASS: vxlan
PASS: fou
(system hangs here)

You will need to use console to see error message from dmesg:
[ 274.758075] MACsec IEEE 802.1AE
[ 275.075237] kernel BUG at mm/vmalloc.c:2716!
[ 275.079520] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 275.085590] Modules linked in: macsec fou vxlan ip6_udp_tunnel udp_tunnel vrf 8021q garp mrp bridge stp llc ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre cls_u32 sch_htb dummy binfmt_misc nls_iso8859_1 hisi_zip hisi_hpre hisi_sec2 hns_roce_hw_v2 hisi_qm arm_spe_pmu ecdh_generic libcurve25519_generic uacce ecc ipmi_ssif authenc hisi_trng_v2 hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler hisi_uncore_pmu cppc_cpufreq sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure mlx5_ib ib_uverbs ib_core realtek hibmc_drm crct10dif_ce drm_vram_helper ghash_ce sha2_ce drm_ttm_helper mlx5_core ttm sha256_arm64 sha1_ce i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt hns3 fb_sys_fops cec
[ 275.085753] hisi_sas_v3_hw mlxfw rc_core hisi_sas_main psample hclge tls drm libsas xhci_pci hnae3 xhci_pci_renesas ahci scsi_transport_sas spi_dw_mmio spi_dw gpio_dwapb aes_neon_blk crypto_simd cryptd aes_ce_cipher
[ 275.190870] CPU: 75 PID: 0 Comm: swapper/75 Not tainted 5.15.0-106-generic #116-Ubuntu
[ 275.198753] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B160.01 01/15/2020
[ 275.207584] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 275.214514] pc : vunmap+0x50/0x54
[ 275.217824] lr : dma_common_free_remap+0x5c/0x70
[ 275.222427] sp : ffff80000825bc50
[ 275.225728] x29: ffff80000825bc50 x28: ffffb7e0ce3efb40 x27: ffffb7e0cbc0ea50
[ 275.232831] x26: ffff203f7fb9e9b0 x25: 0000000000000000 x24: ffffb7e0ccd9be90
[ 275.239934] x23: 00000000f7eff000 x22: ffff20205cfa7480 x21: 0000000000000001
[ 275.247036] x20: ffff80001c1db000 x19: ffff80001c1db000 x18: 0000000000000000
[ 275.254139] x17: ffff685eb2070000 x16: ffffb7e0cbc15324 x15: 63657363616d3d45
[ 275.261242] x14: ffffb7e0ce30bf30 x13: ffffb7e0ce30ba18 x12: 000000000000004b
[ 275.268345] x11: 0000000000000000 x10: ffff202006e44000 x9 : ffffb7e0cbc1af5c
[ 275.275447] x8 : 000000000000001f x7 : 0000000000000061 x6 : 0000000000000021
[ 275.282550] x5 : ffff80001c1dbfff x4 : 0000000000000000 x3 : ffff80001b12dfff
[ 275.289652] x2 : ffffb7e0ce7ab5d8 x1 : 0000000000000100 x0 : ffff80001c1db000
[ 275.296756] Call trace:
[ 275.299194] vunmap+0x50/0x54
[ 275.302149] __iommu_dma_free+0xc4/0x10c
[ 275.306061] iommu_dma_free+0x44/0x60
[ 275.309706] dma_free_attrs+0xe0/0xec
[ 275.313356] sec_cipher_uninit+0x54/0x70 [hisi_sec2]
[ 275.318303] sec_aead_exit+0x34/0x80 [hisi_sec2]
[ 275.322902] sec_aead_xcm_ctx_exit+0x30/0x40 [hisi_sec2]
[ 275.328190] crypto_aead_exit_tfm+0x28/0x3c
[ 275.332361] crypto_destroy_tfm+0x48/0xa0
[ 275.336359] free_rxsa+0x28/0x50 [macsec]
[ 275.340354] rcu_do_batch+0x16c/0x450
[ 275.344001] rcu_core+0x160/0x310
[ 275.347302] rcu_core_si+0x18/0x2c
[ 275.350688] __do_softirq+0x15c/0x410
[ 275.354336] irq_exit+0xa0/0xe0
[ 275.357468] handle_domain_irq+0x6c/0xa0
[ 275.361378] gic_handle_irq+0xec/0x1b0
[ 275.365110] call_on_irq_stack+0x20/0x2c
[ 275.369016] do_interrupt_handler+0x5c/0x70
[ 275.373182] el1_interrupt+0x30/0x50
[ 275.376750] el1h_64_irq_handler+0x18/0x2c
[ 275.380830] el1h_64_irq+0x7c/0x80
[ 275.384216] arch_cpu_idle+0x18/0x3c
[ 275.387776] default_idle_call+0x44/0x150
[ 275.391771] cpuidle_idle_call+0x174/0x200
[ 275.395851] do_idle+0xac/0x100
[ 275.398979] cpu_startup_entry+0x2c/0x70
[ 275.402885] secondary_start_kernel+0xfc/0x190
[ 275.407316] __secondary_switched+0x90/0x94
[ 275.411485] Code: f9400bf3 a8c27bfd d50323bf d65f03c0 (d4210000)
[ 275.417555] ---[ end trace 6fe56b1fa29bb224 ]---
[ 275.427929] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
[ 275.435292] SMP: stopping secondary CPUs
[ 275.439278] Kernel Offset: 0x37e0c3ab0000 from 0xffff800008000000
[ 275.445342] PHYS_OFFSET: 0x0
[ 275.448210] CPU features: 0x0,00000441,a3202c40
[ 275.452721] Memory Limit: none
[ 275.461227] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]---

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

It looks like it's caused by the "macsec" test.

Here is an output from -91 + test from -106:
$ sudo ./rtnetlink.sh
PASS: policy routing
PASS: route get
PASS: preferred_lft addresses have expired
PASS: promote_secondaries complete
PASS: tc htb hierarchy
PASS: gre tunnel endpoint
PASS: gretap
PASS: ip6gretap
PASS: erspan
PASS: ip6erspan
PASS: bridge setup
PASS: ipv6 addrlabel
PASS: set ifalias e15265b2-91be-4a42-a08f-682913cb1767 for test-dummy0
PASS: vrf
PASS: vxlan
PASS: fou
PASS: macsec
PASS: ipsec
PASS: ipsec_offload
PASS: bridge fdb get
PASS: neigh get
PASS: bridge_parent_id

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Failed with -97 + test code from -106.
Failed with -94 + test code from -106.
Passed with -92 + test code from -106.

So it looks like this issue happens between -92 and -94.

We didn't catch this becuase scobee-kernel was added to the pool since 2024.04.01 cycle but at that time the ubuntu_kselftests_net failed with deployment error with 5.15.0-106.116.

And then we finally get it tested with -111 in 2024.04.29

description: updated
Changed in linux (Ubuntu):
status: New → Invalid
Ike Panhc (ikepanhc)
Changed in linux (Ubuntu Jammy):
assignee: nobody → Ike Panhc (ikepanhc)
Ike Panhc (ikepanhc)
Changed in linux (Ubuntu Jammy):
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.