pmtu.sh in net from ubunut_kernel_selftests crash SUT with K-5.19

Bug #2000778 reported by Po-Hsu Lin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Kinetic
Fix Released
Undecided
Unassigned

Bug Description

Issue found with Kinetic 5.19.0-27.28 and 5.19.0-28.29 in this cycle (20221114) on these SUTs
 * P9 baltar
 * ARM64 kuzzle
 * ARM64 howzit-kernel

This should not be considered as a regression as the net test cannot be built in 5.19.0-24.25

Test log:
ubuntu@baltar:~/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/net$ sudo ./pmtu.sh
TEST: ipv4: PMTU exceptions [ OK ]
TEST: ipv4: PMTU exceptions - nexthop objects [ OK ]
TEST: ipv6: PMTU exceptions [ OK ]
TEST: ipv6: PMTU exceptions - nexthop objects [ OK ]
TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [ OK ]
TEST: ICMPv4 with DSCP and ECN: PMTU exceptions - nexthop objects [ OK ]
'socat' command not found; skipping tests
TEST: UDPv4 with DSCP and ECN: PMTU exceptions [SKIP]
TEST: IPv4 over vxlan4: PMTU exceptions [ OK ]
TEST: IPv4 over vxlan4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6 over vxlan4: PMTU exceptions [ OK ]
TEST: IPv6 over vxlan4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4 over vxlan6: PMTU exceptions [ OK ]
TEST: IPv4 over vxlan6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6 over vxlan6: PMTU exceptions [ OK ]
TEST: IPv6 over vxlan6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4 over geneve4: PMTU exceptions [ OK ]
TEST: IPv4 over geneve4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6 over geneve4: PMTU exceptions [ OK ]
TEST: IPv6 over geneve4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4 over geneve6: PMTU exceptions [ OK ]
TEST: IPv4 over geneve6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6 over geneve6: PMTU exceptions [ OK ]
TEST: IPv6 over geneve6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4, bridged vxlan4: PMTU exceptions [ OK ]
TEST: IPv4, bridged vxlan4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6, bridged vxlan4: PMTU exceptions [ OK ]
TEST: IPv6, bridged vxlan4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4, bridged vxlan6: PMTU exceptions [ OK ]
TEST: IPv4, bridged vxlan6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6, bridged vxlan6: PMTU exceptions [ OK ]
TEST: IPv6, bridged vxlan6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4, bridged geneve4: PMTU exceptions [ OK ]
TEST: IPv4, bridged geneve4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6, bridged geneve4: PMTU exceptions [ OK ]
TEST: IPv6, bridged geneve4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4, bridged geneve6: PMTU exceptions [ OK ]
TEST: IPv4, bridged geneve6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6, bridged geneve6: PMTU exceptions [ OK ]
TEST: IPv6, bridged geneve6: PMTU exceptions - nexthop objects [ OK ]
  ovs_bridge not supported
TEST: IPv4, OVS vxlan4: PMTU exceptions [SKIP]
  ovs_bridge not supported
TEST: IPv6, OVS vxlan4: PMTU exceptions [SKIP]
  ovs_bridge not supported
TEST: IPv4, OVS vxlan6: PMTU exceptions [SKIP]
  ovs_bridge not supported
TEST: IPv6, OVS vxlan6: PMTU exceptions [SKIP]
  ovs_bridge not supported
TEST: IPv4, OVS geneve4: PMTU exceptions [SKIP]
  ovs_bridge not supported
TEST: IPv6, OVS geneve4: PMTU exceptions [SKIP]
  ovs_bridge not supported
TEST: IPv4, OVS geneve6: PMTU exceptions [SKIP]
  ovs_bridge not supported
TEST: IPv6, OVS geneve6: PMTU exceptions [SKIP]
TEST: IPv4 over fou4: PMTU exceptions [ OK ]
TEST: IPv4 over fou4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6 over fou4: PMTU exceptions [ OK ]
TEST: IPv6 over fou4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4 over fou6: PMTU exceptions [ OK ]
TEST: IPv4 over fou6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6 over fou6: PMTU exceptions [ OK ]
TEST: IPv6 over fou6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4 over gue4: PMTU exceptions [ OK ]
TEST: IPv4 over gue4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6 over gue4: PMTU exceptions [ OK ]
TEST: IPv6 over gue4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4 over gue6: PMTU exceptions [ OK ]
TEST: IPv4 over gue6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6 over gue6: PMTU exceptions [ OK ]
TEST: IPv6 over gue6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4 over IPv4: PMTU exceptions [ OK ]
TEST: IPv4 over IPv4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6 over IPv4: PMTU exceptions [ OK ]
TEST: IPv6 over IPv4: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv4 over IPv6: PMTU exceptions [ OK ]
TEST: IPv4 over IPv6: PMTU exceptions - nexthop objects [ OK ]
TEST: IPv6 over IPv6: PMTU exceptions [ OK ]
TEST: IPv6 over IPv6: PMTU exceptions - nexthop objects [ OK ]
TEST: vti6: PMTU exceptions [ OK ]
TEST: vti4: PMTU exceptions [ OK ]
'nettest' command not found; skipping tests
  xfrm6udp not supported
TEST: vti6: PMTU exceptions (ESP-in-UDP) [SKIP]
'nettest' command not found; skipping tests
  xfrm4udp not supported
TEST: vti4: PMTU exceptions (ESP-in-UDP) [SKIP]
'nettest' command not found; skipping tests
  xfrm6udprouted not supported
TEST: vti6: PMTU exceptions, routed (ESP-in-UDP) [SKIP]
'nettest' command not found; skipping tests
  xfrm4udprouted not supported
TEST: vti4: PMTU exceptions, routed (ESP-in-UDP) [SKIP]
TEST: vti4: default MTU assignment [ OK ]
TEST: vti6: default MTU assignment [ OK ]
TEST: vti4: MTU setting on link creation [ OK ]
TEST: vti6: MTU setting on link creation [ OK ]
TEST: vti6: MTU changes on link changes [ OK ]
TEST: ipv4: cleanup of cached exceptions [ OK ]
TEST: ipv4: cleanup of cached exceptions - nexthop objects [ OK ]
TEST: ipv6: cleanup of cached exceptions [ OK ]
TEST: ipv6: cleanup of cached exceptions - nexthop objects [FAIL]
  can't delete veth device in a timely manner, PMTU dst likely leaked
(system disconnected here, and reboots)

dmesg output:
[ 481.010254] Kernel attempted to read user page (ffb6e0000) - exploit attempt? (uid: 0)
[ 481.010281] BUG: Unable to handle kernel data access on read at 0xffb6e0000
[ 481.010299] Faulting instruction address: 0xc000000000b5d89c
[ 481.010309] Oops: Kernel access of bad area, sig: 11 [#1]
[ 481.010318] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[ 481.010338] Modules linked in: dummy esp4 ip_vti esp6 xfrm_user xfrm_algo ip6_vti xfrm6_tunnel fou6 ip6_tunnel tunnel6 sit ipip tunnel4 fou ip_tunnel bridge stp llc geneve vxlan ip6_udp_tunnel udp_tunnel act_csum act_pedit cls_flower sch_prio veth cfg80211 input_leds joydev mac_hid binfmt_misc ipmi_powernv ofpart at24 cmdlinepart uio_pdrv_genirq ibmpowernv uio powernv_flash ipmi_devintf opal_prd mtd ipmi_msghandler vmx_crypto ramoops dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua pstore_blk pstore_zone reed_solomon ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic ses enclosure scsi_transport_sas usbhid hid ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_vpmsum crc32c_vpmsum drm i40e xhci_pci aacraid xhci_pci_renesas drm_panel_orientation_quirks
[ 481.010578] CPU: 33 PID: 0 Comm: swapper/33 Not tainted 5.19.0-28-generic #29-Ubuntu
[ 481.010599] NIP: c000000000b5d89c LR: c0000000011d4504 CTR: c0000000011d4690
[ 481.010619] REGS: c000000fffe7ba80 TRAP: 0300 Not tainted (5.19.0-28-generic)
[ 481.010639] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24002222 XER: 00000000
[ 481.010667] CFAR: c0000000011d4500 DAR: 0000000ffb6e0000 DSISR: 40000000 IRQMASK: 0
[ 481.010667] GPR00: c0000000011d4504 c000000fffe7bd20 c000000002afe000 c00000061fcf6b00
[ 481.010667] GPR04: ffffffffffffffff 0000000000000020 0000000000000000 0000000000000000
[ 481.010667] GPR08: 0000000000000000 0000000000000000 0000000ffb6e0000 0000000000002000
[ 481.010667] GPR12: c0000000011d4690 c000000ffffbbe80 c000000ffa317f90 0000000000000000
[ 481.010667] GPR16: 0000000000000001 c000000002b33a80 000000010000b0bd c000000002187888
[ 481.010667] GPR20: 000000000000000a c000000002210800 0000000000000000 0000000000000000
[ 481.010667] GPR24: c0000000028f6080 c000000ffd8f1e78 000000000000000a c000000ffd8f1e00
[ 481.010667] GPR28: 0000000000000001 c000000fffe7bdb0 0000000000000000 c00000061fcf6b00
[ 481.010843] NIP [c000000000b5d89c] percpu_counter_add_batch+0x2c/0x120
[ 481.010865] LR [c0000000011d4504] dst_destroy+0x184/0x1c0
[ 481.010885] Call Trace:
[ 481.010899] [c000000fffe7bd20] [c00000000f781980] 0xc00000000f781980 (unreliable)
[ 481.010921] [c000000fffe7bd60] [c0000000011d4504] dst_destroy+0x184/0x1c0
[ 481.010942] [c000000fffe7bd90] [c00000000027b2c8] rcu_do_batch+0x178/0x560
[ 481.010962] [c000000fffe7be30] [c000000000282e6c] rcu_core+0x15c/0x2a0
[ 481.010982] [c000000fffe7be80] [c0000000014e673c] __do_softirq+0x16c/0x47c
[ 481.011003] [c000000fffe7bf90] [c000000000017fec] do_softirq_own_stack+0x4c/0xb0
[ 481.011025] [c00000000f8d3ab0] [c00000000218b0b8] tick_cpu_sched+0x0/0xe0
[ 481.011056] [c00000000f8d3af0] [c0000000001957d8] __irq_exit_rcu+0xd8/0x140
[ 481.011086] [c00000000f8d3b10] [c000000000196560] irq_exit+0x20/0x40
[ 481.011107] [c00000000f8d3b30] [c000000000029b28] timer_interrupt+0x188/0x3a0
[ 481.011138] [c00000000f8d3b90] [c000000000017bb4] replay_soft_interrupts+0x144/0x320
[ 481.011159] [c00000000f8d3d80] [c000000000017f44] arch_local_irq_restore.part.0+0x1b4/0x1c0
[ 481.011190] [c00000000f8d3db0] [c0000000010f8280] cpuidle_enter_state+0x120/0x770
[ 481.011222] [c00000000f8d3e30] [c0000000010f8984] cpuidle_enter+0x54/0x90
[ 481.011251] [c00000000f8d3e70] [c00000000021bf70] cpuidle_idle_call+0x1f0/0x330
[ 481.011282] [c00000000f8d3ec0] [c00000000021c1b0] do_idle+0x100/0x200
[ 481.011313] [c00000000f8d3f10] [c00000000021c52c] cpu_startup_entry+0x3c/0x50
[ 481.011343] [c00000000f8d3f40] [c00000000006ac2c] start_secondary+0x2ac/0x340
[ 481.011374] [c00000000f8d3f90] [c00000000000d154] start_secondary_prolog+0x10/0x14
[ 481.011395] Instruction dump:
[ 481.011411] 4bffff68 3c4c01fa 38420790 7c0802a6 fbe1fff8 fba1ffe8 fbc1fff0 7c7f1b78
[ 481.011436] f8010010 f821ffc1 e94d0030 e9230020 <7faa482e> 7fbe07b4 7fde2214 7fc9fe76

Po-Hsu Lin (cypressyew)
tags: added: 5.19 kinetic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2000778

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Kinetic):
status: New → Incomplete
Revision history for this message
Luke Nowakowski-Krijger (lukenow) wrote :

looks like a similar stack trace found here https://www.spinics.net/lists/kernel/msg4555769.html

Revision history for this message
Francis Ginther (fginther) wrote (last edit ):

Still failing on baltar.ppc64el.9 during 2023.02.27 sru cycle. The kuzzle and scobee (another arm64 server) passed.

tags: added: sru-20230227
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

kuzzle and scobee are actually failing with 64k generic and 64k lowlatency.

The only ARM64 box that can pass with 64k is ARM64 node starmie-kernel. Others like dazzle, kuzzle, helo-kernel, kopter-kernel, scobee-kernel are all failing with 64k. I wonder why we're having this difference.

Revision history for this message
Luke Nowakowski-Krijger (lukenow) wrote :

I could not reproduce this on baltar or kopter-kernel just by running the kernel net selftest suite a few weeks ago, and no one else seems to be experiencing similar problems that I can find.

I suspect there is some configuration of the vrfs from previous tests/runs that might be influencing this crash. Still need to do more debugging on it.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue seems to be fixed with 5.19.0-40 this cycle (sru-20230320)

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Checked with K-5.19, K-lowlatency-5.19, J-hwe-5.19 and J-hwe-lowlatency-5.19. This issue does not exist anymore, therefore I am closing this bug.

Changed in ubuntu-kernel-tests:
status: New → Fix Released
Changed in linux (Ubuntu Kinetic):
status: Incomplete → Fix Released
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I just noticed that we have this hint for Lunar as well, I will revert the bug status to incomplete to verify it.

Changed in ubuntu-kernel-tests:
status: Fix Released → Incomplete
Changed in linux (Ubuntu):
status: Invalid → Incomplete
Revision history for this message
Po-Hsu Lin (cypressyew) wrote (last edit ):

I can confirm that this issue does not exist in L / L-lowlatency.
Hints removed.

Changed in ubuntu-kernel-tests:
status: Incomplete → Fix Released
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.