IPv6 TCP in reuseport_bpf_cpu from ubuntu_kernel_selftests/net crash P8 node entei (Oops: Exception in kernel mode, sig: 4 [#1])

Bug #1927076 reported by Po-Hsu Lin
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Incomplete
Undecided
Unassigned
ubuntu-kernel-tests
New
Undecided
Unassigned
linux (Ubuntu)
Incomplete
Undecided
Unassigned
Focal
Confirmed
Undecided
Unassigned
Hirsute
Won't Fix
Undecided
Unassigned
Impish
Won't Fix
Undecided
Unassigned

Bug Description

It looks like our P8 node "entei" tend to fail with the IPv6 TCP test from reuseport_bpf_cpu in ubuntu_kernel_selftests/net on 5.8 kernels:

 # send cpu 119, receive socket 119
 # send cpu 121, receive socket 121
 # send cpu 123, receive socket 123
 # send cpu 125, receive socket 125
 # send cpu 127, receive socket 127
 # ---- IPv6 TCP ----
publish-job-status: using request.json

It failed silently here, this can be 100% reproduced with Groovy 5.8 and Focal 5.8.

This will cause the ubuntu_kernel_selftests being interrupted, the test result for other tests cannot be processed to our result page.

Please find attachment for the complete "net" test result on this node with Groovy 5.8.0-52.59

Add the kqa-blocker tag as this might needs to be manually verified.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
description: updated
tags: added: 5.8 focal groovy kqa-blocker ppc64el sru-20210412 ubuntu-kernel-selftests
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

It looks like this test will cause system reboot, without suspicious error messages in syslog

ubuntu@entei:~/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/net$ sudo ./reuseport_bpf_cpu
....
send cpu 125, receive socket 125
send cpu 127, receive socket 127
---- IPv6 TCP ----
packet_write_wait: Connection to 10.245.71.180 port 22: Broken pipe
(system rebooted)

In syslog:
ay 5 06:09:30 entei systemd[1]: motd-news.service: Succeeded.
May 5 06:09:30 entei systemd[1]: Finished Message of the Day.
May 5 06:09:31 entei systemd[1]: apt-daily-upgrade.service: Succeeded.
May 5 06:09:31 entei systemd[1]: Finished Daily apt upgrade and clean activities.
May 5 06:14:32 entei PackageKit: daemon quit
May 5 06:14:32 entei systemd[1]: packagekit.service: Succeeded.
May 5 06:14:53 entei ntpd[42145]: kernel reports TIME_ERROR: 0x2041: Clock Unsynchronized
May 5 06:15:12 entei systemd[1]: Started Session 4 of user ubuntu.
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@May 5 06:21:15 entei systemd-sysctl[1479]: Not setting net/ipv4/conf/all/promote_secondaries (explicit setting exists).
May 5 06:21:15 entei systemd-sysctl[1479]: Not setting net/ipv4/conf/default/promote_secondaries (explicit setting exists).
May 5 06:21:15 entei lvm[1468]: /dev/sdc: open failed: No medium found

System rebooted around 06:15:12

This can be reproduced with 5.8.0-50-generic as well.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Download full text (5.5 KiB)

OK, it's a combination effect, this issue can be reproduced in the following order:
1. Run the cpu-hotplug test
   sudo ./autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/cpu-hotplug/cpu-on-off-test.sh
2. Run the reuseport_bpf_cpu test
   sudo ./autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/net/reuseport_bpf_cpu

You may need to run reuseport_bpf_cpu multiple times to trigger this.
But it looks OK if the cpu-hotplug test was not executed first

[ 287.477797] Oops: Exception in kernel mode, sig: 4 [#1]
[ 287.477841] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[ 287.477990] Modules linked in: binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev input_leds mac_hid ofpart cmdlinepart plx_dma powernv_flash mtd at24 ipmi_powernv uio_pdrv_genirq powernv_rng ipmi_devintf ibmpowernv ipmi_msghandler opal_prd uio vmx_crypto sch_fq_codel ip_tables x_tables autofs4 btrfs blake2b_generic hid_generic raid10 raid456 usbhid uas async_raid6_recov hid async_memcpy async_pq usb_storage async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast drm_vram_helper drm_ttm_helper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core crct10dif_vpmsum crc32c_vpmsum drm ahci tg3 libahci drm_panel_orientation_quirks xhci_pci xhci_pci_renesas
[ 287.478276] CPU: 0 PID: 3267 Comm: reuseport_bpf_c Not tainted 5.8.0-50-generic #56-Ubuntu
[ 287.478294] NIP: c008000001592094 LR: c000000000ea092c CTR: c008000001592094
[ 287.478313] REGS: c0000007ff6eb510 TRAP: 0e40 Not tainted (5.8.0-50-generic)
[ 287.478330] MSR: 900000000288b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002488 XER: 20000000
[ 287.478356] CFAR: c000000000ea0928 IRQMASK: 0
[ 287.478356] GPR00: c000000000ea0b04 c0000007ff6eb7a0 c0000000020dd900 c000000712caf2e0
[ 287.478356] GPR04: c008000001260038 c008000001260000 c000000712caf2e0 0000000000000028
[ 287.478356] GPR08: 0000000129432812 0000000000000000 c00000077f82bd58 0000000000000000
[ 287.478356] GPR12: c008000001592094 c000000002380000 c000000002003e80 00000000000022b8
[ 287.478356] GPR16: 00000000000049c3 000000000000000a 0000000000000001 0000000000000001
[ 287.478356] GPR20: c00000077f82bd48 0000000000000000 00000000000022b8 0000000000000001
[ 287.478356] GPR24: 0000000000000001 0000000000000000 c008000001260000 0000000000000080
[ 287.478356] GPR28: c000000712caf2e0 0000000000000028 0000000000000028 c008000001260000
[ 287.478628] NIP [c008000001592094] 0xc008000001592094
[ 287.478645] LR [c000000000ea092c] __bpf_prog_run_save_cb+0x5c/0x190
[ 287.478660] Call Trace:
[ 287.478671] [c0000007ff6eb7a0] [c000000000f3f84c] __ip_queue_xmit+0x18c/0x4d0 (unreliable)
[ 287.478691] [c0000007ff6eb810] [c000000000ea0b04] run_bpf_filter+0xa4/0x1f0
[ 287.478709] [c0000007ff6eb870] [c000000000ea0cd0] reuseport_select_sock+0x80/0x170
[ 287.478728] [c0000007ff6eb8b0] [c0000000010838ec] inet6_lhash2_lookup+0x1dc/0x200
[ 287.478748] [c0000007ff6eb930] [c000000001083a7c] inet6_lookup_listener+0x16c/0x180
[ 287.478768] [c0000007ff6eba00] [c00000000105e968] tcp_v6_rcv+0x828/0xf50
[ 287.478785] [c000...

Read more...

summary: - IPv6 TCP in reuseport_bpf_cpu from ubuntu_kernel_selftests/net tend to
- fail on P8 node entei with 5.8 kernel
+ IPv6 TCP in reuseport_bpf_cpu from ubuntu_kernel_selftests/net crash P8
+ node entei with 5.8 kernel (Oops: Exception in kernel mode, sig: 4 [#1])
summary: IPv6 TCP in reuseport_bpf_cpu from ubuntu_kernel_selftests/net crash P8
- node entei with 5.8 kernel (Oops: Exception in kernel mode, sig: 4 [#1])
+ node entei on 5.8 kernel (Oops: Exception in kernel mode, sig: 4 [#1])
tags: removed: kqa-blocker
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: IPv6 TCP in reuseport_bpf_cpu from ubuntu_kernel_selftests/net crash P8 node entei on 5.8 kernel (Oops: Exception in kernel mode, sig: 4 [#1])

Remove the kqa-blocker tag, as it can be reproduced with the kernel in updates.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1927076

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-192677 severity-high targetmilestone-inin---
Revision history for this message
Andrew Cloke (andrew-cloke) wrote : Re: IPv6 TCP in reuseport_bpf_cpu from ubuntu_kernel_selftests/net crash P8 node entei on 5.8 kernel (Oops: Exception in kernel mode, sig: 4 [#1])

Since the groovy 5.8 kernel is now EOL, can this be reproduced with the 5.11 kernel?

Or can we close this bug out?

Changed in ubuntu-power-systems:
status: New → Incomplete
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Download full text (6.2 KiB)

Hi Andrew,

I just retest this manually on node entei with the steps in comment #3, and this issue can be reproduced (system gets reboot) with a different message from the ipmi console.

[ 417.696448] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
[ 417.696522] Faulting instruction address: 0x00000000
[ 417.696677] Oops: Kernel access of bad area, sig: 11 [#1]
[ 417.696693] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[ 417.696715] Modules linked in: binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev input_leds mac_hid ofpart plx_dma cmdlinepart ipmi_powernv pow
ernv_flash ipmi_devintf ibmpowernv at24 vmx_crypto opal_prd ipmi_msghandler powernv_rng mtd uio_pdrv_genirq uio sch_fq_codel ip_tables x_tables autofs4 btrfs blake2b_
generic uas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor hid_generic usbhid hid usb_storage async_tx xor raid6_pq libcrc32c raid1 raid0 multipath
linear ast drm_vram_helper i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_vpmsum cec crc32c_vpmsum rc_core drm
 ahci tg3 xhci_pci libahci drm_panel_orientation_quirks xhci_pci_renesas
[ 417.697008] CPU: 0 PID: 3117 Comm: reuseport_bpf_c Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
[ 417.697034] NIP: 0000000000000000 LR: c000000000e77ba8 CTR: 0000000000000000
[ 417.697055] REGS: c0000007ff6e74d0 TRAP: 0400 Not tainted (5.11.0-27-generic)
[ 417.697077] MSR: 9000000040009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022444 XER: 20000000
[ 417.697309] CFAR: c000000000010300 IRQMASK: 0
[ 417.697309] GPR00: c000000000e77b80 c0000007ff6e7770 c000000001e99600 c000000014cecd00
[ 417.697309] GPR04: c008000004230038 c000000014cecd00 0000000000000008 0000000000000001
[ 417.697309] GPR08: 0000000000000001 0000000000000000 c0000000501e9580 0000000000000000
[ 417.697309] GPR12: 0000000000000000 c000000002150000 0000000000000000 0000000000000000
[ 417.697309] GPR16: 0000000000000040 c00000078e09a480 0000000000000001 0000000000000001
[ 417.697309] GPR20: 00000000000022b8 0000000000000000 000000000000cfb3 000000000100007f
[ 417.697309] GPR24: 0000000000000000 0000000000000008 c000000001dba880 c008000004230000
[ 417.697309] GPR28: 0000000000000080 c000000003292000 0000000090dd40dc c000000014cecd00
[ 417.697503] NIP [0000000000000000] 0x0
[ 417.697517] LR [c000000000e77ba8] reuseport_select_sock+0x108/0x3f0
[ 417.697541] Call Trace:
[ 417.697550] [c0000007ff6e7810] [c000000000f64314] udp4_lib_lookup2+0x1a4/0x2b0
[ 417.697576] [c0000007ff6e7890] [c000000000f65928] __udp4_lib_lookup+0x358/0x540
[ 417.697602] [c0000007ff6e79d0] [c000000000f66978] __udp4_lib_rcv+0x608/0xe10
[ 417.697626] [c0000007ff6e7af0] [c000000000f0fa20] ip_protocol_deliver_rcu+0x60/0x2c0
[ 417.697813] [c0000007ff6e7b40] [c000000000f0fcf0] ip_local_deliver_finish+0x70/0x90
[ 417.697838] [c0000007ff6e7b60] [c000000000f0fda0] ip_local_deliver+0x90/0x180
[ 417.697861] [c0000007ff6e7be0] [c000000000f0f140] ip_rcv_finish+0xc0/0xf0
[ 417.697883] [c0000007ff6e7c20] [c000000000f0ffa8] ip_rcv+0x118/0x130
[ 417.697904] [c0000007ff6e7ca0] [c000000000e3a3b4] __netif_receive_skb_one_...

Read more...

Po-Hsu Lin (cypressyew)
Changed in ubuntu-power-systems:
status: Incomplete → Confirmed
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue can be reproduced on P8 node entei with:
  * F-5.4 (5.4.0-81-generic)
  * F-5.11 (5.11.0-27-generic #29~20.04.1-Ubuntu)
  * H-5.11 (5.11.0-31-generic)

Revision history for this message
Patricia Domingues (patriciasd) wrote :

Po-Hsu Lin,
Thanks for the info.
I was able to reproduce the issue on other 2 Power8 servers, but just running the `reuseport_bpf_cpu test` more than once (as you mentioned on comment#3).
I've tested this with focal-hwe (Linux thiel 5.11.0-27-generic) and hirsute (5.11.0-31-generic).
steps:
1. Run the cpu-hotplug test; 2. Run the reuseport_bpf_cpu test; 3. re-run the reuseport_bpf_cpu test.

```
thiel login:
[24669.414656] Oops: Exception in kernel mode, sig: 4 [#1]
[24669.414710] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
```

```
gulpin login:
[277274.876010] Oops: Exception in kernel mode, sig: 4 [#1]
[277274.876235] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
```

I've also tested this 2 servers above with the upstream kernel from `https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11.22/`
I put the `reuseport_bpf_cpu test` on 20x loop and did not hit the issue, so I'd say we may have an issue with ubuntu-kernel.
all 20x the test reported a success and the server did not reboot.
```
send cpu 157, receive socket 157
send cpu 159, receive socket 159
SUCCESS
```

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Can you repeat tests with latest kernels? I could not reproduce it on Focal with following configurations:
1. Power8: P8LPAR05 MAAS
2. Power9: QEMU with 4 or 128 CPUs (and 4 GB of RAM)

Tested kernels:
F/5.4.0-84-generic
F/5.11.0-34-generic

Tried steps:
1. Freshly boot machine.
2. Log in via ssh.
3. sudo ./autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/cpu-hotplug/cpu-on-off-test.sh

4. for i in `seq 100`; do echo $i ; sleep 2 ; sudo ./autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/net/reuseport_bpf_cpu ; done

Revision history for this message
Patricia Domingues (patriciasd) wrote :
Revision history for this message
Patricia Domingues (patriciasd) wrote :
Revision history for this message
Patricia Domingues (patriciasd) wrote :

Ok I've re-ran the test with latest kernel versions on the same systems:

`thiel` (8001-22C) with focal-hwe (5.11.0-34-generic):
```
[ 3255.763649] Oops: Exception in kernel mode, sig: 5 [#1]
[ 3255.763723] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
```

And

`gulpin` (8335-GTA) with hirsute (5.11.0-34-generic)
2nd run of `reuseport_bpf_cpu`:
```
[ 760.451968] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
[ 760.452035] Faulting instruction address: 0x00000000
[ 760.452196] Oops: Kernel access of bad area, sig: 11 [#1]
[ 760.452212] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
```

Revision history for this message
Patricia Domingues (patriciasd) wrote :

Also re-ran on
`entei` it is also a POWER8 (8335-GTA) with Hirsute latest kernel (5.11.0-34-generic)

hit the same error - second run of `reuseport_bpf_cpu`:
```
[ 232.349547] Oops: Exception in kernel mode, sig: 4 [#1]
[ 232.349647] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
...
[ 232.355607] LR [000008d19f3b15a8] 0x8d19f3b15a8
[ 232.355855] --- interrupt: c00
[ 232.355869] Instruction dump:
[ 232.356114] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[ 232.356374] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[ 232.356905] ---[ end trace c99c88cea832039b ]---
[ 232.508560]
[ 233.508662] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 233.950570] Rebooting in 10 seconds..

```

Revision history for this message
Patricia Domingues (patriciasd) wrote :
Revision history for this message
Patricia Domingues (patriciasd) wrote :

Krzysztof, you were on a PowerVM LPAR (P8LPAR05), just let me know if there's anything that need to be tested

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, Patricia.

Can you clarify if you always need to run the hotplug test before reuseport_bpf_cpu in order to reproduce? I wonder what is the state that the hotplug test leaves the system in? How is it being run? The makefile runs it with the -a option, which in the systems I have available would fail to offline the last CPU (which is expected, different behavior from x86, where cpu0 cannot be offlined). Running it without any options would only offline the last CPU and online it again.

I tried looking for differences between our kernels and 5.11.22 and the only cpuset changes I noticed were already present in the kernel you have just tested, and they were on paths unrelated to hotplug or BPF, so I am still baffled as to the real differences here.

And given the different systems fail differently, it looks like this will require a dump or xmon so we can debug it.

Thanks for all the help. I may ask for system access next week in order to help there.
Cascardo.

Revision history for this message
Patricia Domingues (patriciasd) wrote :

Cascardo, I was trying to reproduce the issue as Po-Hsu Lin has mentioned (#3), but the hotplug leaves the system in the same state (shows this output):
```
./cpu-hotplug/cpu-on-off-test.sh
pid 21291's current affinity mask: ffffffffffffffffffffffffffffffff
pid 21291's new affinity mask: 1
CPU online/offline summary:
present_cpus = 0-127 present_max = 127
  Cpus in online state: 0-127
  Cpus in offline state: 0
Limited scope test: one hotplug cpu
  (leaves cpu in the original state):
  online to offline to online: cpu 127
```
I was running this way:
```
ubuntu@gulpin:~/ubuntu-hirsute/tools/testing/selftests$
make TARGETS=net
./cpu-hotplug/cpu-on-off-test.sh
./net/reuseport_bpf_cpu
```
Let me know if there's anything else needed.

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Thanks Patricia for tests. It looks it was seen before: lp:1909286. I will mark it as duplicate.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Hirsute):
status: New → Confirmed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This is also failing with focal/linux on the node dryden.

The latest result is from 5.4.0-85.95, which is stopping around this test:

04:57:26 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
[...]
04:57:26 DEBUG| [stdout] # ---- IPv6 UDP ----
[...]
04:57:26 DEBUG| [stdout] # send cpu 145, receive socket 145
04:57:26 DEBUG| [stdout] # send cpu 147, receive socket 147
04:57:26 DEBUG| [stdout] # send cpu 149, receive socket 149
04:57:26 DEBUG| [stdout] # send cpu 151, receive socket 151
04:57:26 DEBUG| [stdout] # ---- IPv4 TCP ----

This is from an automated test run, so I don't have access to the kernel logs.

I have found the issue on all the regression tests run on this system since the oldest results we still have (5.4.0-76.85).

Changed in linux (Ubuntu Focal):
status: New → Confirmed
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

I was looking at latest changes since 5.11 to powerpc64 BPF JIT and found the following commit:

commit 20ccb004bad659c186f9091015a956da220d615d
Author: Naveen N. Rao <email address hidden>
Date: Wed Jun 9 14:30:24 2021 +0530

    powerpc/bpf: Use bctrl for making function calls

    blrl corrupts the link stack. Instead use bctrl when making function
    calls from BPF programs.

    Reported-by: Anton Blanchard <email address hidden>
    Signed-off-by: Naveen N. Rao <email address hidden>
    Signed-off-by: Michael Ellerman <email address hidden>
    Link: https://<email address hidden>

Though the link stack is unarchitected, that is, it should be transparent to the user aside from branch prediction performance, perhaps there is a bug in the implementation. Considering we have only observed this on POWER8 and with different stack traces, I wouldn't discard the possibility.

As this is not present on 5.13 either, I am building a test kernel with a backport so it can be tested.

Cascardo.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Thadeu, It is present on v5.13 (tested v5.13.17).

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Krzysztof mentioned that this has been found on 5.14 as well. Using a system he lent me (huggins), I also tested with the commit that changed the call to use CTR and it failed as well. But it always failed when __bpf_prog_run_save_cb was calling the jited bpf_func, and CTR always matched NIP (though in that case, it is the CTR from __bpf_prog_run_save_cb, not the JITed code). Sometimes it was NULL (all zeroes), sometimes it looked like a legit kernel address, and I got one 0xfe800000fe80000000 (or something like it), which looks like some corruption on bpf_prog.

Also, I noticed it doesn't happen always on CPU 0, which would be odd on its own. But it seems more likely. And it's either very hard to reproduce without doing the CPU hotplug or it is really necessary, and I left the program running on a loop for a long time and did not have any luck.

I also changed it to an eBPF program instead of cBPF, but still a socket filter type. And used get_smp_processor_id instead of the raw_processor_id (though I recall this being the same on ppc64el), and it still reproduced. And when I returned a constant instead of doing the call, it also reproduced. No wonder, as when it fails, the program never runs. But the way those programs are compiled makes no difference.

Cascardo.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote (last edit ):

I tested that when reuseport_bpf_cpu did not consider the last CPU, the one that has been hotplugged, it didn't crash. It didn't set affinity to that CPU, didn't even allocate socket for it.

Then, I realized that attaching the BPF code was happening on that CPU as it happened right after the tests were run, and the last test set the CPU affinity to that CPU. So, I set the affinity to CPU 0 right before attaching the BPF code. So far, the system did not crash either.

Cascardo.

Scratch that. It looks like the system was not reproducing until I rebooted and tested it again. The last test didn't work out. That is, even setting affinity to a different CPU before attaching BPF resulted in a crash.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I will mark the other 2 bugs as a dup of this one, due to the fact that Thadeu has provided some further investigations here.

Revision history for this message
Krzysztof Kozlowski (krzk) wrote (last edit ):

Since this is leading bug, I will copy&paste here also list of reproducible environments from the other bug:

Also reproduced on (huggins, POWER8NVL, 8335-GTB):
* 5.11.0-20-generic mainline (v5.11.22).
* 5.13.17-051317-generic mainline fails even on first run of reuseport_bpf_cpu test.
* 5.14.4-051404-generic mainline after 4 tries of the test.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :
Download full text (5.9 KiB)

9f:mon> di c008000013566000 1000
c008000013566000 7fe00008 trap
 ...
c008000013566eb8 60000000 nop
 ...
c008000013566ec0 7c0802a6 mflr r0
c008000013566ec4 f8010010 std r0,16(r1)
c008000013566ec8 f821ffa1 stdu r1,-96(r1)
c008000013566ecc 3d80c000 lis r12,-16384
c008000013566ed0 798c07c6 rldicr r12,r12,32,31
c008000013566ed4 658c0036 oris r12,r12,54
c008000013566ed8 618c51e0 ori r12,r12,20960
c008000013566edc 7d8903a6 mtctr r12
c008000013566ee0 4e800421 bctrl
c008000013566ee4 7c681b78 mr r8,r3
c008000013566ee8 38210060 addi r1,r1,96
c008000013566eec e8010010 ld r0,16(r1)
c008000013566ef0 7c0803a6 mtlr r0
c008000013566ef4 7d034378 mr r3,r8
c008000013566ef8 4e800020 blr
c008000013566efc 7fe00008 trap
 ...
9f:mon> r
R00 = c0000000000173d8 R16 = c000007fe13e8cb0
R01 = c0000040074efda0 R17 = c000007f8f6a0000
R02 = c0000000022d9900 R18 = c000007f8f6a0080
R03 = c0000040074efbd8 R19 = c000007f8f6a0080
R04 = 0000000000200000 R20 = c0000000012ff767
R05 = 0000000000030000 R21 = c000007f8f6a0080
R06 = 0000000000020000 R22 = c00000000136cc78
R07 = 0000000000517782 R23 = 0000000000000001
R08 = 0000001ebf7e3a55 R24 = 000000000000009f
R09 = 0000000000000000 R25 = 0000000000000e60
R10 = 0000000000000001 R26 = 0000000000000900
R11 = 0000000000000f8e R27 = 0000000000000500
R12 = 0000000000004400 R28 = 0000000000000a00
R13 = c000007fff6f4f80 R29 = 0000000000000f00
R14 = 0000000000000000 R30 = 0000000000000002
R15 = c0000000012f6020 R31 = 0000000000000003
pc = c000000000017038 replay_soft_interrupts+0x68/0x2e0
cfar= 0000000000000000
lr = c0000000000173d8 arch_local_irq_restore+0x128/0x160
msr = 9000000000001033 cr = 24004428
ctr = c000000000042468 xer = 0000000020000000 trap = 500
9f:mon> c0
[link register ] c000000000f36d4c __bpf_prog_run_save_cb+0x5c/0x190
[c000003ffffa3780] c000000000fdf76c __ip_finish_output+0x8c/0x140 (unreliable)
[c000003ffffa37f0] c000000000f36f2c run_bpf_filter+0xac/0x200
[c000003ffffa3850] c000000000f37104 reuseport_select_sock+0x84/0x170
[c000003ffffa3890] c00000000112c1f8 inet6_lhash2_lookup+0x1c8/0x200
[c000003ffffa3910] c00000000112c48c inet6_lookup_listener+0x25c/0x280
[c000003ffffa3a00] c000000001105e58 tcp_v6_rcv+0x7b8/0xf50
[c000003ffffa3b50] c0000000010b79c0 ip6_protocol_deliver_rcu+0x110/0x630
[c000003ffffa3bc0] c0000000010b803c ip6_input+0x10c/0x130
[c000003ffffa3c40] c0000000010b76c4 ipv6_rcv+0x194/0x1c0
[c000003ffffa3cc0] c000000000ef68f4 __netif_receive_skb_one_core+0x74/0xb0
[c000003ffffa3d10] c000000000ef6d68 process_backlog+0x138/0x280
[c000003ffffa3d80] c000000000ef7e00 napi_poll+0x100/0x3c0
[c000003ffffa3e10] c000000000ef81b4 net_rx_action+0xf4/0x2d0
[c000003ffffa3ea0] c0000000011815f0 __do_softirq+0x150/0x428
[c000003ffffa3f90] c00000000002caec call_do_softirq+0x14/0x24
[c000000008fcf630] c000000000017448 do_softirq_own_stack+0x38/0x50
[c000000008fcf650] c0000000001640a0 do_softirq+0xa0/0xb0
[c000000008fcf680] c0000000001641a8 __local_bh_enable_ip+0xf8/0x120
[c000000008fcf6a0] c0000000010b1a98 ip6_finish_output2+0x248/0x7c0
[c...

Read more...

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

About this latest comment. So, CPU #0 has crashed at pc = c008000013566eb8, its ctr and r12 match, same as usual, it was called by __bpf_prog_run_save_cb as the BPF JITed program. Dumping the program from CPU #0 perspective, it has traps at that address.

It turns out the JIT fills up a whole page with traps and puts the JITed BPF program on a random offset of that page (look at kernel/bpf/core.c:bpf_jit_binary_alloc).

When we go to the hotplugged CPU, however, CPU #9f (159), that same page looks different, with the code placed where it was expected.

Still, it looks like fp->aux->jit_data is NULL on both CPUs, which is not as expected.

I am wondering if either the icache is not being flushed properly, or RCU is not operating correctly. As other issues are not seen, more likely something related to the icache. But I don't see any IPIs involved when flushing the icache, so possibly firmware or micro-architecture related?

Cascardo.

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Sent request upstream:

https://lore.kernel.org/linuxppc-dev/YUpIqytZqpohq4EM@mussarela/T/#u

I will ping some folks for some help there.

Cascardo.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hey Krzysztof and Thadeu,
Thanks for the follow-up and the info!

bugproxy (bugproxy)
tags: added: bugnameltc-194783 severity-medium
removed: bugnameltc-192677 severity-high
Po-Hsu Lin (cypressyew)
summary: IPv6 TCP in reuseport_bpf_cpu from ubuntu_kernel_selftests/net crash P8
- node entei on 5.8 kernel (Oops: Exception in kernel mode, sig: 4 [#1])
+ node entei (Oops: Exception in kernel mode, sig: 4 [#1])
bugproxy (bugproxy)
tags: added: bugnameltc-192677 severity-high
removed: bugnameltc-194783 severity-medium
Revision history for this message
Daniel Axtens (daxtens) wrote (last edit ):

I can repro this with the latest Focal kernel (5.4.0-90) on:

    description: PowerNV
    product: 8247-22L (IBM Power System S822L)

Trying to see if I can repro it upstream.

FWIW my opening hypothesis is that something in a percpu data structure isn't getting updated over hotplug.

Revision history for this message
Daniel Axtens (daxtens) wrote :

I can repro on upstream, all the way back to 5.4.0. It might have existed before that - I haven't tested any earlier yet.

Was the test methodology changed just before this was found? I'm just wondering why it suddenly appeared ~a year after Focal was released. I thought it might have been a patch picked up for a SRU, but it's looking like the problem predates Focal by some way...

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

I can repro this with the latest Focal kernel on:

Revision history for this message
Daniel Axtens (daxtens) wrote :

I've made some good progress here.

I found that older version like 4.19 work, so I ran git bisect. I'm still doing the final check, but it looks like the series that causes the issue is the one containing these:

d53d2f78cead bpf: Use vmalloc special flag
1a7b7d922081 modules: Use vmalloc special flag
868b104d7379 mm/vmalloc: Add flag for freeing of special permsissions

In particular:

commit 868b104d7379e28013e9d48bdd2db25e0bdcf751 (HEAD)
Author: Rick Edgecombe <email address hidden>
Date: Thu Apr 25 17:11:36 2019 -0700

    mm/vmalloc: Add flag for freeing of special permsissions

    Add a new flag VM_FLUSH_RESET_PERMS, for enabling vfree operations to
    immediately clear executable TLB entries before freeing pages, and handle
    resetting permissions on the directmap. This flag is useful for any kind
    of memory with elevated permissions, or where there can be related
    permissions changes on the directmap. Today this is RO+X and RO memory.

    Although this enables directly vfreeing non-writeable memory now,
    non-writable memory cannot be freed in an interrupt because the allocation
    itself is used as a node on deferred free list. So when RO memory needs to
    be freed in an interrupt the code doing the vfree needs to have its own
    work queue, as was the case before the deferred vfree list was added to
    vmalloc.

    For architectures with set_direct_map_ implementations this whole operation
    can be done with one TLB flush when centralized like this. For others with
    directmap permissions, currently only arm64, a backup method using
    set_memory functions is used to reset the directmap. When arm64 adds
    set_direct_map_ functions, this backup can be removed.

    When the TLB is flushed to both remove TLB entries for the vmalloc range
    mapping and the direct map permissions, the lazy purge operation could be
    done to try to save a TLB flush later. However today vm_unmap_aliases
    could flush a TLB range that does not include the directmap. So a helper
    is added with extra parameters that can allow both the vmalloc address and
    the direct mapping to be flushed during this operation. The behavior of the
    normal vm_unmap_aliases function is unchanged.

and

commit d53d2f78ceadba081fc7785570798c3c8d50a718
Author: Rick Edgecombe <email address hidden>
Date: Thu Apr 25 17:11:38 2019 -0700

    bpf: Use vmalloc special flag

    Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
    permissioned memory in vmalloc and remove places where memory was set RW
    before freeing which is no longer needed. Don't track if the memory is RO
    anymore because it is now tracked in vmalloc.

This is _extremely_ in "subtly break under the hash MMU" areas.

Hopefully this is enough to get some Power MMU experts to weigh in. I will keep working on it.

Revision history for this message
Brian Murray (brian-murray) wrote :

The Hirsute Hippo has reached End of Life, so this bug will not be fixed for that release.

Changed in linux (Ubuntu Hirsute):
status: Confirmed → Won't Fix
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Marking as "incomplete" while waiting for input from Power MMU experts.

Changed in ubuntu-power-systems:
status: Confirmed → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-02-22 09:29 EDT-------
(In reply to comment #32)
> Marking as "incomplete" while waiting for input from Power MMU experts.

Adding a couple of developers from MM team to review...

Po-Hsu Lin (cypressyew)
tags: added: 5.13 impish
bugproxy (bugproxy)
tags: added: bugnameltc-194783 severity-medium
removed: bugnameltc-192677 severity-high
Po-Hsu Lin (cypressyew)
tags: added: sru-20220711
bugproxy (bugproxy)
tags: added: bugnameltc-192677 severity-high
removed: bugnameltc-194783 severity-medium
bugproxy (bugproxy)
tags: added: bugnameltc-194783 severity-medium
removed: bugnameltc-192677 severity-high
Revision history for this message
Frank Heimes (fheimes) wrote :

Meanwhile impish reached it's end of life
and starting with Ubuntu 22.04 LTS, POWER9 and POWER10 processors are supported
and the support for POWER8 ended with Ubuntu 21.10, respectively Ubuntu 20.04 LTS
(https://ubuntu.com/download/server/power)
and this bug is limited to P8,
I'm going to close the 'affects Impish' entry to 'Won't Fix'.

Changed in linux (Ubuntu Impish):
status: New → Won't Fix
bugproxy (bugproxy)
tags: added: bugnameltc-192677 severity-high
removed: bugnameltc-194783 severity-medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.