mptcp BUG 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr

Bug #2101120 reported by Krister Johansen
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Canonical Kernel Team
Noble
Fix Released
Undecided
Unassigned
Oracular
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

If mptcp endpoints are configured on a host using an address that is external to the host, then the kernel will create an implicit endpoint with the host's local address when mptcp receives its first flow. If multiple packets for these local interfaces arrive in parallel, more than one caller may end up in mptcp_pm_nl_append_new_local_addr because none found the address in local_addr_list during their call to mptcp_pm_nl_get_local_id. In this case, the concurrent new_local_addr calls may delete the address entry created by the previous caller. These deletes use synchronize_rcu, but this is not permitted in some of the contexts where this function may be called. During packet recv, the caller may be in a rcu read critical section and have preemption disabled.

This can lead to a BUG / panic because synchronize_rcu is called in softint context.

An example stack:

   BUG: scheduling while atomic: swapper/2/0/0x00000302

   Call Trace:
   <IRQ>
   dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1))
   dump_stack (lib/dump_stack.c:124)
   __schedule_bug (kernel/sched/core.c:5943)
   schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 kernel/sched/core.c:5970)
   __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:29 kernel/sched/core.c:6621)
   schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 kernel/sched/core.c:6818)
   schedule_timeout (kernel/time/timer.c:2160)
   wait_for_completion (kernel/sched/completion.c:96 kernel/sched/completion.c:116 kernel/sched/completion.c:127 kernel/sched/completion.c:148)
   __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444)
   synchronize_rcu (kernel/rcu/tree.c:3609)
   mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 net/mptcp/pm_netlink.c:1061)
   mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164)
   mptcp_pm_get_local_id (net/mptcp/pm.c:420)
   subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213)
   subflow_v4_route_req (net/mptcp/subflow.c:305)
   tcp_conn_request (net/ipv4/tcp_input.c:7216)
   subflow_v4_conn_request (net/mptcp/subflow.c:651)
   tcp_rcv_state_process (net/ipv4/tcp_input.c:6709)
   tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934)
   tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334)
   ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
   ip_local_deliver (include/linux/netfilter.h:314 include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
   ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580)
   ip_sublist_rcv (net/ipv4/ip_input.c:640)
   ip_list_rcv (net/ipv4/ip_input.c:675)
   __netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631)
   netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774)
   napi_complete_done (include/linux/list.h:37 include/net/gro.h:449 include/net/gro.h:444 net/core/dev.c:6114)
   igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb
   __napi_poll (net/core/dev.c:6582)
   net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787)
   handle_softirqs (kernel/softirq.c:553)
   __irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427 kernel/softirq.c:636)
   irq_exit_rcu (kernel/softirq.c:651)
   common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
   </IRQ>

[Backport]

Cherry-pick the following patch from upstream:

022bfe24aad8 ("mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr")

This patch fixes the problem by deleting the duplicate prior to its insertion in local_addr_list by skipping the replacement operation in mptcp_pm_nl_append_new_local_addr. Instead of the last implicit endpoint replacing the previous, it is discarded without a synchronize_rcu and the old copy is kept. This mode is only selected in mptcp_pm_nl_get_local_id.

[Test]

This patch has passed the upstream mptcp test suites and has also been tested against the reproducer that triggered the panic. (Add and remove mptcp endpoints with an external address that differs from the internal address). Prior to this patch the problem would trigger in less than a minute. With this patch applied, the test has run for hours without incident.

[Potential Regression]

The regression potential is low since the behavior change is small. Implicit endpoints still get created and deleted, but they are only replaced when a user adds an endpoint with the same local address as an existing implicit address. No replacements via mptcp_pm_nl_get_local_id will occur anymore.

Revision history for this message
Krister Johansen (kmjohansen) wrote (last edit ):

I have a patch for this accepted upstream that I'll send to the Ubuntu kernel team in short order. This has been merged to Linus's tree but has yet to be picked up by Stable. It's tagged to go there, it just hasn't been picked up by the robots yet. It affects all releases from 5.17 onward, which should put it in scope for Noble, Oracular, and Plucky.

description: updated
Revision history for this message
Krister Johansen (kmjohansen) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Tim Whisonant (tswhison)
Changed in linux (Ubuntu):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Oracular):
status: New → Fix Committed
Changed in linux (Ubuntu Noble):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.11.0-26.26 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-oracular-linux' to 'verification-done-oracular-linux'. If the problem still exists, change the tag 'verification-needed-oracular-linux' to 'verification-failed-oracular-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-oracular-linux-v2 verification-needed-oracular-linux
Juerg Haefliger (juergh)
tags: added: kernel-daily-bug
Revision history for this message
Krister Johansen (kmjohansen) wrote :

I have tested this in proposed and validated that it fixes the reported bug.

tags: added: verification-done-oracular-linux
removed: verification-needed-oracular-linux
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.8.0-60.63 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux' to 'verification-done-noble-linux'. If the problem still exists, change the tag 'verification-needed-noble-linux' to 'verification-failed-noble-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-v2 verification-needed-noble-linux
Revision history for this message
Krister Johansen (kmjohansen) wrote :

I have tested the noble proposed and validated that it fixes this bug.

tags: added: verification-done-noble-linux
removed: verification-needed-noble-linux
Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (53.8 KiB)

This bug was fixed in the package linux - 6.8.0-60.63

---------------
linux (6.8.0-60.63) noble; urgency=medium

  * noble/linux: 6.8.0-60.63 -proposed tracker (LP: #2107138)

  * Packaging resync (LP: #1786013)
    - [Packaging] debian.master/dkms-versions -- update from kernel-versions
      (main/2025.04.14)

  * Missing upstream commits for LP: #2102181 (LP: #2107336)
    - libperf cpumap: Add any, empty and min helpers
    - libperf cpumap: Ensure empty cpumap is NULL from alloc

  * Noble update: upstream stable patchset 2025-04-10 (LP: #2106770)
    - memblock: use numa_valid_node() helper to check for invalid node ID
    - jbd2: increase IO priority for writing revoke records
    - jbd2: flush filesystem device before updating tail sequence
    - dm array: fix unreleased btree blocks on closing a faulty array cursor
    - dm array: fix cursor index when skipping across block boundaries
    - exfat: fix the infinite loop in __exfat_free_cluster()
    - erofs: fix PSI memstall accounting
    - ASoC: rt722: add delay time to wait for the calibration procedure
    - ASoC: mediatek: disable buffer pre-allocation
    - selftests/alsa: Fix circular dependency involving global-timer
    - ieee802154: ca8210: Add missing check for kfifo_alloc() in ca8210_probe()
    - net: 802: LLC+SNAP OID:PID lookup on start of skb data
    - tcp/dccp: complete lockless accesses to sk->sk_max_ack_backlog
    - tcp/dccp: allow a connection when sk_max_ack_backlog is zero
    - net: libwx: fix firmware mailbox abnormal return
    - pds_core: limit loop over fw name list
    - bnxt_en: Fix possible memory leak when hwrm_req_replace fails
    - cxgb4: Avoid removal of uninserted tid
    - ice: fix incorrect PHY settings for 100 GB/s
    - igc: return early when failing to read EECD register
    - tls: Fix tls_sw_sendmsg error handling
    - eth: gve: use appropriate helper to set xdp_features
    - Bluetooth: hci_sync: Fix not setting Random Address when required
    - Bluetooth: MGMT: Fix Add Device to responding before completing
    - Bluetooth: btnxpuart: Fix driver sending truncated data
    - tcp: Annotate data-race around sk->sk_mark in tcp_v4_send_reset
    - riscv: Fix early ftrace nop patching
    - memblock tests: fix implicit declaration of function 'numa_valid_node'
    - iio: imu: inv_icm42600: fix timestamps after suspend if sensor is on
    - netfilter: nf_tables: imbalance in flowtable binding
    - drm/mediatek: stop selecting foreign drivers
    - [Config] updateconfigs for MTK_SMI
    - drm/mediatek: Fix YCbCr422 color format issue for DP
    - drm/mediatek: Fix mode valid issue for dp
    - drm/mediatek: Add return value check when reading DPCD
    - cpuidle: riscv-sbi: fix device node release in early exit of
      for_each_possible_cpu
    - scsi: ufs: qcom: Power off the PHY if it was already powered on in
      ufs_qcom_power_up_sequence()
    - dm-ebs: don't set the flag DM_TARGET_PASSES_INTEGRITY
    - ksmbd: Implement new SMB3 POSIX type
    - thermal: of: fix OF node leak in of_thermal_zone_find()
    - smb: client: sync the root session and superblock context passwords before
      automounting
    - ACPI: resource: Add TongFang GM...

Changed in linux (Ubuntu Noble):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (92.3 KiB)

This bug was fixed in the package linux - 6.11.0-26.26

---------------
linux (6.11.0-26.26) oracular; urgency=medium

  * oracular/linux: 6.11.0-26.26 -proposed tracker (LP: #2107166)

  * Packaging resync (LP: #1786013)
    - [Packaging] debian.master/dkms-versions -- update from kernel-versions
      (main/2025.04.14)

  * drm/xe: prevent potential UAF in pf_provision_vf_ggtt() (LP: #2106652)
    - drm/xe: prevent potential UAF in pf_provision_vf_ggtt()

  * Oracular update: upstream stable patchset 2025-04-09 (LP: #2106703)
    - IB/mlx5: Set and get correct qp_num for a DCT QP
    - RDMA/mana_ib: Allocate PAGE aligned doorbell index
    - scsi: ufs: core: Fix ufshcd_is_ufs_dev_busy() and ufshcd_eh_timed_out()
    - ovl: fix UAF in ovl_dentry_update_reval by moving dput() in ovl_link_up
    - SUNRPC: convert RPC_TASK_* constants to enum
    - SUNRPC: Prevent looping due to rpc_signal_task() races
    - SUNRPC: Handle -ETIMEDOUT return from tlshd
    - RDMA/mlx5: Fix AH static rate parsing
    - scsi: core: Clear driver private data when retrying request
    - RDMA/mlx5: Fix bind QP error cleanup flow
    - sunrpc: suppress warnings for unused procfs functions
    - ALSA: usb-audio: Avoid dropping MIDI events at closing multiple ports
    - Bluetooth: L2CAP: Fix L2CAP_ECRED_CONN_RSP response
    - rxrpc: rxperf: Fix missing decoding of terminal magic cookie
    - afs: Fix the server_list to unuse a displaced server rather than putting it
    - net: loopback: Avoid sending IP packets without an Ethernet header
    - net: set the minimum for net_hotdata.netdev_budget_usecs
    - ipv4: icmp: Pass full DS field to ip_route_input()
    - ipv4: icmp: Unmask upper DSCP bits in icmp_route_lookup()
    - ipvlan: Unmask upper DSCP bits in ipvlan_process_v4_outbound()
    - ipv4: Convert icmp_route_lookup() to dscp_t.
    - ipv4: Convert ip_route_input() to dscp_t.
    - ipvlan: Prepare ipvlan_process_v4_outbound() to future .flowi4_tos
      conversion.
    - ipvlan: ensure network headers are in skb linear part
    - net: cadence: macb: Synchronize stats calculations
    - ASoC: es8328: fix route from DAC to output
    - ipvs: Always clear ipvs_property flag in skb_scrub_packet()
    - firmware: cs_dsp: Remove async regmap writes
    - ALSA: hda/realtek: Fix wrong mic setup for ASUS VivoBook 15
    - ice: add E830 HW VF mailbox message limit support
    - ice: Fix deinitializing VF in error path
    - tcp: Defer ts_recent changes until req is owned
    - net: Clear old fragment checksum value in napi_reuse_skb
    - net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination.
    - net/mlx5: IRQ, Fix null string in debug print
    - net: ipv6: fix dst ref loop on input in seg6 lwt
    - net: ipv6: fix dst ref loop on input in rpl lwt
    - net: ti: icss-iep: Reject perout generation request
    - perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list
    - uprobes: Reject the shared zeropage in uprobe_write_opcode()
    - io_uring/net: save msg_control for compat
    - x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems
    - phy: rockchip: naneng-combphy: compatible reset with old DT
    - riscv: KVM: Fix har...

Changed in linux (Ubuntu Oracular):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-ibm-gt-tdx/6.8.0-1027.28+tdx1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-ibm-gt-tdx' to 'verification-done-noble-linux-ibm-gt-tdx'. If the problem still exists, change the tag 'verification-needed-noble-linux-ibm-gt-tdx' to 'verification-failed-noble-linux-ibm-gt-tdx'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-ibm-gt-tdx-v2 verification-needed-noble-linux-ibm-gt-tdx
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-6.11/6.11.0-1010.10 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-nvidia-6.11' to 'verification-done-noble-linux-nvidia-6.11'. If the problem still exists, change the tag 'verification-needed-noble-linux-nvidia-6.11' to 'verification-failed-noble-linux-nvidia-6.11'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-nvidia-6.11-v2 verification-needed-noble-linux-nvidia-6.11
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-intel/6.11.0-1010.10 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-oracular-linux-intel' to 'verification-done-oracular-linux-intel'. If the problem still exists, change the tag 'verification-needed-oracular-linux-intel' to 'verification-failed-oracular-linux-intel'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-oracular-linux-intel-v2 verification-needed-oracular-linux-intel
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-riscv-6.8/6.8.0-62.65~22.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-riscv-6.8' to 'verification-done-jammy-linux-riscv-6.8'. If the problem still exists, change the tag 'verification-needed-jammy-linux-riscv-6.8' to 'verification-failed-jammy-linux-riscv-6.8'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-riscv-6.8-v2 verification-needed-jammy-linux-riscv-6.8
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra/6.8.0-1007.7 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-nvidia-tegra' to 'verification-done-noble-linux-nvidia-tegra'. If the problem still exists, change the tag 'verification-needed-noble-linux-nvidia-tegra' to 'verification-failed-noble-linux-nvidia-tegra'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-nvidia-tegra-v2 verification-needed-noble-linux-nvidia-tegra
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-fips/6.8.0-72.72+fips1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-fips' to 'verification-done-noble-linux-fips'. If the problem still exists, change the tag 'verification-needed-noble-linux-fips' to 'verification-failed-noble-linux-fips'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-fips-v2 verification-needed-noble-linux-fips
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws-fips/6.8.0-1034.36+fips1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-aws-fips' to 'verification-done-noble-linux-aws-fips'. If the problem still exists, change the tag 'verification-needed-noble-linux-aws-fips' to 'verification-failed-noble-linux-aws-fips'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-aws-fips-v2 verification-needed-noble-linux-aws-fips
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gcp-fips/6.8.0-1035.37+fips1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-gcp-fips' to 'verification-done-noble-linux-gcp-fips'. If the problem still exists, change the tag 'verification-needed-noble-linux-gcp-fips' to 'verification-failed-noble-linux-gcp-fips'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-gcp-fips-v2 verification-needed-noble-linux-gcp-fips
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-xilinx/6.8.0-1017.18 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-xilinx' to 'verification-done-noble-linux-xilinx'. If the problem still exists, change the tag 'verification-needed-noble-linux-xilinx' to 'verification-failed-noble-linux-xilinx'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-xilinx-v2 verification-needed-noble-linux-xilinx
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure-fips/6.8.0-1034.39+fips1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-azure-fips' to 'verification-done-noble-linux-azure-fips'. If the problem still exists, change the tag 'verification-needed-noble-linux-azure-fips' to 'verification-failed-noble-linux-azure-fips'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-azure-fips-v2 verification-needed-noble-linux-azure-fips
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.