Ubuntu
linux-bluefield package

flowtable: fix TCP flow teardown

Bug #1975649 reported by Bodong Wang on 2022-05-24

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	linux-bluefield (Ubuntu)	Fix Released	Undecided	Unassigned

Bug Description

* Explain the feature

This patch addresses three possible problems:

1. ct gc may race to undo the timeout adjustment of the packet path, leaving
the conntrack entry in place with the internal offload timeout (one day).

2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE
timeout is reached before the flow offload del.

3. tcp ct is always set to ESTABLISHED with a very long timeout
   in flow offload teardown/delete even though the state might be already
   CLOSED. Also as a remark we cannot assume that the FIN or RST packet
   is hitting flow table teardown as the packet might get bumped to the
   slow path in nftables.

This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so
conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN
state transition.

Moreover, return the connection's ownership to conntrack upon teardown
by clearing the offload flag and fixing the established timeout value.
The flow table GC thread will asynchonrnously free the flow table and
hardware offload entries.

Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on
which is also misleading since the flow is back to classic conntrack
path.

If nf_ct_delete() removes the entry from the conntrack table, then it
calls nf_ct_put() which decrements the refcnt. This is not a problem
because the flowtable holds a reference to the conntrack object from
flow_offload_alloc() path which is released via flow_offload_free().

This patch also updates nft_flow_offload to skip packets in SYN_RECV
state. Since we might miss or bump packets to slow path, we do not know
what will happen there while we are still in SYN_RECV, this patch
postpones offload up to the next packet which also aligns to the
existing behaviour in tc-ct.

flow_offload_teardown() does not reset the existing tcp state from
flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow
path might have already update the state to CLOSE/FIN.

* How to test
Adding the following flows to the OVS bridge in DPU OS:
# ovs-ofctl add-flow ovsbr1 "table=0, ip,ct_state=-trk, actions=ct(table=1)"
# ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+new, actions=ct(commit),normal"
# ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=-new, actions=normal"

Start netserver on SUT:
# netserver -p 5007

Start multiple TCP_CRR tests on peer:
# count=1;while [ $count -lt 10 ]; do screen -d -m netperf -t TCP_CRR -H 11.0.0.2 -l 360 -- -r 1 -O " MIN_LAETENCY, MAX_LATENCY, MEAN_LATENCY, P90_LATENCY, P99_LATENCY ,P999_LATENCY,P9999_LATENCY,STDDEV_LATENCY ,THROUGHPUT ,THROUGHPUT_UNITS "; count=`expr $count + 1`; done
A huge number of connections will be established and tear down. After the tests, some of them are not aged out:
# From /proc/net/nf_conntrack in DPU OS
ipv4 2 tcp 6 86354 LAST_ACK src=11.0.0.1 dst=11.0.0.2 sport=35862 dport=46797 src=11.0.0.2 dst=11.0.0.1 sport=46797 dport=35862 [ASSURED] mark=0 zone=0 use=2
ipv4 2 tcp 6 86354 LAST_ACK src=11.0.0.1 dst=11.0.0.2 sport=35862 dport=46797 src=11.0.0.2 dst=11.0.0.1 sport=46797 dport=35862 [ASSURED] mark=0 zone=0 use=2
ipv4 2 tcp 6 86354 LAST_ACK src=11.0.0.1 dst=11.0.0.2 sport=35862 dport=46797 src=11.0.0.2 dst=11.0.0.1 sport=46797 dport=35862 [ASSURED] mark=0 zone=0 use=2
The issue is usually reproduced after running the for several times.

* What it could break.
N/A

Tags:

CVE References

2022-1789

Zachary Tahenakos (ztahenakos) on 2022-07-06

Changed in linux-bluefield (Ubuntu):
status:	New → Fix Committed

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2022-07-13:

This bug is awaiting verification that the linux-bluefield/5.4.0-1042.47 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:

added: verification-needed-focal

Revision history for this message

Launchpad Janitor (janitor) wrote on 2022-07-25:

Download full text (14.8 KiB)

This bug was fixed in the package linux-bluefield - 5.4.0-1042.47

---------------
linux-bluefield (5.4.0-1042.47) focal; urgency=medium

* focal/linux-bluefield: 5.4.0-1042.47 -proposed tracker (LP: #1979463)

* Focal update: upstream stable patchset v5.4.192 (LP: #1979014)
- [Config] bluefield: update configs for NVM

* flowtable: fix TCP flow teardown (LP: #1975649)
- netfilter: flowtable: fix TCP flow teardown

* remove offload_pickup sysctl again (LP: #1975820)
- netfilter: conntrack: remove offload_pickup sysctl again

* mlxbf_gige: remove driver-managed interrupt counts (LP: #1975749)
- mlxbf_gige: remove driver-managed interrupt counts

[ Ubuntu: 5.4.0-122.138 ]

This bug was fixed in the package linux-bluefield - 5.4.0-1042.47

---------------
linux-bluefield (5.4.0-1042.47) focal; urgency=medium

* focal/linux-bluefield: 5.4.0-1042.47 -proposed tracker (LP: #1979463)

* Focal update: upstream stable patchset v5.4.192 (LP: #1979014)
    - [Config] bluefield: update configs for NVM

* flowtable: fix TCP flow teardown (LP: #1975649)
    - netfilter: flowtable: fix TCP flow teardown

* remove offload_pickup sysctl again (LP: #1975820)
    - netfilter: conntrack: remove offload_pickup sysctl again

* mlxbf_gige: remove driver-managed interrupt counts (LP: #1975749)
    - mlxbf_gige: remove driver-managed interrupt counts

[ Ubuntu: 5.4.0-122.138 ]

* focal/linux: 5.4.0-122.138 -proposed tracker (LP: #1979489)
  * Remove SAUCE patches from test_vxlan_under_vrf.sh in net of
    ubuntu_kernel_selftests (LP: #1975691)
    - Revert "UBUNTU: SAUCE: selftests: net: Don't fail test_vxlan_under_vrf on
      xfail"
    - Revert "UBUNTU: SAUCE: selftests: net: Make test for VXLAN underlay in non-
      default VRF an expected failure"
  * Enable Asus USB-BT500 Bluetooth dongle(0b05:190e) (LP: #1976613)
    - Bluetooth: btusb: Add flag to define wideband speech capability
    - Bluetooth: btrtl: Add support for RTL8761B
    - Bluetooth: btusb: Add 0x0b05:0x190e Realtek 8761BU (ASUS BT500) device.
  * [UBUNTU 20.04] rcu stalls with many storage key guests (LP: #1975582)
    - s390/gmap: voluntarily schedule during key setting
    - s390/mm: use non-quiescing sske for KVM switch to keyed guest
  * Ubuntu 5.4.0-117.132-generic 5.4.189 has BUG: kernel NULL pointer
    dereference, address: 0000000000000034 (LP: #1978719)
    - mm: rmap: explicitly reset vma->anon_vma in unlink_anon_vmas()
  * Focal update: upstream stable patchset v5.4.192 (LP: #1979014)
    - floppy: disable FDRAWCMD by default
    - [Config] updateconfigs for BLK_DEV_FD_RAWCMD
    - hamradio: defer 6pack kfree after unregister_netdev
    - hamradio: remove needs_free_netdev to avoid UAF
    - lightnvm: disable the subsystem
    - [Config] updateconfigs for NVM, NVM_PBLK
    - usb: mtu3: fix USB 3.0 dual-role-switch from device to host
    - USB: quirks: add a Realtek card reader
    - USB: quirks: add STRING quirk for VCOM device
    - USB: serial: whiteheat: fix heap overflow in WHITEHEAT_GET_DTR_RTS
    - USB: serial: cp210x: add PIDs for Kamstrup USB Meter Reader
    - USB: serial: option: add support for Cinterion MV32-WA/MV32-WB
    - USB: serial: option: add Telit 0x1057, 0x1058, 0x1075 compositions
    - xhci: stop polling roothubs after shutdown
    - xhci: increase usb U3 -> U0 link resume timeout from 100ms to 500ms
    - iio: dac: ad5592r: Fix the missing return value.
    - iio: dac: ad5446: Fix read_raw not returning set value
    - iio: magnetometer: ak8975: Fix the error handling in ak8975_power_on()
    - usb: misc: fix improper handling of refcount in uss720_probe()
    - usb: typec: ucsi: Fix role swapping
    - usb: gadget: uvc: Fix crash when encoding data for usb request
    - usb: gadget: configfs: clear deactivation flag in
      configfs_composite_unbind()
    - usb: dwc3: core: Fix tx/rx threshold settings
    - usb: dwc3: gadget: Return proper request status
    - serial: imx: fix overrun interrupts in DMA mode
    - serial: 8250: Also set sticky MCR bits in console restoration
    - serial: 8250: Correct the clock for EndRun PTP/1588 PCIe device
    - arch_topology: Do not set llc_sibling if llc_id is invalid
    - hex2bin: make the function hex_to_bin constant-time
    - hex2bin: fix access beyond string end
    - video: fbdev: udlfb: properly check endpoint type
    - arm64: dts: meson: remove CPU opps below 1GHz for G12B boards
    - arm64: dts: meson: remove CPU opps below 1GHz for SM1 boards
    - mtd: rawnand: fix ecc parameters for mt7622
    - USB: Fix xhci event ring dequeue pointer ERDP update issue
    - ARM: dts: imx6qdl-apalis: Fix sgtl5000 detection issue
    - phy: samsung: Fix missing of_node_put() in exynos_sata_phy_probe
    - phy: samsung: exynos5250-sata: fix missing device put in probe error paths
    - ARM: OMAP2+: Fix refcount leak in omap_gic_of_init
    - phy: ti: omap-usb2: Fix error handling in omap_usb2_enable_clocks
    - ARM: dts: at91: Map MCLK for wm8731 on at91sam9g20ek
    - phy: mapphone-mdm6600: Fix PM error handling in phy_mdm6600_probe
    - phy: ti: Add missing pm_runtime_disable() in serdes_am654_probe
    - ARM: dts: Fix mmc order for omap3-gta04
    - ARM: dts: am3517-evm: Fix misc pinmuxing
    - ARM: dts: logicpd-som-lv: Fix wrong pinmuxing on OMAP35
    - ipvs: correctly print the memory size of ip_vs_conn_tab
    - mtd: rawnand: Fix return value check of wait_for_completion_timeout
    - bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt
      hook
    - tcp: md5: incorrect tcp_header_len for incoming connections
    - tcp: ensure to use the most recently sent skb when filling the rate sample
    - sctp: check asoc strreset_chunk in sctp_generate_reconf_event
    - ARM: dts: imx6ull-colibri: fix vqmmc regulator
    - arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock
    - pinctrl: pistachio: fix use of irq_of_parse_and_map()
    - cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
    - net: hns3: add validity check for message data length
    - net/smc: sync err code when tcp connection was refused
    - ip_gre: Make o_seqno start from 0 in native mode
    - tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
    - bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create()
    - clk: sunxi: sun9i-mmc: check return value after calling
      platform_get_resource()
    - net: bcmgenet: hide status block before TX timestamping
    - net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
    - drm/amd/display: Fix memory leak in dcn21_clock_source_create
    - tls: Skip tls_append_frag on zero copy size
    - bnx2x: fix napi API usage sequence
    - ixgbe: ensure IPsec VF<->PF compatibility
    - tcp: fix F-RTO may not work correctly when receiving DSACK
    - ASoC: wm8731: Disable the regulator when probing fails
    - ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit()
    - x86: __memcpy_flushcache: fix wrong alignment if size > 2^32
    - cifs: destage any unwritten data to the server before calling
      copychunk_write
    - drivers: net: hippi: Fix deadlock in rr_close()
    - net: ethernet: stmmac: fix write to sgmii_adapter_base
    - x86/cpu: Load microcode during restore_processor_state()
    - tty: n_gsm: fix wrong signal octet encoding in convergence layer type 2
    - tty: n_gsm: fix malformed counter for out of frame data
    - netfilter: nft_socket: only do sk lookups when indev is available
    - tty: n_gsm: fix insufficient txframe size
    - tty: n_gsm: fix missing explicit ldisc flush
    - tty: n_gsm: fix wrong command retry handling
    - tty: n_gsm: fix wrong command frame length field encoding
    - tty: n_gsm: fix incorrect UA handling
    - hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
    - mm, hugetlb: allow for "high" userspace addresses
    - Linux 5.4.192
  * CVE-2022-1789
    - KVM: x86/mmu: fix NULL pointer dereference on guest INVPCID
  * Focal update: v5.4.191 upstream stable release (LP: #1976116)
    - etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead
    - mm: page_alloc: fix building error on -Werror=array-compare
    - tracing: Dump stacktrace trigger to the corresponding instance
    - gfs2: assign rgrp glock before compute_bitstructs
    - tcp: fix race condition when creating child sockets from syncookies
    - tcp: Fix potential use-after-free due to double kfree()
    - ALSA: usb-audio: Clear MIDI port active flag after draining
    - ASoC: atmel: Remove system clock tree configuration for at91sam9g20ek
    - ASoC: msm8916-wcd-digital: Check failure for devm_snd_soc_register_component
    - dmaengine: imx-sdma: Fix error checking in sdma_event_remap
    - dmaengine: mediatek:Fix PM usage reference leak of
      mtk_uart_apdma_alloc_chan_resources
    - igc: Fix infinite loop in release_swfw_sync
    - igc: Fix BUG: scheduling while atomic
    - rxrpc: Restore removed timer deletion
    - net/smc: Fix sock leak when release after smc_shutdown()
    - net/packet: fix packet_sock xmit return value checking
    - net/sched: cls_u32: fix possible leak in u32_init_knode()
    - l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using
      netdev_master_upper_dev_get_rcu
    - netlink: reset network and mac headers in netlink_dump()
    - selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets
    - ARM: vexpress/spc: Avoid negative array index when !SMP
    - reset: tegra-bpmp: Restore Handle errors in BPMP response
    - platform/x86: samsung-laptop: Fix an unsigned comparison which can never be
      negative
    - ALSA: usb-audio: Fix undefined behavior due to shift overflowing the
      constant
    - vxlan: fix error return code in vxlan_fdb_append
    - cifs: Check the IOCB_DIRECT flag, not O_DIRECT
    - mt76: Fix undefined behavior due to shift overflowing the constant
    - brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant
    - dpaa_eth: Fix missing of_node_put in dpaa_get_ts_info()
    - drm/msm/mdp5: check the return of kzalloc()
    - net: macb: Restart tx only if queue pointer is lagging
    - scsi: qedi: Fix failed disconnect handling
    - stat: fix inconsistency between struct stat and struct compat_stat
    - EDAC/synopsys: Read the error count from the correct register
    - oom_kill.c: futex: delay the OOM reaper to allow time for proper futex
      cleanup
    - ata: pata_marvell: Check the 'bmdma_addr' beforing reading
    - dma: at_xdmac: fix a missing check on list iterator
    - drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised
    - drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare
    - KVM: PPC: Fix TCE handling for VFIO
    - drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage
    - powerpc/perf: Fix power9 event alternatives
    - xtensa: patch_text: Fixup last cpu should be master
    - xtensa: fix a7 clobbering in coprocessor context load/store
    - openvswitch: fix OOB access in reserve_sfa_size()
    - ASoC: soc-dapm: fix two incorrect uses of list iterator
    - e1000e: Fix possible overflow in LTR decoding
    - ARC: entry: fix syscall_trace_exit argument
    - arm_pmu: Validate single/group leader events
    - ext4: fix symlink file size not match to file content
    - ext4: fix use-after-free in ext4_search_dir
    - ext4, doc: fix incorrect h_reserved size
    - ext4: fix overhead calculation to account for the reserved gdt blocks
    - ext4: force overhead calculation if the s_overhead_cluster makes no sense
    - jbd2: fix a potential race while discarding reserved buffers after an abort
    - spi: atmel-quadspi: Fix the buswidth adjustment between spi-mem and
      controller
    - staging: ion: Prevent incorrect reference counting behavour
    - block/compat_ioctl: fix range check in BLKGETSIZE
    - Linux 5.4.191
  * Focal update: v5.4.190 upstream stable release (LP: #1973085)
    - memory: atmel-ebi: Fix missing of_node_put in atmel_ebi_probe
    - net/sched: flower: fix parsing of ethertype following VLAN header
    - veth: Ensure eth header is in skb's linear part
    - gpiolib: acpi: use correct format characters
    - mlxsw: i2c: Fix initialization error flow
    - net/sched: fix initialization order when updating chain 0 head
    - net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link
    - net/sched: taprio: Check if socket flags are valid
    - cfg80211: hold bss_lock while updating nontrans_list
    - drm/msm/dsi: Use connector directly in msm_dsi_manager_connector_init()
    - net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
    - sctp: Initialize daddr on peeled off socket
    - testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set
    - nfc: nci: add flush_workqueue to prevent uaf
    - cifs: potential buffer overflow in handling symlinks
    - drm/amd: Add USBC connector ID
    - drm/amd/display: fix audio format not updated after edid updated
    - drm/amd/display: Update VTEM Infopacket definition
    - drm/amdkfd: Fix Incorrect VMIDs passed to HWS
    - drm/amdkfd: Check for potential null return of kmalloc_array()
    - Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer
    - scsi: target: tcmu: Fix possible page UAF
    - scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024
    - ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
    - gpu: ipu-v3: Fix dev_dbg frequency output
    - regulator: wm8994: Add an off-on delay for WM8994 variant
    - arm64: alternatives: mark patch_alternative() as `noinstr`
    - tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entry
    - net: usb: aqc111: Fix out-of-bounds accesses in RX fixup
    - drm/amd/display: Fix allocate_mst_payload assert on resume
    - powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit
    - scsi: mvsas: Add PCI ID of RocketRaid 2640
    - scsi: megaraid_sas: Target with invalid LUN ID is deleted during scan
    - drivers: net: slip: fix NPD bug in sl_tx_timeout()
    - perf/imx_ddr: Fix undefined behavior due to shift overflowing the constant
    - mm, page_alloc: fix build_zonerefs_node()
    - mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
    - gcc-plugins: latent_entropy: use /dev/urandom
    - ath9k: Properly clear TX status area before reporting to mac80211
    - ath9k: Fix usage of driver-private space in tx_info
    - btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
    - btrfs: mark resumed async balance as writing
    - ALSA: hda/realtek: Add quirk for Clevo PD50PNT
    - ALSA: pcm: Test for "silence" field in struct "pcm_format_data"
    - ipv6: fix panic when forwarding a pkt with no in6 dev
    - drm/amd/display: don't ignore alpha property on pre-multiplied mode
    - genirq/affinity: Consider that CPUs on nodes can be unbalanced
    - tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
    - ARM: davinci: da850-evm: Avoid NULL pointer dereference
    - dm integrity: fix memory corruption when tag_size is less than digest size
    - smp: Fix offline cpu check in flush_smp_call_function_queue()
    - i2c: pasemi: Wait for write xfers to finish
    - dma-direct: avoid redundant memory sync for swiotlb
    - ax25: add refcount in ax25_dev to avoid UAF bugs
    - ax25: fix reference count leaks of ax25_dev
    - ax25: fix UAF bugs of net_device caused by rebinding operation
    - ax25: Fix refcount leaks caused by ax25_cb_del()
    - ax25: fix UAF bug in ax25_send_control()
    - ax25: fix NPD bug in ax25_disconnect
    - ax25: Fix NULL pointer dereferences in ax25 timers
    - ax25: Fix UAF bugs in ax25 timers
    - Linux 5.4.190

-- Zachary Tahenakos <zachary.tahenakos@canonical.com>  Wed, 06 Jul 2022 14:15:25 -0400

Changed in linux-bluefield (Ubuntu):
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux-bluefield package