skb_warn_bad_offload kernel splat due to CHECKSUM target not compatible with GSO skbs

Bug #1840619 reported by Matthew Ruffell on 2019-08-19
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Xenial
Medium
Matthew Ruffell

Bug Description

BugLink: https://bugs.launchpad.net/bugs/1840619

[Impact]

In environments which have CHECKSUM iptables rules set, the following kernel call trace will be created when a GSO skb is processed by the CHECKSUM target:

WARNING: CPU: 34 PID: 806048 at /build/linux-zdslHp/linux-4.4.0/net/core/dev.c:2456 skb_warn_bad_offload+0xcf/0x110()
qr-f78bfdf7-fe: caps=(0x000000000fdb58e9, 0x000000000fdb58e9) len=1955 data_len=479 gso_size=1448 gso_type=1 ip_summed=3
CPU: 34 PID: 806048 Comm: haproxy Tainted: G W OE 4.4.0-138-generic #164-Ubuntu
Call Trace:
 dump_stack+0x63/0x90
 warn_slowpath_common+0x82/0xc0
 warn_slowpath_fmt+0x5c/0x80
 ? ___ratelimit+0xa2/0xe0
 skb_warn_bad_offload+0xcf/0x110
 skb_checksum_help+0x185/0x1a0
 checksum_tg+0x22/0x29 [xt_CHECKSUM]
 ipt_do_table+0x301/0x730 [ip_tables]
 ? ipt_do_table+0x349/0x730 [ip_tables]
 iptable_mangle_hook+0x39/0x107 [iptable_mangle]
 nf_iterate+0x68/0x80
 nf_hook_slow+0x73/0xd0
 ip_output+0xcf/0xe0
 ? __ip_flush_pending_frames.isra.43+0x90/0x90
 ip_local_out+0x3b/0x50
 ip_queue_xmit+0x154/0x390
 __tcp_transmit_skb+0x52b/0x9b0
 tcp_write_xmit+0x1dd/0xf50
 __tcp_push_pending_frames+0x31/0xd0
 tcp_push+0xec/0x110
 tcp_sendmsg+0x749/0xba0
 inet_sendmsg+0x6b/0xa0
 sock_sendmsg+0x3e/0x50
 SYSC_sendto+0x101/0x190
 ? __sys_sendmsg+0x51/0x90
 SyS_sendto+0xe/0x10
 entry_SYSCALL_64_fastpath+0x22/0xc1

The CHECKSUM target does not support GSO skbs, and when a GSO skb is passed to skb_checksum_help(), it errors out and skb_warn_bad_offload() is called.

The above call trace was found in a customer environment which has an Openstack deployment, with the following sorts of iptables rules set:

-A neutron-l3-agent-POSTROUTING -o qr-+ -p tcp -m tcp --sport 9697 -j CHECKSUM --checksum-fill
-A neutron-dhcp-age-POSTROUTING -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill

This was causing haproxy running on the node to crash and restart every time a GSO skb was processed by the CHECKSUM target.

I recommend reading the netdev mailing list thread for more details:
https://www.spinics.net/lists/netdev/msg517366.html

[Fix]

This was fixed in 4.19 upstream with the below commit:

commit 10568f6c5761db24249c610c94d6e44d5505a0ba
Author: Florian Westphal <email address hidden>
Date: Wed Aug 22 11:33:27 2018 +0200
Subject: netfilter: xt_checksum: ignore gso skbs

This commit adds a check to see if the current skb is a gso skb, and if it is, skips skb_checksum_help(). It then continues on to check if the packet uses udp, and if it does, exits early. Otherwise it prints a single warning that CHECKSUM should be avoided, and if really needed, only for use with outbound udp.

Note, 10568f6c5761db24249c610c94d6e44d5505a0ba was included in upstream stable version 4.18.13, and was backported to bionic in 4.15.0-58.64 by LP #1836426.

This patch required minor backporting for 4.4, by slightly adjusting the context in the final patch hunk.

[Testcase]

You can reproduce this by adding the following iptables rule to the mangle table:

-t mangle -A POSTROUTING -p tcp -m tcp --sport 80 -j CHECKSUM --checksum-fill

and running traffic over port 80 with incorrect checksums in the ip header.

I built a test kernel, which is available here:

https://launchpad.net/~mruffell/+archive/ubuntu/sf216537-test

For unpatched kernels, this causes the process which was handling the socket to crash, as seen by haproxy crashing on a node in production which hits this issue.

On patched kernels you see the below warning printed to dmesg and no crashes occur.

xt_CHECKSUM: CHECKSUM should be avoided. If really needed, restrict with "-p udp" and only use in OUTPUT

[Regression Potential]

The changes are limited only to users which have CHECKSUM rules enabled in their iptables configs. Openstack commonly configures such rules on deployment, even though they are not necessary, as almost all packets have their checksum calculated by NICs these days, and CHECKSUM is only around to service old dhcp clients which would discard UDP packets with empty checksums.

This commit was selected for upstream -stable 4.18.13, and has made its way into bionic 4.15.0-58.64 by LP #1836426. There have been no reported problems and those kernels would have had sufficient testing with Openstack and its configured iptables rules.

If any users are affected by regression, then they can simply delete any CHECKSUM entries in their iptables configs.

Changed in linux (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Matthew Ruffell (mruffell)
description: updated
tags: added: sts

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1840619

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: xenial
description: updated
description: updated
description: updated
description: updated
Stefan Bader (smb) on 2019-08-26
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Matthew Ruffell (mruffell) wrote :

I installed 4.4.0-163-generic from xenial -proposed to a xenial VM, with the following uname -rv:

4.4.0-163-generic #191-Ubuntu SMP Wed Sep 11 17:06:27 UTC 2019

From there I enabled a iptsables rule with the CHECKSUM target, for tcp port 8000:

sudo iptables -t mangle -A POSTROUTING -p tcp -m tcp --sport 8000 -j CHECKSUM --checksum-fill

After running that command, dmesg now prints the correct warning against use of the CHECKSUM target:

[ 99.606968] xt_CHECKSUM: CHECKSUM should be avoided. If really needed, restrict with "-p udp" and only use in OUTPUT

I bound a port to 8000 with netcat, and ran traffic over it. Everything worked fine and was stable with no crashes seen.

This fixes the issue in this bug, and I am happy to mark it as verified.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :
Download full text (12.8 KiB)

This bug was fixed in the package linux - 4.4.0-165.193

---------------
linux (4.4.0-165.193) xenial; urgency=medium

  * xenial/linux: 4.4.0-165.193 -proposed tracker (LP: #1844416)

  * Xenial update: 4.4.187 upstream stable release (LP: #1840081)
    - MIPS: ath79: fix ar933x uart parity mode
    - MIPS: fix build on non-linux hosts
    - dmaengine: imx-sdma: fix use-after-free on probe error path
    - ath10k: Do not send probe response template for mesh
    - ath9k: Check for errors when reading SREV register
    - ath6kl: add some bounds checking
    - ath: DFS JP domain W56 fixed pulse type 3 RADAR detection
    - batman-adv: fix for leaked TVLV handler.
    - media: dvb: usb: fix use after free in dvb_usb_device_exit
    - crypto: talitos - fix skcipher failure due to wrong output IV
    - media: marvell-ccic: fix DMA s/g desc number calculation
    - media: vpss: fix a potential NULL pointer dereference
    - net: stmmac: dwmac1000: Clear unused address entries
    - signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig
    - af_key: fix leaks in key_pol_get_resp and dump_sp.
    - xfrm: Fix xfrm sel prefix length validation
    - media: staging: media: davinci_vpfe: - Fix for memory leak if decoder
      initialization fails.
    - net: phy: Check against net_device being NULL
    - tua6100: Avoid build warnings.
    - locking/lockdep: Fix merging of hlocks with non-zero references
    - media: wl128x: Fix some error handling in fm_v4l2_init_video_device()
    - cpupower : frequency-set -r option misses the last cpu in related cpu list
    - net: fec: Do not use netdev messages too early
    - net: axienet: Fix race condition causing TX hang
    - s390/qdio: handle PENDING state for QEBSM devices
    - perf test 6: Fix missing kvm module load for s390
    - gpio: omap: fix lack of irqstatus_raw0 for OMAP4
    - gpio: omap: ensure irq is enabled before wakeup
    - regmap: fix bulk writes on paged registers
    - bpf: silence warning messages in core
    - rcu: Force inlining of rcu_read_lock()
    - xfrm: fix sa selector validation
    - perf evsel: Make perf_evsel__name() accept a NULL argument
    - vhost_net: disable zerocopy by default
    - EDAC/sysfs: Fix memory leak when creating a csrow object
    - media: i2c: fix warning same module names
    - ntp: Limit TAI-UTC offset
    - timer_list: Guard procfs specific code
    - acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
    - media: coda: fix mpeg2 sequence number handling
    - media: coda: increment sequence offset for the last returned frame
    - mt7601u: do not schedule rx_tasklet when the device has been disconnected
    - x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c
    - mt7601u: fix possible memory leak when the device is disconnected
    - ath10k: fix PCIE device wake up failed
    - rslib: Fix decoding of shortened codes
    - rslib: Fix handling of of caller provided syndrome
    - ixgbe: Check DDM existence in transceiver before access
    - EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec
    - bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush()
    - Bluetooth: hci_bcsp: Fix memory ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers