skb_warn_bad_offload kernel splat due to CHECKSUM target not compatible with GSO skbs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Matthew Ruffell |
Bug Description
BugLink: https:/
[Impact]
In environments which have CHECKSUM iptables rules set, the following kernel call trace will be created when a GSO skb is processed by the CHECKSUM target:
WARNING: CPU: 34 PID: 806048 at /build/
qr-f78bfdf7-fe: caps=(0x0000000
CPU: 34 PID: 806048 Comm: haproxy Tainted: G W OE 4.4.0-138-generic #164-Ubuntu
Call Trace:
dump_stack+
warn_slowpath_
warn_slowpath_
? ___ratelimit+
skb_warn_
skb_checksum_
checksum_
ipt_do_
? ipt_do_
iptable_
nf_iterate+
nf_hook_
ip_output+
? __ip_flush_
ip_local_
ip_queue_
__tcp_
tcp_write_
__tcp_
tcp_push+
tcp_sendmsg+
inet_sendmsg+
sock_sendmsg+
SYSC_sendto+
? __sys_sendmsg+
SyS_sendto+
entry_
The CHECKSUM target does not support GSO skbs, and when a GSO skb is passed to skb_checksum_
The above call trace was found in a customer environment which has an Openstack deployment, with the following sorts of iptables rules set:
-A neutron-
-A neutron-
This was causing haproxy running on the node to crash and restart every time a GSO skb was processed by the CHECKSUM target.
I recommend reading the netdev mailing list thread for more details:
https:/
[Fix]
This was fixed in 4.19 upstream with the below commit:
commit 10568f6c5761db2
Author: Florian Westphal <email address hidden>
Date: Wed Aug 22 11:33:27 2018 +0200
Subject: netfilter: xt_checksum: ignore gso skbs
This commit adds a check to see if the current skb is a gso skb, and if it is, skips skb_checksum_
Note, 10568f6c5761db2
This patch required minor backporting for 4.4, by slightly adjusting the context in the final patch hunk.
[Testcase]
You can reproduce this by adding the following iptables rule to the mangle table:
-t mangle -A POSTROUTING -p tcp -m tcp --sport 80 -j CHECKSUM --checksum-fill
and running traffic over port 80 with incorrect checksums in the ip header.
I built a test kernel, which is available here:
https:/
For unpatched kernels, this causes the process which was handling the socket to crash, as seen by haproxy crashing on a node in production which hits this issue.
On patched kernels you see the below warning printed to dmesg and no crashes occur.
xt_CHECKSUM: CHECKSUM should be avoided. If really needed, restrict with "-p udp" and only use in OUTPUT
[Regression Potential]
The changes are limited only to users which have CHECKSUM rules enabled in their iptables configs. Openstack commonly configures such rules on deployment, even though they are not necessary, as almost all packets have their checksum calculated by NICs these days, and CHECKSUM is only around to service old dhcp clients which would discard UDP packets with empty checksums.
This commit was selected for upstream -stable 4.18.13, and has made its way into bionic 4.15.0-58.64 by LP #1836426. There have been no reported problems and those kernels would have had sufficient testing with Openstack and its configured iptables rules.
If any users are affected by regression, then they can simply delete any CHECKSUM entries in their iptables configs.
CVE References
Changed in linux (Ubuntu Xenial): | |
status: | New → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Matthew Ruffell (mruffell) |
description: | updated |
tags: | added: sts |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in linux (Ubuntu): | |
status: | Incomplete → Fix Released |
Changed in linux (Ubuntu Xenial): | |
status: | In Progress → Fix Committed |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1840619
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.