kernel panic - null pointer dereference on ipset operations

Bug #1793753 reported by Laurent Sesquès
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Critical
Unassigned
Xenial
Fix Released
Critical
Unassigned

Bug Description

== SRU Justification ==
A regression was introduced in Xenial, even prior to v4.4 Final. I did
not test prior to this kernel once I found the bug was fixed in
mainline. The bug reporter experienced crashes on machines running
iptables using ipsets. He could get a trace from the console on one of
them which is attached to the bug report.

On these machines, some ipset commands are automatically run to update the
sets, and/or to dump them (ipset restore, swap, delete ... / ipset save).

I was able to reproduce this bug as was cking. This bug was found to be
fixed by mainline commits 596cf3fe5854 and e5173418ac59.

== Fixes ==
596cf3fe5854 ("netfilter: ipset: fix race condition in ipset save, swap and delete")
e5173418ac59 ("netfilter: ipset: Fix race between dump and swap")

== Regression Potential ==
Low. This fixes a regression and is limited to netfilter.

== Test Case ==
A test kernel was built with these patches and tested by myself and cking.

Revision history for this message
Laurent Sesquès (sajoupa) wrote :
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joel Sing (jsing) wrote :

I've been able to reproduce this panic by running Ubuntu Xenial under qemu, with a script that effectively does ipset restore/swap/destroy in a loop, while also running ipset save in a separate loop.

Revision history for this message
Haw Loeung (hloeung) wrote :
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.19 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19-rc5

Changed in linux (Ubuntu):
importance: Undecided → High
status: Confirmed → Incomplete
tags: added: needs-bisect
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'll try to reproduce this bug. @joel Sing do you happen to have the reproducer script available?

In parallel, I'll also build a Xenial test kernel with the commit posted by @Haw Loeung in comment 4

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
status: New → Triaged
Changed in linux (Ubuntu):
status: Confirmed → Triaged
importance: High → Critical
Changed in linux (Ubuntu Xenial):
importance: High → Critical
tags: added: kernel-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Xenial test kernel with commit 596cf3fe5854fe2b1703b0466ed6bf9cfb83c91e. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1793753

Can you test this kernel and see if it resolves this bug?

If you are unable to test, could you provide the commands that you found to reproduce this bug?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Note, for the test kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Changed in linux (Ubuntu Xenial):
status: Triaged → In Progress
Changed in linux (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Just to confirm, this is a regression? This bug did not happen with 4.4.0-116, and started with 4.4.0-135?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I've been trying variations of this in a loop but have been unable to reproduce:

ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 counters
ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 counters
ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 counters

ipset save &

ipset swap hash_ip3 hash_ip2
ipset destroy hash_ip3

@joel sing, is this similar to how you were able to reproduce the bug?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I can reproduce this bug now as well as cking. Thanks for the assistance cking!

This bug has actually been around prior 4.4.0, but it fixed in 4.15.0 or newer. Commit 596cf3fe5854fe does not actually fix this bug, but it does provide a test case to reproduce this.

I will now perform a "Reverse" bisect and narrow down the commit(s) needed to resolve this bug.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I believe this is the fix that landed in mainline as of v4.14-rc5:
e517341 netfilter: ipset: Fix race between dump and swap

I'm testing now and will post a test kernel for others to test as well.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

A combination of two commits seems to resolve this bug:
596cf3fe5854 ("netfilter: ipset: fix race condition in ipset save, swap and delete")
e5173418ac59 ("netfilter: ipset: Fix race between dump and swap")

I built a Xenial test kernel with these two commits. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1793753

Can you test this kernel and see if it resolves this bug?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
description: updated
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Laurent Sesquès (sajoupa) wrote :

I could reproduce the panic with a loop of ipset destroys / swaps / restores and then saves in another. Standard xenial machines consistently panic within a few minutes max.
With the same loops and the proposed kernel, I got them running overnight without any issues.
I'll check with Joel and if he's also ok, we'll add the verification-done-xenial tag.

Laurent

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Joel Sing (jsing) wrote :

I've tested the proposed kernel under the previous test environment - after two hours the host was still up, where as it would have previously crashed within 30 minutes or so. As such, this appears to prevent the panic.

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.