Bug #843892 “Repeatable kernel oops on container delete” : Oneiric (11.10) : Bugs : linux-lts-backport-natty package : Ubuntu

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-07:

#1

Oops under 2.6.38-10-server on lucid Edit (9.8 KiB, text/plain)

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-07:

#2

Note that this bug affects Lucid (10.04 LTS) where 2.6.38-10 is a shipped kernel, as well as 11.04.

affects:

ubuntu → linux-lts-backport-natty (Ubuntu)

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2011-09-08:

#3

I can't reproduce this. I installed a new natty server VM, and created a natty container.

You might be doing different networking setup than I am. Can you describe your network setup and the lxc.conf you used (for -f argument for lxc-create)

Serge Hallyn (serge-hallyn) on 2011-09-08

Changed in linux-lts-backport-natty (Ubuntu):
importance:	Undecided → Medium
status:	New → Incomplete

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-08:

#4

I think this is two separate problems.

The kernel bug is one thing, but we now think it may not be recreated using the method specified.

Our method of attempting to recreate the oops (which appears reliably in an application we have written) does indeed cause an immediate reboot, but we now think for a different reason; the exact commands are set out above and the network setup was a desktop with eth0 on DHCP (i.e. nothing very exciting). We were not passing a -f argument to lxc-create, and we are now guessing this running /sbin/init (perhaps inside the container). ^C then kills the container's /sbin/init, which causes the system to reboot. It would probably be a good thing if in the absence of a config file (which the manpage describes as optional) lxc-create did not run /sbin/init by default (if that's what it's doing), but perhaps ran /bin/sh or something, if only on the principle of least surprise.

From my point of view the bug is really the oops though. It looks like we need to find another way to duplicate that.

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-09:

#5

oops.txt Edit (16.3 KiB, text/plain)

OK, we are much closer to repeating this oops. Ignore the original instructions - that seems to be a surprising lxc feature rather than the oops.

In order to repeat, you need to have
a) a container
b) which is forwarding IPv4 traffic from one interface in the container to another (2 veth interfaces in this case) - one ping packet per second will do
c) iptables with an IP conntrack rule.
d) delete the container (it doesn't matter if you delete the iptables rule first and sleep for a couple of seconds).

You then get the oops above. This also occurs on

root@node-10-157-128-101:~# uname -a
Linux node-10-157-128-101 3.0.0-10-server #16-Ubuntu SMP Fri Sep 2 18:51:05 UTC 2011 x86_64 GNU/Linux

I have attached a double oops from that kernel too.

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10:

#6

Perl program to demonstrate crash repeatably Edit (2.9 KiB, text/plain)

OK I can now replicate this at will with a perl script on vanilla Oneiric. This locks the machine hard. You will need a console to the virtual machine to see the Oops - it doesn't get written to disk as the lockup is immediate.

Changed in linux-lts-backport-natty (Ubuntu):
status:	Incomplete → New

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10:

#7

Patch to fix oops Edit (938 bytes, text/plain)

The attach patch fixes the issue.

The oops is called from cleanup_net when the namespace is destroyed. conntrack iterates through outstanding events and calls death_by_timeout on each of them, which in turn produces a call to ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which oopses because net->nfnl is NULL.

I made the container through (essentially) 'unshare -n'; I didn't explicitly set up netlink, but I presume it was set up else net->nfnl would have been NULL earlier (i.e. when an earlier connection timed out). This would thus suggest that net->nfnl is made NULL during the destruction of the container, which I think is done by nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite order to which the relevant register_pernet_subsys calls are called, and both nf_conntrack and nfnetlink_net_ops register their relevant subsystems. If nfnetlink_net_ops registered later than nfconntrack, then its exit routine would have been called first, which would cause the oops described. I am not sure there is anything to prevent this happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering of subsystem deinit, it seems to me that missing a netlink event on a container that is dying is not a disaster. An early check for net->nfnl being non-NULL in ctnetlink_conntrack_event appears to fix this. There may remain a potential race condition if it becomes NULL immediately after being checked (I am not sure any lock is held at this point or how synchronisation for subsystem deinitialization works).

This patch should apply on everything from 2.6.26 (if not before) onwards; it appears to be a problem on all kernels. This was taken against Ubuntu-3.0.0-11.17. I have torture-tested it with the above perl script for 15 minutes or so; the perl script hung the machine within 20 seconds without this patch.

Alex Bligh (ubuntu-alex-org) on 2011-09-10

description:

updated

Revision history for this message

Brad Figg (brad-figg) wrote on 2011-09-10: Missing required logs.

#8

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 843892

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete
tags:	added: natty

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10: AcpiTables.txt

#9

AcpiTables.txt Edit (37.9 KiB, text/plain)

apport information

tags:	added: apport-collected oneiric
description:	updated

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10: BootDmesg.txt

#10

BootDmesg.txt Edit (24.1 KiB, text/plain)

apport information

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10: CurrentDmesg.txt

#11

CurrentDmesg.txt Edit (16.2 KiB, text/plain)

apport information

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10: Lspci.txt

#12

Lspci.txt Edit (4.5 KiB, text/plain)

apport information

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10: ProcCpuinfo.txt

#13

ProcCpuinfo.txt Edit (1.1 KiB, text/plain)

apport information

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10: ProcInterrupts.txt

#14

ProcInterrupts.txt Edit (1.5 KiB, text/plain)

apport information

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10: ProcModules.txt

#15

ProcModules.txt Edit (1022 bytes, text/plain)

apport information

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10: UdevDb.txt

#16

UdevDb.txt Edit (62.3 KiB, text/plain)

apport information

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10: UdevLog.txt

#17

UdevLog.txt Edit (139.8 KiB, text/plain)

apport information

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-10: WifiSyslog.txt

#18

WifiSyslog.txt Edit (462.7 KiB, text/plain)

apport information

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Ubuntu Foundations Team Bug Bot (crichton) wrote on 2011-09-10:

#19

The attachment "Patch to fix oops" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags:

added: patch

Revision history for this message

Brad Figg (brad-figg) wrote on 2011-09-14: Test with newer development kernel (3.0.0-11.18)

#20

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel currently in the release pocket than the one you tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

Thank you for your help.

Changed in linux (Ubuntu):
status:	Confirmed → Incomplete
tags:	added: kernel-request-3.0.0-11.18

Alex Bligh (ubuntu-alex-org) on 2011-09-14

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-14:

#21

Download full text (4.9 KiB)

This patch has been added to the -mm tree. See the mail below.

---------- Forwarded Message ----------
Date: 14 September 2011 13:43:36 -0700
From: <email address hidden>
To: <email address hidden>
CC: <email address hidden>, <email address hidden>, <email address hidden>, <email address hidden>
Subject: + net-netfilter-nf_conntrack_netlinkc-fix-oops-on-container-destroy.patch added to -mm tree

The patch titled
net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy
has been added to the -mm tree. Its filename is
net-netfilter-nf_conntrack_netlinkc-fix-oops-on-container-destroy.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find
out what to do about this

The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/

------------------------------------------------------
Subject: net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy
From: Alex Bligh <email address hidden>

Problem:

A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.

A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.

Analysis:

The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.

The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).

Patch:

The ...

Leanne,

Will do. In the absence of kernel.org, please also see the message below, suggesting this is queued for 3.2.

Alex

---------- Forwarded Message ----------
Date: 20 September 2011 13:47:55 -0700
From: akpm@google.com
To: kaber@trash.net
CC: davem@davemloft.net, akpm@linux-foundation.org, akpm@google.com, alex@alex.org.uk, stable@kernel.org
Subject: [patch for 3.2 1/1] net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy

From: Alex Bligh <alex@alex.org.uk>
Subject: net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy

Problem:

A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.

A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.

Analysis:

The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.

The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly	set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).

Patch:

The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.

Applicability:

If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.

Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on
container destroy

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

net/netfilter/nf_conntrack_netlink.c |    5 +++++
 1 file changed, 5 insertions(+)

diff -puN
net/netfilter/nf_conntrack_netlink.c~net-netfilter-nf_conntrack_netlinkc-fi
x-oops-on-container-destroy net/netfilter/nf_conntrack_netlink.c ---
a/net/netfilter/nf_conntrack_netlink.c~net-netfilter-nf_conntrack_netlinkc-
fix-oops-on-container-destroy +++ a/net/netfilter/nf_conntrack_netlink.c
@@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int e
 		return 0;
 
 	net = nf_ct_net(ct);
+
+	/* container deinit, netlink may have died before death_by_timeout */
+	if (!net->nfnl)
+		return 0;
+
 	if (!item->report && !nfnetlink_has_listeners(net, group))
 		return 0;
 
_

---------- End Forwarded Message ----------

-- 
Alex Bligh

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-09-21:

#24

Leann,

The kernel at
http://people.canonical.com/~ogasawara/lp843892/oneiric/

seems to work fine.

Thanks for your work on this! (& apologies for spelling your name wrong last time)

Alex

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2011-09-21:

#25

Thanks for the quick testing turn around. I'll get this submitted to the Ubuntu kernel-team mailing list.

Leann Ogasawara (leannogasawara) on 2011-09-21

description:

updated

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2011-09-21:

#26

https://lists.ubuntu.com/archives/kernel-team/2011-September/017129.html

description:	updated
Changed in linux (Ubuntu Natty):
assignee:	nobody → Leann Ogasawara (leannogasawara)
importance:	Undecided → Medium
status:	New → In Progress

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2011-09-22:

#27

Patch has been applied to both the Oneiric and Natty git repo's. Marking this Fix Committed. Thanks.

Changed in linux (Ubuntu Natty):
status:	In Progress → Fix Committed
Changed in linux (Ubuntu Oneiric):
status:	In Progress → Fix Committed

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2011-09-22:

#28

Hi Alex,

I just want to give you a heads up that once this is uploaded to the natty-proposed pocket, a comment will be posted here (by a bot) to request testing (instructions will be supplied). There is a strict SRU policy reinforcing test confirmation of all patches which are uploaded, if there is no test feedback from the original bug reporter (or another affected user), the patches will be dropped. Thanks.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-09-23:

#29

This bug was fixed in the package linux - 3.0.0-12.19

---------------
linux (3.0.0-12.19) oneiric; urgency=low

[ Alex Bligh ]

  * SAUCE: (drop after v3.1) net/netfilter/nf_conntrack_netlink.c: fix Oops
    on container destroy
    - LP: #843892

[ Andy Whitcroft ]

  * [Config] standardise on HZ=250
  * SAUCE: headers_install: fix #include "..." usage for userspace
    - LP: #824377
  * make module-inclusion selection retain the left overs
  * add a new linux-image-extras package for virtual

[ edwin_rong ]

* SAUCE: Staging: add driver for Realtek RTS5139 cardreader
- LP: #824273

[ Greg Kroah-Hartman ]

* SAUCE: staging: rts5139: add vmalloc.h to some files to fix the build.
- LP: #824273

[ Jesse Sung ]

* SAUCE: Unregister input device only if it is registered
- LP: #839238

[ Keng-Yu Lin ]

* [Config] Enable CONFIG_RTS5139=m on i386/amd64
- LP: #824273

[ Leann Ogasawara ]

  * SAUCE: x86: reboot: Make Dell Optiplex 990 use reboot=pci
    - LP: #768039
  * SAUCE: x86: reboot: Make Dell Latitude E6220 use reboot=pci
    - LP: #838402

[ Ming Lei ]

  * SAUCE: ata: make DVD drive recognisable on systems with Sandybridge CPT
    chipset
    - LP: #794642

[ Paolo Pisati ]

* [Config] Compile-in vfat support for armel
- LP: #853783

[ Randy Dunlap ]

* SAUCE: staging: fix rts5139 depends & build
- LP: #824273

[ Tim Gardner ]

  * [Config] Fix binary-% build target
  * SAUCE: (drop after 3.0.0) OMAP3 and 4 hwmod I2C units only allow 16 bit
    access
    - LP: #852225

[ Upstream Kernel Changes ]

  * hfsplus: Fix kfree of wrong pointers in hfsplus_fill_super() error path
    - LP: #854987
  * rt2x00: Serialize TX operations on a queue.
    - LP: #855239
-- Leann Ogasawara <email address hidden> Wed, 14 Sep 2011 06:14:30 -0700

Changed in linux (Ubuntu Oneiric):
status:	Fix Committed → Fix Released

Revision history for this message

Herton R. Krzesinski (herton) wrote on 2011-10-04:

#30

This bug is awaiting verification that the kernel for Natty in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-natty' to 'verification-done-natty'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:

added: verification-needed-natty

Revision history for this message

HeinMueck (cperz) wrote on 2011-10-05:

#31

What should be the version of the proposed kernel? I see linux-image-2.6.38-9-generic in natty-proposed packages, whereas in natty I see linux-image-2.6.38-10-generic. Are you sure its available?

Revision history for this message

HeinMueck (cperz) wrote on 2011-10-05:

#32

K, packages don't seem to contain it, but at least when I upgrade I get 2.6.38-11.50. Is that the kernel that should be tested?

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2011-10-05:

#33

It should be the 3.6.38-12.51 kernel in the natty-proposed pocket that contains the fix for this issue. Are you sure you enabled natty-proposed?

https://launchpad.net/ubuntu/+source/linux/2.6.38-12.51

Revision history for this message

HeinMueck (cperz) wrote on 2011-10-07:

#34

Found it - looked into main instead of universe. Surprise surprise :) Its working for me, although once I saw a reboot instead of a crash. But I was not able to reproduce that. Anyone else?

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-10-07:

#35

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-lts-backport-natty (Ubuntu Natty):
status:	New → Confirmed
Changed in linux-lts-backport-natty (Ubuntu):
status:	New → Confirmed

Revision history for this message

Herton R. Krzesinski (herton) wrote on 2011-10-07:

#37

@HeinMueck: perhaps your reboot was the issue Alex described in comment #4 ? (not sure if you tested with containers or the testcase in comment #6)

Can you or Alex confirm the Natty 2.6.38-12.51 kernel fixes the oops, and the reboot is just the containers/init issue? If you are safe with it (oops is solved, testcase from comment #6), then change the tag verification-needed-natty to verification-done-natty, thanks.

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-10-09:

#38

I have tested this on the supplied Natty kernel

root@test:/home/amb# uname -a
Linux test 2.6.38-12-server #51-Ubuntu SMP Wed Sep 28 16:07:08 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
root@test:/home/amb# dpkg --list | fgrep 2.6.38-12
ii linux-image-2.6.38-12-server 2.6.38-12.51 Linux kernel image for version 2.6.38 on x86_64

with the perl test program supplied, and it works fine.

I think I have changed the tag appropriately.

tags:

added: verification-done-natty
removed: verification-needed-natty

Revision history for this message

Chaskiel Grundman (cg2v) wrote on 2011-10-19:

#39

testns-2.6.38-12.51.log Edit (12.6 KiB, text/plain)

When I run the perl script on 2.6.38-12.51, the system does not crash, but the script doesn't behave the same as with 2.6.38-11.51

The pings fail, and every other run through startcontainer doesn't even set up the veth devices correctly. in addition, my lxc setup still triggers an double-oops-and-hang involving cleanup_net (though it's not the same oops, and I don't think the conntrack module is in the backtrace anymore. it certainly isn't in the second backtrace.)

Revision history for this message

Alex Bligh (ubuntu-alex-org) wrote on 2011-10-20:

#40

Chaskiel,

Your logs show the pings going through in every other run. That's to be expected. My script has poor error handling, and relies on various sleep statements to set things up in order - it's meant to reliably trigger the bug, rather than reliably transmit data. Feel free to fix it up :-)

When you say "When I run the perl script on 2.6.38-12.51, the system does not crash, but the script doesn't behave the same as with 2.6.38-11.51", I would expect you to get oops on 2.6.38-11.51, as that's without my script. So I'm not sure what the word "but" means here, as opposed to "and".

I think your last sentence starting "In addition" you are saying that even with 2.6.8-12.51 you can still trigger a container related oops but with a different non-conntrack related traceback (so probably a different oops), and triggered by a different test case. That should probably be handled separately.

However, when I was looking for this bug, I noticed there were other places (including one in the same file) where nfnetlink_has_listeners() was being called without checking net->nfnl for NULL; in many, net->nfnl can't be NULL no doubt. If it's an oops at the same place (I don't think you attached the oops) you might try fixing it py putting in the obvious check.

Revision history for this message

Chaskiel Grundman (cg2v) wrote on 2011-10-20:

#41

> the script doesn't behave the same as with 2.6.38-11
specifically, in -11, before it crashed, I didn't see the "destination host unreachable" before the successful ping responses. I thought that was strange, and that perhaps some other change in -12 changed some timing.

>That should probably be handled separately.
ok. if I can reduce my testcase, I'll open a new report. I don't have the oops, in part because there was a recursive fault which possiby hides the real issue and in part becaus the stacktrace was long and scrolled important bits off the screen. I suppose I should test inside vmware with a serial console..... I can tell you that the second oops did not have any modules in the backtrace. I _think_ that cleanup_net and do_error were in adjacent stack frames.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-10-25:

#42

This bug was fixed in the package linux - 2.6.38-12.51

---------------
linux (2.6.38-12.51) natty-proposed; urgency=low

[Herton R. Krzesinski]

* Release Tracking Bug
- LP: #860832

[ Alex Bligh ]

  * SAUCE: net/netfilter/nf_conntrack_netlink.c: fix Oops on container
    destroy
    - LP: #843892

[ Jesse Sung ]

* SAUCE: Unregister input device only if it is registered
- LP: #839238

[ Leann Ogasawara ]

  * SAUCE: x86: reboot: Make Dell Latitude E6220 use reboot=pci
    - LP: #838402
  * SAUCE: x86: reboot: Make Dell Latitude E6520 use reboot=pci
    - LP: #833705

[ Ming Lei ]

* SAUCE: fireware: add NO_MSI quirks for o2micro controller
- LP: #801719

[ Stefan Bader ]

* [Config] Include all filesystem modules for virtual
- LP: #761809

[ Tim Gardner ]

  * [Config] kernel preparation cannot be parallelized
  * [Config] Linearize module/abi checks
  * [Config] Linearize and simplify tree preparation rules
  * [Config] Build kernel image in parallel with modules
  * [Config] Set concurrency for kmake invocations
  * [Config] Improve install-arch-headers speed
  * [Config] Fix binary-perarch dependencies
  * [Config] Removed stamp-flavours target
  * [Config] Serialize binary indep targets
  * [Config] Use build stamp directly
  * [Config] Restore prepare-% target
  * [Config] Fix binary-% build target

[ Upstream Kernel Changes ]

  * Revert "drm/i915: disable PCH ports if needed when disabling a CRTC"
    - LP: #814325, #838181
  * drm/i915: restore only the mode of this driver on lastclose (v2)
    - LP: #848687
  * cifs: fix possible memory corruption in CIFSFindNext, CVE-2011-3191
    - LP: #834135
    - CVE-2011-3191
  * befs: Validate length of long symbolic links, CVE-2011-2928
    - LP: #834124
    - CVE-2011-2928
  * gro: Only reset frag0 when skb can be pulled, CVE-2011-2723
    - LP: #844371
    - CVE-2011-2723
  * inet_diag: fix inet_diag_bc_audit(), CVE-2011-2213
    - LP: #838421
    - CVE-2011-2213
  * si4713-i2c: avoid potential buffer overflow on si4713, CVE-2011-2700
    - LP: #844370
    - CVE-2011-2700
  * Bluetooth: Prevent buffer overflow in l2cap config request,
    CVE-2011-2497
    - LP: #838423
    - CVE-2011-2497
  * crypto: Move md5_transform to lib/md5.c, CVE-2011-3188
    - LP: #834129
    - CVE-2011-3188
  * net: Compute protocol sequence numbers and fragment IDs using MD5,
    CVE-2011-3188
    - LP: #834129
    - CVE-2011-3188
  * x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS
    - LP: #760131
  * x86, intel, power: Correct the MSR_IA32_ENERGY_PERF_BIAS message
    - LP: #760131
  * rt2x00: Serialize TX operations on a queue.
    - LP: #855239
  * ext4: Fix max file size and logical block counting of extent format
    file, CVE-2011-2695
    - LP: #819574
    - CVE-2011-2695
-- Herton Ronaldo Krzesinski <email address hidden> Tue, 27 Sep 2011 16:19:57 -0300

This bug was fixed in the package linux - 2.6.38-12.51

---------------
linux (2.6.38-12.51) natty-proposed; urgency=low

[Herton R. Krzesinski]

* Release Tracking Bug
    - LP: #860832

[ Alex Bligh ]

* SAUCE: net/netfilter/nf_conntrack_netlink.c: fix Oops on container
    destroy
    - LP: #843892

[ Jesse Sung ]

* SAUCE: Unregister input device only if it is registered
    - LP: #839238

[ Leann Ogasawara ]

* SAUCE: x86: reboot: Make Dell Latitude E6220 use reboot=pci
    - LP: #838402
  * SAUCE: x86: reboot: Make Dell Latitude E6520 use reboot=pci
    - LP: #833705

[ Ming Lei ]

* SAUCE: fireware: add NO_MSI quirks for o2micro controller
    - LP: #801719

[ Stefan Bader ]

* [Config] Include all filesystem modules for virtual
    - LP: #761809

[ Tim Gardner ]

* [Config] kernel preparation cannot be parallelized
  * [Config] Linearize module/abi checks
  * [Config] Linearize and simplify tree preparation rules
  * [Config] Build kernel image in parallel with modules
  * [Config] Set concurrency for kmake invocations
  * [Config] Improve install-arch-headers speed
  * [Config] Fix binary-perarch dependencies
  * [Config] Removed stamp-flavours target
  * [Config] Serialize binary indep targets
  * [Config] Use build stamp directly
  * [Config] Restore prepare-% target
  * [Config] Fix binary-% build target

[ Upstream Kernel Changes ]

* Revert "drm/i915: disable PCH ports if needed when disabling a CRTC"
    - LP: #814325, #838181
  * drm/i915: restore only the mode of this driver on lastclose (v2)
    - LP: #848687
  * cifs: fix possible memory corruption in CIFSFindNext, CVE-2011-3191
    - LP: #834135
    - CVE-2011-3191
  * befs: Validate length of long symbolic links, CVE-2011-2928
    - LP: #834124
    - CVE-2011-2928
  * gro: Only reset frag0 when skb can be pulled, CVE-2011-2723
    - LP: #844371
    - CVE-2011-2723
  * inet_diag: fix inet_diag_bc_audit(), CVE-2011-2213
    - LP: #838421
    - CVE-2011-2213
  * si4713-i2c: avoid potential buffer overflow on si4713, CVE-2011-2700
    - LP: #844370
    - CVE-2011-2700
  * Bluetooth: Prevent buffer overflow in l2cap config request,
    CVE-2011-2497
    - LP: #838423
    - CVE-2011-2497
  * crypto: Move md5_transform to lib/md5.c, CVE-2011-3188
    - LP: #834129
    - CVE-2011-3188
  * net: Compute protocol sequence numbers and fragment IDs using MD5,
    CVE-2011-3188
    - LP: #834129
    - CVE-2011-3188
  * x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS
    - LP: #760131
  * x86, intel, power: Correct the MSR_IA32_ENERGY_PERF_BIAS message
    - LP: #760131
  * rt2x00: Serialize TX operations on a queue.
    - LP: #855239
  * ext4: Fix max file size and logical block counting of extent format
    file, CVE-2011-2695
    - LP: #819574
    - CVE-2011-2695
 -- Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>   Tue, 27 Sep 2011 16:19:57 -0300

Changed in linux (Ubuntu Natty):
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-10-25:

#43

Download full text (3.2 KiB)

This bug was fixed in the package linux-lts-backport-natty - 2.6.38-12.51~lucid1

---------------
linux-lts-backport-natty (2.6.38-12.51~lucid1) lucid-proposed; urgency=low

[Herton R. Krzesinski]

* Release Tracking Bug
- LP: #862556

[ Alex Bligh ]

  * SAUCE: net/netfilter/nf_conntrack_netlink.c: fix Oops on container
    destroy
    - LP: #843892

[ Jesse Sung ]

* SAUCE: Unregister input device only if it is registered
- LP: #839238

[ Leann Ogasawara ]

  * SAUCE: x86: reboot: Make Dell Latitude E6220 use reboot=pci
    - LP: #838402
  * SAUCE: x86: reboot: Make Dell Latitude E6520 use reboot=pci
    - LP: #833705

[ Ming Lei ]

* SAUCE: fireware: add NO_MSI quirks for o2micro controller
- LP: #801719

[ Stefan Bader ]

* [Config] Include all filesystem modules for virtual
- LP: #761809

[ Tim Gardner ]

  * [Config] kernel preparation cannot be parallelized
  * [Config] Linearize module/abi checks
  * [Config] Linearize and simplify tree preparation rules
  * [Config] Build kernel image in parallel with modules
  * [Config] Set concurrency for kmake invocations
  * [Config] Improve install-arch-headers speed
  * [Config] Fix binary-perarch dependencies
  * [Config] Removed stamp-flavours target
  * [Config] Serialize binary indep targets
  * [Config] Use build stamp directly
  * [Config] Restore prepare-% target
  * [Config] Fix binary-% build target

[ Upstream Kernel Changes ]

  * Revert "drm/i915: disable PCH ports if needed when disabling a CRTC"
    - LP: #814325, #838181
  * drm/i915: restore only the mode of this driver on lastclose (v2)
    - LP: #848687
  * cifs: fix possible memory corruption in CIFSFindNext, CVE-2011-3191
    - LP: #834135
    - CVE-2011-3191
  * befs: Validate length of long symbolic links, CVE-2011-2928
    - LP: #834124
    - CVE-2011-2928
  * gro: Only reset frag0 when skb can be pulled, CVE-2011-2723
    - LP: #844371
    - CVE-2011-2723
  * inet_diag: fix inet_diag_bc_audit(), CVE-2011-2213
    - LP: #838421
    - CVE-2011-2213
  * si4713-i2c: avoid potential buffer overflow on si4713, CVE-2011-2700
    - LP: #844370
    - CVE-2011-2700
  * Bluetooth: Prevent buffer overflow in l2cap config request,
    CVE-2011-2497
    - LP: #838423
    - CVE-2011-2497
  * crypto: Move md5_transform to lib/md5.c, CVE-2011-3188
    - LP: #834129
    - CVE-2011-3188
  * net: Compute protocol sequence numbers and fragment IDs using MD5,
    CVE-2011-3188
    - LP: #834129
    - CVE-2011-3188
  * x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS
    - LP: #760131
  * x86, intel, power: Correct the MSR_IA32_ENERGY_PERF_BIAS message
    - LP: #760131
  * rt2x00: Serialize TX operations on a queue.
    - LP: #855239
  * ext4: Fix max file size and logical block counting of extent format
    file, CVE-2011-2695
    - LP: #819574
    - CVE-2011-2695

linux (2.6.38-11.50) natty-proposed; urgency=low

[Herton R. Krzesinski]

* Release Tracking Bug
- LP: #848246

[ Upstream Kernel Changes ]

  * Revert "eCryptfs: Handle failed metadata read in lookup"
  * Revert "KVM: fix kvmclock regression due to missing clock update"
  * Revert "ath9k: use split rx buffers to get rid of...

Ubuntu
linux-lts-backport-natty package

Repeatable kernel oops on container delete

Bug Description

Related branches

CVE References

Other bug subscribers

Patches

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
linux (Ubuntu)	Fix Released	Medium	Leann Ogasawara
Natty	Fix Released	Medium	Leann Ogasawara
Oneiric	Fix Released	Medium	Leann Ogasawara
linux-lts-backport-natty (Ubuntu)	Fix Released	Medium	Unassigned
Natty	Fix Released	Undecided	Unassigned
Oneiric	Fix Released	Medium	Unassigned

Changed in linux-lts-backport-natty (Ubuntu):
status:	Confirmed → Fix Released

Ubuntulinux-lts-backport-natty package

Repeatable kernel oops on container delete

Bug Description

Related branches

CVE References

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
linux-lts-backport-natty package