Linux Kernel crash in Netfilter both in Natty (2.6.38-8-server) and oneiric(3.0.0-13-server/3.0.0-14-server) kernels

Bug #905219 reported by Shyam
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
High
linux (Ubuntu)
Fix Released
High
Unassigned
Natty
Fix Released
Medium
Chris J Arges
Oneiric
Fix Released
Medium
Chris J Arges
Precise
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:

Impact:
When running KVM with a few VM's with physical eth bridges and TAP interfaces connected between KVM (via libvirt) and the bridge, occasionally kernel panics will occur.

Fix: Cherry-pick from a504b86e718a425ea4a34e2f95b5cf0545ddfd8d. This is present in Precise onwards.

Testcase:
0) Download attached netperf.tgz testcase.
1) Launch 3 instances (for example Ubuntu Natty/Oneiric).
2) Assign a public IP to each instance.
3) Copy all files from netperf.tgz to /root on the 3 instances.
4) Run the following command on each instance:
$ /root/netserver -p 12865
5) Create a file named 'remote_hosts' with the following content and copy it to /root on the 1st of the 3 instances:
REMOTE_HOSTS[0]=<public IP of 2nd instance>
REMOTE_HOSTS[1]=<public IP of 3rd instance>
NUM_REMOTE_HOSTS=2
6) Run the following command on the 1st instance to start the test:
$ cd /root
$ PATH=$PATH:. ./runemomniaggdemo.sh | tee overall.log
7) The machine will eventually panic.

--

On multiple servers on which we have Ubuntu versions installed we periodically hit this kernel panic. in netfilter flowpath The console freezes & the only way is to physically reboot the servers.

We run KVM and few VM's on the physical servers. On the physical eth's bridges are created and TAP interfaces are setup into the bridges. These TAP interfaces are connected into KVM through libvirt (we primarily use openstack for the VM management).

This problem happened in Natty 2.6.38-8-server version. After looking at the below links, we thought this could be a problem on 2.6.38-8-server kernel that is bundled along with Natty & decided to move to oneiric running 3.0 kernel.

http://lkml.indiana.edu/hypermail/linux/kernel/1106.0/00755.html
https://lkml.org/lkml/2011/6/1/754
http://www.spinics.net/lists/netfilter-devel/msg17239.html
https://lkml.org/lkml/2011/2/3/147

However, this exact problem is repeating in oneiric kernels. We request you to pls look at this issue asap. I will be happy to provide you any additional information that you will need.

Thanks

[ 5445.359446] [<ffffffffa0395200>] ? br_nf_pre_routing_finish_bridge+0x60/0xd0 [bridge]
[ 5445.372671] [<ffffffffa0396808>] br_nf_pre_routing_finish+0x328/0x360 [bridge]
[ 5445.385984] [<ffffffffa0396b96>] br_nf_pre_routing+0x356/0x370 [bridge]
[ 5445.393255] [<ffffffff815165d5>] nf_iterate+0x85/0xc0
[ 5445.400827] [<ffffffffa038ffd0>] ? br_handle_local_finish+0x50/0x50 [bridge]
[ 5445.408643] [<ffffffff81516686>] nf_hook_slow+0x76/0x130
[ 5445.416438] [<ffffffffa038ffd0>] ? br_hanlde_local_finish+0x50/0x50 [bridge]
[ 5445.424202] [<ffffffffa0390424>] br_handle_frame+0x1b4/0x260 [bridge]
[ 5445.431824] [<ffffffffa0390270>] ? br_hanlde_frame_finish+0x2a0/0x2a0 [bridge]
[ 5445.446876] [<ffffffff814e9b8d>] __netif_receive_skb+0x17d/0x540
[ 5445.454815] [<ffffffff814ea191>] process_backlog+0xb1/0x190
[ 5445.462790] [<ffffffff814eb2f4>] net_rx_action+0x134/0x290
[ 5445.470445] [<ffffffff81065ea8>] __do_softirq+0xa8/0x210
[ 5445.478364] [<ffffffff8160831c>] call_softirq+0x1c/0x30
[ 5445.486145] <EOI> [<ffffffff8100c295>] ? do_softirq+0x65/0xa0
[ 5445.494076] [<ffffffff814eb7d8>] netif_rx_ni+0x28/0x30
[ 5445.501902] [<ffffffff8144c702>] tun_chr_aio_write+0x342/0x4f0
[ 5445.509771] [<ffffffff8144c3c0>] ? tun_chr_aio_read+0xd0/0xd0
[ 5445.517864] [<ffffffff81167db3>] do_sync_readv_writev+0xd3/0x110
[ 5445.526011] [<ffffffff81281ecc>] ? security_file_permission+0x2c/0xb0
[ 5445.534177] [<ffffffff811674b1>] ? rw_verify_area+0x61/0xf0
[ 5445.542339] [<ffffffff8116807d>] do_readv_write_v+0xcd/0x1d0
[ 5445.550227] [<ffffffff811681bc>] vfs_writev+0x3c/0x50
[ 5445.558337] [<ffffffff8116831a>] sys_writev+0x4a/0xb0
[ 5445.566288] [<ffffffff81607102>] system_call_fastpath+0x16/0x1b
[ 5445.574267] ---[ end trace f585024e7045bff8 ]---

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: linux-image-3.0.0-13-server 3.0.0-13.22
ProcVersionSignature: Ubuntu 3.0.0-13.22-server 3.0.6
Uname: Linux 3.0.0-13-server x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 1.23-0ubuntu4
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
CRDA: Error: [Errno 2] No such file or directory
Date: Fri Dec 16 09:51:41 2011
HibernationDevice: RESUME=UUID=5d83c5ae-1432-4ed0-be62-0b5bb54d8989
IwConfig: Error: [Errno 2] No such file or directory
MachineType: Dell Inc. PowerEdge R710
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_US:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.0.0-13-server root=UUID=6adff529-7713-48d3-9174-d7e8f7fe5053 ro splash quiet vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.0.0-13-server N/A
 linux-backports-modules-3.0.0-13-server N/A
 linux-firmware 1.60
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
UpgradeStatus: Upgraded to oneiric on 2011-12-09 (6 days ago)
dmi.bios.date: 01/31/2011
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 3.0.0
dmi.board.name: 00NH4P
dmi.board.vendor: Dell Inc.
dmi.board.version: A12
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr3.0.0:bd01/31/2011:svnDellInc.:pnPowerEdgeR710:pvr:rvnDellInc.:rn00NH4P:rvrA12:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R710
dmi.sys.vendor: Dell Inc.

Revision history for this message
Shyam (shyam-zadarastorage) wrote :
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: natty
Revision history for this message
Shyam (shyam-zadarastorage) wrote :
Revision history for this message
Shyam (shyam-zadarastorage) wrote :
Revision history for this message
Shyam (shyam-zadarastorage) wrote :

Pls note that the stack trace is slightly different between the reference links of 2.6.38-8 netfilter corruption & the ones we are hitting.

We are constantly hitting the ones as shown in the attachments (png files).

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Shyam,

Would it be possible for you to test the latest upstream kernel? It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . If possible, please test the latest v3.2-rcN kernel (Not a kernel in the daily directory). Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed by the mainline kernel, please add the following tag 'kernel-fixed-upstream-KERNEL-VERSION'. For example, if kernel version 3.2-rc1 fixed and issue, the tag would be: 'kernel-fixed-upstream-v3.2-rc1'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.

Thanks in advance.

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key needs-upstream-testing
Revision history for this message
Shyam (shyam-zadarastorage) wrote : RE: [Bug 905219] Re: Linux Kernel crash in Netfilter both in Natty (2.6.38-8-server) and oneiric(3.0.0-13-server) kernels
Download full text (6.6 KiB)

Hi Joseph,

I will try this & get back to you.

--Shyam

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of
Joseph Salisbury
Sent: Friday, December 16, 2011 8:07 PM
To: <email address hidden>
Subject: [Bug 905219] Re: Linux Kernel crash in Netfilter both in Natty
(2.6.38-8-server) and oneiric(3.0.0-13-server) kernels

Hi Shyam,

Would it be possible for you to test the latest upstream kernel? It
will allow additional upstream developers to examine the issue. Refer to
https://wiki.ubuntu.com/KernelMainlineBuilds . If possible, please test
the latest v3.2-rcN kernel (Not a kernel in the daily directory). Once
you've tested the upstream kernel, please remove the 'needs-upstream-
testing' tag(Only that one tag, please leave the other tags). This can
be done by clicking on the yellow pencil icon next to the tag located at
the bottom of the bug description and deleting the 'needs-upstream-
testing' text.

If this bug is fixed by the mainline kernel, please add the following
tag 'kernel-fixed-upstream-KERNEL-VERSION'. For example, if kernel
version 3.2-rc1 fixed and issue, the tag would be: 'kernel-fixed-
upstream-v3.2-rc1'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not
boot, please add the tag: 'kernel-unable-to-test-upstream'.

Thanks in advance.

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Tags added: kernel-da-key needs-upstream-testing

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/905219

Title:
  Linux Kernel crash in Netfilter both in Natty (2.6.38-8-server) and
  oneiric(3.0.0-13-server) kernels

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  On multiple servers on which we have Ubuntu versions installed we
  periodically hit this kernel panic. in netfilter flowpath The console
  freezes & the only way is to physically reboot the servers.

  We run KVM and few VM's on the physical servers. On the physical eth's
  bridges are created and TAP interfaces are setup into the bridges.
  These TAP interfaces are connected into KVM through libvirt (we
  primarily use openstack for the VM management).

  This problem happened in Natty 2.6.38-8-server version. After looking
  at the below links, we thought this could be a problem on
  2.6.38-8-server kernel that is bundled along with Natty & decided to
  move to oneiric running 3.0 kernel.

  http://lkml.indiana.edu/hypermail/linux/kernel/1106.0/00755.html
  https://lkml.org/lkml/2011/6/1/754
  http://www.spinics.net/lists/netfilter-devel/msg17239.html
  https://lkml.org/lkml/2011/2/3/147

  However, this exact problem is repeating in oneiric kernels. We request
you to pls look at this issue asap. I will be happy to provide you any
additional information that you will need.

  Thanks

  [ 5445.359446] [<ffffffffa0395200>] ?
br_nf_pre_routing_finish_bridge+0x60/0xd0 [bridge]
  [ 5445.372671] [<ffffffffa0396808>] br_nf_pre_routing_finish+0x328/0x360
[bridge]
  [ 5445.385984] [<ffffffffa0396b96>] br_nf_pre_routing...

Read more...

Revision history for this message
Kiall Mac Innes (kiall) wrote : Re: Linux Kernel crash in Netfilter both in Natty (2.6.38-8-server) and oneiric(3.0.0-13-server) kernels

I've added the OpenStack nova project to the bug, as I'm another OS user who is seeing this issue. Maybe someone there can detail exactly what OS is setting up and expecting...

I've attached the complete panic trace output I obtained via netconsole..

> Linux stack03 3.0.0-14-server #23-Ubuntu SMP Mon Nov 21 20:49:05 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

I'll be paying a visit to the datacenter today once a delivery arrives in, I'll go ahead and try the latest mainline kernel while I'm there.

summary: Linux Kernel crash in Netfilter both in Natty (2.6.38-8-server) and
- oneiric(3.0.0-13-server) kernels
+ oneiric(3.0.0-13-server/3.0.0-14-server) kernels
Revision history for this message
Kiall Mac Innes (kiall) wrote :

Just to show how often this has been occurring for the last day or so, I've attached both a uptime graph from one of the servers..

Revision history for this message
Kiall Mac Innes (kiall) wrote :

This appears to be solved in the mainline v3.2-precise kernel[1]. Things have been much more stable since installing this kernel.. Including some other kernel issues I've had on those servers.

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-precise/

tags: added: kernel-fixed-upstream-v3.2-precise
removed: needs-upstream-testing
Brian Waldon (bcwaldon)
no longer affects: nova
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Natty):
status: New → Confirmed
Changed in linux (Ubuntu Oneiric):
status: New → Confirmed
Chris J Arges (arges)
Changed in linux (Ubuntu Natty):
assignee: nobody → Chris J Arges (christopherarges)
Changed in linux (Ubuntu Oneiric):
assignee: nobody → Chris J Arges (christopherarges)
Changed in linux (Ubuntu Natty):
importance: Undecided → Medium
Changed in linux (Ubuntu Oneiric):
importance: Undecided → Medium
Revision history for this message
Chris J Arges (arges) wrote :

Hi,
So far we know this bug affects 3.0.0-13.22, but is fixed in v3.2.
It would be great to bisect this issue further by testing a 3.1 series kernel.
Refer to https://wiki.ubuntu.com/KernelMainlineBuilds .
If this bug is fixed by the mainline v3.1 kernel, please add the following tag 'kernel-fixed-upstream-KERNEL-VERSION'.

This way we have a smaller space of patches to look at that will potentially fix this issue.

Thanks ,

Revision history for this message
Chris J Arges (arges) wrote :

Another request. Can you provide a bit more detail on how to reproduce this bug? If there are any special networking settings used, or any additional information. Enough so I can reproduce on my end.
Thanks,

Revision history for this message
Chris J Arges (arges) wrote :

Here is the upstream netfilter bug that was filed:
https://bugzilla.netfilter.org/bugzilla3/show_bug.cgi?id=765

Chris J Arges (arges)
Changed in linux (Ubuntu Natty):
status: Confirmed → Invalid
Changed in linux (Ubuntu Oneiric):
status: Confirmed → Invalid
Changed in linux (Ubuntu):
status: Triaged → Invalid
tags: added: kernel-fixed-upstream-precise
removed: kernel-fixed-upstream-v3.2-precise
tags: added: kernel-fixed-upstream precise
removed: kernel-fixed-upstream-precise
Chris J Arges (arges)
Changed in linux (Ubuntu Natty):
status: Invalid → Confirmed
Changed in linux (Ubuntu Oneiric):
status: Invalid → Confirmed
Changed in linux (Ubuntu):
status: Invalid → Won't Fix
status: Won't Fix → New
Chris J Arges (arges)
Changed in linux (Ubuntu Precise):
status: New → Fix Released
Changed in linux (Ubuntu):
status: New → Fix Released
Revision history for this message
Chris J Arges (arges) wrote :

Thanks to Juerg for bisecting and finding the proper cherry-pick a504b86e718a425ea4a34e2f95b5cf0545ddfd8d:

A build for the Natty kernel can be found here:
http://people.canonical.com/~arges/lp905219/

Changed in linux (Ubuntu Natty):
status: Confirmed → In Progress
Revision history for this message
Chris J Arges (arges) wrote :

Reproducer script.

description: updated
Chris J Arges (arges)
description: updated
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Natty):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Oneiric):
status: Confirmed → Fix Committed
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel for natty in -proposed solves the problem (2.6.38-15.64). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-natty' to 'verification-done-natty'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-natty
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel for oneiric in -proposed solves the problem (3.0.0-23.38). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-oneiric' to 'verification-done-oneiric'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-oneiric
Revision history for this message
Luis Henriques (henrix) wrote :

Chris (or anyone able to reproduce the issue), any chances of verifying that -proposed kernel fixes the issue both on Oneiric and on Natty? Thanks.

Chris J Arges (arges)
tags: added: verification-done-natty
removed: verification-needed-natty
Revision history for this message
Luis Henriques (henrix) wrote :

Since the fix for this bug was a clean cherry-pick from upstreams, and has been tested on Natty, I'm tagging this as verified in Oneiric.

tags: added: verification-done-oneiric
removed: verification-needed-oneiric
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.38-15.64

---------------
linux (2.6.38-15.64) natty-proposed; urgency=low

  [ Andy Whitcroft ]

  * fix ABI directory naming

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1019992

linux (2.6.38-15.63) natty-proposed; urgency=low

  [ Andy Whitcroft ]

  * No change upload to fix .ddeb generation in the PPA.

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1019992

linux (2.6.38-15.62) natty-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #1019992

  [ Upstream Kernel Changes ]

  * tun: reserves space for network in skb
    - LP: #905219
  * KVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGS
    - LP: #1018440
 -- Andy Whitcroft <email address hidden> Fri, 06 Jul 2012 15:49:42 +0100

Changed in linux (Ubuntu Natty):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.9 KiB)

This bug was fixed in the package linux - 3.0.0-23.38

---------------
linux (3.0.0-23.38) oneiric-proposed; urgency=low

  [ Andy Whitcroft ]

  * No change upload to fix .ddeb generation in the PPA.

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1020623

linux (3.0.0-23.37) oneiric-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #1020623

  [ Luis Henriques ]

  * SAUCE: (upstreamed) [media] ene_ir: Fix driver initialisation
    - LP: #1014800

  [ Upstream Kernel Changes ]

  * Revert "net: maintain namespace isolation between vlan and real device"
    - LP: #1013748
  * hwmon: (k10temp) Add support for AMD Trinity CPUs
    - LP: #1009086
  * hwmon: (fam15h_power) Increase output resolution
    - LP: #1009086
  * x86/amd: Re-enable CPU topology extensions in case BIOS has disabled it
    - LP: #1009087
  * SCSI: fix scsi_wait_scan
    - LP: #1013748
  * SCSI: Fix dm-multipath starvation when scsi host is busy
    - LP: #1013748
  * mm: consider all swapped back pages in used-once logic
    - LP: #1013748
  * mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race
    condition
    - LP: #1013748
  * iwlwifi: update BT traffic load states correctly
    - LP: #1013748
  * cifs: fix oops while traversing open file list (try #4)
    - LP: #1013748
  * PARISC: fix boot failure on 32-bit systems caused by branch stubs
    placed before .text
    - LP: #1013748
  * PARISC: fix TLB fault path on PA2.0 narrow systems
    - LP: #1013748
  * solos-pci: Fix DMA support
    - LP: #1013748
  * mac80211: fix ADDBA declined after suspend with wowlan
    - LP: #1013748
  * NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO
    - LP: #1013748
  * drm/radeon: fix XFX quirk
    - LP: #1013748
  * drm/i915: properly handle interlaced bit for sdvo dtd conversion
    - LP: #1013748
  * drm/i915: wait for a vblank to pass after tv detect
    - LP: #1013748
  * Bluetooth: btusb: Add USB device ID "0a5c 21e8"
    - LP: #1013748
  * Bluetooth: btusb: typo in Broadcom SoftSailing id
    - LP: #1013748
  * Add Foxconn / Hon Hai IDs for btusb module
    - LP: #1013748
  * Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C
    - LP: #1013748
  * ALSA: usb-audio: fix rate_list memory leak
    - LP: #1013748
  * vfs: umount_tree() might be called on subtree that had never made it
    - LP: #1013748
  * mtd: nand: fix scan_read_raw_oob
    - LP: #1013748
  * drm/radeon: properly program gart on rv740, juniper, cypress, barts,
    hemlock
    - LP: #1013748
  * drm/radeon: fix HD6790, HD6570 backend programming
    - LP: #1013748
  * drm/ttm: Fix spinlock imbalance
    - LP: #1013748
  * ipv4: Do not use dead fib_info entries.
    - LP: #1013748
  * ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
    - LP: #1013748
  * ipv6: fix incorrect ipsec fragment
    - LP: #1013748
  * l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case
    - LP: #1013748
  * pktgen: fix crash at module unload
    - LP: #1013748
  * pktgen: fix module unload for good
    - LP: #1013748
  * sctp: check cached dst before using it
    - LP: #1013748
  * skb: avoid unnecessary reallocations...

Read more...

Changed in linux (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Changed in linux:
importance: Unknown → High
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.