Maverick on Hyper-V causes BUG: scheduling while atomic

Bug #752064 reported by Mike Sterling
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Maverick
Fix Released
Undecided
Tim Gardner

Bug Description

Binary package hint: linux-generic

If a user wants to run Linux in a Hyper-V virtual machine (Hyper-V is Microsoft's virtualization solution, included in Windows Server 2008 and higher), Microsoft provides a set of drivers that enable the use of the high-performance synthetic devices (the equivalent of Xen's PV drivers). Maverick includes some of our drivers by default - hv_vmbus, hv_netvsc, and hv_utils. However, that version of the kernel has an bug in the hv_netvsc driver that results in multiple issues - specifically, you'll see a number of errors in /var/log/messages referring to schedule while atomic. The issue's been fixed upstream as part of the cleanup we've been doing to exit staging, and we have created a patch against Maverick that fixes it.

[ 807.276091] BUG: scheduling while atomic: swapper/0/0x10000100 [
[ 807.277476] Modules linked in: parport_pc ppdev binfmt_misc hv_utils(C) hv_netvsc(C) psmouse lp serio_raw hv_vmbus(C) i2c_piix4 parport floppy tulip [ 807.294414] Modules linked in: parport_pc ppdev binfmt_misc hv_utils(C) hv_netvsc(C) psmouse lp serio_raw hv_vmbus(C) i2c_piix4 parport floppy tulip [ 807.336663]
[ 807.337981] Pid: 0, comm: swapper Tainted: G D C 2.6.35-22-generic #33-Ubuntu Virtual Machine/Virtual Machine
[ 807.339352] EIP: 0060:[<c012c21a>] EFLAGS: 00000246 CPU: 0
[ 807.340699] EIP is at native_safe_halt+0xa/0x10

Tim Gardner (timg-tpi)
affects: linux-meta (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
status: In Progress → Invalid
assignee: Tim Gardner (timg-tpi) → nobody
Changed in linux (Ubuntu Maverick):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Revision history for this message
Tim Gardner (timg-tpi) wrote :

SRU Justification

Impact: The Microsoft hv driver in staging exhibits a variety of faults under load.

Patch description: Use sync_set_bit() and GFP_ATOMIC in key places. Also fix an erroneous WARN_ON.

Changed in linux (Ubuntu Maverick):
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted linux into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Mike Sterling (mikester-linuxonhyperv) wrote :

Verifed with kernel on x86 in maverick-proposed. Testing amd64 now.

Martin Pitt (pitti)
tags: added: verification-needed
tags: added: verification-done
removed: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted linux into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Martin (martin1140) wrote :

Guys I have a question. We have tremendous problems with Ubuntu loosing network connectivity on hyper-v.

Can you please provide the information what (which kernel) for ubuntu has these named patches already inside? We already tried around 5 different kernel versions, natty/maverick but the problem still occours. Is it the kernel? Which one? Is it Natty as new system? Do we have to update the kernel modules? Which package?

We are not really into kernel development and really tried to look up everything on our own but no solution.

We would really (!!!) appreciate your help.

Regards,
Martin

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Martin - you'll have to work with the Linux hv developers. I'd start with the Microsoft guys. As long as hv is still a staging driver, the Ubuntu kernel team isn't gonna spend much time on it.

rtg@lochsa:~/ubuntu/ubuntu-oneiric$ scripts/get_maintainer.pl -f drivers/staging/hv/hv.c
Greg Kroah-Hartman <email address hidden> (maintainer:STAGING SUBSYSTEM,commit_signer:13/14=93%)
Hank Janssen <email address hidden> (commit_signer:8/14=57%)
Haiyang Zhang <email address hidden> (commit_signer:5/14=36%)
"K. Y. Srinivasan" <email address hidden> (commit_signer:3/14=21%)
Lucas De Marchi <email address hidden> (commit_signer:1/14=7%)
<email address hidden> (open list:STAGING SUBSYSTEM)
<email address hidden> (open list)

Revision history for this message
Martin (martin1140) wrote :

Tim,

thank you very much for your quick reply. I have to say I am not really a kernel guy but I see your point. But as it is written above in this thread there seems to be a bug fix which you already have applied and was tested (succesfully as i have unterstood).

Therefore it seems that I only have to apply the right kernel, is this true?

Ubuntu really works great under hyper-v, performance, etc. - everything is fine. The only thing are these sudden network losses (which are not reproducable) is stopping us to switch from debian to ubuntu with our whole platform. We did our beta tests on ubuntu because you already included the hv drivers with a clean install and loading some drivers - this was REALLY great to hear (and in general works very well - beside this one tiny thing).

I know MS is a big company, many people, processes, etc. - perhaps the open source community is more flexible here. I would like to offer my support for testing, etc. if this helps - perhaps someone has to guide me to use the correct tools but I like to learn, therefore everything is possible if anybody may offer some help.

Thank you for another supply in advance!

Sincerely,
Martin

Revision history for this message
Mike Sterling (mikester-linuxonhyperv) wrote :

In my testing, the kernel update that Martin refers to in post 2 resolves the issue. Are you seeing problems beyond that?

Revision history for this message
Martin (martin1140) wrote :

Mike,

sorry for my silly questions but:

what is "testing". i am running

root@web2:~# uname -a
Linux web2.puaschitz.at 2.6.39-999-generic #201104261010 SMP Tue Apr 26 10:12:41 UTC 2011 x86_64 GNU/Linux

The latest nightly-build-driver I have found.

what do you have?

my ubuntu version with this kernel:

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 10.10
Release: 10.10
Codename: maverick

I upgraded the kernel this morning and lost network connection again in the afternoon...therefore my problem is still somewhere out there....

regards,
martin

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Martin - 2.6.39-999-generic is kind of a bleeding edge kernel. I suggest you stick with the Maverick 2.6.35 series for now. Enable -proposed in order to get the aforementioned Maverick kernel, and be sure to remove 2.6.39-999-generic.

Revision history for this message
Mike Sterling (mikester-linuxonhyperv) wrote :

I'm using the following build (as Tim mentions) with no issues:

Linux mike-ubu10 2.6.35-29-generic #51-Ubuntu SMP Fri Apr 15 17:13:54 UTC 2011 i686 GNU/Linux

Revision history for this message
Martin (martin1140) wrote :

Guys,

Linux web2 2.6.35-29-generic #51-Ubuntu SMP Fri Apr 15 17:12:35 UTC 2011 x86_64 GNU/Linux

thank you a lot, had to learn something about proposed kernels and installting them - if someone finds this thread, here are the last steps to go:

What are proposed Kernels: https://launchpad.net/~kernel-ppa/+archive/pre-proposed/+packages
How to add he proposed repository to apt-get: https://wiki.ubuntu.com/Testing/EnableProposed
What you want to install first is: http://www.ubuntuupdates.org/packages/show/307920
What you want to install second is: http://www.ubuntuupdates.org/packages/show/307925

Will now test out if the network loss will occour again, hoping for the very best.

Cheers,
Martin

Revision history for this message
Martin (martin1140) wrote :

Sorry guys, me again:

I updated my kernels, but we faced the same problem again (no network connectivity without any reason, not reproducable).

@Mike, please some questions:

*Are you using legacy network adapter or synteticals?
*Do you use your network interfaces with ethX or sthX (where is even the difference)?
*What are your cpu settings (how many, which options)

"apt-get upgrade" is up to date!

My running kernel is:

Linux web2 2.6.35-29-generic #51-Ubuntu SMP Fri Apr 15 17:12:35 UTC 2011 x86_64 GNU/Linux

when i shutdown or reboot i get this message in the console:

"Linux V-Server not detected in kernel. Stopping all VM guests".

Do you know this?

Many Regards,
Martin

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (30.0 KiB)

This bug was fixed in the package linux - 2.6.35-30.54

---------------
linux (2.6.35-30.54) maverick-proposed; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #794114

  [ Upstream Kernel Changes ]

  * Revert "xhci: Fix full speed bInterval encoding."
  * Revert "USB: xhci - also free streams when resetting devices"
  * Revert "USB: xhci - fix math in xhci_get_endpoint_interval()"
  * Revert "USB: xhci - fix unsafe macro definitions"

linux (2.6.35-30.53) maverick-proposed; urgency=low

  [ Upstream Kernel Changes ]

  * xhci: Fix full speed bInterval encoding.
    - LP: #792959

linux (2.6.35-30.52) maverick-proposed; urgency=low

  [ Herton R. Krzesinski ]

  * Release Tracking Bug
    - LP: #790653

  [ Stefan Bader ]

  * Include nls_iso8859-1 for virtual images
    - LP: #732046

  [ Thomas Schlichter ]

  * SAUCE: vesafb: mtrr module parameter is uint, not bool
    - LP: #778043

  [ Tim Gardner ]

  * [Config] Add cachefiles.ko to virtual flavour
    - LP: #770430

  [ Upstream Kernel Changes ]

  * Revert "intel_idle: PCI quirk to prevent Lenovo Ideapad s10-3 boot
    hang"
    - LP: #772560
  * Revert "TPM: Long default timeout fix"
    - LP: #772560
  * Revert "tpm_tis: Use timeouts returned from TPM"
    - LP: #772560
  * Revert "xen: set max_pfn_mapped to the last pfn mapped"
  * CAN: Use inode instead of kernel address for /proc file, CVE-2010-4565
    - LP: #765007
    - CVE-2010-4565
  * xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1,
    CVE-2011-0711
    - LP: #767740
    - CVE-2011-0711
  * Treat writes as new when holes span across page boundaries,
    CVE-2011-0463
    - LP: #770483
    - CVE-2011-0463
  * fs/partitions/ldm.c: fix oops caused by corrupted partition table,
    CVE-2011-1017
    - LP: #771382
    - CVE-2011-1017
  * qla2xxx: Make the FC port capability mutual exclusive.
    - LP: #772560
  * staging: usbip: bugfixes related to kthread conversion
    - LP: #772560
  * staging: usbip: bugfix add number of packets for isochronous frames
    - LP: #772560
  * staging: usbip: bugfix for isochronous packets and optimization
    - LP: #772560
  * staging: hv: Fix GARP not sent after Quick Migration
    - LP: #772560
  * staging: hv: use sync_bitops when interacting with the hypervisor
    - LP: #772560
  * irda: validate peer name and attribute lengths
    - LP: #772560
  * irda: prevent heap corruption on invalid nickname
    - LP: #772560
  * nilfs2: fix data loss in mmap page write for hole blocks
    - LP: #772560
  * ASoC: Explicitly say registerless widgets have no register
    - LP: #772560
  * ALSA: ens1371: fix Creative Ectiva support
    - LP: #772560
  * ROSE: prevent heap corruption with bad facilities
    - LP: #772560
  * Btrfs: Fix uninitialized root flags for subvolumes
    - LP: #772560
  * x86, mtrr, pat: Fix one cpu getting out of sync during resume
    - LP: #772560
  * UBIFS: do not read flash unnecessarily
    - LP: #772560
  * UBIFS: fix oops on error path in read_pnode
    - LP: #772560
  * UBIFS: fix debugging failure in dbg_check_space_info
    - LP: #772560
  * quota: Don't write quota info in dquot_commit()
    - LP: #772560
  * mm: avoid wrapping vm_...

Changed in linux (Ubuntu Maverick):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.