BUG: scheduling while atomic: swapper/0/0x10010000

Bug #555261 reported by Nicolay Doytchev
114
This bug affects 97 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Chase Douglas

Bug Description

What was expected:
    When the RF killswitch is turned on Bluetooth is supposed to turn on.
What happened:
    When the RF killswtich is turned on Bluetooth turns on but also I get a crash reported in dmesg and Apport.

I will try different kernel tomorrow and report back the results.

ProblemType: KernelOops
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-19-generic 2.6.32-19.28
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic i686
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Annotation: Your system might become unstable now and might need to be restarted.
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: lightrush 1460 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xfebfc000 irq 21'
   Mixer name : 'SigmaTel STAC9228'
   Components : 'HDA:83847616,10280227,00100201 HDA:14f12c06,14f1000f,00100000'
   Controls : 28
   Simple ctrls : 18
Date: Sun Apr 4 13:25:24 2010
Failure: oops
HibernationDevice: RESUME=UUID=51e11985-7b00-4c10-bc89-31cfc2aa7425
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Beta i386 (20100318)
MachineType: Dell Inc. Vostro 1400
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-19-generic root=UUID=98fdfac9-b194-4474-ad94-d3efdbc28303 ro quiet splash
RelatedPackageVersions: linux-firmware 1.33
SourcePackage: linux
Title: BUG: scheduling while atomic: swapper/0/0x10010000
dmi.bios.date: 07/10/2008
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A09
dmi.board.name: 0TT361
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA09:bd07/10/2008:svnDellInc.:pnVostro1400:pvr:rvnDellInc.:rn0TT361:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Vostro 1400
dmi.sys.vendor: Dell Inc.

Revision history for this message
Nicolay Doytchev (lightrush) wrote :
description: updated
Revision history for this message
Chase Douglas (chasedouglas) wrote :

@lightrush:

The oops text normally contains data we need to debug the issue. However, for these bugs the data is useless. To get the data we need, please do the following:

$ sudo sh -c "echo function > /sys/kernel/debug/tracing/current_tracer"

Trigger the bug, then capture the trace output:

$ cat /sys/kernel/debug/tracing/trace | bzip2 -c > trace.bz2

Then upload trace.bz2 to this bug report.

Thanks

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Nicolay Doytchev (lightrush) wrote :

Here is the trace you wanted. I don't think it contains anything useful though. If there is another way to obtain useful information please let me know how and I will provide it.

Revision history for this message
Chase Douglas (chasedouglas) wrote :

My initial instructions were a little wrong. Please see the instructions at https://wiki.ubuntu.com/KernelTeam/DebuggingSchedulingWhileAtomic

Thanks

Revision history for this message
Nicolay Doytchev (lightrush) wrote :

You won't like the new attachment either.

New observation - the oops happens only after a fresh boot (off -> boot). After suspend or hibernate switching the RF kill does not trigger the oops anymore.

What else can I do?

Revision history for this message
Chase Douglas (chasedouglas) wrote :

I think what's going on is the bug has already hit, which turns off the tracing. We need to turn it back on. Before executing the steps I mentioned, please do the following:

$ sudo sh -c "echo 1 > /sys/kernel/debug/tracing/tracing_enabled"
$ sudo sh -c "echo 1 > /sys/kernel/debug/tracing/tracing_on"

Then continue with the steps outlined in the wiki page.

Thanks

Revision history for this message
Nicolay Doytchev (lightrush) wrote :

Success - 438.5KB.

If you think it is relevant I can add the extra lines to turn on the tracing to the wiki page u gave me earlier.

Revision history for this message
Chase Douglas (chasedouglas) wrote :

@lightrush:

Unfortunately, this time I don't think you hit the bug. The end of the trace even shows me that the process was bzip2 :). When you hit the bug, the end of the trace will be in the schedule_bug function. Would you mind trying again?

I'll add the enablement lines to the wiki page now that we know it works :).

Thanks

Revision history for this message
Nicolay Doytchev (lightrush) wrote :

This one is done under recovery mode and it has "schedule_bug" on its last line. I do not know if it is the right thing though since I am not familiar with what I am looking at :) .

Revision history for this message
Chase Douglas (chasedouglas) wrote :

That's perfect! I need to analyze this further, but at a glance it appears to be related to the keyboard driver somehow.

Thanks!

Changed in linux (Ubuntu):
status: Incomplete → In Progress
assignee: nobody → Chase Douglas (chasedouglas)
Revision history for this message
Chase Douglas (chasedouglas) wrote :

@lightrush:

I've uploaded a test kernel with a patch that should hopefully fix this issue. Please download and install the packages at http://people.canonical.com/~cndougla/555261/. Then test it out, especially through flipping the rf switch lots of times to ensure the bug doesn't appear. Please reply in this bug with the results.

Thanks

Revision history for this message
Nicolay Doytchev (lightrush) wrote :

Okay here are the results:

1. Settings as before, normal boot, flipping the RF switch: no oops
2. Settings as before, recovery boot, flipping the RF switch: no oops
3. Changed RF to kill WiFi (from BIOS), flipping the RF switch: no oops
4. Changed RF to kill both WiFi and BT, flipping the RF switch: no oops

Every "flipping" == gazillion on-off flips.

So I think it works well now. Can you describe how (if) that was connected to the keyboard and what was the problem generally? (in the terms you like - I can look them up)
Also will this patch make it in the official kernel and if so should I remove yours and wait for an update of the official kernel?

Thank you!

Revision history for this message
Chase Douglas (chasedouglas) wrote :

@lightrush:

Thanks for being so thorough! I will send the patch on to the kernel-team mailing list so it can be reviewed and hopefully inserted into the 10.04 image before release.

A new kernel went out today (2.6.32-20), and when you install it should upgrade your kernel to the released one. So, that's good if you didn't want a test kernel lying around, bad if you want the fix. I suggest that if this bug doesn't cause you too many issues then it's ok to upgrade. Otherwise you may want to wait until the next kernel is released, which would hopefully be sometime next week.

As for the fix itself, some code was added by Ubuntu developers to properly handle some Dell wifi switches. They did this by hooking into the Linux input event system which handles mouse, keyboard, switches, lid switch, and other miscellaneous input doodads. The problem is that the input event system hook runs in what's called interrupt context. If you're in irq context, you absolutely cannot ever stop to wait for something else. The reasons are complicated, but essentially you may be sitting on top of another process that you need to wait for. Unfortunately, the function to handle the wifi hardware does just that. The fix is simple though: listen for events in irq context, and then defer the actual handling of the event to a task that runs later in user context.

If you are interested in this sort of thing, you can follow along as I send the patch for review on the kernel-team mailing list: https://lists.ubuntu.com/mailman/listinfo/kernel-team. You can subscribe or just watch the archives for new messages.

Thanks again for testing!

Andy Whitcroft (apw)
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.32-21.31

---------------
linux (2.6.32-21.31) lucid; urgency=low

  [ Andy Whitcroft ]

  * allow modules.builtin to be optional
  * d-i: add mpt2sas to the message-modules udeb
    - LP: #530361

  [ Christopher James Halse Rogers ]

  * SAUCE: Nouveau: Add quirk framework to disable acceleration
    - LP: #544088, #546393
  * SAUCE: Nouveau: Disable acceleration on MacBook Pros
    - LP: #546393
  * SAUCE: Nouveau: Disable acceleration on GeForce3 cards
    - LP: #544088
  * SAUCE: Nouveau: Disable acceleration on 6100 cards
    - LP: #542950

  [ Stefan Bader ]

  * SAUCE: dma-mapping: Remove WARN_ON in dma_free_coherent
    - LP: #458201

  [ Surbhi Palande ]

  * SAUCE: sync before umount to reduce time taken by ext4 umount
    - LP: #543617

  [ Upstream Kernel Changes ]

  * tipc: Fix oops on send prior to entering networked mode (v3)
    - CVE-2010-1187
  * KVM: x86 emulator: Add Virtual-8086 mode of emulation
    - LP: #561425
  * KVM: x86 emulator: fix memory access during x86 emulation
    - LP: #561425
  * KVM: x86 emulator: Check IOPL level during io instruction emulation
    - LP: #561425
  * KVM: x86 emulator: Fix popf emulation
    - LP: #561425
  * KVM: Fix segment descriptor loading
    - LP: #561425
  * KVM: VMX: Update instruction length on intercepted BP
    - LP: #561425
  * KVM: VMX: Use macros instead of hex value on cr0 initialization
    - LP: #561425
  * KVM: SVM: Reset cr0 properly on vcpu reset
    - LP: #561425
  * KVM: VMX: Disable unrestricted guest when EPT disabled
    - LP: #561425
  * KVM: x86: disable paravirt mmu reporting
    - LP: #561425
  * AppArmor: Fix put of unassigned ns if aa_unpack fails
  * AppArmor: Fix refcount bug when exec fails
    - LP: #562063
  * AppArmor: Take refcount on cxt->profile to ensure it remains a valid
    reference
    - LP: #367499
  * AppArmor: fix typo in scrubbing environment variable warning
    - LP: #562060
  * AppArmor: fix regression by setting default to mediate deleted files
    - LP: #562056
  * AppArmor: fix refcount order bug that can trigger during replacement
    - LP: #367499
  * AppArmor: Make sure to unmap aliases for vmalloced dfas before they are
    live
    - LP: #529288
  * AppArmor: address performance regression of replaced profile
    - LP: #549428
  * AppArmor: make the global side the correct type
    - LP: #562047
  * AppArmor: use the kernel shared workqueue to free vmalloc'ed dfas
  * sky2: add register definitions for new chips
    - LP: #537168
  * sky2: 88E8059 support
    - LP: #537168
  * net: Fix Yukon-2 Optima TCP offload setup
    - LP: #537168
  * net: Add missing TST_CFG_WRITE bits around sky2_pci_write
    - LP: #537168
  * sky2: print Optima chip name
    - LP: #537168
  * (Upstream) dell-laptop: defer dell_rfkill_update to worker thread
    - LP: #555261
  * drm/nv40: add LVDS table quirk for Dell Latitude D620
    - LP: #539730
 -- Andy Whitcroft <email address hidden> Tue, 13 Apr 2010 18:50:58 +0100

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Ben Schwartz (bmschwar) wrote :

This bug is also present in the Karmic build shipped by Dell on this Latitude 13n.

To post a comment you must log in.