natty, invalid opcode: 0000 [#1] SMP

Bug #713769 reported by Fabio Marconi
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
High
linux (Fedora)
Fix Released
Undecided
linux (Ubuntu)
Fix Released
High
Andy Whitcroft

Bug Description

running ubuntu-bug linux, after collect data , gdm crash and switch to unusable console (no cursor, only mouse pointer) that show the 'cut here' in the attached syslog, i was able to restore this session only hitting many times ctrl+alt+f7.
As apport still running i can send this.

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: linux-image-2.6.38-1-generic 2.6.38-1.28
Regression: No
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.38-1.28-generic 2.6.38-rc2
Uname: Linux 2.6.38-1-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1c', '/dev/snd/pcmC0D2p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code -11:
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'ICH'/'Intel ICH with ALC850 at irq 23'
   Mixer name : 'Realtek ALC850 rev 0'
   Components : 'AC97a:414c4790'
   Controls : 42
   Simple ctrls : 27
Date: Sat Feb 5 18:54:33 2011
LiveMediaBuild: Ubuntu 11.04 "Natty Narwhal" - Alpha amd64 (20110202)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 002: ID 0718:0628 Imation Corp.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
ProcEnviron:
 LANGUAGE=it_IT.UTF-8:it:en_GB:en
 LANG=it_IT.UTF-8
 LC_MESSAGES=it_IT.utf8
 SHELL=/bin/bash
ProcKernelCmdLine: noprompt cdrom-detect/try-usb=true file=/cdrom/preseed/hostname.seed boot=casper initrd=/casper/initrd.lz quiet splash -- debian-installer/language=it console-setup/layoutcode?=it
RelatedPackageVersions:
 linux-restricted-modules-2.6.38-1-generic N/A
 linux-backports-modules-2.6.38-1-generic N/A
 linux-firmware 1.46
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
dmi.bios.date: 11/03/2005
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P1.50
dmi.board.asset.tag: 00000000
dmi.board.name: K8NF4G-SATA2
dmi.board.version: 1.00
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP1.50:bd11/03/2005:svn:pnK8NF4G-SATA2:pvr1.00:rvn:rnK8NF4G-SATA2:rvr1.00:
dmi.product.name: K8NF4G-SATA2
dmi.product.version: 1.00

Revision history for this message
Fabio Marconi (fabiomarconi) wrote :
tags: added: iso-testing
removed: needs-upstream-testing
Revision history for this message
Fabio Marconi (fabiomarconi) wrote :
Revision history for this message
In , Michael (michael-redhat-bugs) wrote :

I get the following backtrace when running fuser on a local directory eg. fuser /var/cache/yum on a system with NFSv4 mounted via autofs. The system isn't usable afterwards.

kernel BUG at fs/namei.c:405!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
CPU 0
Modules linked in: fuse nfs lockd fscache nfs_acl rpcsec_gss_krb5 auth_rpcgss des_generic sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm serio_raw snd_timer snd e1000e iTCO_wdt soundcore snd_page_alloc i2c_i801 microcode iTCO_vendor_support firewire_ohci firewire_core pata_acpi crc_itu_t ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Pid: 2689, comm: fuser Not tainted 2.6.38-0.rc3.git4.1.fc15.x86_64 #1 DQ45CB/DQ45CB
RIP: 0010:[<ffffffff8112a1f8>] [<ffffffff8112a1f8>] nameidata_drop_rcu+0x2d/0xca
RSP: 0018:ffff880057e9fc78 EFLAGS: 00010246
RAX: ffff880070715c80 RBX: ffff880057e9fdd0 RCX: ffff880057cb0b40
RDX: 0000000000000000 RSI: 00000000f42f9d10 RDI: ffff880057e9fdd0
RBP: ffff880057e9fc98 R08: 0000000000000000 R09: 00000000000007ff
R10: 00000000000007ff R11: ffff880065c9f180 R12: ffff880070684300
R13: ffff880065c9f180 R14: 0000000000000000 R15: 0000000000000000
FS: 00007fe65d919720(0000) GS:ffff880079800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001ac74b8 CR3: 0000000071fc0000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process fuser (pid: 2689, threadinfo ffff880057e9e000, task ffff880070715c80)
Stack:
 00000000000007ff ffff880065c9f180 ffff880057e9fdd0 ffff880057e9fd10
 ffff880057e9fcb8 ffffffff8112ae6a ffff880057e9fdd0 0000000000000000
 ffff880057e9fcf8 ffffffff8112b004 ffff880052b47a80 ffff880057e9fdd0
Call Trace:
 [<ffffffff8112ae6a>] force_reval_path.isra.24+0x39/0x5a
 [<ffffffff8112b004>] do_follow_link+0x179/0x1e8
 [<ffffffff8112b32b>] link_path_walk+0x2b8/0x430
 [<ffffffff8112b6ce>] do_path_lookup+0x4d/0xf6
 [<ffffffff8112c4d0>] user_path_at+0x57/0x94
 [<ffffffff8146f37c>] ? _cond_resched+0xe/0x22
 [<ffffffff81124a2f>] ? might_fault+0x21/0x23
 [<ffffffff81124b28>] ? cp_new_stat+0xf7/0x10d
 [<ffffffff81124d16>] vfs_fstatat+0x39/0x63
 [<ffffffff81124d7b>] vfs_stat+0x1b/0x1d
 [<ffffffff81124e7a>] sys_newstat+0x1a/0x33
 [<ffffffff8112985f>] ? path_put+0x1f/0x23
 [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
Code: 89 e5 41 55 41 54 53 41 52 0f 1f 44 00 00 65 48 8b 04 25 c0 cc 00 00 f6 47 40 40 48 89 fb 4c 8b a0 20 05 00 00 4c 8b 6f 08 75 02 <0f> 0b 48 83 7f 20 00 74 20 49 8d 7c 24 04 e8 a5 66 34 00 49 8b
RIP [<ffffffff8112a1f8>] nameidata_drop_rcu+0x2d/0xca
 RSP <ffff880057e9fc78>

Revision history for this message
In , Chuck (chuck-redhat-bugs) wrote :

(In reply to comment #1)
> I get the following backtrace when running fuser on a local directory eg. fuser
> /var/cache/yum on a system with NFSv4 mounted via autofs.

What does that mean exactly? Does some remote system have that directory mounted, or is autofs being used to mount some unrelated directory? Can you list steps needed to reproduce this, including what autofs config to use?

Revision history for this message
In , Michael (michael-redhat-bugs) wrote :

The crash happens when I run fuser on an unrelated ordinary directory on the local disk. I am still trying to narrow down the circumstances to work out when it occurs and when it doesn't. So far I have found that if I am logged in (via a graphical desktop) in a home directory that is NFS mounted via autofs then running fuser (as root from a text console) gives the backtrace. If I start again with a clean boot, log in graphically and log out (so NFS will still be mounted), and then run fuser I don't get a crash. I am not sure whether autofs or NFS mounting is necessary for the crash or just a consequence of the environment I am testing it in.

Revision history for this message
In , Michael (michael-redhat-bugs) wrote :

autofs is unnecessary, it is reproducible in a hand mounted NFS directory with no automount or other NFS mounts.

Revision history for this message
In , Michael (michael-redhat-bugs) wrote :

So in summary to reproduce this problem you need a graphical session in an NFSv4 mounted home directory when you run fuser. It doesn't happen with a user home directory on the local disk.

Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Herton R. Krzesinski (herton) wrote :

The BUG_ON trace on this is quite interesting, unless I missed something:

* do_path_lookup calls path_init_rcu, which sets the LOOKUP_RCU in nd->flags

* path_walk_rcu doesn't appear on the trace as it's inline, just link_path_walk appears. So link_path_walk is under path_walk_rcu called from do_path_lookup

* now things get better: inside link_path_walk, it actually checks "if (nd->flags & LOOKUP_RCU)", and only under it it executes nameidata_drop_rcu, so the "BUG_ON(!(nd->flags & LOOKUP_RCU))" inside nameidata_drop_rcu shouldn't happen.

Probably the flag is reset at some point under exec_permission which goes under aufs code, or something else (another concurrent code).

Revision history for this message
Herton R. Krzesinski (herton) wrote :

That is, something from the new RCU path based lookup in 2.6.38 seems to trigger this with aufs, as it's live cd. May be aufs is at fault here.

Revision history for this message
Herton R. Krzesinski (herton) wrote :

Most likely the flag reset occurs over a double call of nameidata_drop_rcu in same nd, and BUG_ON catches it properly. exec_permission in aufs is fine, it returns -EPERM.

Revision history for this message
Andy Whitcroft (apw) wrote :

I see that this was on the 2.6.38-1.28 kernel. We have rebased to a much later upstream snapshot and also pulled in a number of aufs directory handling fixes since then. Perhaps you could test 'tommorrows' daily as that should have the 2.6.38-3.30 kernel. Please report any testing results here. Thanks!

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Herton R. Krzesinski (herton) wrote :

Seems to be not happening only with aufs, there is now a report about it happening with nfs too: https://lkml.org/lkml/2011/2/11/1

Revision history for this message
Fabio Marconi (fabiomarconi) wrote :

present in 2.6.38 too

Revision history for this message
Fabio Marconi (fabiomarconi) wrote :

Ops, present in 2.6.38-3.30

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
summary: - natty 20110202, invalid opcode: 0000 [#1] SMP
+ natty, invalid opcode: 0000 [#1] SMP
Changed in linux:
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
Fabio Marconi (fabiomarconi) wrote :

It popup also running update manager, pressing ctrl+alt+f1 and ctrl+alt+f7 restore this live session as nothing occurred

tags: added: kernel-bug
Revision history for this message
Fabio Marconi (fabiomarconi) wrote :

Also this happens only in this system, on the other running regularry natty don't happens

Revision history for this message
Herton R. Krzesinski (herton) wrote :

Please disregard what I posted in comment #3-#5, I'm blind and didn't saw the force_reval_path.

I reproduced the issue here with nfsv4 and debugged properly, I may have some fix to propose soon. The problem still happens on today Linus tree.

Revision history for this message
Herton R. Krzesinski (herton) wrote :

From what I saw force_reval_path is never called with LOOKUP_RCU flag set. Who calls force_reval_path is __do_follow_link, and __do_follow_link is called from do_filp_open and do_follow_link. When do_filp_open calls it, it already dropped LOOKUP_RCU (if path rcu lookup didn't work). do_follow_link also calls it without LOOKUP_RCU set from what I'm seeing, as before each do_follow_link call nameidata_dentry_drop_rcu_maybe is called.

So simply removing again nameidata_drop_rcu from force_reval_path should do it. I'm running now on this attached patch and didn't experience the same bug or other problems.

Revision history for this message
In , Chuck (chuck-redhat-bugs) wrote :

Looks like this is fixed in today's updates (commit 844a391799c25d9ba85cbce33e4697db06083ec6)

Revision history for this message
Herton R. Krzesinski (herton) wrote :

Patch posted for comments: https://lkml.org/lkml/2011/2/15/292

Revision history for this message
Herton R. Krzesinski (herton) wrote :

Nevermind, it was fixed upstream too today in commit 844a391 "nothing in do_follow_link() is going to see RCU"

Changed in linux:
status: Confirmed → Fix Released
Revision history for this message
Andy Whitcroft (apw) wrote :

Looks very much like this fix already released into Linus' tree, in v2.6.38-rc5. Natty was rebased onto this last night, and I have pushed some preliminary builds which contain the upstream fix mentioned for testing. Could those affected test the kernel below and report back here:

    http://people.canonical.com/~apw/lp713769-natty/

Thanks.

tags: added: kernel-key
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
assignee: nobody → Andy Whitcroft (apw)
Revision history for this message
Herton R. Krzesinski (herton) wrote :

@apw -- Fixed here as expected, confirmed by testing this kernel, can't reproduce the issue anymore with nfs4

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
In , Michael (michael-redhat-bugs) wrote :

Yes, kernel-2.6.38-0.rc5.git0.1.fc15.x86_64 has stopped throwing up kernel bugs (fuser throws up warnings about Stale NFS file handle for open but deleted files but I doubt that is a kernel bug).

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.38-4.31

---------------
linux (2.6.38-4.31) natty; urgency=low

  [ Andy Whitcroft ]

  * add in bugs closed by upstream patches pulled in by rebases
  * rebase to 795abaf1e4e188c4171e3cd3dbb11a9fcacaf505
  * [Config] enable CONFIG_VSX to allow use of vector instuctions
  * resync with maverick 98defa1c5773a3d7e4c524967eb01d5bae035816
  * rebase to mainline v2.6.38-rc5
  * SAUCE: ecryptfs: read on a directory should return EISDIR if not
    supported
    - LP: #719691

  [ Colin Ian King ]

  * SAUCE: Dell All-In-One: Remove need for Dell module alias

  [ Manoj Iyer ]

  * SAUCE: (drop after 2.6.38) add ricoh 0xe823 pci id.
    - LP: #717435

  [ Tim Gardner ]

  * [Config] CONFIG_CRYPTO_CRC32C_INTEL=y

  [ Upstream Kernel Changes ]

  * Quirk to fix suspend/resume on Lenovo Edge 11,13,14,15
    - LP: #702434
  * vfs: fix BUG_ON() in fs/namei.c:1461

  [ Vladislav P ]

  * SAUCE: Release BTM while sleeping to avoid deadlock.
    - LP: #713837

  [ Major Kernel Changes ]

  * rebase from v2.6.38-rc4 to v2.6.38-rc5
    - LP: #579276
    - LP: #715877
    - LP: #713769
  * resync with Maverick Ubuntu-2.6.35-27.47
 -- Andy Whitcroft <email address hidden> Fri, 11 Feb 2011 17:24:09 +0000

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

kernel-2.6.38-0.rc5.git1.1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.38-0.rc5.git1.1.fc15

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

kernel-2.6.38-0.rc5.git1.1.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.

Revision history for this message
Lasse Vinther Tyron (lasse) wrote :
Download full text (4.1 KiB)

This happens frequently:

Aug 12 13:42:41 Kontor kernel: [17097.148266] ------------[ cut here ]------------
Aug 12 13:42:41 Kontor kernel: [17097.148308] kernel BUG at /build/buildd/linux-2.6.38/fs/btrfs/extent-tree.c:5512!
Aug 12 13:42:41 Kontor kernel: [17097.148354] invalid opcode: 0000 [#1] SMP
Aug 12 13:42:41 Kontor kernel: [17097.148404] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map
Aug 12 13:42:41 Kontor kernel: [17097.148454] Modules linked in: snd_seq_dummy nls_utf8 isofs binfmt_misc parport_pc ppdev snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_usb_audio snd_pcm snd_hwdep snd_usbmidi_lib snd_seq_midi rndis_wlan snd_rawmidi snd_seq_midi_event cfg80211 rndis_host snd_seq i915 cdc_ether gspca_sn9c20x psmouse gspca_main usbnet videodev drm_kms_helper snd_timer asus_atk0110 snd_seq_device drm serio_raw snd i2c_algo_bit video soundcore snd_page_alloc lp parport usbhid hid usb_storage uas btrfs zlib_deflate libcrc32c
Aug 12 13:42:41 Kontor kernel: [17097.148816]
Aug 12 13:42:41 Kontor kernel: [17097.148829] Pid: 1652, comm: btrfs-transacti Not tainted 2.6.38-10-generic #46-Ubuntu System manufacturer System Product Name/P5L8L-SE
Aug 12 13:42:41 Kontor kernel: [17097.148913] EIP: 0060:[<f82765d0>] EFLAGS: 00010282 CPU: 1
Aug 12 13:42:41 Kontor kernel: [17097.148961] EIP is at alloc_reserved_tree_block.clone.61+0x1c0/0x1f0 [btrfs]
Aug 12 13:42:41 Kontor kernel: [17097.149004] EAX: fffffffb EBX: f4bf4930 ECX: fffffffb EDX: f4bf4930
Aug 12 13:42:41 Kontor kernel: [17097.149041] ESI: e744c000 EDI: d959de6f EBP: d959de18 ESP: d959ddd4
Aug 12 13:42:41 Kontor kernel: [17097.149078] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Aug 12 13:42:41 Kontor kernel: [17097.149111] Process btrfs-transacti (pid: 1652, ti=d959c000 task=e91c9940 task.ti=d959c000)
Aug 12 13:42:41 Kontor kernel: [17097.149159] Stack:
Aug 12 13:42:41 Kontor kernel: [17097.149172] d959de6f d959de08 00000001 00000001 00000400 00000000 00000000 00000000
Aug 12 13:42:41 Kontor kernel: [17097.149234] 00000000 e923b114 f37fc750 d959de84 f5902fe0 00000033 00000000 f37fc750
Aug 12 13:42:41 Kontor kernel: [17097.149297] e923b000 d959de8c f8276ddb 00000000 00000000 00000005 00000000 00000000
Aug 12 13:42:41 Kontor kernel: [17097.149360] Call Trace:
Aug 12 13:42:41 Kontor kernel: [17097.149388] [<f8276ddb>] run_delayed_tree_ref+0xdb/0x230 [btrfs]
Aug 12 13:42:41 Kontor kernel: [17097.149436] [<f827ac74>] run_one_delayed_ref+0xc4/0x120 [btrfs]
Aug 12 13:42:41 Kontor kernel: [17097.149478] [<c1276ae4>] ? rb_erase+0xb4/0x120
Aug 12 13:42:41 Kontor kernel: [17097.149516] [<f827ad7a>] run_clustered_refs+0xaa/0x1f0 [btrfs]
Aug 12 13:42:41 Kontor kernel: [17097.149563] [<f827af6a>] btrfs_run_delayed_refs+0xaa/0x1d0 [btrfs]
Aug 12 13:42:41 Kontor kernel: [17097.149616] [<f82a8cec>] ? btrfs_run_ordered_operations+0x1ac/0x1c0 [btrfs]
Aug 12 13:42:41 Kontor kernel: [17097.149672] [<f82893c9>] btrfs_commit_transaction+0x69/0x700 [btrfs]
Aug 12 13:42:41 Kontor kernel: [17097.149714] [<c15088e9>] ? mutex_lock+0x19/0x40
Aug 12 13:42:41 Kontor kernel: [17097.149745] [<c106d4d0>] ? autoremove_wake_function+0x0/0x50
Aug 12 13:42:41 Kontor kernel: [17097.14979...

Read more...

Revision history for this message
Fabio Marconi (fabiomarconi) wrote :

Hello Lasse
This bug is already closed as fix released, then please open another bug report typing in a terminal
ubuntu-bug linux
I suggest you to do it after the bug is reproduced, so probably some track remain in the log files that will be attached.
Also attach to the new report the files /var/log/kern.log or /kern.log.old with the back trace of what happens.
Thanks
Fabio

Revision history for this message
NoBugs! (luke32j) wrote :

I noticed this error on 10.04 with 2.6.38 backport also.

Revision history for this message
Fabio Marconi (fabiomarconi) wrote :

Hello NoBugs
Same symphtoms don't mean same cause, so please file a new report typing in terminal
ubuntu-bug linux
Thanks
Fabio
---
Ubuntu Bug Squad volunteer triager
http://wiki.ubuntu.com/BugSquad

Changed in linux (Fedora):
importance: Unknown → Undecided
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.