kernel oops during boot on a nfs root diskless system, SRU TEST CASE

Bug #103044 reported by darthvader
32
Affects Status Importance Assigned to Milestone
linux-source-2.6.20 (Baltix)
Invalid
Undecided
Unassigned
linux-source-2.6.20 (Ubuntu)
Invalid
High
Unassigned
Gutsy
Invalid
Undecided
Unassigned
linux-ubuntu-modules-2.6.22 (Ubuntu)
Fix Released
High
Unassigned
Gutsy
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: linux-image-2.6.20-13-generic

The diskless box uses multiple nfs shares stacked with unionfs as root file system (see http://www.evolware.org/chri/hopeless.html). After upgrading to Feisty I got this kernel dump below. When booting the same system with linux-image-2.6.17-11 (Edgy) ist just comes up fine. The oops happens shortly after finishing initramfs, during regular system init phase.

[ 62.149865] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 62.227384] iTCO_vendor_support: vendor-support=0
[ 62.268982] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.01 (11-Nov-2006)
[ 62.309766] iTCO_wdt: Found a ICH7 or ICH7R TCO device (Version=2, TCOBASE=0x1060)
[ 62.369157] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
[ 62.477414] parport: PnPBIOS parport detected.
[ 62.504174] parport0: PC-style at 0x378 (0x778), irq 7, dma 3 [PCSPP,TRISTATE,COMPAT,EPP,ECP,DMA]
[ 62.570071] input: PC Speaker as /class/input/input2
[ 62.608493] intel_rng: FWH not detected
[ 63.093293] Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
[ 63.126035] [<ffffffff8819bc97>] :nfs:nfs_lookup+0x147/0x2d0
[ 63.174974] PGD 7ecae067 PUD 25c8067 PMD 0
[ 63.200064] Oops: 0000 [1] SMP
[ 63.218884] CPU 0
[ 63.230910] Modules linked in: pcspkr parport_pc psmouse parport iTCO_wdt iTCO_vendor_support shpchp pci_hotplug evdev tsdev af_packet nfs lockd sunrpc sg sr_mod cdrom generic ata_generic floppy ehci_hcd ata_piix libata uhci_hcd scsi_mod e1000 usbcore thermal processor fan fbcon tileblit font bitblit softcursor vesafb cfbcopyarea cfbimgblt cfbfillrect capability commoncap bonding unionfs
[ 63.438017] Pid: 3319, comm: mount Not tainted 2.6.20-13-generic #2
[ 63.475395] RIP: 0010:[<ffffffff8819bc97>] [<ffffffff8819bc97>] :nfs:nfs_lookup+0x147/0x2d0
[ 63.525784] RSP: 0018:ffff81007f4eba48 EFLAGS: 00010246
[ 63.557459] RAX: ffff81007da66000 RBX: ffff81007e900be0 RCX: 000000000000fd02
[ 63.600021] RDX: ffff81007f64e250 RSI: 0000000000000000 RDI: ffff81007f64e1e0
[ 63.642581] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 63.685144] R10: ffff81007d82e250 R11: 0000000000000128 R12: 0000000000000000
[ 63.727707] R13: ffff81007f64e1e0 R14: ffff81007f4ebc28 R15: ffff81007f4ebb18
[ 63.770268] FS: 00002b45b46676f0(0000) GS:ffffffff8054d000(0000) knlGS:0000000000000000
[ 63.818532] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 63.852798] CR2: 0000000000000008 CR3: 00000000025e6000 CR4: 00000000000006e0
[ 63.895361] Process mount (pid: 3319, threadinfo ffff81007f4ea000, task ffff81007e47c100)
[ 63.944142] Stack: 0000000000000006 ffff81007f4ebd48 ffff81007e93c980 ffff81007f4ebc58
[ 63.992147] 00000000fffeef6b ffffffff8819bdc3 000081ed00000001 0000000000000001
[ 64.036472] 0000000000000000 0000000000000000 0000000000000000 ffffffff00000000
[ 64.079707] Call Trace:
[ 64.095376] [<ffffffff8819bdc3>] :nfs:nfs_lookup+0x273/0x2d0
[ 64.129700] [<ffffffff8819c83d>] :nfs:nfs_permission+0x1dd/0x200
[ 64.166085] [<ffffffff8819c544>] :nfs:nfs_access_get_cached+0x24/0x140
[ 64.205589] [<ffffffff8023861c>] __lookup_hash+0x11c/0x160
[ 64.238866] [<ffffffff802dc2e8>] lookup_one_len_nd+0x78/0x90
[ 64.273134] [<ffffffff8800bdee>] :unionfs:unionfs_lookup_backend+0x2ce/0x930
[ 64.315703] [<ffffffff8819c83d>] :nfs:nfs_permission+0x1dd/0x200
[ 64.352090] [<ffffffff880031f3>] :unionfs:unionfs_lookup+0x53/0x70
[ 64.389464] [<ffffffff802123d8>] __fput+0x168/0x1d0
[ 64.419063] [<ffffffff8022e0e4>] mntput_no_expire+0x24/0xb0
[ 64.452813] [<ffffffff8023861c>] __lookup_hash+0x11c/0x160
[ 64.486044] [<ffffffff8025a8ef>] lookup_create+0x4f/0xa0
[ 64.518236] [<ffffffff80256d75>] sys_linkat+0xd5/0x180
[ 64.549392] [<ffffffff8020ceaf>] dput+0x2f/0x170
[ 64.577442] [<ffffffff802123d8>] __fput+0x168/0x1d0
[ 64.607040] [<ffffffff8022e0e4>] mntput_no_expire+0x24/0xb0
[ 64.640788] [<ffffffffffff81007f4eba48>
[ 64.821659] CR2: 0000000000000008

Changed in linux-source-2.6.20:
assignee: nobody → phillip-lougher
importance: Undecided → High
status: Unconfirmed → Confirmed
Revision history for this message
Dan O'Huiginn (daniel-ohuiginn) wrote :

Thanks for taking the time to report this bug. Unfortunately we can't fix it, because your description doesn't yet have enough information.

Please include the following additional information, if you have not already done so (please pay attention to lspci's additional options), as required by the Ubuntu Kernel Team:
1. Please include the output of the command "uname -a" in your next response. It should be one, long line of text which includes the exact kernel version you're running, as well as the CPU architecture.
2. Please run the command "dmesg > dmesg.log" and attach the resulting file "dmesg.log" to this bug report.
3. Please run the command "lspci -vvnn > lspci-vvnn.log" and attach the resulting file "lspci-vvnn.log" to this bug report.

For your reference, the full description of procedures for kernel-related bug reports is available at [WWW] http://wiki.ubuntu.com/KernelTeamBugPolicies. Thanks in advance!

Revision history for this message
Dan O'Huiginn (daniel-ohuiginn) wrote :

oops, ignore that; just pasted into the wrong bug report!

Revision history for this message
Cassus (cassus) wrote :

The problem still exists in 2.6.22, gusty.

Changed in linux-source-2.6.22:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Sjors Robroek (sjors) wrote : Re: feisty and gutsy, amd64: kernel oops during boot on a nfs root diskless system

This problem still existing in gutsy is a showstopper for us at the moment. We want to upgrade from Dapper to Gutsy through FAI on about 100 desktops, but this bug is preventing us.

Please consider raising the priority of this bug to critical.

Revision history for this message
Reinhard Tartler (siretart) wrote :

assigning ubuntu kernel team as was the (now marked as duplicate) bug #137765

Changed in linux-source-2.6.22:
assignee: nobody → ubuntu-kernel-team
Revision history for this message
Phillip Lougher (phillip-lougher) wrote : Re: kernel oops during boot on a nfs root diskless system

Fix for unionfs 1.4 will be in the next gutsy-lum update coming soon.

Changed in linux-source-2.6.22:
status: Triaged → Fix Committed
assignee: ubuntu-kernel-team → phillip-lougher
Revision history for this message
Tim Gardner (timg-tpi) wrote :

linux-ubuntu-modules-2.6.22_2.6.22-14.38 is in the unapproved queue

Changed in linux-source-2.6.22:
milestone: none → gutsy-updates
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted into gutsy-proposed, please test and give feedback here.

 linux-ubuntu-modules-2.6.22 (2.6.22-14.38) gutsy-proposed; urgency=low
 .
   [Amit Kucheria]
 .
   * Poulsbo: Update DRM driver to sync with moblin tree
 .
   [Phillip Lougher]
 .
   * Backport lookup_one_len_nd NFS changes from Unionfs 2.0
     - LP: #137765, 103044
   * Backport unionfs_statfs from Unionfs 2.0
     - LP: #137765, 103044
 .
   [Tim Gardner]
 .
   * postinst does not run depmod correctly.
     - LP: #134193
   * depmod uses incorrect options in postinst and postrm
     - LP: #134193
   * Add STAC9228 DMIC support.
     - LP: #153963
   * l-u-m ships with OLD cx2341x mpeg encoder firmware
     - LP: #99107
   * Prevent hard system locks when lirc_pvr150 is loaded
     - LP: #156747
   * Fix version ipw3945 string for Centrino Mobile Test (CMT).
     - LP: #128360

Changed in linux-ubuntu-modules-2.6.22:
status: New → Fix Committed
Changed in linux-source-2.6.20:
status: Confirmed → Invalid
status: New → Invalid
Revision history for this message
Tim Gardner (timg-tpi) wrote :

The related report https://bugs.edge.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/137765 has a straightforward test case.

Changed in linux-source-2.6.20:
status: New → Invalid
Changed in linux-ubuntu-modules-2.6.22:
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Revision history for this message
laga (laga) wrote :

I have installed the latest linux-ubuntu-modules from gutsy-proposed and this is still not fixed. I'm mounting a NFS share on top of squashfs as root file system. Log is attached, I'm also re-opening this bug.

Changed in linux-ubuntu-modules-2.6.22:
status: Fix Released → Confirmed
Revision history for this message
laga (laga) wrote :

Follow-up: this does not seem to be exactly the same issue. I've just tried the test case found in bug #137765 and it does not OOPS. However, bonnie++ doesn't seem to work correctly. I'm not sure if that's intended or not so I'll leave the interpretation to you.

Unionfs layout:
laga@laga-desktop:/unionfs$ mount
10.0.10.1:/opt/ltsp/i386 on /mnt type nfs (rw,addr=10.0.10.1)
tmpfs on /mnt2 type tmpfs (rw)
/home/laga/unionfs on /unionfs type unionfs (rw,dirs=/mnt2/=rw:/mnt=nfsro)

With 10.0.10.1 actually being localhost. Here's bonnie's output:

laga@laga-desktop:/unionfs/home/laga$ bonnie++
Writing with putc()...Can't putc() - disk full?

However, the disk is not full:

laga@laga-desktop:/unionfs/home/laga$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 7,5G 5,1G 2,1G 71% /
varrun 197M 216K 197M 1% /var/run
varlock 197M 0 197M 0% /var/lock
udev 197M 44K 197M 1% /dev
devshm 197M 0 197M 0% /dev/shm
lrm 197M 34M 163M 18% /lib/modules/2.6.22-14-generic/volatile
10.0.10.1:/opt/ltsp/i386
                      7,5G 5,1G 2,1G 71% /mnt
tmpfs 197M 2,8M 194M 2% /mnt2
/home/laga/unionfs 197M 2,8M 194M 2% /unionfs

I have tried to reproduce the OOPS I posted above with a union layout similar to how my diskless clients use unionfs:
/home/laga/unionfs on /unionfs type unionfs (rw,dirs=/mnt2/=rw:/mnt=nfsro)
10.0.10.1:/opt/ltsp/i386/ on /test/nfs type nfs (rw,addr=10.0.10.1)
/opt/ltsp/images/i386.img on /test/squashfs type squashfs (ro,loop=/dev/loop0)
unionfs on /test/union type unionfs (rw,dirs=/test/nfs/=rw:/test/squashfs/=ro)

I've just completed a full run of bonnie on /test/union/ and it hasn't crashed. Also, sudo find /test/union/ works without a problem. It's noteworthy that the squashfs contains the same files as the nfs mount (more or less), maybe it wasn't a smart move to do that :)

To make up for my silliness, I created another union with a different nfs share on top of the squashfs. This was the same nfs share which caused the OOPS when booting my diskless client (which now contains everything written by the client during booting). Immediately when running find . /test/union/, the kernel OOPSes again. I'm attaching the log.

I'd like to create a proper test case, but I'm running out of time for tonight unfortunately. I hope the backtrace will provide enough insight. If not, let me know and I'll come up with a sane way to reproduce the problem.

Revision history for this message
laga (laga) wrote :

aufs (another unionfs) will be included in Hardy which will work for my use case (last two comments).

It's still a valid problem in unionfs (and for gutsy) but aufs can be a good alternative. Not sure how you want to handle this bug now, but I'll just use aufs.

Revision history for this message
Ted Cabeen (ted-cabeen) wrote :

I've had success with linux-ubuntu-modules-2.6.22_2.6.22-14.38 in gutsy-proposed.

Revision history for this message
Martin Pitt (pitti) wrote :

(Hopefully) fixed in hardy, closing hardy task.

Changed in linux-ubuntu-modules-2.6.22:
status: Confirmed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Gutsy version is still in -proposed, reopening gutsy task.

Changed in linux-ubuntu-modules-2.6.22:
status: Fix Released → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

l-u-m copied from gutsy-proposed to gutsy-updates.

Changed in linux-ubuntu-modules-2.6.22:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.