[Geode LX] [ION603] kernels >= 2.6.31 fail to boot [initramfs]

Bug #396286 reported by Martin-Éric Racine
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
Medium
linux (Ubuntu)
Won't Fix
High
Unassigned
Nominated for Karmic by Martin-Éric Racine
Nominated for Lucid by Martin-Éric Racine
Nominated for Maverick by Martin-Éric Racine

Bug Description

linux-image-2.6.31-2-generic oops on this FIC ION 603 (Geode LX800), right near the end of executing the content of the initramfs.

Reverting to linux-image-2.6.30-10-generic works; the system boots all the way to GDM as expected.

ProblemType: Bug
Architecture: i386
Date: Tue Jul 7 01:12:41 2009
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=5ffade8f-b837-49eb-bb44-225617349ca3
Lsusb:
 Bus 001 Device 004: ID 0ace:1215 ZyDAS WLA-54L WiFi
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 003: ID 03f9:0100 KeyTronic Corp. Keyboard
 Bus 002 Device 002: ID 046d:c00e Logitech, Inc. M-BJ58/M-BJ69 Optical Wheel Mouse
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: First International Computer, Inc. ION603
Package: linux-image-2.6.31-2-generic 2.6.31-2.15
ProcCmdLine: root=UUID=97b2628b-28a5-49f2-85f7-495728b3bef8 ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=fi_FI.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.30-10.12-generic
RelatedPackageVersions: linux-backports-modules-2.6.30-10-generic N/A
SourcePackage: linux
Uname: Linux 2.6.30-10-generic i586
dmi.bios.date: 11/08/2007
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: ION603
dmi.board.vendor: First International Computer, Inc.
dmi.board.version: PCB 2.X
dmi.chassis.type: 3
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd11/08/2007:svnFirstInternationalComputer,Inc.:pnION603:pvrVER2.X:rvnFirstInternationalComputer,Inc.:rnION603:rvrPCB2.X:cvn:ct3:cvr:
dmi.product.name: ION603
dmi.product.version: VER 2.X
dmi.sys.vendor: First International Computer, Inc.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote : apport-collect data

Architecture: i386
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=5ffade8f-b837-49eb-bb44-225617349ca3
Lsusb:
 Bus 002 Device 003: ID 03f9:0100 KeyTronic Corp. Keyboard
 Bus 002 Device 002: ID 046d:c00e Logitech, Inc. M-BJ58/M-BJ69 Optical Wheel Mouse
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 004: ID 0ace:1215 ZyDAS WLA-54L WiFi
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: First International Computer, Inc. ION603
Package: linux-image-2.6.31-2-generic 2.6.31-2.16
PackageArchitecture: i386
ProcCmdLine: root=UUID=97b2628b-28a5-49f2-85f7-495728b3bef8 ro quiet splash
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=fi_FI.UTF-8
ProcVersionSignature: Ubuntu 2.6.30-10.12-generic
RelatedPackageVersions: linux-backports-modules-2.6.30-10-generic N/A
Uname: Linux 2.6.30-10-generic i586
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare sudo
dmi.bios.date: 11/08/2007
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: ION603
dmi.board.vendor: First International Computer, Inc.
dmi.board.version: PCB 2.X
dmi.chassis.type: 3
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd11/08/2007:svnFirstInternationalComputer,Inc.:pnION603:pvrVER2.X:rvnFirstInternationalComputer,Inc.:rnION603:rvrPCB2.X:cvn:ct3:cvr:
dmi.product.name: ION603
dmi.product.version: VER 2.X
dmi.sys.vendor: First International Computer, Inc.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote : Re: kernel 2.6.31-generic oops after loading initramfs

It appears that using "apport-collect -p linux-image-2.6.31-2-generic 396286" provided the logs from booting using the last good kernel (2.6.30) rather than the one from the failed log.

Is there any way to dump the log for the kernel that fails during the initramfs stage instead?

Revision history for this message
Andy Whitcroft (apw) wrote :

If you are getting an oops in initramfs and not booting then no you won't get to a place where you can easily take an apport-collect. You normally will see the panic on the screen or can get it there with the dmesg command. If so often a digital photo is an effective solution here.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Here's a screenshot of what I get on a 80x60 console.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Still not fixed as of linux-image-2.6.31-4-generic. Is there any missing information that I can attach to this bug?

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Someone on the LKML reported successful booting on fairly similar hardware, when running a vanilla kernel compiled with the following .config options.

I would have loved to compare this with Ubuntu's kernel config to help track the source of this issue, except that /boot/config-2.6.31-4-generic only is a partial config, because Ubuntu uses config splitter to prepare its build targets, and /proc/config.gz is not enabled on Ubuntu kernels. :(

I still hope that the above config can be of use to the Ubuntu kernel team to try and track the source of the issue. :)

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

As requested by Leann Ogasawara:

I tested linux-image-2.6.31-020631rc5-generic (2.6.31-020631rc5)
from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.31-rc5/

I get the same kernel panic as above.

Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Triaged
tags: added: regression-potential
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Martin noted he's also using EXT3.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

I'm working with Martin to do a rough bisect right now.

Changed in linux (Ubuntu):
assignee: nobody → Leann Ogasawara (leannogasawara)
Changed in linux (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

To continue the series of mainline kernel tests Leann suggested:

2.6.30-020630-generic: works fine.
2.6.31-020631rc1gc0d1117-generic: kernel panic.

summary: - kernel 2.6.31-generic oops after loading initramfs
+ 2.6.31-generic: kernel panic near the end of initramfs execution
summary: - 2.6.31-generic: kernel panic near the end of initramfs execution
+ 2.6.31-generic: kernel panic near the end of initramfs run
summary: - 2.6.31-generic: kernel panic near the end of initramfs run
+ 2.6.31-generic: kernel panic near the end of initramfs
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Martin-Éric,

Thanks for testing and the feedback. We're going to try to put together some additional test kernels for you to try to continue bisecting between 2.6.30 and 2.6.31-rc1. We'll let you know when they're ready.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote : Re: 2.6.31-generic: kernel panic near the end of initramfs

Linux geode 2.6.30-999-generic #200908041153 SMP Tue Aug 4 12:48:19 UTC 2009 i586

This one booted successfully. Hurray!

I'm curious, what was the change that enabled it? Could someone attach a unified diff?

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

FYI, this is the kernel module set that is pulled in by udev. I thought that it might be useful to add it here.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Linux geode 2.6.30-999-generic #200908041829 SMP Wed Aug 5 08:58:04 UTC 2009 i586

Boots successfully.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Linux geode 2.6.30-999-generic #200908051216 SMP Wed Aug 5 11:59:01 UTC 2009 i586

Boots successfully.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thanks, I'll queue up the next one. Will post when we have an image.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Linux geode 2.6.30-999-generic #200908061755 SMP Thu Aug 6 17:39:31 UTC 2009 i586

Boots successfully.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thanks for the quick testing and feedback. Queuing next build.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Linux geode 2.6.30-999-generic #200908071146 SMP Fri Aug 7 11:29:56 UTC 2009 i586

Boots successfully.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

While waiting for the final test build, might not hurt to verify this remains with the latest 2.6.31-5 kernel. Thanks.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

2.6.31-5 has already been tested, as all other 2.6.31 that get pulled by linux-generic. Kernel panic.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Linux geode 2.6.30-999-generic #200908071658 SMP Fri Aug 7 16:40:56 UTC 2009 i586

Boots successfully.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

2.6.30-999.200908110142 does NOT boot.

It also seems to fail at an earlier stage than 2.6.31-5 does. See enclosed snapshot.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Linux geode 2.6.30-999-generic #200908112132 SMP Tue Aug 11 21:12:50 UTC 2009 i586

Boots successfully.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Linux geode 2.6.30-999-generic #200908121741 SMP Wed Aug 12 17:22:08 UTC 2009 i586

Boots successfully.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Changed in linux:
status: Unknown → Confirmed
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

2.6.30-999.200908122359 does NOT boot. Snapshot attached.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Ingo Molnar pointed out that the only Geode-specific commit he can spot is this one:

d6c585a: x86: geode: Mark mfgpt irq IRQF_TIMER to prevent resume failure

Could this be our suspect?

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Here's a larger snapshot of what I get with linux-image-2.6.31-5-generic version 2.6.31-5.24 (based on upstream 2.6.31-rc5), thanks to vesafb and a 1280x1024 framebuffer. The main advantage over the initial snapshot is that it fits more lines of the crash into the visible area.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Next image:

http://kernel.ubuntu.com/~ogasawara/mainline/daily/lp396286/bisectf21f622/linux-image-2.6.30-999-generic_2.6.30-999.200908131643_i386.deb

I unfortunately don't see the commit Ingo pointed out in the remaining list of commits we're bisecting.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Someone else said on the LKML:

> > http://launchpadlibrarian.net/30267494/2.6.31-5.24.jpg
>
> Hmm. This looks like a sysfs oops to my untrained eye.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Linux geode 2.6.30-999-generic #200908131643 SMP Thu Aug 13 16:25:22 UTC 2009 i586

Boots successfully.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

2.6.30-999.200908132259 does NOT boot. Snapshot attached.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

PS: would it be possible to include vesafb as a module in all test kernels? Thank you!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hrm, that's odd. vesafb should be getting built as a module for the test kernels, but apparently it isn't happening as you've noted. We'll investigate.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

So I'm told we're using the Jaunty config within our mainline build scripts. Mainly because if we used Karmic's config it would enable KMS if someone happened to install it within Jaunty. So there is definitely some discrepancy. Seeing as we have 1-2 more test builds to go I'd like to finish isolating the patch and then I'll build you a final mainline and Karmic test kernel with the patch reverted for you to confirm it is indeed the offending patch regardless of the different configs that we've been using to build.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Noted and understood.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Linux geode 2.6.30-999-generic #200908141609 SMP Fri Aug 14 15:53:21 UTC 2009 i586

Boots successfully.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Ok, the bisect has narrowed down the following:

f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0 is first bad commit
commit f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0
Author: Al Viro <email address hidden>
Date: Mon Jun 8 19:50:45 2009 -0400

    add caching of ACLs in struct inode

    No helpers, no conversions yet.

    Signed-off-by: Al Viro <email address hidden>

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

What if we reverse that specific commit against 2.4.31-rc6, as a test (reverse-apply the change as a patch)?

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

New snapshot, showing the current kernel panic on 2.6.31-6.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Still not fixed as of 2.6.31-7. Panic output is the same as in 2.6.31-6.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Still not fixed as of 2.6.31-9. Panic output similar to 2.6.31-6.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi,

Can you try the following kernel build?

http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2009-06-19a/

It's a build of f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0 (the first bad commit). I'm expecting it to crash. However it'll confirm http://lkml.org/lkml/2009/8/16/252:

  f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0 crashes
  f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0~1 boots fine

f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0~1 was 3e63cbb1efca7dd3137de1bb475e2e068e38ef23 which you tested and confirmed was booting fine in comment #59.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Linux geode 2.6.30-999-generic #200909032144 SMP Thu Sep 3 21:35:39 UTC 2009 i586

Boots fine.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Bah, seeing you say it booted fine when I was expecting it to fail I looked and realized I had the wrong patch queued, ugh :( So you just tested f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0~1 and indeed re-confirmed what we already knew, that it boots fine. I'm sooo sorry I wasted your time on that one. I'm going to requeue f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0 for reals this time . . .

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Indeed, doesn't boot.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Still not fixed as of 2.6.31-10.31 a.k.a. upstream 2.6.31 final.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Just to recap, this is on a host where / is an ext3 file system.

# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
# /dev/sda1
UUID=97b2628b-28a5-49f2-85f7-495728b3bef8 / ext3 relatime,errors=remount-ro 0 1
# /dev/sda5
UUID=5ffade8f-b837-49eb-bb44-225617349ca3 none swap sw 0 0

At Stefan Bader's request, here is what /proc/cpuinfo says:

processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 10
model name : Geode(TM) Integrated Processor by AMD PCS
stepping : 2
cpu MHz : 497.996
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx mmxext 3dnowext 3dnow up
bogomips : 995.99
clflush size : 32
power management:

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

BUG: unable to handle kernel paging request at ffffb4ff
IP: [<c01f716b>] __destroy_inode+0x4b/0x80
*pde = 00810067 *pte = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/power/resume

Revision history for this message
Stefan Bader (smb) wrote :

Thanks Martin-Éric,

checking against the code this confirms that the bug occurs in __destroy_inode at the following position:

232 void __destroy_inode(struct inode *inode)
233 {
234 BUG_ON(inode_has_buffers(inode));
235 ima_inode_free(inode);
236 security_inode_free(inode);
237 fsnotify_inode_delete(inode);
238 #ifdef CONFIG_FS_POSIX_ACL
239 if (inode->i_acl && inode->i_acl != ACL_NOT_CACHED)
240 posix_acl_release(inode->i_acl); /* here */
241 if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED)
242 posix_acl_release(inode->i_default_acl);
243 #endif
244 }

In EAX is the address of i_acl, so it looks like it is (repeatably) 0xffffb4ff. In theory i_acl is either a pointer to an acl structure or 0xffffffff (ACL_NOT_CACHED) or 0x0 (uninitialized). The address causing the bug seems a bit high for being a valid pointer. But just to be completely sure I put a kernel to http://people.canonical.com/~smb/bug396286/ which tries to catch a double free case.
On the other side 0xffffb4ff might be caused by something either writing 0xb4ff into the first word (little endian) or 0xb4 at offset 1 into the area that holds the pointer. Before the change that added i_acl and i_default_acl, the last field was a private pointer. Could something (this would need to be an externally build module) still use the wrong header file?...
One thing to try next would be to check whether the other pointer is corrupted too. I try to get something sensible up and the post here.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

With that kernel, the result is:

BUG: unable to handle kernel paging request at ffffb4ff
IP: [<c01f5902>] __destroy_inode+0x72/0x110
*pde = 00817067 *pte = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/power/resume

Revision history for this message
Stefan Bader (smb) wrote :

Just as memo in order to remember it after the weekend: The second kernel was uploaded and did boot further but seemed to have other problems with apparmor then. Not sure whether this is related or not.
The only change was to move the i_acl and i_default_acl to the end of the inode structure, so that the i_private pointer comes to the relative offset it was before. So the corruption of that memory location might still happen but as it is used differently it does not lead to the immediate panic.

Revision history for this message
Martin-Éric Racine (q-funk) wrote : Re: [Bug 396286] Re: 2.6.31-generic: kernel panic near the end of initramfs

Stefan, could you submit those changes to the LKML, referring to the
above kernel.org bug number, but emphasizing that this might only be a
partial fix that only masks the real problem? I'm sure that Ingo
Molnar and Al Viro would have some constructive feedback.

Revision history for this message
Stefan Bader (smb) wrote : Re: [Bug 396286] Re: 2.6.31-generic: kernel panic near the end of initramfs

Martin-Éric, I added some info to the upstream bug. I believe the relevant
people are subscribed there as well and I don't think that change really is
something near a solution as this only prevents the immediate visibility of the
corruption. It still might happen but go unnoticed as the private pointer might
be used differently (at other times). So I would not say it is a fix.

Revision history for this message
Stefan Bader (smb) wrote : Re: 2.6.31-generic: kernel panic near the end of initramfs

In order to hopefully find out more about this, I created a new kernel that will (if things go as intended) catch the corruption cases without crashes and also tries to gather more info. It replaces the other kernels at http://people.canonical.com/~smb/bug396286/. Can you try that and if it boots post the dmesg that gets produced? Or even if not, whatever can be seen on th screen. Thanks

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Stefan, sorry for not replying to this any sooner.

This one crashes in similar ways as before. However, there's one interesting development: fsck of the root filesystem succeeds in launching and it fixes errors. Then, the kernel crashes as follow:

BUG: unable to handle kernel paging request at ffffb4ff
IP: [<c01f595f>] __destroy_inode+0x6f/0x110
*pde = 00819067 *pte = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/power/resume

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Stefan, could you please attach your 2.6.31-10.32bug396286v2 diff to this bug?

Revision history for this message
Stefan Bader (smb) wrote : Re: [Bug 396286] Re: 2.6.31-generic: kernel panic near the end of initramfs

Martin-Éric Racine wrote:
> Stefan, could you please attach your 2.6.31-10.32bug396286v2 diff to

I am traveling this week and unfortunately seem to have the patch on another
box. Getting that crash mean it is not really working as I intended it to do
and I have to have another look at it. But I won't get back until next Monday.

Revision history for this message
Martin-Éric Racine (q-funk) wrote : Re: 2.6.31-generic: kernel panic near the end of initramfs

Stefan, we're already aware that simply shuffling the structure as you did for 2.6.31-10.32bug396286v2 probably only masks the real issue rather than fixes it, but it would already be a good start to attach this as a patch and to send it upstream for comments.

Revision history for this message
Stefan Bader (smb) wrote :

This is the patch used to avoid the crash by moving the new pointers to the end of the inode structure. Though I would think this won't give new reactions from upstream. They pretty much should have guessed this.

Revision history for this message
Stefan Bader (smb) wrote :

As for the last debug kernel. This unfortunately contained a copy and paste error which failed to check the i_default_acl pointer. Interestingly I would have expected this would be no problem as the previous tests seemed to indicate only the first of the two got corrupted. I am currently uploading a revised kernel build which hopefully goes without crashing. It will be at the known location in a few minutes).

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Linux geode 2.6.31-11-generic #37bug396286v1 SMP Tue Sep 29 13:43:37 UTC 2009 i586

Boots fine.

Revision history for this message
Martin-Éric Racine (q-funk) wrote : apport-collect data

AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Audio [CS5535 Audio], device 0: CS5535 Audio [CS5535 Audio]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Audio [CS5535 Audio], device 0: CS5535 Audio [CS5535 Audio]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D0c', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Audio'/'CS5535 Audio cs5535audio at 0xfe00, irq 11'
   Mixer name : 'Realtek ALC203 rev 0'
   Components : 'AC97a:414c4770'
   Controls : 33
   Simple ctrls : 21
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=5ffade8f-b837-49eb-bb44-225617349ca3
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 003: ID 046d:c00e Logitech, Inc. M-BJ58/M-BJ69 Optical Wheel Mouse
 Bus 002 Device 002: ID 03f9:0100 KeyTronic Corp. Keyboard
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: First International Computer, Inc. ION603
Package: linux 2.6.31.11.22
PackageArchitecture: i386
ProcCmdLine: root=UUID=97b2628b-28a5-49f2-85f7-495728b3bef8 ro vga=795 quiet splash crashkernel=384M-2G:64M,2G-:128M
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=fi_FI.UTF-8
 LANGUAGE=fi_FI:fi:en_US:en
ProcVersionSignature: Ubuntu 2.6.31-11.37bug396286v1-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-11-generic N/A
 linux-firmware 1.19
RfKill:

Uname: Linux 2.6.31-11-generic i586
UserGroups: adm admin audio cdrom dialout lpadmin operator plugdev pulse pulse-access sambashare staff sudo
WpaSupplicantLog:

dmi.bios.date: 11/08/2007
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: ION603
dmi.board.vendor: First International Computer, Inc.
dmi.board.version: PCB 2.X
dmi.chassis.type: 3
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd11/08/2007:svnFirstInternationalComputer,Inc.:pnION603:pvrVER2.X:rvnFirstInternationalComputer,Inc.:rnION603:rvrPCB2.X:cvn:ct3:cvr:
dmi.product.name: ION603
dmi.product.version: VER 2.X
dmi.sys.vendor: First International Computer, Inc.

Revision history for this message
Martin-Éric Racine (q-funk) wrote : .etc.asound.conf.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : AlsaDevices.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : BootDmesg.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : Card0.Amixer.values.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : Card0.Codecs.codec97.0.ac97.0.0.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : Card0.Codecs.codec97.0.ac97.0.0.regs.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : CurrentDmesg.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : Dependencies.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : Lspci.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : PciMultimedia.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : ProcCpuinfo.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : ProcInterrupts.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : ProcModules.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : UdevDb.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : UdevLog.txt
Revision history for this message
Martin-Éric Racine (q-funk) wrote : WifiSyslog.txt
tags: added: apport-collected
Revision history for this message
Martin-Éric Racine (q-funk) wrote : Re: 2.6.31-generic: kernel panic near the end of initramfs

PS: I just added dmesg.boot and dmesg.current using apport-collect, based on output from 37bug396286v1. I hope this provides useful information.

Revision history for this message
Stefan Bader (smb) wrote :

Hm, unfortunately my filename printing was not very successful. Though somehow it looks related to apparmor. Could you try to boot with "apparmor=0" and check dmesg for those bad pointer messages?

Revision history for this message
Akdo (menoft) wrote :

Hi, I've reported the duplicate bug after this one ( although I make some research before ) and I think I have more information on this issue, Bug #406484

We just have the same Wireless device ! Evil device !

Bus 001 Device 002: ID 0ace:1215 ZyDAS WLA-54L 802.11bg

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Akdo, your issue is completely unrelated to this one.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Stefan, adding "apparmor=0" to cmdline did not produce any noticable change:

[ 6.627144] EXT3-fs: mounted filesystem with writeback data mode.
[ 8.308111] bad i_default_acl pointer = ffffb4ff
[ 8.308133] on
[ 8.308689] bad i_default_acl pointer = ffffb4ff
[ 8.308705] on
[ 8.317678] bad i_default_acl pointer = ffffb4ff
[ 8.317697] on

Revision history for this message
Stefan Bader (smb) wrote : Re: [Bug 396286] Re: 2.6.31-generic: kernel panic near the end of initramfs

Martin-Éric Racine wrote:
> Stefan, adding "apparmor=0" to cmdline did not produce any noticable
> change:

Alright, it was probably unlikely as this would have affected more people. And
the same code seems to run well on other systems. But it was worth a try, just
as it happens so close to those messages.
I need to rework the part that tries to find and print the associate file.
Maybe that gives a better indication. Not sure how quickly I get that done, though.

Revision history for this message
Stefan Bader (smb) wrote : Re: 2.6.31-generic: kernel panic near the end of initramfs

Hi Martin-Éric, looks like at the time of destroy_inode there is no path information left anymore. So I created a new version which checks on every inode access. Maybe this gives a little more insight. Could you try to run the v2 version of the kernel for me? Thanks

Revision history for this message
Stefan Bader (smb) wrote :

I might as well assign that to me by now.

Changed in linux (Ubuntu):
assignee: Leann Ogasawara (leannogasawara) → Stefan Bader (stefan-bader-canonical)
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Stefan, thanks for the updated kernel. I'll test this shortly and report here on the results. Meanwhile, could this please be re-based against kernel 2.6.31-12-generic while we're at it?

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Occurrences spotted with 37v2:

<4>[ 3.180243] sda1 sda2 < sda5 >
<5>[ 3.209382] sd 0:0:0:0: [sda] Attached SCSI disk
<6>[ 3.209462] Freeing unused kernel memory: 540k freed
<6>[ 3.210813] Write protecting the kernel text: 4548k
<6>[ 3.211102] Write protecting the kernel read-only data: 1840k
<6>[ 3.588117] usb 2-3: new low speed USB device using ohci_hcd and address 2
<6>[ 3.798731] usb 2-3: configuration #1 chosen from 1 choice
<3>[ 3.917274] bad i_default_acl pointer = ffffb4ff
<3>[ 3.917314] ipath /lib/udev/rules.d
<6>[ 4.124184] usb 2-4: new low speed USB device using ohci_hcd and address 3
<6>[ 4.349780] usb 2-4: configuration #1 chosen from 1 choice

...

<6>[ 6.809857] kjournald starting. Commit interval 5 seconds
<6>[ 6.809915] EXT3-fs: mounted filesystem with writeback data mode.
<5>[ 8.264349] type=1505 audit(1254918705.672:2): operation="profile_load" pid=321 name=/sbin/dhclient3
<5>[ 8.265847] type=1505 audit(1254918705.672:3): operation="profile_load" pid=321 name=/usr/lib/NetworkManager/nm-dhcp-client.action
<5>[ 8.266749] type=1505 audit(1254918705.672:4): operation="profile_load" pid=321 name=/usr/lib/connman/scripts/dhclient-script
<5>[ 8.409711] type=1505 audit(1254918705.816:5): operation="profile_load" pid=322 name=/usr/bin/evince
<5>[ 8.440386] type=1505 audit(1254918705.848:6): operation="profile_load" pid=322 name=/usr/bin/evince-previewer
<5>[ 8.454846] type=1505 audit(1254918705.860:7): operation="profile_load" pid=322 name=/usr/bin/evince-thumbnailer
<5>[ 8.495981] type=1505 audit(1254918705.900:8): operation="profile_load" pid=323 name=/usr/lib/cups/backend/cups-pdf
<5>[ 8.498157] type=1505 audit(1254918705.904:9): operation="profile_load" pid=323 name=/usr/sbin/cupsd
<5>[ 8.515403] type=1505 audit(1254918705.920:10): operation="profile_load" pid=324 name=/usr/sbin/tcpdump
<3>[ 8.621156] bad i_default_acl pointer = ffffb4ff
<3>[ 8.621904] bad i_default_acl pointer = ffffb4ff
<3>[ 8.623094] bad i_default_acl pointer = ffffb4ff
<3>[ 8.636382] bad i_default_acl pointer = ffffb4ff
<3>[ 8.637708] bad i_default_acl pointer = ffffb4ff
<3>[ 8.638109] bad i_default_acl pointer = ffffb4ff
<3>[ 8.640120] bad i_default_acl pointer = ffffb4ff
<3>[ 8.641532] bad i_default_acl pointer = ffffb4ff
<3>[ 8.642390] bad i_default_acl pointer = ffffb4ff
<3>[ 8.652979] bad i_default_acl pointer = ffffb4ff
<6>[ 11.047595] udev: starting version 147
<7>[ 12.395906] cs5535_gpio: base=0x6100 mask=0xb003c66 major=251
<6>[ 12.403445] AMD Geode RNG detected

Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Stefan Bader (smb) wrote : Re: [Bug 396286] Re: 2.6.31-generic: kernel panic near the end of initramfs

Thanks for the full log. Unfortunately won't be able to look at it today. But
updated the kernel at least.

summary: - 2.6.31-generic: kernel panic near the end of initramfs
+ [Geode LX] [OLPC] 2.6.31-generic: kernel panic near the end of initramfs
Revision history for this message
Alan Bell (alanbell) wrote : Re: [Geode LX] [OLPC] 2.6.31-generic: kernel panic near the end of initramfs

I think this issue is preventing boot on the OLPC XO-1 hardware, however as the screen never unfreezes it is hard to be certain.

Revision history for this message
Stefan Bader (smb) wrote :

New kernel same place. Please try to boot and if it comes up add the dmesg as usual. Thanks.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Nice, except that Geode really is an i386 architecture, so an amd4 kernel won't be of any use here. :)

Revision history for this message
Stefan Bader (smb) wrote :

Grr, wrong build chroot. Ok, on the other hand I'd have been surprised if there had not been anything gone wrong when doing things quickly in the evening. Correct architecture uploaded.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

dmesg attached.

Revision history for this message
Stefan Bader (smb) wrote :

That looks somewhat crazy. There is not a single error message in this dmesg. This is completely unexpected and actually weird. Just because this is so unbelievable, could you please also try the -14.46 v2 which I uploaded (and attach the dmesg here). And for the sake of completeness verify and confirm in this report, that a unmodified Ubuntu 2.6.31-14.46 kernel crashes on boot.

For explanation: the old debug used the current struct inode which looks like this:

struct inode {
  ...
  struct posix_acl *i_acl;
  struct posix_acl *i_default_acl;
  void *i_private;
}

With that we saw a corrupted value in i_default_acl. For the latest debug I added two dummy pointers before and after the acl pointers. So the structure looks like this:

struct inode {
  ...
  void *i_dbg1;
  struct posix_acl *i_acl;
  struct posix_acl *i_default_acl;
  void *i_dbg2;
  void *i_private;
}

The expected behavior would have been that by adding those pointers either i_acl or i_dbg2 or i_default_acl (depending on whether the corruption is relative to the start or the end or direct to i_default_acl) would see the corruption. But certainly not that nothing gets triggered.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Here's dmesg for 46 v2.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

To compare, I tried booting with the stock 47 (46 is no longer available) and, sure enough, it freezes during boot as the other unpatched releases before.

tags: added: regression-karmic
Revision history for this message
Stefan Bader (smb) wrote :

As thie issue seems to vanish when we prod around with the size of the inode structure, this needs a bit more prodding around. I added two more test kernels to my peoples page:

2.6.31-14.38*v1: This one is the stock kernel, but with SMP disabled in config (which removes some code replacement magic)
2.6.31-14.38*v2: This has the bad pointer catcher without the padding pointers, but with additional information printed about the bad pointers.

Could you boot both of these and add the dmesg or the info that it crashes on boot for both? Thanks.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

2.6.31-14-generic_48+bug396286v1.txt attached.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

2.6.31-14-generic_48+bug396286v2.txt attached.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Sorry, please disregard the previous 48v2 attachment. See this one instead.

Revision history for this message
Stefan Bader (smb) wrote :

So the setting of SMP has no effect here and looking at the addresses of the bad pointers, there seems to be no obvious pattern in those. Neither the offsets within the inode structure are really showing any suspicious placements. So one step further, I added some code to immediately check after the values are supposed to be set (a v3 in the usual place). Could you check that for me and post the resulting dmesg? Thanks.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

This actually looks interesting. Maybe some light somewhere? :) Again the corruption seems not to have happened at all. And this time the structure was not modified. I only moved the init statements somewhat. But before getting too existed, could you take a go at v4 (and again post me the dmesg of that)? I hope this sheds a bit more light on it.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

Somehow those results do not make really sense. In v4, the init code is back to the place it was before and the only difference between the code that results in corrupted pointers and this one (which does not show any corruptions at all), is that there were a few more calls for validating the pointers after the init function supposedly set the right values.
Now this really brings up the question what the heck is going on there. It happens without SMP, so this would rule out the option of a race condition. The same code works on different hardware (I run it without any problems). And running the same procedure on the Geode seems to produce different results by changing the codepath a bit, even without really changing the effective way things are done. I wonder whether the v5 if added (which does only a limited number of checks and no function call to do the check) does bring back the failure warnings or still runs without any output. Could you do, yet another, run and post the dmesg?

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

As a point of information, I recently changed the fstab entries to state ext4, to benefit from the slightly faster performance that new features common to both ext3 and ext4 make possible. In principle, this should not change anything, since both ext3 and ext4 call the same common fsattr functions but, in case there was a cut&paste error that only affected ext3 and not ext4, this could have some impacts.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Stefan Bader (smb) wrote : Re: [Bug 396286] Re: [Geode LX] [OLPC] 2.6.31-generic: kernel panic near the end of initramfs

To make sure the change of the fstab did not have an impact, go back to the v2 version
(which previously showed the corruption) and check whether you find bad acl messages
now. If not, then it would be most interesting to get back to the old state.

Revision history for this message
Martin-Éric Racine (q-funk) wrote : Re: [Geode LX] [OLPC] 2.6.31-generic: kernel panic near the end of initramfs
Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

So both (ext3 and ext4) show the corruption messages when running with v2, but v5 never hits them. For better understanding I am attaching the diff between v2 and v5. As one can see there is no real change in that. In __iget there is just a printk added for the case that inode_check_acl() finds something. And all the changes to init_inode() just query i_acl and i_default_acl, which usually are set in inode_init_always(). There are just two cases where the acl values are not initialized and in both the test in init_inde() should then return NULL. And of course either way I would expect either a debug message here or at least hitting the problem in destroy_inode(). But as soon as this code is added, all problems go away. This is something I have a hard time to explain.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Just to confirm, this issue still applies to 2.6.32-2-generic.

Stefan, how about attaching all of your your diffs to the upstream bug and asking the LKML for advice? I think that you and Leann have already done a fine job of narrowing down the issue and, at this point, the authors of the upstream code really need to step in and contribute their share in fixing this regression.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

I'll also add that a Debian developer (dilinger) who is also a member of the OLPC kernel team would be willing to help, but he cannot do much until you've attached your diffs to the upstream bug.

Revision history for this message
Stefan Bader (smb) wrote :

I added the two patches with some comment to the upstream bug report. At this point I guess it would be interesting to have a second confirmation with a different (but same model) Geode to rule out a single misbehaving hardware problem. I heard of others saying they have problems, but were those the exactly same crash with the acl pointers corrupted in that particular way?

tags: added: regression-release
removed: regression-potential
tags: added: karmic
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Added tags for Lucid, since this issue is still unresolved and will blow up in people's faces when they upgrade from Hardy.

tags: added: lucid regression-lucid
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

I'm really wondering what to do about this one since LKML has been rather uncooperative and yet it already affects those upgrading from Jaunty to Karmic. However, the real concern is for LTS->LTS+1 upgrades. Geode support in Hardy is rock-solid, whereas this show-stopper affects Lucid.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Peter Anvin suggested in the upstream report that enforcing -march=i386 as compiler options might be all that's required to fix this. Could new test packages be built using the following patch?

http://git.kernel.org/tip/17a2a9b57a9a7d2fd8f97df951b5e63e0bd56ef5

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Repeatedly trying to rebuild 2.6.32-9-generic with Peter Anvin's patch following instructions at https://help.ubuntu.com/community/Kernel/Compile consistently fails this way:

  CC arch/x86/kernel/alternative.o
  CC arch/x86/kernel/i8253.o
  CC arch/x86/kernel/pci-nommu.o
  CC arch/x86/kernel/tsc.o
  CC arch/x86/kernel/io_delay.o
  CC arch/x86/kernel/rtc.o
  CC arch/x86/kernel/trampoline.o
  CC arch/x86/kernel/process.o
arch/x86/kernel/process.o: final close failed: File truncated
make[5]: *** [arch/x86/kernel/process.o] Error 1
make[4]: *** [arch/x86/kernel] Error 2
make[3]: *** [arch/x86] Error 2
make[2]: *** [sub-make] Error 2
make[1]: *** [/home/q-funk/Projektit/linux-2.6.32/debian/stamps/stamp-build-generic] Error 2
make: *** [binary-generic] Error 2

This is on a system with 1 GB of RAM, so I'm really not sure how this "file truncated" keeps on showing up.

Revision history for this message
Stefan Bader (smb) wrote :

Placed test kernel (2.6.31-17.54 + patch mentioned by hpa in the upstream bug) to http://people.canonical.com/~smb/bug396286/

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Tested. Crashes as before.

Could we apply this and your extra debug message patch to something 2.6.32 as well and build test packages with that? It seems that 2.6.31 cannot work with Plymouth and some other novelties found in Lucid, plus upstream probably wants us to try against the latest and greatest.

Revision history for this message
Stefan Bader (smb) wrote :

Uploaded linux-image-2.6.32-10-generic_2.6.32-10.14+bug396286v2_i386.deb to http://people.canonical.com/~smb/bug396286/

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

2.6.32-10.14+bug396286v2 boots. dmesg attached.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

I'm wondering if the patch that was used to produce 2.6.32-10.14+bug396286v2 could be added to the Lucid -generic kernel?

While I realize that it's not a proper fix, let's keep in mind that Lucid is the next LTS and, as such, the last thing we want is a massive wave of complaints from users of thin clients (most of which are based on some Geode variant) upgrading from Hardy that their whole classroom of LX800-based thin client devices can no longer boot since the upgrade from Hardy to Lucid.

This of course doesn't dispense us from finding the real cause of the issues and fixing it properly but, if anyone asks me, a piece of gaffer tape that somehow prevents a hardware management disaster from taking place is better than no solution at all.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Leann? Stefan?

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

It seems that we have some progress.

In an attempt to debug this issue, I compared notes with someone on Fedora for whom the same hardware works. As a test, I used their kernel 2.6.31 config (with a couple of small modifications to build specific drivers as built-in) as attached to build my own kernel using make-kpkg. Much to my amazement, this kernel boots fine, as long as I specify root=/dev/sda1 on the GRUB cmdline.

However, for some reason, kernel-package no longer creates an initrd.img, even when the --initrd option was specified to build the kernel-image target. Yet, as soon as I created one using "sudo update-initramfs -k 2.6.31.12-geodelx -c" and rebooted, the kernel failed to boot as before.

Just to be safe, I deleted the initrd and rebooted again, letting udev perform its work after /sbin/init has been launched by the kernel. Lo and behold, it worked again!

As such, it seems that something that gets included in the initramfs image is what messes with the ACL code and destroys some inodes and makes the kernel crash in a non-recoverable way.

Interestingly enough, we still get the previous error messages when booting with this barebone kernel, without an initrd.img, but the error is non-fatal. The output of dmesg -r is attached next.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Performing the same test as above (removal of initrd.img and modification of GRUB's menu.lst) using 2.6.32-14-generic further confirms that the issue is with something that gets included in the initramfs image.

The relevant GRUB menu excerpt:

title Ubuntu lucid (development branch), kernel 2.6.32-14-generic
kernel /boot/vmlinuz-2.6.32-14-generic rootfstype=ext4 root=/dev/sda1 ro vga=795 quiet splash
quiet

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

The resulting raw dmesg output.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Apparently, in all cases, something that touches sysfs does something nasty, but what?

Instructions on how to locate exactly which part of the initramfs image payload causes this are welcome.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Stefan suggested trying mem=nopentium but this did not have any apparent effect. However raid=noautodetect did. When booting without any initramfs image, the kernel no longer shows any paging error or destroyed inode at all.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Testing the mainline 2.6.32 kernel on other LX800-based hardware (Artec ThinCan DBE61C-USB), I notice that everything boots as normal.

It thus appears that some recent changes in the kernel might have succeeded in exposing BIOS issues on some specific hardware.

I suppose that we could choose to ignore this bug and move on, but doing so would pose a problem: an awful lot of Geode-based hardware sold by different hardware vendors are branded versions of this same FIC ION603. This includes the Linutop-2, Inveneo desktop, Koolu ... and many others that came with Ubuntu pre-installed.

Personally, I think that a more positive outcome would involve a combination of Canonical's technical support (who is known to have certified some of the above hardware for Ubuntu) and of some of the above vendors contacting First International Computers to work at finding a common solution together, which could possibly involve releasing an updated BIOS along with Ubuntu tools to flash the EPROM from command line.

summary: - [Geode LX] [OLPC] 2.6.31-generic: kernel panic near the end of initramfs
+ [Geode LX] [ION603] 2.6.31-generic: kernel panic near the end of
+ initramfs
Revision history for this message
Martin-Éric Racine (q-funk) wrote : Re: [Geode LX] [ION603] 2.6.31-generic: kernel panic near the end of initramfs

FYI it appears that AMD decided to jump in and contact the FIC engineering team themselves. I'll keep everyone informed on any progress via this bug and the upstream one.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Here's the info provided by dmidecode, in case it can help someone figure out what is going on.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Here's a snapshot of the BIOS splash, as requested.

summary: - [Geode LX] [ION603] 2.6.31-generic: kernel panic near the end of
- initramfs
+ [Geode LX] [ION603] kernels >= 2.6.31 fail to boot [initramfs]
tags: added: kernel-core kernel-reviewed
Revision history for this message
Stefan Bader (smb) wrote :

Setting this bug back to triaged as I don't know how to progress here and don't want to block anybody else to pick up if there is someone with better ideas.

Changed in linux (Ubuntu):
assignee: Stefan Bader (stefan-bader-canonical) → nobody
status: In Progress → Triaged
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Changed status to WontFix per the dropping of support for Geode in Maverick. This is per discussion with the Ubuntu Kernel Team as to the ongoing status of this issue.

~JFo

Changed in linux (Ubuntu):
status: Triaged → Won't Fix
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

It would be a very good idea for the kernel team to discuss this sort of far-reaching ideas with the user community BEFORE coming to a decision. It would also be worth noting that the decision is in direct contradiction with the existing decision of accommodating the Geode in libc6 compilation options.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

I'd like to point out that the linux-image-2.6.36-0-generic that popped up into Natty suddenly resolves this issue.

Meanwhile, the 2.6.35-22.34 that is currently pulled by linux-generic does not.

This seems to indicate that this has indeed been a kernel issue all along and NOT a BIOS issue!

It would be highly desirable to for the kernel team to find out exactly what fixed it and to backport the patch into Lucid.

Changed in linux (Ubuntu):
status: Won't Fix → Triaged
Changed in linux (Ubuntu):
status: Triaged → Won't Fix
Changed in linux:
status: Confirmed → Invalid
Changed in linux:
importance: Unknown → Medium
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Fixed by 2.6.36-1-generic and broken again as of 2.6.38-1-generic, which makes 2.6.37-12 the last good kernel on this hardware.

Changed in linux:
status: Invalid → Confirmed
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Working again as of 2.6.38-7-generic.

Changed in linux:
status: Confirmed → Invalid
Changed in linux:
status: Invalid → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.