[Geode SC] [DBE60] kernels >= 2.6.22 fail to boot [A20 interrupt]

Bug #241307 reported by Martin-Éric Racine
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned
Hardy
Invalid
High
Unassigned
Lucid
Invalid
High
Unassigned
ltsp (Ubuntu)
Invalid
Undecided
Unassigned
Hardy
Invalid
Undecided
Unassigned
Lucid
Invalid
Undecided
Unassigned
mknbi (Ubuntu)
Invalid
Undecided
Unassigned
Hardy
Invalid
Undecided
Unassigned
Lucid
Invalid
Undecided
Unassigned

Bug Description

Since Hardy, booting in LTSP fails with a kernel oops on this Geode SC2200 based thin client. This used to boot just fine on previous Ubuntu releases. The attached screen shot shows what the kernel oops reports.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

This is on Hardy/LTS using 2.6.24 and that is the version that is expected to work for the whole duration of the LTS cycle. Please investigate that one. Thanks!

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Just to be more explicit, the 2.6.22 kernel that was standard on Gutsy worked with the same hardware, so this is a regression introduced by Hardy.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

And to answer the question, no, linux-image-2.6.27-2-generic did not restore operation on this Geode SC2200. Again, the last known good kernel with this chipset is the 2.6.22 from Gutsy.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Martin-Eric,

Thanks for testing 2.6.27. Since we just rebased with the upstream kernel I suspect this is likely an issue which exists upstream as well. I'd like to try and narrow down the patch which is causing this regression from 2.6.22 to 2.6.24. The only issue is that we don't have the same hardware here to replicate the bug so would appreciate your help. We'll need to perform a git bisect to narrow down the offending patch. Is this something you'll be willilng to help do? This is obviously not something we expect bug reporters to know how to do so this would be completely voluntary on your part. I've got some documents that should hopefully help if you are interested. The following explains the git bisect process:

http://www.kernel.org/doc/local/git-quick.html#bisect

Information regarding git and where to pull the ubuntu-hardy.git tree from can be found at:

https://wiki.ubuntu.com/KernelGitGuide

Part of doing the git bisection requires building the kernel. Again, this is not something we expect you to know how to do. Information on building the Ubuntu kernel can be found at:

https://help.ubuntu.com/community/Kernel/Compile

I can understand that this may be quite a bit of information to digest but we'd really appreciate your help on this one. Thanks.

Changed in linux:
status: New → Incomplete
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Vanilla 2.6.24 (self-made kernel package) works fine, so this would have to be some Debian or Ubuntu-specific patch.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Martin,

Just curious when you tested the upstream vanilla 2.6.24 kernel did you use the an Ubuntu generated kernel config, for example /boot/config-2.6.24-19-generic, or did you generate your own custom config ? Thanks.

Changed in linux:
status: Incomplete → New
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

I started with config-2.6.24-19-generic and removed support for non-Geode platforms.

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
status: New → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Andy Whitcroft (apw)
Changed in linux:
assignee: nobody → apw
status: Triaged → In Progress
Andy Whitcroft (apw)
Changed in linux:
assignee: nobody → apw
importance: Undecided → High
status: New → In Progress
Revision history for this message
Andy Whitcroft (apw) wrote : Re: kernel oops during bootup in LTSP

@Martin-Eric -- you had success with a vanilla v2.6.24 kernel, using a cut-down ubuntu configuration. Now that we have vanilla builds available for testing we can use those to try and identify the trigger for this regression.

The first logical test would be to try a v2.6.24 vanilla kernel with a full ubuntu config from the URL below:

    http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.24/

If that works then it would be worth stepping through the v2.6.24.x mainline releases to see if any fail. Those can be found here:

    http://kernel.ubuntu.com/~kernel-ppa/mainline/

If you could could test those and report back here that would help us identify the source of the regression. Thanks in advance.

Changed in linux:
status: In Progress → Incomplete
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Btw, it seems that I found someone familiar with GIT bisecting to take over investigation of this issue. Please stay tuned.

Revision history for this message
Andy Whitcroft (apw) wrote :

@Martin-Eric -- sound most excellent. If you are not able to get bisect results, testing the mainline builds as above may also help narrow this down.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Someone pointed out today on IRC that:

(16.21.03) wwx: Looks like it might be an A20 issue.
(16.21.36) wwx: In the transition from 2.6.22 to 2.6.23, A20 handling was rewritten.
(16.24.15) wwx: arch/i386/boot/a20.c appeared in 2.6.23 kernel. Previously this part written in ASM and found elsewhere.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

(16.27.24) wwx: When you comment out some initialization parts in a20.c, then 2.6.23 boots.
(16.27.58) wwx: In later kernels, a20.c changed some more, so simply commenting out some lines won't help anymore.
(16.30.02) wwx: Look at the a20.c file, in function int enable_a20(void).

int enable_a20(void)
{
<------>int loops = A20_ENABLE_LOOPS;

#if defined(CONFIG_X86_ELAN)
<------>/* Elan croaks if we try to touch the KBC */
<------>enable_a20_fast();
<------>while (!a20_test_long())
<------><------>;
<------>return 0;
#elif defined(CONFIG_X86_VOYAGER)
<------>/* On Voyager, a20_test() is unsafe? */
<------>enable_a20_kbc();
<------>return 0;
#else
<------>while (loops--) {
<------><------>/* First, check to see if A20 is already enabled
<------><------> (legacy free, etc.) */
//<----><------>if (a20_test_short())
//<----><------><------>return 0;

<------><------>/* Next, try the BIOS (INT 0x15, AX=0x2401) */
<------><------>enable_a20_bios();
<------><------>if (a20_test_short())
<------><------><------>return 0;

<------><------>/* Try enabling A20 through the keyboard controller */
//<----><------>empty_8042();
//OK//<><------>if (a20_test_short())
//OK//<><------><------>return 0; /* BIOS worked, but with delayed reaction */

//<----><------>enable_a20_kbc();
//<----><------>if (a20_test_long())
//<----><------><------>return 0;
<------>}

<------>return -1;
#endif
}

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

(16.36.32) wwx: in 2.6.22 kernels, this part is written in asm.
(16.37.00) wwx: from changelog: "A20 handling code for the new x86 setup code. This implements the same algorithms as the assembly version"
(16.38.52) wwx: before 2.6.23 same A20 line enable code (in assembler) is in arch/i386/boot/setup.S

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

I'm sorry, but why is this bug still marked as incomplete? Additional data we provided has narrowed down the cause of the regression. What else is missing?

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

(11.56.35) Mart Raudsepp: I've found two problems
(11.56.42) Mart Raudsepp: the first problem is in not being able to copy the kernel commandline to high memory
(11.57.00) Mart Raudsepp: so kernel launches without parameters
(11.57.07) Mart Raudsepp: that complicated debugging a bit too
(11.57.37) Mart Raudsepp: the cmdline stuff broke with the change from asm to C
(11.57.48) Mart Raudsepp: A20 a bit later I think

Revision history for this message
Andy Whitcroft (apw) wrote :

Ok. The piece you are commenting out there includes the kbd controller frobbing. That has been fixed up a lot in karmic kernels specifically to handle the case where it is missing. It also fixes another enable method, so it might be worth trying the latest karmic kernel to see if the code there works on this machine. Let us know here. Thanks.

tags: added: hardy
Changed in linux (Ubuntu Hardy):
status: Incomplete → New
Revision history for this message
Martin-Éric Racine (q-funk) wrote :
summary: - kernel oops during bootup in LTSP
+ [Geode SC] [ThinCan] kernels newer than 2.6.22 fail to boot
summary: - [Geode SC] [ThinCan] kernels newer than 2.6.22 fail to boot
+ [Geode SC] [DBE60] kernels newer than 2.6.22 fail to boot
summary: - [Geode SC] [DBE60] kernels newer than 2.6.22 fail to boot
+ [Geode SC] [DBE60] kernels >= 2.6.22 fail to boot
summary: - [Geode SC] [DBE60] kernels >= 2.6.22 fail to boot
+ [Geode SC] [DBE60] kernels >= 2.6.22 fail to boot [A20 interrupt]
Andy Whitcroft (apw)
Changed in linux (Ubuntu):
assignee: Andy Whitcroft (apw) → nobody
Changed in linux (Ubuntu Hardy):
assignee: Andy Whitcroft (apw) → nobody
tags: added: kernel-needs-review kernel-uncat
Changed in linux (Ubuntu):
status: In Progress → Triaged
Changed in linux (Ubuntu Hardy):
status: New → Triaged
tags: added: kernel-core kernel-reviewed
removed: kernel-needs-review kernel-uncat
Revision history for this message
Mart Raudsepp (leio) wrote :

If the non-booting happens in combination with etherboot netboot, as found on the DBE60 BIOS, then the problem of non-booting ever since 2.6.23 (perhaps including 2.6.22 in some cases?) is probably related to some low-level loader work in 2.6.23 (involving also the A20 code) in combination with the mknbi created loader code or something along those lines.
If the tftp payload if created with WrapLinux instead of mknbi, we have seen working fine boot of vanilla 2.6.34 kernel on DBE60.

A user found out that:
"Not 100% sure about the cause, but at least mkelf-linux of OpenSUSE mknbi package seemed to hardcode the (empty) initrd option to the image, that stopped the 2.6.22 kernel to start. Workaround was to give empty initramfs cpio file upon image creation, but I found WrapLinux not to have this "effect" and used it successfully instead..." - http://wiki.thincan.org/Talk:DBE60

Perhaps WrapLinux can be tested, and/or the mentioned workaround.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

As found by comment #21, it appears that mknbi might be the true cause of this bug.

As the comment suggests, possible remedies include:

1) Fix mknbi and backport the fix to Hardy and more recent Ubuntu releases.

2) Replace mknbi with wraplinux in LTSP. Would require packaging wraplinux and extensive testing, before it and updated LTSP boot image creations scripts can be pushed into Hardy and more recent releases.

Revision history for this message
Martin-Éric Racine (q-funk) wrote :

According to http://ftp-master.debian.org/new.html and http://ftp-master.debian.org/new/wraplinux_1.6-1.html wraplinux has been packaged in August 2009, but it still hasn't moved out of NEW.

Revision history for this message
Oliver Grawert (ogra) wrote :

mknbi isnt used in ltsp since a long time ...

Changed in mknbi (Ubuntu):
status: New → Invalid
Changed in mknbi (Ubuntu Hardy):
status: New → Invalid
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Oliver appears to be referring to https://bugs.launchpad.net/ubuntu/+source/mknbi/+bug/487826 as to why mknbi hasn't been used in a long time.

Revision history for this message
Oliver Grawert (ogra) wrote :

why would any info in that bug explain why we switched away from mknbi ? its totaly unrelated.
we switched to mkelfimage because artecgroup pushed for it to enable coreboot based products they sell.
there is no bug about it, the decision was based on IRC discussions ...
if there is any breakage please file a bug against the right package (which should be mkelfimage).

Revision history for this message
Daniel martinez trujillo (elpandehitler12345) wrote :

hardphones plis

Changed in linux (Ubuntu):
assignee: nobody → Daniel martinez trujillo (elpandehitler12345)
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

Daniel, thank you for taking upon this bug. What course of action do you have in mind to fix it?

Changed in linux (Ubuntu Lucid):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Martin-Éric Racine (q-funk) wrote :

It appears that "wraplinux" has finally entered Debian upstream. Updating LTSP to use to generate old-fashion Etherboot images, instead of mkelfimage or mnbi, it would be highly desirable.

Changed in ltsp (Ubuntu):
status: New → Invalid
Changed in ltsp (Ubuntu Hardy):
status: New → Invalid
Changed in ltsp (Ubuntu Lucid):
status: New → Invalid
Changed in mknbi (Ubuntu Lucid):
status: New → Invalid
Revision history for this message
Julian Wiedmann (jwiedmann) wrote :

This release has reached end-of-life [0].

[0] https://wiki.ubuntu.com/Releases

Changed in linux (Ubuntu Hardy):
status: Triaged → Invalid
Revision history for this message
penalvch (penalvch) wrote :

Martin-Éric Racine, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
assignee: Daniel martinez trujillo (elpandehitler12345) → nobody
status: Triaged → Incomplete
Revision history for this message
penalvch (penalvch) wrote :

Declined for Lucid as EoL.

Changed in linux (Ubuntu Lucid):
status: Triaged → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.