Every second number disappears in (tiny) dmesg output (in initrd.img)

Bug #235282 reported by Anthony Fok on 2008-05-27
6
Affects Status Importance Assigned to Milestone
klibc (Ubuntu)
Medium
Tim Gardner
Hardy
Undecided
Unassigned

Bug Description

Note: An equivalent bug report is filed as Debian Bug#483186 at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=483186

Every now and then, we come across a machine which is unable to mount the root filesystem for whatever reasons, and get stuck at the busybox initrd environment, from which we can run dmesg to diagnostic what went wrong.

To our dismay, in recent months (or years?), dmesg result come out like this, with lots of missing numbers. For example, from a test machine booting Ubuntu 8.04 hardy (with an upgraded kernel):

    [ 0.000] Linux version 2.6.2-1-generic (buildd@iridium) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu May 2 0:0:4 UTC 20 (Ubuntu 2.6.2-1.2ubuntu6-generic)
    [ 0.000] BIOS-provided physical RAM map:
    [ 0.000] BIOS-e80: 00000000 - 000000e00 (usable)
    [ 0.000] BIOS-e80: 000000e00 - 000000a00 (reserved)

But it is supposed to look like this:

    [ 0.000000] Linux version 2.6.25-1-generic (buildd@iridium) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu May 22 05:01:49 UTC 2008 (Ubuntu 2.6.25-1.2ubuntu6-generic)
    [ 0.000000] BIOS-provided physical RAM map:
    [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009e000 (usable)
    [ 0.000000] BIOS-e820: 000000000009e000 - 00000000000a0000 (reserved)

This caused quite a bit of problem when we trying to diagnose kernel oops or panics since the addresses are all wrong.

Initially, we thought it had something to do with memory corruption from the kernel Oops. But later, we noticed this phenomenon happens even for cases without a kernel oops, say, perhaps we just got root=/dev/sda7 written wrong.

So, we decided to investigate, and eventually came to the realization that the dmesg in initrd.img in Ubuntu (and Debian) nowadays come not from busybox but klibc-utils, and running /usr/lib/klibc/bin/dmesg on a fully booted system exhibit the same bug.

Checking the source code, we found the code used to strip out <[0-7]> that prefixes every kernel message (See klogd(8)) is somewhat incorrect. So, with a bit of hacking, we got that fixed. :-) A patch is attached. Just drop it in debian/patches/20_dmesg_dropped-digits.patch and repackage! :-)

We have verified the output of this fixed dmesg identical to that of util-linux dmesg.

Further thoughts:

We checked out klibc source using:
    git clone git://git.kernel.org/pub/scm/libs/klibc/klibc.git

And noticed it is an upstream bug since dmesg.c was first added on Mon Aug 20 19:57:50 2007 +0200 as commit 9c5a7acda064daa7482148b5a45ee3b7ed39356c

As to why this bug wasn't discovered sooner... I don't know. Perhaps very few people use the tiny dmesg in klibc-utils for diagnostic purposes? And before that, Ubuntu (and Debian) uses the dmesg module in busybox, which exhibits no such bug?

Cheers,

Anthony Fok <anthony dot fok at thizgroup dot com>
ThizLinux Software Co., Ltd. - A member of Thiz Technology Group
Debian GNU/Linux Developer

TESTCASE
- ensure klibc-utils is installed
$ /usr/lib/klibc/bin/dmesg > /tmp/klibc.dmesg
$ /bin/dmesg > /tmp/bin.dmesg
"diff -u /tmp/{klibc,bin}.dmesg" should show no differences (except for any additional messages that were emitted between the dmesg invocations)

Anthony Fok (foka) wrote :
description: updated
Tim Gardner (timg-tpi) on 2008-05-27
Changed in klibc:
assignee: nobody → timg-tpi
status: New → In Progress
Tim Gardner (timg-tpi) wrote :

Anthony - it looks like upstream has committed a fix for this issue 'dmesg: fix the stripping of priority codes'.

Tim Gardner (timg-tpi) wrote :

SRU Justification:

Impact: The klibc-utils dmesg used by initramfs incorrectly strips priority codes, mangling the output such that every other decimal digit is missing.

Patch Description: Sync with upstream git://git.kernel.org/pub/scm/libs/klibc/klibc.git commit 15c8d22886f4355539f7f557b7184ffedc0002b9

Patch: attached

TEST CASE: See bug description

Changed in klibc:
importance: Undecided → Medium
milestone: none → ubuntu-8.04.1
status: In Progress → Fix Committed
Steve Langasek (vorlon) wrote :

Accepted into -proposed, please test and give feedback here

Changed in klibc:
milestone: ubuntu-8.04.1 → none
status: New → Fix Committed

Hello Tim,

Yes, upstream author "hpa" (H. Peter Anvin) happened to be on
#etherboot, and he asked me to post the patch on klibc mailing list:

    http://www.zytor.com/pipermail/klibc/2008-May/002288.html

However, someone dislikes the way I wrote it, saying it is "if
speghetti code". (That's kind of how dmesg.c in util-linux does it
though...)

    http://www.zytor.com/pipermail/klibc/2008-May/002290.html

And the marvellous hpa promptly came up with something better, an
easier-to-follow "state-machine-like approach":

    http://www.zytor.com/pipermail/klibc/2008-May/002289.html

So yes, I'm very happy that hpa is so responsive, and I'm also very
glad that you've already dealt with this issue in Ubuntu too. Many
thanks, Tim and Steve! :-)

Anthony

Anthony Fok (foka) wrote :

> However, someone dislikes the way I wrote it, saying it is "if
> speghetti code". (That's kind of how dmesg.c in util-linux does it
> though...)
>
> http://www.zytor.com/pipermail/klibc/2008-May/002290.html

My bad. That "someone" is Maximilian Attems, and I just realized he
is the Debian maintainer of the klibc package, and he also promptly
uploaded a fix too a few days ago. :-) Thanks to Maximilian's
comment, we got hpa's own fix, which is much better. :-)

Cheers,

Anthony

Steve Beattie (sbeattie) wrote :

I've verified that this behavior exists in klibc-util's dmesg in version 1.5.7-4ubuntu3 released in Hardy, and that it is fixed in version 1.5.7-4ubuntu4 in hardy-proposed. I've not otherwise exercised klibc to verify that there are no regressions.

description: updated
Martin Pitt (pitti) wrote :

Copied to hardy-updates.

Changed in klibc:
status: Fix Committed → Fix Released
Martin Pitt (pitti) wrote :

Needs to be fixed in Intrepid ASAP.

Changed in klibc:
milestone: none → intrepid-alpha-2
Colin Watson (cjwatson) wrote :

This was fixed in Intrepid a while back (before hardy-updates, in fact):

klibc (1.5.10-1ubuntu1) intrepid; urgency=low

  * Merge from debian unstable, remaining changes:
    - Add lpia to the architecture list for klibc-utils-floppy-udeb.

 -- Kees Cook <email address hidden> Mon, 09 Jun 2008 10:40:21 -0700

klibc (1.5.10-1) unstable; urgency=low

  * New upstream release (dmesg, nfsmount)
    - dmesg: fix the stripping of priority codes (closes: #483186)
    - ubuntu sync rework 11-klibc-run-init-drop-executable-stack.patch
    - add git fix 12-klibc-avoid-.gitignore-in-the-include-directory.patch

 -- maximilian attems <email address hidden> Thu, 05 Jun 2008 22:21:44 +0200

Changed in klibc:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers