Comment 0 for bug 235282

Revision history for this message
Anthony Fok (foka) wrote :

Note: An equivalent bug report is filed as Debian Bug#483186 at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=483186

Every now and then, we come across a machine which is unable to mount the root filesystem for whatever reasons, and get stuck at the busybox initrd environment, from which we can run dmesg to diagnostic what went wrong.

To our dismay, in recent months (or years?), dmesg result come out like this, with lots of missing numbers. For example, from a test machine booting Ubuntu 8.04 hardy (with an upgraded kernel):

    [ 0.000] Linux version 2.6.2-1-generic (buildd@iridium) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu May 2 0:0:4 UTC 20 (Ubuntu 2.6.2-1.2ubuntu6-generic)
    [ 0.000] BIOS-provided physical RAM map:
    [ 0.000] BIOS-e80: 00000000 - 000000e00 (usable)
    [ 0.000] BIOS-e80: 000000e00 - 000000a00 (reserved)

But it is supposed to look like this:

    [ 0.000000] Linux version 2.6.25-1-generic (buildd@iridium) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Thu May 22 05:01:49 UTC 2008 (Ubuntu 2.6.25-1.2ubuntu6-generic)
    [ 0.000000] BIOS-provided physical RAM map:
    [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009e000 (usable)
    [ 0.000000] BIOS-e820: 000000000009e000 - 00000000000a0000 (reserved)

This caused quite a bit of problem when we trying to diagnose kernel oops or panics since the addresses are all wrong.

Initially, we thought it had something to do with memory corruption from the kernel Oops. But later, we noticed this phenomenon happens even for cases without a kernel oops, say, perhaps we just got root=/dev/sda7 written wrong.

So, we decided to investigate, and eventually came to the realization that the dmesg in initrd.img in Ubuntu (and Debian) nowadays come not from busybox but klibc-utils, and running /usr/lib/klibc/bin/dmesg on a fully booted system exhibit the same bug.

Checking the source code, we found the code used to strip out <[0-7]> that prefixes every kernel message (See klogd(8)) is somewhat incorrect. So, with a bit of hacking, we got that fixed. :-) A patch is attached. Just drop it in debian/patches/20_dmesg_dropped-digits.patch
and repackage! :-)

We have verified the output of this fixed dmesg identical to that of
util-linux dmesg.

Further thoughts:

We checked out klibc source using:
    git clone git://git.kernel.org/pub/scm/libs/klibc/klibc.git

And noticed it is an upstream bug since dmesg.c was first added on (Mon Aug 20 19:57:50 2007 +0200) commit 9c5a7acda064daa7482148b5a45ee3b7ed39356c

As to why this bug wasn't discovered sooner... I don't know. Perhaps very few people use the tiny dmesg in klibc-utils for diagnostic
purposes? And before that, Ubuntu (and Debian) uses the dmesg module in busybox, which exhibits no such bug?

Cheers,

Anthony Fok <anthony dot fok at thizgroup dot com>
ThizLinux Software Co., Ltd. - A member of Thiz Technology Group
Debian GNU/Linux Developer