lilo needs to warn if initrd is too large

Bug #260059 reported by Soren Hansen on 2008-08-21
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lilo (Ubuntu)
Critical
Unassigned
Nominated for Karmic by r12056
Nominated for Lucid by r12056

Bug Description

I just spent the better part of a day trying to find out why one of my
servers refused to boot any kernels newer than 2.6.24-17-server. After
countless hours of debugging, it turns out that the size of my
initrd.img's had grown ever so slightly, but it was just enough to push
it over a critical threshold that made lilo fail to boot in rather
mysterious ways. I've attached a screenshot of the boot failure do
demonstrate how non-obvious the cause is.

These are the relevant sizes:

-rw-r--r-- 1 root root 8216636 2008-05-13 13:10 /boot/initrd.img-2.6.24-17-server
-rw-r--r-- 1 root root 8255405 2008-08-20 14:56 /boot/initrd.img-2.6.24-19-server

The former boots just fine, the latter.. not so much. So the limit is
somewhere in between those two. The system has both -updates and
-security enabled, but even with just -security, it's quite conceivable
that someone might pass the threshold, and suddenly find themselves with
systems that fail to boot. The fix is simple: Add the "large-memory"
option in lilo.conf and rerun lilo.

I propose that we put large-memory in the default lilo.conf from now on,
and add a check to lilo that will tell the user that their initrd.img is
over a certain size and that they might want to add the "large-memory"
option to lilo.conf. This *definitely* needs to go into an SRU, IMNSHO.

Related branches

Soren Hansen (soren) wrote :
Soren Hansen (soren) on 2008-08-21
description: updated
Mads Chr. Olesen (shiyee) wrote :

Hmm, the issue doesn't seem as straightforward after all.

In lilo there is already a check for a too large initrd (boot.c, search for FLAG_TOOBIG). I wonder why it wasn't triggered.

Could you attach your lilo.conf? And, if possible, the output of running lilo with a couple of "-v" flags (and without the "large-memory" option)?

Soren Hansen (soren) wrote :
Soren Hansen (soren) wrote :
robvarga (robert-varga) wrote :

I second this request, this bit me as well, my fresh install of ubuntu server 8.04.1 amd64 did not install because the large-memory option was not in lilo.conf.

Either put it there from start, or even better, provide a chance of editing the generated lilo.conf during install before installing the boot loader.

robvarga (robert-varga) wrote :

More correctly, it did not boot because of large-memory wasn't there in lilo.

Mads Chr. Olesen (shiyee) wrote :

Could you provide the size of the initrd image, gunzipped?

On Sat, Aug 23, 2008 at 08:10:51PM -0000, Mads Chr. Olesen wrote:
> Could you provide the size of the initrd image, gunzipped?

Oh, you think uncompressed size matters? Hm.. Perhaps.

soren@amdi:~$ gzip -l /boot/initrd.img-2.6.24-{17,21}-server
         compressed uncompressed ratio uncompressed_name
            8216636 24412672 66.3% /boot/initrd.img-2.6.24-17-server
            8257993 24574976 66.4% /boot/initrd.img-2.6.24-21-server
           16474629 48987648 66.4% (totals)

Hi John!

I hope you can answer a question I have about the LILO bootloader
(version 22.8), raised by a bug-report in Ubuntu
( https://bugs.launchpad.net/ubuntu/+source/lilo/+bug/260059 ). I have
CC'ed the bug report for reference.

In boot.c there is the following two lines, for calculating the number
of high sectors needed for loading the initrd (IIUC):
hi_sectors = sectors - setup; /* number of sectors loaded high */
hi_sectors *= 3; /* account for decompression */

My question is about the "account for decompression" part. How precise
is the number 3? Is it heuristic, or is it completely precise?

The reporter in the bug has an initrd image of size 24574976 (gunzipped)
and 8257993 (gzipped). The ratio of compression, 2.976, is therefore
very close to 3.

My second question, if my speculations are correct, is: Would it hurt to
raise the number to e.g. 4? My understanding is that this would just
make lilo behave as if "large-memory" was specified, in many more cases,
potentially avoiding this gray area.

Thank you very much for taking the time to respond :-)

--
Mads Chr. Olesen <email address hidden>
shiyee.dk

Mads Chr. Olesen (shiyee) wrote :

Oh bugger. I tried emailing John Coffman, the author of LILO, to get his take on this (the previous comment), but the mail bounced.

It pretty much sums up my _suspicions_:
LILO has a heuristic to determine when the "large-memory" option is needed, but it underestimates, leading to this edge case. We are very close to the estimated compression ratio that LILO has.

My proposed solution:
Make LILO be more pessimistic about its estimate. This (untested) patch should do it:
--- boot.c.orig 2008-08-30 12:44:51.000000000 +0200
+++ boot.c 2008-08-30 14:36:51.000000000 +0200
@@ -84,7 +84,7 @@
      die("Can't load kernel at mis-aligned address 0x%08lx\n",hdr.start);
  descr->flags |= FLAG_LOADHI; /* load kernel high */
  hi_sectors = sectors - setup; /* number of sectors loaded high */
- hi_sectors *= 3; /* account for decompression */
+ hi_sectors *= 4; /* account for decompression */
  if (hi_sectors < HIGH_4M) hi_sectors = HIGH_4M;
     }
     geo_close(&geo);

It basically makes LILO assume that the initrd has a compression ratio of 4, therefore much sooner warning the user that "large-memory" is required (and actually behaving as if "large-memory" was specified).

Soren: Could you test that this patch correctly gives a warning if you run LILO (without the large-memory option)?

Changed in lilo:
assignee: nobody → sebastiancobaleda
status: New → Confirmed

I just spent the better part of a day trying to find out why one of my
servers refused to boot any kernels newer than 2.6.24-17-server. After
countless hours of debugging, it turns out that the size of my
initrd.img's had grown ever so slightly, but it was just enough to push
it over a critical threshold that made lilo fail to boot in rather
mysterious ways. I've attached a screenshot of the boot failure do
demonstrate how non-obvious the cause is.

These are the relevant sizes:

-rw-r--r-- 1 root root 8216636 2008-05-13 13:10 /boot/initrd.img-2.6.24-17-server
-rw-r--r-- 1 root root 8255405 2008-08-20 14:56 /boot/initrd.img-2.6.24-19-server

The former boots just fine, the latter.. not so much. So the limit is
somewhere in between those two. The system has both -updates and
-security enabled, but even with just -security, it's quite conceivable
that someone might pass the threshold, and suddenly find themselves with
systems that fail to boot. The fix is simple: Add the "large-memory"
option in lilo.conf and rerun lilo.

I propose that we put large-memory in the default lilo.conf from now on,
and add a check to lilo that will tell the user that their initrd.img is
over a certain size and that they might want to add the "large-memory"
option to lilo.conf. This *definitely* needs to go into an SRU, IMNSHO.

this is really, but is not a ubuntu bug. this is a kernel problem.

Changed in lilo:
status: Confirmed → Fix Released
Mads Chr. Olesen (shiyee) wrote :

I fail to understand your argumentation. If the supposed fix ("large-memory") is in Lilo, how can this be a kernel problem?

"add a check to lilo that will tell the user that their initrd.img is
over a certain size and that they might want to add the "large-memory"
option to lilo.conf. This *definitely* needs to go into an SRU, IMNSHO."

There _is already_ such a check in Lilo. It just seems to be a heuristic that is slightly off. Instead of changing properties of this bug, could you please test the patch i attached instead?

The user Juan Sebastian Cobaleda Cano is a troll, or something. We, in the Ubuntu-Co Team are checking all his changes in Launchpad. Sorry for the problems.

Changed in lilo:
assignee: sebastiancobaleda → nobody
status: Fix Released → New
Soren Hansen (soren) on 2009-03-02
Changed in lilo:
status: New → Confirmed
Brian Murray (brian-murray) wrote :

Soren - it seems that there is a proposed fix in this bug report could you possibly test it out?

Changed in lilo:
status: Confirmed → Triaged
John Brondum (johnbrondum) wrote :

Brian - do you know the status of this defect ? thnx

Joachim Wiedorn (ad-debian) wrote :

With the new upstream version 23.1 this problem should be solved.

Have a nice day.

Joachim (Germany)

Launchpad Janitor (janitor) wrote :
Download full text (6.4 KiB)

This bug was fixed in the package lilo - 1:23.2-1

---------------
lilo (1:23.2-1) unstable; urgency=medium

  [ Joachim Wiedorn ]
  * New upstream release:
    - Fix for larger kernel setup code. (Closes: #625266)
    - Update of manpages (mkrescue.8, lilo.conf.5).
  * Update of some patches; remove of some patches (now in upstream).
  * debconf scripts:
    - Update of French translation (fr.po). (Closes: #621845)
    - Update of Portuguese translation (pt.po). (Closes: #622642)
    - Update of Russian translation (ru.po). (Closes: #623794)
    - Update of Japanese translation (ja.po). (Closes: #624629)
  * debian/control:
    - Bump to Standards-Version 3.9.2 (without changes).
    - Remove mbr dependency for package lilo (no more needed).
    - Replace perl-modules dependency with variable ${perl:Depends}.

  [ Niels Thykier ]
  * Added DMUA flag.

lilo (1:23.1-2) unstable; urgency=medium

  [ Joachim Wiedorn ]
  * debian/control:
    - Remove dependency to lilo in package lilo-doc. (Closes: #613753)
  * Fix: save errno for second command (device.c).
  * Fix: save file permissions for converted lilo.conf (Closes: #615103)
      and fix some typos in script lilo-uuid-diskid.
  * Fix: missleading error message in geometry.c. (Closes: #445264)
  * Reformatting of mkrescue manpage (thanks to M.E. Schauer).
      (Closes: #617282)
  * debconf scripts:
    - Fix typos in some debconf translation files. (Closes: #504733)
    - Use better style in debconf translations. (Closes: #312451, #504733)
    - Remove debconf code for managing old boot/boot.b and similar files.
    - Remove no more needed debian/lilo.lintian-overrides file.
    - Remove script liloconfig and all appropriate debconf code.
    - Update of German translation (de.po).
    - Update of French translation (fr.po). (Closes: #615936)
    - Update of Russian translation (ru.po). (Closes: #616691)
    - Update of Galician translation (gl.po).
    - Update of Danish translation (da.po). (Closes: #618004)
    - Update of Basque translation (eu.po). (Closes: #618253)
    - Update of Czech translation (cs.po). (Closes: #618711)
    - Update of Spanish translation (es.po). (Closes: #618813)
    - Update of Finnish translation (fi.po). (Closes: #618886)
    - Update of Italian translation (it.po). (Closes: #618801)
    - Update of Brazilian Portuguese translation. (Closes: #618738)
    - Update of Swedish translation (sv.po). (Closes: #618620)
  * Add new script liloconfig, using template with comments,
      works with UUID, LABEL and disk-id for root and boot options.
  * Add new manpage for liloconfig, update of other manpages.
  * Fix typos and phrases in manpage of lilo.conf. (Closes: #258472)

  [ Niels Thykier ]
  * Added Depends on perl-modules, since liloconfig needs it.

lilo (1:23.1-1) unstable; urgency=low

  * New upstream release. (Closes: #339778)
  * Upstream bugfixes:
    - Option 'append' works with acpi=off. (Closes: #428390)
    - Working on degraded RAID-1 device. (Closes: #278373, #522283)
    - Working with MD v1.0 RAID-1 boot devices. (Closes: #598035)
    - Working with spaces in labels. (Closes: #287257)
    - Using new hook scripts for kernel and initrd only one...

Read more...

Changed in lilo (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers