Mistake in /etc/network/interfaces keeps the system from booting

Bug #512253 reported by Fredrik Staxäng on 2010-01-25
134
This bug affects 22 people
Affects Status Importance Assigned to Milestone
ifupdown (Ubuntu)
Medium
Colin Watson
Karmic
Medium
Colin Watson
Lucid
Medium
Colin Watson

Bug Description

SRU justification:

IMPACT: If /etc/network/interfaces contains a typo (e.g. 'iface eth0 inet statuc') then the system will fail to reach runlevel 2, including recovery mode (although gdm will start fine, since it's started by an Upstart job).

DEVELOPMENT BRANCH: Fixed in ifupdown 0.6.10ubuntu2 by manually bringing up lo before ifup -a. I consulted with Scott James Remnant on the correct implementation here.

PATCH: http://launchpadlibrarian.net/52315062/512253.patch (karmic), http://launchpadlibrarian.net/52315085/512253.patch (lucid)

TEST CASE: Make a deliberate syntax error in /etc/network/interfaces (see example above) and try to boot to recovery mode.

REGRESSION POTENTIAL: Take care to ensure that runlevel switching continues to work normally both with valid and invalid /etc/network/interfaces.

Original report follows:

Binary package hint: ifupdown

I apparently made some mistake in /etc/network/interfaces. When I rebooted, the computer hung with a blank black screen
with just a blinking cursor in the upper left corner. When pressing Ctrl-Alt-Del shutdown messages appear. Trying a few imes,
there seems to be a couple of lines of output that is cleared off the screen. Perhaps it has something to do with the
following line in /var/log/syslog:

 syslog:Jan 25 10:07:36 emmaline4 init: networking main process (762) terminated with status 1

ProblemType: Bug
Architecture: amd64
Date: Mon Jan 25 10:41:48 2010
DistroRelease: Ubuntu 9.10
Package: ifupdown 0.6.8ubuntu21
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-17.54-server
SourcePackage: ifupdown
Uname: Linux 2.6.31-17-server x86_64

Fredrik Staxäng (fstx01) wrote :
bdot (bdot) wrote :

Same here. Old i386 hardware, Ubuntu 9.10 Server (!)

Linux ubuntu910server 2.6.31-17-generic-pae #54-Ubuntu SMP Thu Dec 10 17:23:29 UTC 2009 i686 GNU/Linux

If you screw up your /etc/networking/interfaces file (such as I did, very late at night), the system will NOT boot anymore, under any GRUB option (normal or "recovery mode").

The only way to recover is to boot from some sort of live CD and edit the offending file.

It hangs like so:
/dev/sdb1: clean, 50464/490560 files, 301233/1961930 blocks
init: network-interface (eth1) pre-start process (440) terminated with status 1
init: network-interface (eth0) pre-start process (470) terminated with status 1
init: network-interface (lo) pre-start process (482) terminated with status 1
init: network-interface (eth1) post-stop process (485) terminated with status 1
init: network-interface (eth0) post-stop process (486) terminated with status 1
init: network-interface (lo) post-stop process (487) terminated with status 1

The offending /etc/networking/interfaces had a big mistake like this:

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet 192.168.0.222

However, I think it is ridiculous for a mistake on the config file to PREVENT a server from booting! Adding insult to injury, there is no way to recover from this problem, except booting off a live CD. Please take corrective action!

oedstero (ubuntu-edster) wrote :

I agree that's annoying... had a couple scripted echo -e '\tup ip route add ...\n\tup ip route' lines that should've been /bin/echo statements -- broke when interpreted by dash ...

Kai Jauch (kaijauch) on 2010-02-21
Changed in ifupdown (Ubuntu):
status: New → Confirmed
Jim Salter (jrssnet) wrote :

This one bit me too - accidentally typoed "static" to "statuc" and I was screwed; server would not boot until I'd started from a live USB drive, mounted my RAID array, edited my /etc/network/interfaces and corrected the typo, then restarted.

BEYOND annoying. This needs fixing, please!

Alecz20 (alexguzu) wrote :

Same problem as pretty much everyone above. However, being a server, I don't have a CD-ROM drive, and for some reason, the USB flash does not boot, I get to the logo, but then it hangs with a black screen

Did anyone solve this problem on a LVM-based server?

bdot (bdot) wrote :

Please bump in importance/priority. This is a showstopper!
Thanks

Alecz20 (alexguzu) wrote :

Here is a workaround for this problem:

If this happened to you and you cannot boot (and maybe you have a server without a CD-ROM drive), here is what you have to do to recover your system:

- When GRUB menu appears (press ESC to prevent it from trying to boot), select any line and press 'e' for Edit.
- Then, select the second line (the one that has 'kernel' at the beginning), and make the following changes:
- replace 'ro' with 'rw' so you can read and write in the file-system
- add 'init=/bin/bash' at the end of the line
- Press Enter (so that the line is saved)
- Press 'b' to boot.

You will now be able to 'nano /etc/network/interfaces' and fix the file.
- After you fix the file, press "Ctrl + Alt + Del"

The changes you made in GRUB will NOT be saved for further boots, so you don't need to do anything to GRUB after this.

Hope this helps.

Prashant_N (massoo-gmail) wrote :

Hi,

A silly mistake in network config file. Cannot BOOT / go to rescue mode as I cannot even see the GRUB Screen.

I have a server without CDROM and the live USB fails with detection of CDROM.

This is pretty annoying ... Please fix

A script had duplicated the "lo" line, and I ended up completely stuck.
The syntax was fine, just a duplicate line!!

This was a KVM instance and the error messages were not helpful at all.
I had to add "--verbose" to the boot prompt (not well documented), and sift trough the upstart log messages.

Just not what you would expect from a server OS.
I should at least be able to get a root console to be able to fix things!

Fawaad Moied (4waad) wrote :

Same error and description as mentioned by bdot (#2).
The direct cause of the error in my case was the interfaces file contained term 'broadcast' with missing value, non set. Settings were changed using webmin and value was overlooked. Had to boot of live cd to correct.

Kevin (klstringer2012) wrote :

I have the same issue I edited the interfaces file to change from dhcp addressing to a static address, I made a typo and misspelt broadcast, fixed the typo and still have the same issue, went back in and changed the file back to dhcp and still have the same issue. Since I don't have time to troubleshoot the issue and its a clean install on a new server that hasn't been populated with anything yet I'm currently doing a reinstall. Hopefully it will solve the issue.

Dell r610
2x Xeon quad-core 2.0 Ghz
12GB RAM
SAS 6 i/R RAID Controller
10.04 x64

Kevin (klstringer2012) wrote :

Reinstalling solved the issue, have not tried any further editing of the interfaces file.

wayne (plettpc) wrote :

Confirmed, Same issue as Fawaad, editing /etc/network/interfaces solved it.
Two issues to look at, Firstly on boot there is no grub selection, thus making "e" difficult to get logon.
I also edited ethx's via webmin, is there a way within Ubuntu not to save unless proper values are entered, not neccesarily correct values.?

On 10.04 Grub2 installs:

 Hold down SHIFT to force Grub2 menu to be shown.

Source: https://help.ubuntu.com/community/Grub2#Hidden

Robbie Williamson (robbiew) wrote :

I'm a bit confused. I understand the frustration, but what is it exactly that we need to "fix"? A mistake can be made in plenty of system files that lead to an unbootable system, which is why they are owned by root...it is assumed that anyone with root access and editing these files knows the consequences of making mistakes.

Changed in ifupdown (Ubuntu):
status: Confirmed → Invalid

Robbie, it's a regression. In previous release, you would screw up your interfaces file, the system would still boot, only there would be no networking. Mistake do happen, failing gracefully is the right thing to do when possible.

What I do not understand is: why is a problem with networking preventing getty to be spawned on the console? It is completely counter-intuitive, it just does not make any sense.

I understand it's how Upstart behave, but if services are failing on completely unrelated service not starting, that's pretty brittle. Can we make it more robust please?

Tom Ellis (tellis) wrote :

The behavior previous to Lucid this caused the networking service to fail, but not to make the system hang, you still got to login and fix any mistakes. In Lucid the networking service is a native upstart script, previous releases carried the older /etc/init.d/networking script afaik.

is there a way to make an upstart script timeout after a certain number of failed attempts?

Tom Ellis (tellis) wrote :

Actually please disregard my last comment, it's a load of rubbish, the old networkingscript still exists... something else must be causing this

Robbie Williamson (robbiew) wrote :

I will re-open the bug, but because there is no solution yet, it will not make 10.04.1.

Changed in ifupdown (Ubuntu):
status: Invalid → Confirmed
importance: Undecided → Low
Robbie Williamson (robbiew) wrote :

Please note that this change in behavior occurred because of upstart-related changes we had to make for boot performance efforts. I've spoken to the upstart maintainer and there are plans to address situations like this in a new version of upstart, but that will not be ready until 11.04 at the earliest. As the proper "fix" means making changes to upstart, which is a critical component of the boot process, a fix for 9.10 or 10.04 will not occur due to SRU guidelines, and I doubt anything will change for 10.10 given our point in the release cycle.

Ricky Sheaves (ricky-sheaves) wrote :

Robbie, thanks for reopening the ticket for this "bug." To give some more context underscoring the end-user's perspective that this is defective behavior, imagine that an Ubuntu 10.04 server is racked at a co-lo with no bootable ISO in the drive. Now think about what happens when someone fat-fingers something in /etc/network/interfaces and reboots the machine.

Even with ILO access to the box, the most convenient thing s/he can do to recover from this is to configure a PXE boot from some image somewhere, if that's possible. If that's not a possibility, or it's more trouble than it's worth, it's off to the car to drive to the co-lo with a Live CD in hand. This is exactly the situation I was in just a few hours ago.

Certainly one could come up with end-around schemes to resolve this issue remotely, but that's not the point. The point is that if this server were running Debian (for example), it would simply complain, boot, and allow me to fix the mistake via ILO without having to muck with booting from a live network image or driving to the datacenter.

I hope that helps. A "fix" would be much appreciated. Thanks!

Robbie Williamson (robbiew) wrote :

Ricky,

A group of us are getting together next week. We'll see if we can come up with something for a 10.04 SRU or 10.04.2....unfortunately it's too late to get something into 10.04.1. Will update this bug with whatever we come up with.

Colin Watson (cjwatson) on 2010-07-22
Changed in ifupdown (Ubuntu):
status: Confirmed → Triaged
importance: Low → Medium
assignee: nobody → Colin Watson (cjwatson)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.6.10ubuntu2

---------------
ifupdown (0.6.10ubuntu2) maverick; urgency=low

  * debian/ifupdown.network-interface.conf: Bring up lo manually, so that it
    comes up even if /etc/network/interfaces is broken (LP: #512253).
  * debian/ifupdown.upstart.if-up: Don't emit a duplicate net-device-up
    event for lo here, as network-interface.conf will have taken care of it.
 -- Colin Watson <email address hidden> Thu, 22 Jul 2010 13:52:49 +0100

Changed in ifupdown (Ubuntu):
status: Triaged → Fix Released
Colin Watson (cjwatson) on 2010-07-22
Changed in ifupdown (Ubuntu Lucid):
assignee: nobody → Colin Watson (cjwatson)
Changed in ifupdown (Ubuntu Karmic):
status: New → In Progress
Changed in ifupdown (Ubuntu Lucid):
status: New → In Progress
importance: Undecided → Medium
Changed in ifupdown (Ubuntu Karmic):
assignee: nobody → Colin Watson (cjwatson)
importance: Undecided → Medium
Colin Watson (cjwatson) wrote :
description: updated
Colin Watson (cjwatson) wrote :
description: updated

Accepted ifupdown into karmic-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in ifupdown (Ubuntu Karmic):
status: In Progress → Fix Committed
tags: added: verification-needed
Jonathan Riddell (jr) wrote :

lucid upload in lucid-proposed unapproved queue, awaiting SRU freeze ending

John Dong (jdong) wrote :

ACK from SRU team for Lucid.

mr. Ed (mred) wrote :

This bug affect ubuntu server 10.04.1

I have the proof, my server dead.

Robbie Williamson (robbiew) wrote :

@mr. Ed

Have you enabled -proposed in Software Sources? This will allow you to install and test the proposed fix for Lucid. Then we can verify it and possibly get it into 10.04.1. FTR, your server says 10.04.1 because we mistakenly updated the lsb_release data in the base-files package to early...to say 10.04.1.

Tom Ellis (tellis) wrote :

Tried to find ifupdown in -proposed, didn't seem to be in the indexes.

Grabbed http://gb.archive.ubuntu.com/ubuntu/pool/main/i/ifupdown/ifupdown_0.6.10ubuntu2_amd64.deb

Installed, Typo'ed my /etc/network/interfaces file and rebooted, all was well, system booted normally and allowed me to login and correct the mistake. This issue is fixed.

Thanks Colin.

Robbie Williamson (robbiew) wrote :

FYI, the fix has been uploaded to Lucid -proposed:
  http://launchpadlibrarian.net/52314068/ifupdown_0.6.8ubuntu29.1_source.changes
and is waiting for approval.

SRU verification for Karmic:
I have reproduced the problem with ifupdown 0.6.8ubuntu21.2 in karmic-updates and have verified that the version of ifupdown 0.6.8ubuntu21.3 in -proposed fixes the issue.

With the version in -proposed, the same error messages are displayed on the console, but the boot sequence continues. Once logged in, I've verified that lo is up.

Marking as verification-done

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.6.8ubuntu21.3

---------------
ifupdown (0.6.8ubuntu21.3) karmic-proposed; urgency=low

  * debian/ifupdown.network-interface.conf: Bring up lo manually, so that it
    comes up even if /etc/network/interfaces is broken (LP: #512253).
  * debian/ifupdown.upstart.if-up: Don't emit a duplicate net-device-up
    event for lo here, as network-interface.conf will have taken care of it.
 -- Colin Watson <email address hidden> Thu, 22 Jul 2010 14:06:36 +0100

Changed in ifupdown (Ubuntu Karmic):
status: Fix Committed → Fix Released
Martin Pitt (pitti) wrote :

Accepted ifupdown into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in ifupdown (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: removed: verification-done
tags: added: verification-needed
Paul Elliott (omahn) wrote :

SRU verification for Lucid:
I have reproduced the problem with ifupdown 0.6.8ubuntu29 in lucid and have verified that ifupdown 0.6.8ubuntu29.1 in lucid-proposed fixes the issue.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.6.8ubuntu29.1

---------------
ifupdown (0.6.8ubuntu29.1) lucid-proposed; urgency=low

  * debian/ifupdown.network-interface.conf: Bring up lo manually, so that it
    comes up even if /etc/network/interfaces is broken (LP: #512253).
  * debian/ifupdown.upstart.if-up: Don't emit a duplicate net-device-up
    event for lo here, as network-interface.conf will have taken care of it.
 -- Colin Watson <email address hidden> Thu, 22 Jul 2010 14:00:43 +0100

Changed in ifupdown (Ubuntu Lucid):
status: Fix Committed → Fix Released
affects: ifupdown (Ubuntu Lucid) → plymouth (Ubuntu Lucid)
Steve Langasek (vorlon) on 2011-02-09
affects: plymouth (Ubuntu) → ifupdown (Ubuntu)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers