Sudden reboot during server ISO install

Reported by C de-Avillez on 2010-12-27
This bug report is a duplicate of:  Bug #348455: init: transfer state across re-exec. Edit Remove
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Ubuntu Studio
Critical
Unassigned
upstart
Undecided
Unassigned
eglibc (Debian)
New
Unknown
eglibc (Ubuntu)
Critical
Canonical Foundations Team
Lucid
Critical
Unassigned
Maverick
Critical
Unassigned
upstart (Ubuntu)
Undecided
James Hunt
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned

Bug Description

Binary package hint: debian-installer

I am not sure which package is the affected one, but since it affects the install of the server, I am opening against d-i. I started experiencing this last week at around Dec 23/24th.

Repeated with ISOs from Dec 24th to Dec 27th.

I experienced it under both a bare-metal install and a libvirt install (under Natty), both under preseeded and manual installs.

Installation proceeds nicely until we get to the point of installing the base system. Then, suddenly, I see a "rebooting" message flash on the screen, and system does indeed reboot. Since there is no kernel installed, at this point, the system hangs.

I am marking this as critical -- no tests can be performed.

I am unsure on how to collect data here, since the installer logs are lost.

LAST ISO TESTED: 20101230. reboot happens on 'extracting libpcreg...'

C de-Avillez (hggdh2) on 2010-12-27
Changed in debian-installer (Ubuntu):
importance: Undecided → Critical
milestone: none → natty-alpha-2
tags: added: iso-testing

I can conirm this issue using both the Ubuntu and Xubuntu Alternate i386 and amd64 images for Natty. The reboot on those happens as a full reboot, leaving me at the startup screeen, "choose a language".

tags: added: debian-installer natty
Changed in debian-installer (Ubuntu):
status: New → Triaged
C de-Avillez (hggdh2) on 2010-12-27
description: updated
Brad Soto (bradskins) wrote :

same issue here for the PPC netboot image from Dec 22.

C de-Avillez (hggdh2) on 2010-12-30
description: updated

Screenshot just before the machine rebooted itself. This was the server image dated 2010-12-30, showing messages as printed on Alt+F4

'telinit u' makes init send SIGTERM to all processes. 'telinit u' executes from libc6.postinst script. I think there is similar issue that talks about it - 695220

Andrzej: Is that bug 695220 filed by you a duplicate of this?

Charlie: Yeap. Based on yours log I think we are dealing with the same problem.

Colin Watson (cjwatson) wrote :

sysvinit's telinit only ever sends SIGHUP, no matter what the argument. Upstart's 'telinit u' sends SIGTERM. busybox init interprets that as a reboot request.

This regression happened because libc6.postinst was simplified in line with Debian. We need to be more careful somehow, to avoid sending SIGTERM to non-Upstart instances of init.

affects: debian-installer (Ubuntu) → eglibc (Ubuntu)
Loïc Minier (lool) wrote :

I don't think it's a good idea to call "telinit u" if we're installing libc in a chroot, so I am considering this debdiff to revert eglibc to its older behavior of checking whether for a chroot before running telinit. I will send it to Debian as well, albeit I suspect the Linux specific test was a reason for the removal of this snippet in the first place.

I like Colin's idea of testing whether init really is upstart before signalling it with SIGTERM; I wonder how this will play when people replace sysvinit with upstart, so maybe we need to test for busybox or sysvinit specifically.

Loïc Minier (lool) wrote :

This is the proposed debdiff for upstart; I'm sure style could be improved, and maybe it would be preferable to not exit(1) when /proc isn't mounted; happy to get early comments though.

(Untested yet; apparently upstart FTBFS under my sbuild for some reason.)

Colin Watson (cjwatson) wrote :

I'm fine with the eglibc diff; thanks! I'm much less convinced about the Upstart diff and I think that needs specialist review. Perhaps 'telinit u' needs to use a D-Bus-ish way to ask Upstart to restart, rather than a signal, and silently do nothing if it can't talk to Upstart. However, I'm not sure whether upgrades from sysvinit to Upstart need to exec Upstart before rebooting. I'm sure Scott would remember.

Loïc Minier (lool) wrote :

Hey Scott!

We'd love your input on the best way for upstart's telinit to not SIGTERM what could possibly be a non-upstart init (this is particularly an issue with busybox as it reboots on SIGTERM). I attached a patch doing basically an is_chrooted test; Colin proposes using the DBus interface instead; we're not quite sure what should happen in the case of sysvinit -> upstart transitions or other weirdnesses though

Thanks a lot for your input

Clint Byrum (clint-fewbar) wrote :

I'm also working on a change to libc6's postinst in bug #672177 to not call telinit u at all, because upstart will lose all of its state if it is restarted. Instead, we need to touch /var/run/init.upgraded to force the restart of upstart just before unmounting the root filesystem.

So, in the case of a chroot/busybox init, this would do nothing. If one is upgrading from sysvinit to upstart (dapper -> hardy upgrades must still be supported IIRC) then this has to be handled delicately, and maybe already is, I haven't looked at whether update manager or one of the maintainer scripts does something clever here.

Changed in eglibc (Ubuntu):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Changed in upstart (Ubuntu):
assignee: nobody → James Hunt (jamesodhunt)
Loïc Minier (lool) wrote :

I find the touch /var/run/init.upgraded approach much less intrusive; I discussed various approaches with Debian, and none is entirely satisfactory. Apparently some filesystems wont be able to restore the new versions of the files if they aren't closed properly, and an unclean shutdown would cause these systems to keep the old libs.

Another issue is that upgraded systems continue running the old libc6 until reboot (at least their init does).

However since bug #672177 explains that upstart basically stopped honoring SIGTERM some time ago due to a broken merge, whatever short term solution which avoids using telinit will be fine.

Loïc Minier (lool) wrote :

I've tested the eglibc debdiff I have attached here, and it didn't cause any upgrade issue on my system; I didn't try it in a d-i context though.

I'll test Clint's proposal and upload that soon if that doesn't cause any regression on upgrade here.

A D-Bus interface to re-exec is the plan here.

Scott

On Tue, Jan 4, 2011 at 2:37 PM, Loïc Minier <email address hidden> wrote:
> Hey Scott!
>
> We'd love your input on the best way for upstart's telinit to not
> SIGTERM what could possibly be a non-upstart init (this is particularly
> an issue with busybox as it reboots on SIGTERM).  I attached a patch
> doing basically an is_chrooted test; Colin proposes using the DBus
> interface instead; we're not quite sure what should happen in the case
> of sysvinit -> upstart transitions or other weirdnesses though
>
> Thanks a lot for your input
>
> ** Also affects: upstart
>   Importance: Undecided
>       Status: New
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/694772
>
> Title:
>  Sudden reboot during server ISO install
>
> Status in Upstart:
>  New
> Status in “eglibc” package in Ubuntu:
>  Triaged
> Status in “upstart” package in Ubuntu:
>  New
> Status in “eglibc” package in Debian:
>  Unknown
>
> Bug description:
>  Binary package hint: debian-installer
>
> I am not sure which package is the affected one, but since it affects the install of the server, I am opening against d-i. I started experiencing this last week at around Dec 23/24th.
>
> Repeated with ISOs from Dec 24th to Dec 27th.
>
> I experienced it under both a bare-metal install and a libvirt install (under Natty), both under preseeded and manual installs.
>
> Installation proceeds nicely until we get to the point of installing the base system. Then, suddenly, I see a "rebooting" message flash on the screen, and system does indeed reboot. Since there is no kernel installed, at this point, the system hangs.
>
> I am marking this as critical -- no tests can be performed.
>
> I am unsure on how to collect data here, since the installer logs are lost.
>
> LAST ISO TESTED: 20101230. reboot happens on 'extracting libpcreg...'
>
>
>

tags: added: patch
Clint Byrum (clint-fewbar) wrote :

Here is a debdiff that comes from the eglibc portion of bug #672177

This would solve this bug as well.

Loïc Minier (lool) wrote :

I tested the no-telinit version by rebooting a couple of times after installation of an updated libc6, and didn't get any umount or fsck error that I could see, nor any reboot but that was not expected on a regular install anyway.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.12.1-0ubuntu12

---------------
eglibc (2.12.1-0ubuntu12) natty; urgency=low

  * do not run 'telinit u' on upgrade, as this will break upstart.
    touch /var/run/init.upgraded instead, which will force a re-exec just
    before remounting root read-only. LP: #672177, LP: #694772.
 -- Clint Byrum <email address hidden> Mon, 03 Jan 2011 10:17:18 -0800

Changed in eglibc (Ubuntu):
status: Triaged → Fix Released
C de-Avillez (hggdh2) wrote :

Confirming -- I no longer experience a reboot during ISO install.

Changed in eglibc (Debian):
status: Unknown → New
Changed in ubuntustudio:
status: New → Confirmed
importance: Undecided → Critical

Accepted eglibc into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Colin Watson (cjwatson) wrote :

Accepted eglibc into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in eglibc (Ubuntu Lucid):
importance: Undecided → Critical
milestone: none → ubuntu-10.04.2
status: New → Fix Committed
Changed in eglibc (Ubuntu Maverick):
status: New → Fix Committed
importance: Undecided → Critical
James Hunt (jamesodhunt) wrote :

Attached is a patch to sysvinit (from which the initscripts binary package is generated) which appears to fix the problem for maverick. The patch waits for up to 5 seconds for init to re-exec. If after this time init has not re-execed, we continue to unmount forcibly. This isn't ideal, but we cannot wait forever for init to re-exec. However, in testing, init re-spawns in under 1 second anyway.

Tested in combination with upstart package built from lp:~clint-fewbar/ubuntu/natty/upstart/restore-re-exec-code.

Would also require the updated eglibc (which creates /var/run/init.upgraded rather than calling "telinit u"). If updated eglibc not available, "sudo touch /var/run/init.upgraded" after a libc update should work.

Currently working on lucid + natty debdiffs...

James Hunt (jamesodhunt) wrote :

Oops! Ignore comment #22 as it applies to bug 672177.

Verification for Lucid.

I've verified that the package upgrades correctly from a default Lucid installation and that after the installation the system reboots, that X and the network are working. If there are specific verifications to do, let me known.

Marking as verification-done.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.11.1-0ubuntu7.8

---------------
eglibc (2.11.1-0ubuntu7.8) lucid-proposed; urgency=low

  [ Matthias Klose ]
  * Fix issue #12077, __strncmp_ssse3 can segfault when it over-reads
    its buffer. LP: #702190.

  [ Clint Byrum ]
  * do not run 'telinit u' on upgrade, as this will break upstart.
    touch /var/run/init.upgraded instead, which will force a re-exec just
    before remounting root read-only. LP: #672177, LP: #694772.
 -- Matthias Klose <email address hidden> Wed, 19 Jan 2011 03:06:52 +0100

Changed in eglibc (Ubuntu Lucid):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.12.1-0ubuntu10.2

---------------
eglibc (2.12.1-0ubuntu10.2) maverick-proposed; urgency=low

  [ Clint Byrum ]
  * do not run 'telinit u' on upgrade, as this will break upstart.
    touch /var/run/init.upgraded instead, which will force a re-exec just
    before remounting root read-only. LP: #672177, LP: #694772.

  [ Matthias Klose ]
  * Call locale-gen --purge when updating from eglibc-2.11.x. LP: #504198.
 -- Matthias Klose <email address hidden> Wed, 19 Jan 2011 03:12:11 +0100

Changed in eglibc (Ubuntu Maverick):
status: Fix Committed → Fix Released
Clint Byrum (clint-fewbar) wrote :

I'm failing to see where the "bug" in upstart is for this. telinit u sends SIGTERM to pid 1 blindly, which probably isn't the safest thing, but we can avoid doing that fairly easily.

So, since the way forward to allow upstart to re-execute itself without losing state is not clear, I think this one should probably be closed as Won't Fix.

Opinions?

I believe there's a bug open for the lack of state transfer? If not,
we could repurpose this one?

That being said, we should remember that state transfer should *not*
be invoked by SIGTERM - so maybe it's worth keeping this one open to
remind us of that.

Scott

On Fri, Feb 4, 2011 at 4:42 PM, Clint Byrum <email address hidden> wrote:
> I'm failing to see where the "bug" in upstart is for this. telinit u
> sends SIGTERM to pid 1 blindly, which probably isn't the safest thing,
> but we can avoid doing that fairly easily.
>
> So, since the way forward to allow upstart to re-execute itself without
> losing state is not clear, I think this one should probably be closed as
> Won't Fix.
>
> Opinions?
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/694772
>
> Title:
>  Sudden reboot during server ISO install
>
> Status in Ubuntu Studio:
>  Confirmed
> Status in Upstart:
>  New
> Status in “eglibc” package in Ubuntu:
>  Fix Released
> Status in “upstart” package in Ubuntu:
>  New
> Status in “eglibc” source package in Lucid:
>  Fix Released
> Status in “upstart” source package in Lucid:
>  New
> Status in “eglibc” source package in Maverick:
>  Fix Released
> Status in “upstart” source package in Maverick:
>  New
> Status in “eglibc” package in Debian:
>  New
>
> Bug description:
>  Binary package hint: debian-installer
>
>  I am not sure which package is the affected one, but since it affects
>  the install of the server, I am opening against d-i. I started
>  experiencing this last week at around Dec 23/24th.
>
>  Repeated with ISOs from Dec 24th to Dec 27th.
>
>  I experienced it under both a bare-metal install and a libvirt install
>  (under Natty), both under preseeded and manual installs.
>
>  Installation proceeds nicely until we get to the point of installing
>  the base system. Then, suddenly, I see a "rebooting" message flash on
>  the screen, and system does indeed reboot. Since there is no kernel
>  installed, at this point, the system hangs.
>
>  I am marking this as critical -- no tests can be performed.
>
>  I am unsure on how to collect data here, since the installer logs are
>  lost.
>
>  LAST ISO TESTED: 20101230. reboot happens on 'extracting libpcreg...'
>
>
>

Changed in ubuntustudio:
status: Confirmed → Fix Released
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in upstart (Ubuntu Lucid):
status: New → Confirmed
Changed in upstart (Ubuntu Maverick):
status: New → Confirmed
Changed in upstart (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.