libc6 upgrade causes umount to fail on shutdown because init cannot be restarted

Bug #672177 reported by Paul van Berlo
174
This bug affects 30 people
Affects Status Importance Assigned to Milestone
upstart
Invalid
Undecided
Unassigned
eglibc (Ubuntu)
Fix Released
Critical
Unassigned
Lucid
Fix Released
Undecided
Bobby A. Callender
Maverick
Fix Released
Undecided
Unassigned
Natty
Fix Released
Critical
Unassigned
sysvinit (Ubuntu)
Fix Released
High
James Hunt
Lucid
Fix Released
Critical
James Hunt
Maverick
Fix Released
Undecided
James Hunt
Natty
Fix Released
High
James Hunt
upstart (Ubuntu)
Fix Released
Critical
Clint Byrum
Lucid
Fix Released
Undecided
Unassigned
Maverick
Fix Released
Undecided
Unassigned
Natty
Fix Released
Critical
Clint Byrum

Bug Description

On a clean install of Ubuntu 10.04.1, after upgrading the offer libc6 upgrade, on the next reboot the root fs can't be properly unmounted (mount: / is busy). This causes fsck to run on boot and of course some minor issues with the filesystem. This might not be a problem with libc6 itself, but a side effect of upgrading in combination with some other package (I suspect the init process, so I guess upstart).

The fsck run, and the orphaned inodes it finds are holding me back from installing this on a new server - especially since this already happens on a clean install of 10.04.1!

paul@ubuntu:~$ lsb_release -rd
Description: Ubuntu 10.04.1 LTS
Release: 10.04

ii libc6 2.11.1-0ubuntu7.2 Embedded GNU C Library: Shared libraries

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: libc6 2.11.1-0ubuntu7.2
ProcVersionSignature: Ubuntu 2.6.32-24.39-server 2.6.32.15+drm33.5
Uname: Linux 2.6.32-24-server x86_64
Architecture: amd64
Date: Sun Nov 7 16:17:07 2010
InstallationMedia: Ubuntu-Server 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.2)
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: eglibc

Revision history for this message
Paul van Berlo (pvanberlo) wrote :
Revision history for this message
Paul van Berlo (pvanberlo) wrote :

Also - this only happens once after upgrading, after that, reboots work just fine without causing mount to fail on trying to unmount a busy root filesystem.

Revision history for this message
Victor Vargas (kamus) wrote :

I have reassigned this issue to libc6 (eglibc) package for you

affects: ubuntu → eglibc (Ubuntu)
Revision history for this message
Paul van Berlo (pvanberlo) wrote :

Thank you. It appears that this is somewhat similar to what happens in a very old bug (188925). Not sure if this is some kind of regression or if this is 'how things are supposed to be'. On reboot lsof only shows init being in use, with some of the libraries (incl. libc6). So right now I can only believe that the update of libc6 is causing this issue. I also tried restarting upstart (telinit u), but that apparently either doesn't work or it does work but doesn't solve the issue.

Revision history for this message
nerdistmonk (nerdistmonk) wrote :

Its effecting me on ubuntu 10.10 I386, I installed a command line system, ran a routine update, and behold it doesnt umount the root. I have seen this bug in action since version 7.04, dont know why it hasnt been fixed.

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

If this is 'how it should be'/accepted behavior, then something is obviously wrong. Having filesystem issues right after a clean install due to some libc upgrade is not acceptable.

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

It appears Bug #616287 is related to this.

Revision history for this message
ingo (ingo-steiner) wrote :

I just reproduced it and can confirm, see here: https://bugs.launchpad.net/ubuntu/+source/mountall/+bug/616287/comments/50

If further data collection is required, pleas advise.

Changed in eglibc (Ubuntu):
status: New → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Setting Importance to Critical, as this causes FS corruption, and potentially could affect all users.

Changed in eglibc (Ubuntu):
importance: Undecided → High
importance: High → Critical
Revision history for this message
ingo (ingo-steiner) wrote :

I'll preserve the snapshot of my Lucid-VM with update of libc pending. So I am prepared to provide additional information if required and to test the fix - which hopefully will come soon.

Revision history for this message
ingo (ingo-steiner) wrote :

Everybody can reproduce the bug any time in Lucid, Maverick, no matter whether i386 or amd64 by just doing:

apt-get install --reinstall libc6

and reboot. This causes the "orphaned inodes" upon shutdown (verified by examining the filesystem after shutdown).
This bug is Ubuntu specific, the verry same with Debian-Squeeze never will lead to a corrupted filesystem.

Revision history for this message
ingo (ingo-steiner) wrote :

This even happens in the recovery mode (root shell without network) by just executing:

apt-get install --reinstall libc6 && shutdown -r now

It appears that we got a sports car with utmost accelleration, but the designer has forgotten to equip it with proper breaks to stop it without damage. (just redirecting the fsck error-message to /dev/null won't be a fix)

Revision history for this message
ingo (ingo-steiner) wrote :

just execute on any version of Lucid or Maverick to corrupt filesystem:

apt-get install --reinstall libc6 && shutdown -r now

Changed in upstart (Ubuntu):
status: New → Confirmed
Revision history for this message
ingo (ingo-steiner) wrote :

I double-checked with Debian-Squeeze: all ok there (libc6 2.11.2-7)!

So it is definitely Ubuntu specific and happens in LTS-Lucid and Maverick.

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

Debian doesn't do upstart yet I believe. Since upstart is the only process left which keeps references to the old libc, that's probably causing the issue. There is no apparent and proper way to restart upstart after a libc upgrade.

Revision history for this message
ingo (ingo-steiner) wrote :

If that proves true, Ubuntu revives the historical UNIX-slogan (slightly adopted):

"Sure It Corrupts Your Files, But Look How Fast It Boots!"

Maybe that's also the reason why Scott refused to fix https://bugs.launchpad.net/bugs/568594, which causes problems if the /-filesystem is located on a remote drive?

Revision history for this message
Alf Gaida (agaida) wrote :

Among with other issues in a clean install of 10.10 i run into this libc6-sh**. It's not amusing. When this happens in front of a customer, what will you say about this behavior? "Ok, i've warned you but it's your descision not to take debian?" Add this to the bug ingo has filed month ago - a clean solution at the moment is to sell only debian servers and aptosid workstations because there are stable.

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

Bug 348346 also talks about how a libc upgrade should restart init to unload old libc to avoid remount root to read only issues. It appears to not work though, and seeing the non-response from the package maintainers (even after being marked critical), I have my doubts this will be fixed anytime soon. Too bad.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Paul, and others, this is being actively looked into. There are a number of bugs in the shutdown process, some of which are mentioned above.

Please also, temper the "corrupts your files" language. Orphaned inodes from already deleted files are not exactly corrupted files. Yes this issue is critical because it is preventing one of the critical steps of shutdown from completing correctly, and forcing a filesystem check on reboot. But as of yet, nobody has shown actual corruption due to this bug (though other similar bugs which prevent unmounting filesystems may in fact cause that).

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

Clint,

I agree up to a certain point:

1) There is no obvious proof that anyone is doing anything with this, as some of the other bugs mentioning this issue are over a year old. I believe you, but some more active overview on what is being done would be appreciated.

2) Although it is not necessary 'corruption', the fs was still unmounted while busy. Whatever we call it - it's not what is supposed to happen and I'm sure many people get scared away when that happens, especially on a clean install.

Just my two cents...

Revision history for this message
ingo (ingo-steiner) wrote :

I just did another probably dirty experiment, so don't blame me. I just want to report the results:

1. in Maverick I installed 'libc-bin' and 'libc6' (2.11.2-7) from Squeeze and
after reboot I got a corrupted filesystem (8 orphaned inodes) besides other complaints.

2. in Debian-Squeeze I installed 'libc-bin' and 'libc6' (2.11.1-0ubuntu7.6) from Lucid-updates and
after reboot I got a clean filesystem (no orphaned inodes) besides other complaints.

Disregarding the dirty way of using unmatched libc's and the resulting complaints of apt, ...
I conclude that the libc-packages of both, Ubuntu and Debian, themselfs are correct.
So the root of the evil has to be something else - I suppose upstart.

My personal opinion is that at very first this basic and critical bug must be fixed.
As log as this filesystem-corruption during shutdown persists, all the other related problems and bugs do just try to cure the symptoms and waste a lot of work?

Maybe I am wrong, but this bug though critical is not assigned to anybody yet. I wonder when somebody will feel responsible.

Revision history for this message
ingo (ingo-steiner) wrote :

And here another oddity of Ubuntu:

Squeeze logs fs-checks correctly in /var/log/fsck/*

Lucid (and Maverick): nothing logged there depite the libc6-updates:
ls -l /var/log/fsck
insgesamt 8
-rw-r----- 1 root adm 31 2010-08-20 12:55 checkfs
-rw-r----- 1 root adm 31 2010-08-20 12:55 checkroot

why?

Revision history for this message
praseodym (oliver-ehlert) wrote :

I confirm this bug in Maverick 64 bit in a Virtualbox on a Lucid 64 bit host. It doesnt occur on the same system in Maverick 32 bit in VBox and doesnt occur on the host system.

Revision history for this message
ingo (ingo-steiner) wrote :

@Element #59 (rare earth):

did you try 'apt-get install --reinstall libc6 && sutdown -r now' ?

Revision history for this message
ingo (ingo-steiner) wrote :

I wonder what Canonical tells its customers who purchased commercial support for Ubuntu in case they discover this bug?

Something like:

"sorry, we have done our best to hide such frightening messages from our customers. Unfortunately the nasty community has retrieved and disclosed it."

We have set up plymouth to hide the messages behind a puple splash screen with walking dots and taken provisions, so a normal user won't be able to uninstall plymouth (see Bug #556372). We have switched-off the usual logging from fsck-runs in /var/log/fsck/checkroot and /var/log/fsck/checkfs (see Bug #568594). We have set up the boot process to perform a filesystem check for / at every boot, regaredless what the user specifies in /etc/fstab column #6 (see Bug #568594).

Ubuntu firmly relies on 'fsck' as a proven and reliable tool to iron out those minor glitches when your filesystem gets corrupted or marked dirty on shutdown. Moreover we always point out: Ubuntu is not Debian!

???

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

So this does in fact seem to be caused completely by init holding libraries open. I patched /etc/init.d/umountfs to save the output of lsof just before unmounting root:

clint@natty-alpha1:~$ grep DEL /lastlsof
init 1 root DEL REG 251,0 399654 /lib/libnss_files-2.12.2.so
init 1 root DEL REG 251,0 399653 /lib/libnss_nis-2.12.2.so
init 1 root DEL REG 251,0 399639 /lib/libnsl-2.12.2.so
init 1 root DEL REG 251,0 399659 /lib/libnss_compat-2.12.2.so.dpkg-new
init 1 root DEL REG 251,0 399660 /lib/libc-2.12.2.so
init 1 root DEL REG 251,0 399629 /lib/librt-2.12.2.so
init 1 root DEL REG 251,0 399662 /lib/libpthread-2.12.2.so
init 1 root DEL REG 251,0 399663 /lib/ld-2.12.2.so

This is odd, because upstart claims to suport 'telinit u' in its man page, but it actually doesn't do anything thanks to this revision in upstart's codebase:

------------------------------------------------------------
revno: 977
committer: Scott James Remnant <email address hidden>
branch nick: upstart
timestamp: Thu 2008-06-05 01:26:10 +0100
message:
  * init/main.c: Also remove SIGTERM handling, we don't re-exec
  properly and this is a dangerous signal to use anyway.
  (term_handler): Drop function.

So, I think glibc is doing its job calling 'telinit u' in the postinst. This is upstart's bug. Setting to Critical in upstart now, and Invalid in eglibc.

This is also a regression of bug #188925 , which was present in hardy, and fixed in intrepid. Tagging regression-release.

Changed in upstart (Ubuntu):
importance: Undecided → Critical
Changed in eglibc (Ubuntu):
status: Confirmed → Invalid
tags: added: regression-release
summary: - libc6 upgrade causes umount to fail on shutdown
+ libc6 upgrade causes umount to fail on shutdown because init cannot be
+ restarted
Changed in upstart (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Scott James Remnant (scott) wrote : Re: [Bug 672177] Re: libc6 upgrade causes umount to fail on shutdown

On Mon, Dec 27, 2010 at 10:17 PM, Clint Byrum <email address hidden> wrote:

> So this does in fact seem to be caused completely by init holding
> libraries open. I patched /etc/init.d/umountfs to save the output of
> lsof just before unmounting root:
>
Yes, now go read /etc/init.d/umountroot

Changed in upstart:
status: New → Invalid
Changed in upstart (Ubuntu):
status: Triaged → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Oops, I'm still learning my way around the shutdown process and, indeed, I missed umountroot.

I went ahead and added another lsof right after the call to telinit u, right before MOUNT_FORCE_OPT=, and one *after* the remount of /.

It doesn't seem to make any difference:

clint@natty-alpha1:~$ grep DEL /lastlsof-umountroot-post
init 1 root DEL REG 251,0 399637 /lib/libnss_files-2.12.2.so
init 1 root DEL REG 251,0 399636 /lib/libnss_nis-2.12.2.so
init 1 root DEL REG 251,0 399630 /lib/libnsl-2.12.2.so
init 1 root DEL REG 251,0 399643 /lib/libnss_compat-2.12.2.so.dpkg-new
init 1 root DEL REG 251,0 399644 /lib/libc-2.12.2.so
init 1 root DEL REG 251,0 390256 /lib/librt-2.12.2.so
init 1 root DEL REG 251,0 399646 /lib/libpthread-2.12.2.so
init 1 root DEL REG 251,0 399647 /lib/ld-2.12.2.so
clint@natty-alpha1:~$ grep DEL /lastlsof-umountroot
init 1 root DEL REG 251,0 399637 /lib/libnss_files-2.12.2.so
init 1 root DEL REG 251,0 399636 /lib/libnss_nis-2.12.2.so
init 1 root DEL REG 251,0 399630 /lib/libnsl-2.12.2.so
init 1 root DEL REG 251,0 399643 /lib/libnss_compat-2.12.2.so.dpkg-new
init 1 root DEL REG 251,0 399644 /lib/libc-2.12.2.so
init 1 root DEL REG 251,0 390256 /lib/librt-2.12.2.so
init 1 root DEL REG 251,0 399646 /lib/libpthread-2.12.2.so
init 1 root DEL REG 251,0 399647 /lib/ld-2.12.2.so

I even tried forcing the telinit u by doing 'touch /var/run/init.upgraded', and verified that it ran by adding -x and a sleep at the end of umountroot.

This makes sense, because telinit u just sends SIGTERM to upstart, which has no handler, as it was removed by revision 977, and doesn't seem to have been added back. Since SIG_DFL signals are not delivered to init, I'm not sure how umountroot's call to 'telinit u' can help in this case.

Unless I'm missing something in upstart's code (quite likely) I think there may still potentially need to be a change in upstart to support re-executing.

Another option if this can't be pushed into upstart is to mount a special readonly filesystem at boot just for init's libraries. However, it seems that re-execcing would be simpler than trying to get that right.

Revision history for this message
Scott James Remnant (scott) wrote : Re: [Bug 672177] Re: libc6 upgrade causes umount to fail on shutdown because init cannot be restarted

On Tue, Dec 28, 2010 at 6:47 AM, Clint Byrum <email address hidden> wrote:

> This makes sense, because telinit u just sends SIGTERM to upstart, which
> has no handler, as it was removed by revision 977, and doesn't seem to
> have been added back. Since SIG_DFL signals are not delivered to init,
> I'm not sure how umountroot's call to 'telinit u' can help in this case.
>
> Unless I'm missing something in upstart's code (quite likely) I think
> there may still potentially need to be a change in upstart to support
> re-executing.
>
Hmm, while that handled was removed Upstream, the code should have
been retained in the Ubuntu package as part of the "telinit u" patch.

The idea is that rather than doing a full state transfer, Upstart just
re-exec's itself and loses all state. That's why we do it as the last
thing before unmounting the root on shutdown, because it then doesn't
matter about the state - there shouldn't be any.

Maybe that part of the patch has been lost?

Scott

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Tue, 2010-12-28 at 14:22 +0000, Scott James Remnant wrote:
> On Tue, Dec 28, 2010 at 6:47 AM, Clint Byrum <email address hidden> wrote:
>
> > This makes sense, because telinit u just sends SIGTERM to upstart, which
> > has no handler, as it was removed by revision 977, and doesn't seem to
> > have been added back. Since SIG_DFL signals are not delivered to init,
> > I'm not sure how umountroot's call to 'telinit u' can help in this case.
> >
> > Unless I'm missing something in upstart's code (quite likely) I think
> > there may still potentially need to be a change in upstart to support
> > re-executing.
> >
> Hmm, while that handled was removed Upstream, the code should have
> been retained in the Ubuntu package as part of the "telinit u" patch.
>
> The idea is that rather than doing a full state transfer, Upstart just
> re-exec's itself and loses all state. That's why we do it as the last
> thing before unmounting the root on shutdown, because it then doesn't
> matter about the state - there shouldn't be any.
>
> Maybe that part of the patch has been lost?
>

Ahh, as I suspected I was missing something in upstart's code.

That code is cleverly hidden within the package diff.

It registers nih_main_term_signal() which calls nih_main_loop_exit(),
which tells init to exit after any currently running operation is over.

So, it would seem init exits on SIGTERM. I did see it actually exit and
end up panicing the kernel once, but only when I sent multiple kills.

init should never exit, so should'nt we instead be execcing ourselves
again? Will keep digging in, maybe neither are happening as we expect
them to, causing this issue.

Revision history for this message
Scott James Remnant (scott) wrote :
Download full text (3.4 KiB)

Are you sure? I can't find that code at all - it looks like it's been
lost somehow

On Tue, Dec 28, 2010 at 6:19 PM, Clint Byrum <email address hidden> wrote:
> On Tue, 2010-12-28 at 14:22 +0000, Scott James Remnant wrote:
>> On Tue, Dec 28, 2010 at 6:47 AM, Clint Byrum <email address hidden> wrote:
>>
>> > This makes sense, because telinit u just sends SIGTERM to upstart, which
>> > has no handler, as it was removed by revision 977, and doesn't seem to
>> > have been added back. Since SIG_DFL signals are not delivered to init,
>> > I'm not sure how umountroot's call to 'telinit u' can help in this case.
>> >
>> > Unless I'm missing something in upstart's code (quite likely) I think
>> > there may still potentially need to be a change in upstart to support
>> > re-executing.
>> >
>> Hmm, while that handled was removed Upstream, the code should have
>> been retained in the Ubuntu package as part of the "telinit u" patch.
>>
>> The idea is that rather than doing a full state transfer, Upstart just
>> re-exec's itself and loses all state.  That's why we do it as the last
>> thing before unmounting the root on shutdown, because it then doesn't
>> matter about the state - there shouldn't be any.
>>
>> Maybe that part of the patch has been lost?
>>
>
> Ahh, as I suspected I was missing something in upstart's code.
>
> That code is cleverly hidden within the package diff.
>
> It registers nih_main_term_signal() which calls nih_main_loop_exit(),
> which tells init to exit after any currently running operation is over.
>
> So, it would seem init exits on SIGTERM. I did see it actually exit and
> end up panicing the kernel once, but only when I sent multiple kills.
>
> init should never exit, so should'nt we instead be execcing ourselves
> again? Will keep digging in, maybe neither are happening as we expect
> them to, causing this issue.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/672177
>
> Title:
>  libc6 upgrade causes umount to fail on shutdown because init cannot be restarted
>
> Status in Upstart:
>  Invalid
> Status in “eglibc” package in Ubuntu:
>  Invalid
> Status in “upstart” package in Ubuntu:
>  Confirmed
>
> Bug description:
>  On a clean install of Ubuntu 10.04.1, after upgrading the offer libc6 upgrade, on the next reboot the root fs can't be properly unmounted (mount: / is busy). This causes fsck to run on boot and of course some minor issues with the filesystem. This might not be a problem with libc6 itself, but a side effect of upgrading in combination with some other package (I suspect the init process, so I guess upstart).
>
> The fsck run, and the orphaned inodes it finds are holding me back from installing this on a new server - especially since this already happens on a clean install of 10.04.1!
>
> paul@ubuntu:~$ lsb_release -rd
> Description:    Ubuntu 10.04.1 LTS
> Release:        10.04
>
> ii  libc6                           2.11.1-0ubuntu7.2                 Embedded GNU C Library: Shared libraries
>
> ProblemType: Bug
> DistroRelease: Ubuntu 10.04
> Package: libc6 2.11.1-0ubuntu7.2
> ProcVersionSignature: Ubuntu 2.6...

Read more...

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Wed, 2010-12-29 at 00:06 +0000, Scott James Remnant wrote:
> Are you sure? I can't find that code at all - it looks like it's been
> lost somehow

quite certain.. its a bit tricky to find it in bzr, but this will give
it to you for, say, maverick:

$ bzr branch lp:ubuntu/maverick/upstart maverick
$ cd maverick
$ bzr diff -r tag:upstream-0.6.6..tag:0.6.6-3

apt-get source will wrap it up in the .diff.gz for you as well.

Revision history for this message
Scott James Remnant (scott) wrote :

Yes, I know how to drive bzr and dpkg ;-)

My point is that if you do that diff, the only occurrence of SIGTERM
is in the code for the upstart/udev bridge, that patch doesn't add any
handling to upstart itself.

Scott

On Wed, Dec 29, 2010 at 6:54 AM, Clint Byrum <email address hidden> wrote:
> On Wed, 2010-12-29 at 00:06 +0000, Scott James Remnant wrote:
>> Are you sure?  I can't find that code at all - it looks like it's been
>> lost somehow
>
> quite certain.. its a bit tricky to find it in bzr, but this will give
> it to you for, say, maverick:
>
> $ bzr branch lp:ubuntu/maverick/upstart maverick
> $ cd maverick
> $ bzr diff -r tag:upstream-0.6.6..tag:0.6.6-3
>
> apt-get source will wrap it up in the .diff.gz for you as well.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/672177
>
> Title:
>  libc6 upgrade causes umount to fail on shutdown because init cannot be restarted
>
> Status in Upstart:
>  Invalid
> Status in “eglibc” package in Ubuntu:
>  Invalid
> Status in “upstart” package in Ubuntu:
>  Confirmed
>
> Bug description:
>  On a clean install of Ubuntu 10.04.1, after upgrading the offer libc6 upgrade, on the next reboot the root fs can't be properly unmounted (mount: / is busy). This causes fsck to run on boot and of course some minor issues with the filesystem. This might not be a problem with libc6 itself, but a side effect of upgrading in combination with some other package (I suspect the init process, so I guess upstart).
>
> The fsck run, and the orphaned inodes it finds are holding me back from installing this on a new server - especially since this already happens on a clean install of 10.04.1!
>
> paul@ubuntu:~$ lsb_release -rd
> Description:    Ubuntu 10.04.1 LTS
> Release:        10.04
>
> ii  libc6                           2.11.1-0ubuntu7.2                 Embedded GNU C Library: Shared libraries
>
> ProblemType: Bug
> DistroRelease: Ubuntu 10.04
> Package: libc6 2.11.1-0ubuntu7.2
> ProcVersionSignature: Ubuntu 2.6.32-24.39-server 2.6.32.15+drm33.5
> Uname: Linux 2.6.32-24-server x86_64
> Architecture: amd64
> Date: Sun Nov  7 16:17:07 2010
> InstallationMedia: Ubuntu-Server 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.2)
> ProcEnviron:
>  PATH=(custom, no user)
>  LANG=en_US.UTF-8
>  SHELL=/bin/bash
> SourcePackage: eglibc
>
>
>

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Some day, I'll learn to pay attention to the details, I promise. Sorry for the confusion.

Ok, so I reverse merged r977 back in, which mostly applied cleanly.

I then tested this on a natty VM, and, shock, it worked flawlessly.

I think we may also need to have eglibc's postinst skip the call to telinit u, and instead touch /var/run/init.upgraded. Its probably open to debate whether we need to flag users to reboot, though I'd prefer that we do.

One major problem though, is it seems upstart is FTBFS on natty right now. Without any changes, 0.6.7-3 cannot build, as init/test_conf fails, and utils/test_utmp also fails. Will open a bug report for that post-holiday.

I've pushed up a branch, it would be nice if somebody else could test it (I manually disabled test_conf and test_utmp in their respective Makefile.am's and then re-ran automake before building). I will propose the merge after returning from holiday next week.

Revision history for this message
Scott James Remnant (scott) wrote :
Download full text (3.4 KiB)

I've no idea why that revision got dropped from the Ubuntu package, I
did a bit of investigation and it seems to vanish about the point we
switched to the auto-importer based packages. My only guess is that
bzr undid the cherry-pick as part of a merge.

Right, eglibc's postinst should definitely not call telinit u and
should touch that upgraded file instead, i thought the code to do that
was definitely in glibc's postinst but again I can't find it in the
history. It does look very much like we lost a chunk of patches
somehow, the bug log shows that they were definitely uploaded!

Definitely open bug reports for test failures - these could be signs
of bigger problems and shouldn't be overridden to make an upload go
through.

Scott

On Wed, Dec 29, 2010 at 8:17 PM, Clint Byrum <email address hidden> wrote:
> Some day, I'll learn to pay attention to the details, I promise. Sorry
> for the confusion.
>
> Ok, so I reverse merged r977 back in, which mostly applied cleanly.
>
> I then tested this on a natty VM, and, shock, it worked flawlessly.
>
> I think we may also need to have eglibc's postinst skip the call to
> telinit u, and instead touch /var/run/init.upgraded. Its probably open
> to debate whether we need to flag users to reboot, though I'd prefer
> that we do.
>
> One major problem though, is it seems upstart is FTBFS on natty right
> now. Without any changes, 0.6.7-3 cannot build, as init/test_conf fails,
> and utils/test_utmp also fails. Will open a bug report for that post-
> holiday.
>
> I've pushed up a branch, it would be nice if somebody else could test it
> (I manually disabled test_conf and test_utmp in their respective
> Makefile.am's and then re-ran automake before building). I will propose
> the merge after returning from holiday next week.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/672177
>
> Title:
>  libc6 upgrade causes umount to fail on shutdown because init cannot be restarted
>
> Status in Upstart:
>  Invalid
> Status in “eglibc” package in Ubuntu:
>  Invalid
> Status in “upstart” package in Ubuntu:
>  Confirmed
>
> Bug description:
>  On a clean install of Ubuntu 10.04.1, after upgrading the offer libc6 upgrade, on the next reboot the root fs can't be properly unmounted (mount: / is busy). This causes fsck to run on boot and of course some minor issues with the filesystem. This might not be a problem with libc6 itself, but a side effect of upgrading in combination with some other package (I suspect the init process, so I guess upstart).
>
> The fsck run, and the orphaned inodes it finds are holding me back from installing this on a new server - especially since this already happens on a clean install of 10.04.1!
>
> paul@ubuntu:~$ lsb_release -rd
> Description:    Ubuntu 10.04.1 LTS
> Release:        10.04
>
> ii  libc6                           2.11.1-0ubuntu7.2                 Embedded GNU C Library: Shared libraries
>
> ProblemType: Bug
> DistroRelease: Ubuntu 10.04
> Package: libc6 2.11.1-0ubuntu7.2
> ProcVersionSignature: Ubuntu 2.6.32-24.39-server 2.6.32.15+drm33.5
> Uname: Linux 2.6.32-24-server x86...

Read more...

Changed in eglibc (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

The test failures that I saw were caused by running the build step as root. I didn't think sbuild did that, but apparently it does. When I build inside a clean chroot as non-root the build seems to work fine.

I will file the merge proposal then and also do the glibc change.

Changed in upstart (Ubuntu):
assignee: nobody → Clint Byrum (clint-fewbar)
Changed in eglibc (Ubuntu):
status: Confirmed → In Progress
Changed in upstart (Ubuntu):
status: Confirmed → In Progress
Changed in eglibc (Ubuntu):
assignee: nobody → Clint Byrum (clint-fewbar)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.12.1-0ubuntu12

---------------
eglibc (2.12.1-0ubuntu12) natty; urgency=low

  * do not run 'telinit u' on upgrade, as this will break upstart.
    touch /var/run/init.upgraded instead, which will force a re-exec just
    before remounting root read-only. LP: #672177, LP: #694772.
 -- Clint Byrum <email address hidden> Mon, 03 Jan 2011 10:17:18 -0800

Changed in eglibc (Ubuntu):
status: In Progress → Fix Released
Changed in eglibc (Ubuntu):
assignee: Clint Byrum (clint-fewbar) → nobody
Revision history for this message
ingo (ingo-steiner) wrote :

Hi Clint,

thanks for fast fix - but unfortunately only in Natty which is still in development. This bug was reported for *Lucid*, which in terms of service will even survive Natty. When will that be fixed and who cares?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Wed, 2011-01-05 at 21:31 +0000, ingo wrote:
> Hi Clint,
>
> thanks for fast fix - but unfortunately only in Natty which is still in
> development. This bug was reported for *Lucid*, which in terms of
> service will even survive Natty. When will that be fixed and who cares?
>

Hi Ingo. This is the normal procedure for stable release updates.

If we don't fix it first in the development release, then it will
persist into subsequent releases.

We need to fix it on the dev version first, and then the fix will be
backported to maverick and lucid.

I know its hard to remain patient. I'm hoping to get this fix into lucid
before 10.04.2 is released in February.

Revision history for this message
nerdistmonk (nerdistmonk) wrote :

This is funny because I reported this bug in the ubuntu forums in 2007 but i was ignored and it was blamed on the user (not surprised)

http://ohioloco.ubuntuforums.org/showthread.php?t=645429
(this is my original post from December 2007 same bug)

Revision history for this message
ingo (ingo-steiner) wrote :

Today another libc6 upgrade (security) came in. This is Lucid's file system just after shutdown:

# fsck -f /dev/sda6
fsck 1.40.8 (13-Mar-2008)
e2fsck 1.40.8 (13-Mar-2008)
Lucid: stelle das Journal wieder her
Bereinige verwaist Inode 310667 (uid=0, gid=0, mode=0100644, size=51712)
Bereinige verwaist Inode 310663 (uid=0, gid=0, mode=0100755, size=1572232)
Bereinige verwaist Inode 310661 (uid=0, gid=0, mode=0100755, size=135745)
Bereinige verwaist Inode 310651 (uid=0, gid=0, mode=0100644, size=43552)
Bereinige verwaist Inode 310650 (uid=0, gid=0, mode=0100644, size=35712)
Bereinige verwaist Inode 310644 (uid=0, gid=0, mode=0100644, size=31744)
Bereinige verwaist Inode 310640 (uid=0, gid=0, mode=0100755, size=136936)
Bereinige verwaist Inode 310639 (uid=0, gid=0, mode=0100644, size=97256)
Durchgang 1: Prüfe Inodes, Blocks, und Größen
Durchgang 2: Prüfe Verzeichnis Struktur
Durchgang 3: Prüfe Verzeichnis Verknüpfungen
Durchgang 4: Überprüfe die Referenzzähler
Durchgang 5: Überprüfe Gruppe Zusammenfassung

Lucid: ***** DATEISYSTEM WURDE VERÄNDERT *****
Lucid: 221800/1028160 Dateien (1.2% nicht zusammenhängend), 1258926/4112632 Blöcke

Revision history for this message
DeJe (djenett) wrote :

I can confirm that on Ubuntu Lucid 64bit.

Bug still in place...

Matthias Klose (doko)
Changed in eglibc (Ubuntu Lucid):
status: New → In Progress
Changed in eglibc (Ubuntu Maverick):
status: New → In Progress
Changed in upstart (Ubuntu Lucid):
status: New → Triaged
Changed in upstart (Ubuntu Maverick):
status: New → Triaged
Revision history for this message
ingo (ingo-steiner) wrote :

Today again after security update for 'udev' and reboot: 1 orphaned inode.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Added sysvinit, as umountroot needs to wait for the upstart re-exec to actually happen before moving forward (can do this in telinit as well, but given the design to use SIGTERM and have no way to signal back to telinit that the re-exec is done, its much simpler to have umountroot just watch for a new version of init based on proc.

Revision history for this message
Colin Watson (cjwatson) wrote : Please test proposed package

Accepted eglibc into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in eglibc (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in eglibc (Ubuntu Maverick):
status: In Progress → Fix Committed
Revision history for this message
Colin Watson (cjwatson) wrote :

Accepted eglibc into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Colin Watson (cjwatson) wrote :

Accepted upstart into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in upstart (Ubuntu Lucid):
status: Triaged → Fix Committed
Changed in upstart (Ubuntu Maverick):
status: Triaged → Fix Committed
Revision history for this message
Colin Watson (cjwatson) wrote :

Accepted upstart into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
ingo (ingo-steiner) wrote :

Thanks for Lucid-fix. I did immediately test with new 'libc6' and 'upstart' from Lucid-proposed:

1. just new 'libc6' -> does not change anything (8 orphaned inodes after reboot).

2. + new 'upstart' -> gives 1 orphaned inode after reboot.

3. new ('libc6' + 'upstart'), 'apt-get install --reinstall libc6 && shutdown -r now' -> 4 orphaned inodes.

So it improves the situation definitely, but does not solve it completely. Did I miss something to update?

--------------
$ dpkg --status libc6
Package: libc6
Status: install ok installed
Priority: required
Section: libs
Installed-Size: 10212
Maintainer: Ubuntu Core developers <email address hidden>
Architecture: amd64
Source: eglibc
Version: 2.11.1-0ubuntu7.8
Replaces: belocs-locales-bin
Provides: glibc-2.10-1
Depends: libc-bin (= 2.11.1-0ubuntu7.8), debconf (>= 0.5) | debconf-2.0, libgcc1, tzdata, findutils (>= 4.4.0-2ubuntu2)
Suggests: glibc-doc, locales
Breaks: nscd (<< 2.9)
Conflicts: belocs-locales-bin, tzdata (<< 2007k-1), tzdata-etch
....

---------------
$ dpkg --status upstart
Package: upstart
Status: install ok installed
Priority: required
Section: admin
Installed-Size: 860
Maintainer: Scott James Remnant <email address hidden>
Architecture: amd64
Version: 0.6.5-8
Replaces: startup-tasks, system-services, sysvinit, upstart-compat-sysv, upstart-job
Provides: startup-tasks, system-services, upstart-compat-sysv, upstart-job
Depends: libc6 (>= 2.4), libdbus-1-3 (>= 1.2.16), libnih-dbus1 (>= 1.0.0), libnih1 (>= 1.0.0), libudev0 (>= 151-5), sysvinit-utils, sysv-rc, initscripts, mountall, ifupdown (>= 0.6.8ubuntu29)
Breaks: libc6 (<< 2.11.1-0ubuntu7.8)
Conflicts: startup-tasks, system-services, sysvinit, upstart-compat-sysv, upstart-job
Conffiles: ....

---------
$ apt-get install --reinstall libc6 && shutdown -r now

->
$ dmesg | grep orphan
[ 2.898032] EXT3-fs: sda1: orphan cleanup on readonly fs
[ 2.898051] ext3_orphan_cleanup: deleting unreferenced inode 185613
[ 2.902163] ext3_orphan_cleanup: deleting unreferenced inode 185601
[ 2.902430] ext3_orphan_cleanup: deleting unreferenced inode 185574
[ 2.903374] ext3_orphan_cleanup: deleting unreferenced inode 185553
[ 2.903771] EXT3-fs: sda1: 4 orphan inodes deleted

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 672177] Re: libc6 upgrade causes umount to fail on shutdown because init cannot be restarted

On Sat, 2011-01-22 at 11:50 +0000, ingo wrote:
> Thanks for Lucid-fix. I did immediately test with new 'libc6' and
> 'upstart' from Lucid-proposed:
>
> 1. just new 'libc6' -> does not change anything (8 orphaned inodes after
> reboot).
>
> 2. + new 'upstart' -> gives 1 orphaned inode after reboot.
>
> 3. new ('libc6' + 'upstart'), 'apt-get install --reinstall libc6 &&
> shutdown -r now' -> 4 orphaned inodes.
>
> So it improves the situation definitely, but does not solve it
> completely. Did I miss something to update?
>

No Ingo, you did everything right, I see similar effects.

That is why sysvinit was added to the bug report. The issue is that
while we've restored the ability to tell init to re-exec itself, we have
also introduced a race condition. telinit u exits as soon as it has sent
SIGTERM to pid 1. Almost the very next line in /etc/init.d/umountroot
attempts the remount of / as readonly. If the signal handler doesn't
complete the re-exec before this, then the remount will still fail.

I've been testing a few ways to make telinit or umountroot block until
init re-execs. The important thing is that this way 10.04.2 will ship
with an init that *can* re-exec.

If you add 'sleep 1' after the telinit u line in /etc/init.d/umountroot,
that should give you a much better chance of succeeding the remount.
Hopefully there will be a proper fix available for that early next week,
but I don't know if it will be accepted for 10.04.2 or have to come as
an update to 10.04.2.

Revision history for this message
ingo (ingo-steiner) wrote :

Clint, seems you are doing an excellent job and I am confident it gets fixed finally!

Your proposal works, however only if I modify /etc/init.d/umountroot this way:

 [ -f /var/run/init.upgraded ] && telinit u && sleep 1 || :

it does *not* work, if I place the sleep in a separate line like this:

 [ -f /var/run/init.upgraded ] && telinit u || :
        sleep 1

Are you sure that at the end of the line a ":" (colon) is correct, or should it be a ";" (semicolon)?

Revision history for this message
ingo (ingo-steiner) wrote :

Sorry, but I can't reproduce it now, getting still 4 orphaned inodes. Can I insert some lines to log which libs are still in use?

Revision history for this message
ingo (ingo-steiner) wrote :

Observed just another oddity, probably a separate bug in mountall?

I tried to mount the / filesystem (ext3) in journal mode to see if this improves the situation by adding the option to /etc/fstab:
data=journal,erros=remount-ro
But that results in boot process stalling with / filesystem mounted ro and just a console. dmesg | grep -i ext3 tells me:

EXT3-fs: mounted filesystem with ordered data mode
EXT3-fs (device sda1): Cannot change data mode on remount. The filesystem is mounted in data=ordered mode and you try to remount it in data=journal mode.

The only way out and to continue boot-up is to manually remount the fs rw by:
mount -o remount,rw /dev/sda1 /
and remove the data=journal option from fstab.

Revision history for this message
Zippo (peter-henninger) wrote :

I don't understand anything of the shutdown procedure. I found this bug report because I recognized orphaned inodes in my / home partition after shutdown. (Ubuntu 10.04 on 32 and 64bit hardware)
My question is:
When this bug (orphaned inodes) will be fixed for the /-partition, will it also prevent orphaned inodes after shutdown in the /home partition??

Revision history for this message
ingo (ingo-steiner) wrote :

@Zippo,

it is really a sad story with LTS-Lucid. I really don't understand how such a buggy release could pass QC (if there is any?). I.e. today there came in an update of openSSH in Lucid, and still the old bug in the upstart-script '/etc/init/ssh.conf' (stop on runlevel S) is not fixed. Set it to (stop on runlevel [!2345]) - a hint from Clint.

In fact Scott has left behind a half done upstart - not only in Lucid. And Clint has taken the ungreatful challenge to clean up the biggest issues. Shutdown process had been neglected to a great part. Unfortunately Canonical is mainly focussing on Natty while Lucid just gets the fixes for the worst.

Revision history for this message
Scott James Remnant (scott) wrote :

ingo: that's the way the Ubuntu release process works, I'm sorry that
this is a surprise to you

On Mon, Jan 24, 2011 at 1:00 PM, ingo <email address hidden> wrote:
> @Zippo,
>
> it is really a sad story with LTS-Lucid. I really don't understand how
> such a buggy release could pass QC (if there is any?). I.e. today there
> came in an update of openSSH in Lucid, and still the old bug in the
> upstart-script '/etc/init/ssh.conf' (stop on runlevel S) is not fixed.
> Set it to (stop on runlevel [!2345]) - a hint from Clint.
>
> In fact Scott has left behind a half done upstart - not only in Lucid.
> And Clint has taken the ungreatful challenge to clean up the biggest
> issues. Shutdown process had been neglected to a great part.
> Unfortunately Canonical is mainly focussing on Natty while Lucid just
> gets the fixes for the worst.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/672177
>
> Title:
>  libc6 upgrade causes umount to fail on shutdown because init cannot be
>  restarted
>
> Status in Upstart:
>  Invalid
> Status in “eglibc” package in Ubuntu:
>  Fix Released
> Status in “sysvinit” package in Ubuntu:
>  New
> Status in “upstart” package in Ubuntu:
>  In Progress
> Status in “eglibc” source package in Lucid:
>  Fix Committed
> Status in “sysvinit” source package in Lucid:
>  New
> Status in “upstart” source package in Lucid:
>  Fix Committed
> Status in “eglibc” source package in Maverick:
>  Fix Committed
> Status in “sysvinit” source package in Maverick:
>  New
> Status in “upstart” source package in Maverick:
>  Fix Committed
> Status in “eglibc” source package in Natty:
>  Fix Released
> Status in “sysvinit” source package in Natty:
>  New
> Status in “upstart” source package in Natty:
>  In Progress
>
> Bug description:
>  On a clean install of Ubuntu 10.04.1, after upgrading the offer libc6
>  upgrade, on the next reboot the root fs can't be properly unmounted
>  (mount: / is busy). This causes fsck to run on boot and of course some
>  minor issues with the filesystem. This might not be a problem with
>  libc6 itself, but a side effect of upgrading in combination with some
>  other package (I suspect the init process, so I guess upstart).
>
>  The fsck run, and the orphaned inodes it finds are holding me back
>  from installing this on a new server - especially since this already
>  happens on a clean install of 10.04.1!
>
>  paul@ubuntu:~$ lsb_release -rd
>  Description:    Ubuntu 10.04.1 LTS
>  Release:        10.04
>
>  ii  libc6                           2.11.1-0ubuntu7.2
>  Embedded GNU C Library: Shared libraries
>
>  ProblemType: Bug
>  DistroRelease: Ubuntu 10.04
>  Package: libc6 2.11.1-0ubuntu7.2
>  ProcVersionSignature: Ubuntu 2.6.32-24.39-server 2.6.32.15+drm33.5
>  Uname: Linux 2.6.32-24-server x86_64
>  Architecture: amd64
>  Date: Sun Nov  7 16:17:07 2010
>  InstallationMedia: Ubuntu-Server 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.2)
>  ProcEnviron:
>   PATH=(custom, no user)
>   LANG=en_US.UTF-8
>   SHELL=/bin/bash
>  SourcePackage: eglibc
>
>
>

Revision history for this message
Alf Gaida (agaida) wrote :

@56: I dont think so. The release process is a Canonical "special feature". These processes are not written in stone and can be changed by the people working for Canonical. Maybe i should write: Must be redefined by the people working for Canonical.

James Hunt (jamesodhunt)
Changed in sysvinit (Ubuntu Maverick):
assignee: nobody → James Hunt (jamesodhunt)
status: New → In Progress
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Verification for Lucid for eglibc and upstart.

I've verified that the package upgrades correctly from a default Lucid installation and that after the installation the system reboots, that X and the network are working. If there are specific verifications to do, let me known.

Marking as verification-done.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Sun, 2011-01-23 at 12:39 +0000, ingo wrote:
> Observed just another oddity, probably a separate bug in mountall?
>
> I tried to mount the / filesystem (ext3) in journal mode to see if this improves the situation by adding the option to /etc/fstab:
> data=journal,erros=remount-ro
> But that results in boot process stalling with / filesystem mounted ro and just a console. dmesg | grep -i ext3 tells me:
>
> EXT3-fs: mounted filesystem with ordered data mode
> EXT3-fs (device sda1): Cannot change data mode on remount. The filesystem is mounted in data=ordered mode and you try to remount it in data=journal mode.
>
> The only way out and to continue boot-up is to manually remount the fs rw by:
> mount -o remount,rw /dev/sda1 /
> and remove the data=journal option from fstab.
>

Ingo, I believe changing the journalling of the root fs in this way is a
long standing issue with ext3/ext4, and has nothing to do with the libc6
issue. IIRC, it is only changeable by unmounting the fs, setting it, and
then mounting/unmounting again.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Sun, 2011-01-23 at 11:50 +0000, ingo wrote:
> Sorry, but I can't reproduce it now, getting still 4 orphaned inodes.
> Can I insert some lines to log which libs are still in use?
>

Yes, you can put this just after the remounts:

lsof -n | grep DEL
sleep 10

Which will give you 10 seconds to view anything that still has deleted
files open.

I recently found that sshd also may interfere w/ the remounting of root
in lucid (not as likely in maverick as it is stopped on runlevel [!
2345]) because it is not properly restarted on libc6 upgrade.

See bug #531912 for more info on that.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.11.1-0ubuntu7.8

---------------
eglibc (2.11.1-0ubuntu7.8) lucid-proposed; urgency=low

  [ Matthias Klose ]
  * Fix issue #12077, __strncmp_ssse3 can segfault when it over-reads
    its buffer. LP: #702190.

  [ Clint Byrum ]
  * do not run 'telinit u' on upgrade, as this will break upstart.
    touch /var/run/init.upgraded instead, which will force a re-exec just
    before remounting root read-only. LP: #672177, LP: #694772.
 -- Matthias Klose <email address hidden> Wed, 19 Jan 2011 03:06:52 +0100

Changed in eglibc (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 0.6.5-8

---------------
upstart (0.6.5-8) lucid-proposed; urgency=low

  * Re-add upstream r977 to allow proper re-exec on shutdown (LP: #672177)
  * debian/control: adding Breaks on eglibc version that disables
    telinit u to avoid accidentally installing a version of libc6 that
    will cause upstart to re-exec and lose its state.
 -- Clint Byrum <email address hidden> Fri, 21 Jan 2011 08:21:18 -0800

Changed in upstart (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 0.6.6-4

---------------
upstart (0.6.6-4) maverick-proposed; urgency=low

  * Re-add upstream r977 to allow proper re-exec on shutdown (LP: #672177)
  * debian/control: adding Breaks on eglibc version that disables
    telinit u to avoid accidentally installing a version of libc6 that
    will cause upstart to re-exec and lose its state.
 -- Clint Byrum <email address hidden> Wed, 29 Dec 2010 12:08:36 -0800

Changed in upstart (Ubuntu Maverick):
status: Fix Committed → Fix Released
Revision history for this message
ingo (ingo-steiner) wrote :

> you can put this just after the remounts:
> lsof -n | grep DEL
> sleep 10

*Lucid-amd64*
I did so (with new libc6 and upstart from Lucid-proposed installed) and found something new:
it's the NFS which makes trouble. Portmap and statd are still running!

(I just list the essential information here, as I took a screenshot, but could upload the *.png as well.)

init: Re-executing /sbin/init
mount: / is busy
portmap 445 daemon DEL REG 8,1 /lib/libnsl-2.11.1.so
       " /lib/libc-2.11.1.so
       " /lib/ld-2.11.1.so
rpc.statd 602 statd DEL REG 8,1 /lib/libnss_files-2.11.1.so
       " /lib/libnsl-2.11.1.so
       " /lib/libc-2.11.1.so
       " /lib/ld-2.11.1.so

Revision history for this message
ingo (ingo-steiner) wrote :

BTW: just some minutes ago nwe 'libc6' and 'upstart' was offered as an official update for Lucid.

Revision history for this message
ingo (ingo-steiner) wrote :

I now stopped portmap by '/etc/init.d/portmap stop' prior to executing
'apt-get install --reinstall libc6 && shutdown -r now'
and got just the messages
  init: Re-executing /sbin/init
  mount: / is busy

but no more output from 'lsof'

Changed in eglibc (Ubuntu Lucid):
assignee: nobody → Bobby A. Callender (bcallender)
Revision history for this message
ingo (ingo-steiner) wrote :

 and no more any orphaned inodes (with portmap stopped before)!

Congratulations, you are approaching the solution!

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Tue, 2011-02-01 at 16:17 +0000, ingo wrote:
> > you can put this just after the remounts:
> > lsof -n | grep DEL
> > sleep 10
>
> *Lucid-amd64*
> I did so (with new libc6 and upstart from Lucid-proposed installed) and found something new:
> it's the NFS which makes trouble. Portmap and statd are still running!
>
> (I just list the essential information here, as I took a screenshot, but
> could upload the *.png as well.)
>
> init: Re-executing /sbin/init
> mount: / is busy
> portmap 445 daemon DEL REG 8,1 /lib/libnsl-2.11.1.so
> " /lib/libc-2.11.1.so
> " /lib/ld-2.11.1.so
> rpc.statd 602 statd DEL REG 8,1 /lib/libnss_files-2.11.1.so
> " /lib/libnsl-2.11.1.so
> " /lib/libc-2.11.1.so
> " /lib/ld-2.11.1.so
>

Ingo, thanks for the rapid feedback!

I think this is actually a new bug against portmap caused by its
migration to upstart.

/etc/init.d/umountnfs.sh defines

# Should-Stop: $network $portmap nfs-common

And portmap actually does tell dh_installinit to stop after umountnfs:

dh_installinit --name portmap -- start 43 S 2 3 4 5 . start 32 0 6 . stop 81 1 .

But there is no way to codify that point in the shutdown into the
upstart job, which I suspect is why it has no stop on. I believe the
proper way to handle this is to have a matching event to
remote-filesystems , unmounted-remote-filesystems, with which to stop
on, and then have umountnfs.sh emit that. A more succinct method would
be to simply add

stop portmap 2>/dev/null || :

to umountnfs.sh

Either way, portmap should be responsible for shutting down at the right
moment, and so I've opened bug #711425 against portmap.

Revision history for this message
ingo (ingo-steiner) wrote :

> But there is no way to codify that point in the shutdown into the
> upstart job, which I suspect is why it has no stop on. I believe the
> proper way to handle this is to have a matching event to
> remote-filesystems , unmounted-remote-filesystems, with which to stop
> on, and then have umountnfs.sh emit that.

Thanks for opening the new bug, Clint.
On my machine portmap is started/running anyhow, also if no nfs share has been mounted ever. In Lucid I have just installed 'nfs-common' (not the kernel server!). It is only used when I temporarely mount some nfs3-exports from guests in VBox, or nfs4-exports from my nas (running Lenny-armel).

You need to install nfs-common also if you only want just the client - and that brings portmap and rpc.statd.
Amaizingly; stopping portmap also stops rpc.statd?

Another observation:
sshd cannot be stopped manually:

# /etc/init.d/ssh stop
 * Stopping OpenBSD Secure Shell server sshd [ OK ]

but 'daemon.log' tells me it is restarting immediately (and appears with new PID in process list):

Feb 1 21:20:20 localhost init: ssh main process (1882) terminated with status 255
Feb 1 21:20:20 localhost init: ssh main process ended, respawning

Revision history for this message
ingo (ingo-steiner) wrote :

I just checked:

portmap is not included in nfs-common, it's a separate package. But it is refereced by nfs-common as "required"!

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Tue, 2011-02-01 at 20:23 +0000, ingo wrote:
> > But there is no way to codify that point in the shutdown into the
> > upstart job, which I suspect is why it has no stop on. I believe the
> > proper way to handle this is to have a matching event to
> > remote-filesystems , unmounted-remote-filesystems, with which to stop
> > on, and then have umountnfs.sh emit that.
>
> Thanks for opening the new bug, Clint.
> On my machine portmap is started/running anyhow, also if no nfs share has been mounted ever. In Lucid I have just installed 'nfs-common' (not the kernel server!). It is only used when I temporarely mount some nfs3-exports from guests in VBox, or nfs4-exports from my nas (running Lenny-armel).
>
> You need to install nfs-common also if you only want just the client - and that brings portmap and rpc.statd.
> Amaizingly; stopping portmap also stops rpc.statd?
>

Yes, statd stops and starts with portmap...

from statd.conf:

start on (started portmap ON_BOOT=
          or (local-filesystems and started portmap ON_BOOT=y))
stop on stopping portmap

> Another observation:
> sshd cannot be stopped manually:
>
> # /etc/init.d/ssh stop
> * Stopping OpenBSD Secure Shell server sshd [ OK ]
>

This is bug #531912, which is awaiting review. The init.d script is only
retained for chroots. Use 'service ssh stop' until that bug is fixed.

Revision history for this message
ingo (ingo-steiner) wrote :

Puh, by and by I understand why Ubutu is blamed for use on servers.
Do you see any chance to get that mess cleaned up in Lucid?
(Squeeze is beeuing released soon)

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Tue, 2011-02-01 at 21:32 +0000, ingo wrote:
> Puh, by and by I understand why Ubutu is blamed for use on servers.
> Do you see any chance to get that mess cleaned up in Lucid?
> (Squeeze is beeuing released soon)
>

I don't know if I'd call it a mess at this point. The shutdown will
certainly be better in 10.04.2 (not quite released but quite frozen)
than it was in 10.04 and 10.04.1. I expect it will continue to improve
over the next couple of months. Its just an area that didn't receive
much testing during the development cycle.

The platform as a whole wouldn't be nearly as stable without testers
like yourself Ingo, so for that, I thank you. I hope you'll continue to
be patient with us and provide feedback as we move forward with more
fixes and some re-factoring.

Revision history for this message
Jimmy Merrild Krag (beruic) wrote :

Now (after a while) I've gotten through reading this bug.

I have just updated my server, it now runs 10.04.2. Does this mean it has no issues? It's a non-critical server I access remotely. should I care at all, or just wait for things to get "even better"?

On my laptop I see upstart as upgradeable, but the required version of libc6" for that package is not available. Again, should I just wait and see, or do you recommend that I upgrade from proposed?

Revision history for this message
Jimmy Merrild Krag (beruic) wrote :

Forgot: My laptop runs Maverick

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Tue, 2011-02-01 at 23:06 +0000, Jimmy Merrild Krag wrote:
> Now (after a while) I've gotten through reading this bug.
>
> I have just updated my server, it now runs 10.04.2. Does this mean it
> has no issues? It's a non-critical server I access remotely. should I
> care at all, or just wait for things to get "even better"?
>

So the good news about 10.04.2's shutdown/reboot is that now that
upstart can re-exec itself again, further issues should be solvable
without rebooting.

The bad news is there's quite a few updates that need to be done to
protect filesystems on shutdown/reboot:

* umountroot needs to wait for init to re-exec itself - addressed in
this bug itself, James Hunt is working on that at the moment.
* portmap needs to stop after umountnfs - bug #711425
* ssh needs to be restarted on libc6 upgrades - bug #531912
* umountfs needs to wait for all stopping services to stop. bug #616287

> On my laptop I see upstart as upgradeable, but the required version of
> libc6" for that package is not available. Again, should I just wait and
> see, or do you recommend that I upgrade from proposed?
>

Looks like eglibc went into maverick-proposed on 1/21 but has not been
verified yet. See comment #46 from Colin Watson. Looks like the Breaks:
field that was added is actually working to keep peoples' systems from..
well.. breaking. :)

Jimmy it would be great if you could enable proposed for your laptop and
verify that the upgrade works.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.12.1-0ubuntu10.2

---------------
eglibc (2.12.1-0ubuntu10.2) maverick-proposed; urgency=low

  [ Clint Byrum ]
  * do not run 'telinit u' on upgrade, as this will break upstart.
    touch /var/run/init.upgraded instead, which will force a re-exec just
    before remounting root read-only. LP: #672177, LP: #694772.

  [ Matthias Klose ]
  * Call locale-gen --purge when updating from eglibc-2.11.x. LP: #504198.
 -- Matthias Klose <email address hidden> Wed, 19 Jan 2011 03:12:11 +0100

Changed in eglibc (Ubuntu Maverick):
status: Fix Committed → Fix Released
Revision history for this message
Tero Tilus (tero-tilus) wrote :

Would bug #711601 be caused by the Breaks: field mentioned by Clint?

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

I'm glad to see this is moving forward! Thanks Clint, Ingo and the rest of course.

Revision history for this message
ingo (ingo-steiner) wrote :

Hi Clint,

me just came up an idea how we could make this bug hunting more systematic and thoroughly. Up to now we just found some conditions in which shutdown process fails (sshd and portmap). Who tells us that there aren't more daemons which get hardly killed upon shutdown/reboot?

So, if for tracing we get a patched '/etc/init.d/umountroot' which does following:

on shutdown when the messages
    init: Re-executing /sbin/init
    mount: / is busy
are issued it indicates that / is not yet umounted. So one could collect the open files (lsof) and write the information to disk. Then wait 5 sec.? untuil the filesystem has commited the journal to disk, try umount again and, if regular umount fails again just do *not* hard restet, but keep system in that state (user can do a cold reboot/shutdown manually).

Upon next boot-up the content of the saved processes could be read out and attached to this bug report.

This way it should be possible to also detect other daemons not so commonly used, which need their start/stop scrips to be fixed. Re-installing libc6 appears to be a very sensitive trigger to make teses oddities visible.

Revision history for this message
James Hunt (jamesodhunt) wrote :

Attached is a patch to sysvinit (from which the initscripts binary package is generated) which appears to fix the immediate problem.

The patch waits for up to 5 seconds for init (upstart) to re-exec. If after this time init has not re-execed, we continue to unmount as before. (In testing, init re-spawns in under 1 second).

Tested on lucid, maverick and natty in combination with upstart package built from lp:~clint-fewbar/ubuntu/natty/upstart/restore-re-exec-code. This patch also requires the updated eglibc (which creates /var/run/init.upgraded rather than calling "telinit u"), however for testing the patches, simply running "sudo touch /var/run/init.upgraded" will trigger the new code.

As noted by Clint, you may still observe problems if you have certain other services running (such as sshd; see bug #531912).

Changed in sysvinit (Ubuntu Lucid):
assignee: nobody → James Hunt (jamesodhunt)
status: New → In Progress
Changed in sysvinit (Ubuntu Natty):
assignee: nobody → James Hunt (jamesodhunt)
status: New → In Progress
Revision history for this message
James Hunt (jamesodhunt) wrote :

Equivalent sysvinit patch for Maverick.

Revision history for this message
James Hunt (jamesodhunt) wrote :

Equivalent sysvinit patch for Natty.

Revision history for this message
Jimmy Merrild Krag (beruic) wrote :

First: Many great thanks to you all for your work.

Now, the updates just got released to Maverick.
How do I verify that it works?

Revision history for this message
ingo (ingo-steiner) wrote :

James,

did you already upload to proposed (so we'll get binaries tomorrow)?
If not, could you be so kind and compile a 'Lucid-amd64 binary' for testing here?

Revision history for this message
ingo (ingo-steiner) wrote :

I just saw it is a addition to 'umountroot', so I could copy+past form your diff - right?

Revision history for this message
ingo (ingo-steiner) wrote :

@ Jimmy

> How do I verify that it works?

when system is up and running, issue following command as root and confirm the install when requested:
   'apt-get install --reinstall libc6 && shutdown -r now'
after system has rebooted, check for orphaned inodes with this command (works as user):
  'dmesg | grep orphan'
and it tells you whether shutdown was clean (no output) or forced (several orphaned inodes).

Revision history for this message
ingo (ingo-steiner) wrote :

@ James,

I verified your patch in Lucid-amd64 (by manually inserting your lines of code) and can confirm: *it works*!
(apart from the portmap and rpc.statd issue, whuich still persists of course).

I used Clints proposal inserting these 2 lines after the mount commands:
    /usr/bin/lsof -n | grep DEL
    sleep 15
(remark: lsof must be given with full path to be found).

Now, when portmap is stopped manually before, there is absolutely no more "mount: / is busy" as without your patch and shutdown is clean! I did many tests with and without your patch - appears to be absolutely reliable.

Congratulations!

Hope this patch will be inclusded into 10.0.4.2.

What now is missing: checking all the daemons which might have a buggy start/stop script.

Revision history for this message
ingo (ingo-steiner) wrote :

Additional remark: I did not observe any noticeble delay of shutdown with the patch.

Revision history for this message
ingo (ingo-steiner) wrote :

Addition (works for me):

    /usr/bin/lsof -n | grep DEL > /lsof.out 2>&1
    sleep 15

greps all the open files/libs - if there are any - and writes the information to disk.
If there are none one just gets an error message that "could not write to ro filesystem".

Revision history for this message
ingo (ingo-steiner) wrote :

I now de-installed portmap+nfs-common and did install openssh-server:

*Only when* I replace the line in /etc/init/ssh.conf by:
   stop on runlevel [!2345]
all is fine with James' patch included!

With the original line
   stop on runlevel S
I get a lot of open files, like this:

sshd 1385 root DEL REG 8,1 185304 /lib/libnss_files-2.11.1.so
sshd 1385 root DEL REG 8,1 185306 /lib/libnss_nis-2.11.1.so
sshd 1385 root DEL REG 8,1 185162 /lib/libnss_compat-2.11.1.so.dpkg-new
sshd 1385 root DEL REG 8,1 185310 /lib/libpthread-2.11.1.so
sshd 1385 root DEL REG 8,1 185369 /lib/libresolv-2.11.1.so
sshd 1385 root DEL REG 8,1 185138 /lib/libdl-2.11.1.so
sshd 1385 root DEL REG 8,1 185156 /lib/libnsl-2.11.1.so
sshd 1385 root DEL REG 8,1 185083 /lib/libc-2.11.1.so
sshd 1385 root DEL REG 8,1 185087 /lib/libcrypt-2.11.1.so
sshd 1385 root DEL REG 8,1 185373 /lib/libutil-2.11.1.so
sshd 1385 root DEL REG 8,1 184782 /lib/ld-2.11.1.so

Insn't that a fast interim fix which can be done by "100 papercuts" so it comes in 10.04.2?
(sshd is one of the most widely used daemons) would be great!

Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 672177] Re: libc6 upgrade causes umount to fail on shutdown because init cannot be restarted

Bad news there: I'm afraid that we've already had to entirely freeze
updates for 10.04.2 in order that we can get certification done in time;
unfortunately that's a rather time-consuming process and needs a couple
of weeks of clearance. I expect that 10.04.3 won't be a problem,
though.

Revision history for this message
ingo (ingo-steiner) wrote :

> ... freeze updates for 10.04.2 in order that we can get certification done

That means we get certified BUGS - and that 10 months after release!

Revision history for this message
Colin Watson (cjwatson) wrote :

I'm afraid there's no point railing about it here - we're already
committed to the date and it would be a colossal rearrangement of many
people's schedules to change it at this point. As I say, sorry, and we
should be able to get this nailed down for .3.

Revision history for this message
Scott James Remnant (scott) wrote : Re: [Bug 672177] Re: libc6 upgrade causes umount to fail on shutdown because init cannot be restarted

At least certified bugs can be documented, with certified workarounds.

On Wed, Feb 2, 2011 at 2:05 PM, ingo <email address hidden> wrote:
>> ... freeze updates for 10.04.2 in order that we can get certification
> done
>
> That means we get certified BUGS - and that 10 months after release!
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/672177
>
> Title:
>  libc6 upgrade causes umount to fail on shutdown because init cannot be
>  restarted
>
> Status in Upstart:
>  Invalid
> Status in “eglibc” package in Ubuntu:
>  Fix Released
> Status in “sysvinit” package in Ubuntu:
>  In Progress
> Status in “upstart” package in Ubuntu:
>  In Progress
> Status in “eglibc” source package in Lucid:
>  Fix Released
> Status in “sysvinit” source package in Lucid:
>  In Progress
> Status in “upstart” source package in Lucid:
>  Fix Released
> Status in “eglibc” source package in Maverick:
>  Fix Released
> Status in “sysvinit” source package in Maverick:
>  In Progress
> Status in “upstart” source package in Maverick:
>  Fix Released
> Status in “eglibc” source package in Natty:
>  Fix Released
> Status in “sysvinit” source package in Natty:
>  In Progress
> Status in “upstart” source package in Natty:
>  In Progress
>
> Bug description:
>  On a clean install of Ubuntu 10.04.1, after upgrading the offer libc6
>  upgrade, on the next reboot the root fs can't be properly unmounted
>  (mount: / is busy). This causes fsck to run on boot and of course some
>  minor issues with the filesystem. This might not be a problem with
>  libc6 itself, but a side effect of upgrading in combination with some
>  other package (I suspect the init process, so I guess upstart).
>
>  The fsck run, and the orphaned inodes it finds are holding me back
>  from installing this on a new server - especially since this already
>  happens on a clean install of 10.04.1!
>
>  paul@ubuntu:~$ lsb_release -rd
>  Description:    Ubuntu 10.04.1 LTS
>  Release:        10.04
>
>  ii  libc6                           2.11.1-0ubuntu7.2
>  Embedded GNU C Library: Shared libraries
>
>  ProblemType: Bug
>  DistroRelease: Ubuntu 10.04
>  Package: libc6 2.11.1-0ubuntu7.2
>  ProcVersionSignature: Ubuntu 2.6.32-24.39-server 2.6.32.15+drm33.5
>  Uname: Linux 2.6.32-24-server x86_64
>  Architecture: amd64
>  Date: Sun Nov  7 16:17:07 2010
>  InstallationMedia: Ubuntu-Server 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.2)
>  ProcEnviron:
>   PATH=(custom, no user)
>   LANG=en_US.UTF-8
>   SHELL=/bin/bash
>  SourcePackage: eglibc
>
>
>

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 0.6.7-7

---------------
upstart (0.6.7-7) natty; urgency=low

  * Re-add upstream r977 to allow proper re-exec on shutdown (LP: #672177)
  * debian/control: adding Breaks on eglibc version that disables
    telinit u to avoid accidentally installing a version of libc6 that
    will cause upstart to re-exec and lose its state.
 -- Clint Byrum <email address hidden> Fri, 21 Jan 2011 08:39:13 -0800

Changed in upstart (Ubuntu Natty):
status: In Progress → Fix Released
Revision history for this message
Zippo (peter-henninger) wrote :

Will this bug also be fixed for lucid the LTS version?

Revision history for this message
Paul Crawford (psc-sat) wrote :

Well it is not yet fixed for 10.04 with the 'proposed' updates. Tonight just rebooted after updates to kernal 2.6.32-29 and guess what? Yes, my syslog contained the following sort of message:

"Feb 15 21:45:24 paul-ubuntu kernel: [ 2.341704] EXT4-fs (sda5): 4 orphan inodes deleted"

So is 'proposed' still part of the 10.04.2 to be released, or will the fix come soon as mentioned for the 10.04.3 CD?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Tue, 2011-02-15 at 21:51 +0000, Paul Crawford wrote:
> Well it is not yet fixed for 10.04 with the 'proposed' updates. Tonight
> just rebooted after updates to kernal 2.6.32-29 and guess what? Yes, my
> syslog contained the following sort of message:
>
> "Feb 15 21:45:24 paul-ubuntu kernel: [ 2.341704] EXT4-fs (sda5): 4
> orphan inodes deleted"
>
> So is 'proposed' still part of the 10.04.2 to be released, or will the
> fix come soon as mentioned for the 10.04.3 CD?
>

Paul, I'm sorry you're still having issues.

This is all covered in the previous comments, but to summarize:

There is still a pending change to sysvinit to make sure the umounts
wait for upstart to re-exec itself. There are also a couple more bugs
covering daemons that need to be shutdown, namely, sshd and portmap.

10.04.3 should have all of these fixes, and they should be available
soon as updates to 10.04.* as well.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I'm targetting the sysvinit bug task for lucid to 10.04.3, but it should be done well before then.

Changed in sysvinit (Ubuntu Lucid):
importance: Undecided → Critical
milestone: none → ubuntu-10.04.3
Revision history for this message
ingo (ingo-steiner) wrote :

On 16.02.2011 00:25, Clint Byrum wrote:
> There are also a couple more bugs
> covering daemons that need to be shutdown, namely, sshd and portmap.

Clint,
the portmap Bug #711425 you filed has not got any attention yet, it's
still undecided and unassigned - does nobody care?

Changed in sysvinit (Ubuntu Natty):
importance: Undecided → High
milestone: none → natty-alpha-3
Revision history for this message
Martin Pitt (pitti) wrote :

This is a bug fix, can happen after FF, and isn't an a3 release blocker.

Changed in sysvinit (Ubuntu Natty):
milestone: natty-alpha-3 → none
Revision history for this message
Colin Watson (cjwatson) wrote :

James' patch appears to have been applied in natty now, so I'm closing this bug task. Please reopen if this was incorrect.

sysvinit (2.87dsf-4ubuntu20) natty; urgency=low

  [ Michael Vogt ]
  * debian/patches/100_fix_ftbfs_enoioctlcmd.patch:
    - cherry pick upstream fix for missing ENOIOCTLCMD, this
      fixes a FTBFS

  [ James Hunt ]
  * debian/initscripts/etc/init.d/umountroot: Improve handling of
    respawn of init: we now wait for inits map file to change. If this
    doesn't happen within 5 seconds, we unmount forcibly.

 -- Michael Vogt <email address hidden> Fri, 04 Mar 2011 10:38:34 +0100

Changed in sysvinit (Ubuntu Natty):
status: In Progress → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Some nitpicks about the patch:

 - patch ordering in series
 - target is {lucid,maverick}-proposed
 - changelog missing bug references.
 - version 2.87dsf-4ubuntu18 already exists in maverick, can't be used for lucid-updates
 - maverick debdiff doesn't apply, as it wasn't done against -updates.

I corrected these and uploaded to l/m-proposed. Unsubscribing sponsors.

Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted sysvinit into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in sysvinit (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: removed: verification-done
tags: added: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted sysvinit into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in sysvinit (Ubuntu Maverick):
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

Any testers? This is blocking another SRU right now.

Revision history for this message
NoOp (glgxg) wrote :

Maverick:
$ dmesg | grep orphan
[ 7.716966] EXT4-fs (sda5): orphan cleanup on readonly fs
[ 7.716978] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899558
[ 7.717070] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899556
[ 7.717087] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899555
[ 7.717104] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899553
[ 7.717119] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899551
[ 7.717135] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899550
[ 7.717153] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899549
[ 7.747616] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899548
[ 7.747639] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899545
[ 7.747655] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899544
[ 7.747673] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899543
[ 7.747687] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899542
[ 7.756193] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899540
[ 7.756209] EXT4-fs (sda5): 13 orphan inodes deleted

$ apt-cache policy sysvinit-utils
sysvinit-utils:
  Installed: 2.87dsf-4ubuntu19.1
  Candidate: 2.87dsf-4ubuntu19.1
  Version table:
 *** 2.87dsf-4ubuntu19.1 0
        500 http://archive.ubuntu.com/ubuntu/ maverick-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     2.87dsf-4ubuntu19 0
        500 http://archive.ubuntu.com/ubuntu/ maverick-updates/main amd64 Packages
     2.87dsf-4ubuntu18 0
        500 http://archive.ubuntu.com/ubuntu/ maverick/main amd64 Packages

I'll test natty next. It will be an hour or so before I can get to lucid.

Revision history for this message
NoOp (glgxg) wrote :

natty:
$ dmesg | grep orphan
$ apt-cache policy sysvinit-utils
sysvinit-utils:
  Installed: 2.87dsf-4ubuntu23
  Candidate: 2.87dsf-4ubuntu23
  Version table:
 *** 2.87dsf-4ubuntu23 0
        500 http://archive.ubuntu.com/ubuntu/ natty/main amd64 Packages
        100 /var/lib/dpkg/status

Revision history for this message
NoOp (glgxg) wrote :

lucid:
$ dmesg | grep orphan
$ apt-cache policy sysvinit-utils
sysvinit-utils:
  Installed: 2.87dsf-4ubuntu17.1
  Candidate: 2.87dsf-4ubuntu17.1
  Version table:
 *** 2.87dsf-4ubuntu17.1 0
        500 http://archive.ubuntu.com/ubuntu/ lucid-proposed/main Packages
        100 /var/lib/dpkg/status
     2.87dsf-4ubuntu17 0
        500 http://archive.ubuntu.com/ubuntu/ lucid/main Packages

Reran the test on maverick:
$ sudo -i
# apt-get install --reinstall libc6 && shutdown -r now
  on reboot:
$ dmesg | grep orphan
Same results as in comment #108 (EXT4-fs (sda5): 13 orphan inodes deleted). Note: also on the switch from natty (/dev/sda7) back to maverick, maverick (dev/sda5) ran an fsck on boot before starting gdm.

Revision history for this message
NoOp (glgxg) wrote :

@Martin: checking maverick on another system:

$ dmesg | grep orphan
[ 13.879321] EXT4-fs (sda1): orphan cleanup on readonly fs
[ 13.879356] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 263685
[ 13.879661] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 263665
[ 13.879850] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 263631
[ 13.926435] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 263583
[ 13.926653] EXT4-fs (sda1): 4 orphan inodes deleted

$ apt-cache policy sysvinit-utils
sysvinit-utils:
  Installed: 2.87dsf-4ubuntu19.1
  Candidate: 2.87dsf-4ubuntu19.1
  Version table:
 *** 2.87dsf-4ubuntu19.1 0
        500 http://archive.ubuntu.com/ubuntu/ maverick-proposed/main i386 Packages
        100 /var/lib/dpkg/status
     2.87dsf-4ubuntu19 0
        500 http://archive.ubuntu.com/ubuntu/ maverick-updates/main i386 Packages
     2.87dsf-4ubuntu18 0
        500 http://archive.ubuntu.com/ubuntu/ maverick/main i386 Packages

Let me know if you want anything else checked.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

NoOp, thanks so much for doing these tests!

There are a couple of other bugs that cause this to happen if you have some other services
installed. Some of these are fixed in uploads waiting on this SRU, and some in yet to be
SRU'd fixes (which is why natty seems to always work).

Specifically, if you have portmap installed, that will cause issues. If you do have portmap,
try stopping it before the reboot.

Also its clear that the shutdown isn't broken by this, so I am marking it verification-done.

tags: added: verification-done
removed: verification-needed
Revision history for this message
NoOp (glgxg) wrote :

Clint, I did have portmap running & stopped prior on this test:
$ sudo service portmap stop
$ sudo service portmap status
portmap stop/waiting
$ sudo -i
# service portmap status
portmap stop/waiting
# apt-get install --reinstall libc6 && shutdown -r now
 on reboot
$ dmesg | grep orphan
again "EXT4-fs (sda5): 13 orphan inodes deleted"

So on the test maverick sytem (that's the one with EXT4-fs (sda1): 4 orphan inodes deleted from comment #111):
$ sudo apt-get purge portmap
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libnfsidmap2 librpcsecgss3 libgssglue1
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
  nfs-common* nfs-kernel-server* portmap*
0 upgraded, 0 newly installed, 3 to remove and 0 not upgraded.
After this operation, 1,245kB disk space will be freed.
Do you want to continue [Y/n]? y
(Reading database ... 159103 files and directories currently installed.)
Removing nfs-kernel-server ...
 * Stopping NFS kernel daemon [ OK ]
 * Unexporting directories for NFS kernel daemon... [ OK ]
Purging configuration files for nfs-kernel-server ...
Removing nfs-common ...
stop: Unknown instance:
stop: Unknown instance:
statd stop/waiting
Purging configuration files for nfs-common ...
dpkg-statoverride: warning: No override present.
dpkg: warning: while removing nfs-common, directory '/var/lib/nfs' not empty so not removed.
Removing portmap ...
portmap stop/waiting
Purging configuration files for portmap ...
Processing triggers for man-db ...
Processing triggers for ureadahead ...
ureadahead will be reprofiled on next reboot

Rebooted & then repeated the test, only this time I also check "$ dmesg | grep orphan" before running the test:
$ dmesg | grep orphan
$
Now run the test:
$ dmesg | grep orphan
$

Would you like me to reinstall nfs-common* nfs-kernel-server* portmap* and test again?

Revision history for this message
NoOp (glgxg) wrote :

Booted to lucid (the one from comment #110 that showed no errors. Install portmap & repeated the test on that machine:

$ dmesg | grep orphan
[ 3.676128] EXT4-fs (sdb1): orphan cleanup on readonly fs
[ 3.676159] EXT4-fs (sdb1): ext4_orphan_cleanup: deleting unreferenced inode 193222
[ 3.676396] EXT4-fs (sdb1): 1 orphan inode deleted

So that machine is now showing an orphan.

lucid@lucid-desktop:~$ apt-cache policy portmap
portmap:
  Installed: 6.0.0-1ubuntu2.1
  Candidate: 6.0.0-1ubuntu2.1
  Version table:
 *** 6.0.0-1ubuntu2.1 0
        500 http://archive.ubuntu.com/ubuntu/ lucid-updates/main Packages
        100 /var/lib/dpkg/status
     6.0.0-1ubuntu2 0
        500 http://archive.ubuntu.com/ubuntu/ lucid/main Packages

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 672177] Re: libc6 upgrade causes umount to fail on shutdown because init cannot be restarted

Excerpts from NoOp's message of Thu Apr 14 21:43:19 UTC 2011:
> Rebooted & then repeated the test, only this time I also check "$ dmesg | grep orphan" before running the test:
> $ dmesg | grep orphan
> $
> Now run the test:
> $ dmesg | grep orphan
> $
>
> Would you like me to reinstall nfs-common* nfs-kernel-server* portmap*
> and test again?

No, thanks though. I think we have the info we need. There are already
open bug reports about other things being left running as of shutdown,
this one is specifically addressing init still running, which I think
we can say you've shown, its not.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.87dsf-4ubuntu17.1

---------------
sysvinit (2.87dsf-4ubuntu17.1) lucid-proposed; urgency=low

  * debian/initscripts/etc/init.d/umountroot: Improve handling of
    respawn of init: we now wait for inits map file to change. If this
    doesn't happen within 5 seconds, we unmount forcibly. (LP: #672177)
 -- James Hunt <email address hidden> Fri, 28 Jan 2011 15:33:50 +0000

Changed in sysvinit (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.87dsf-4ubuntu19.1

---------------
sysvinit (2.87dsf-4ubuntu19.1) maverick-proposed; urgency=low

  * debian/initscripts/etc/init.d/umountroot: Improve handling of
    respawn of init: we now wait for inits map file to change. If this doesn't
    happen within 5 seconds, we unmount forcibly. (LP: #672177)
 -- James Hunt <email address hidden> Fri, 28 Jan 2011 11:45:35 +0000

Changed in sysvinit (Ubuntu Maverick):
status: Fix Committed → Fix Released
Revision history for this message
blitzter47 (blitzter47) wrote :

I'm new in Ubuntu (10.10) and in Lauchpad, and I have this bug. I don't understand how you fix it, as there is fix released. Can someone explain me, please?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Excerpts from chtnh's message of Sat Jul 23 04:41:01 UTC 2011:
> I'm new in Ubuntu (10.10) and in Lauchpad, and I have this bug. I don't
> understand how you fix it, as there is fix released. Can someone explain
> me, please?
>

There are other bugs that sometimes cause an unclean shutdown, this one
is pretty well understood and fixed. Maybe you have some other services,
like mysql, or portmap, that are causing this problem.

Revision history for this message
NoOp (glgxg) wrote :

@Clint: it's back. On Natty 11.04 I am getting unclean inodes again. I notice a single inode in dmesg after every boot:
$ cat /var/log/dmesg.0 | grep orphan
[ 4.470372] EXT4-fs (sda1): orphan cleanup on readonly fs
[ 4.470443] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 2228229
[ 4.470538] EXT4-fs (sda1): 1 orphan inode deleted

So I ran the test:

$ sudo -i
# apt-get install --reinstall libc6 && shutdown -r now

 on reboot:
$ dmesg | grep orphan
[ 13.408854] EXT4-fs (sda1): orphan cleanup on readonly fs
[ 13.408928] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541072
[ 13.409078] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541058
[ 13.409108] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541055
[ 13.409138] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541048
[ 13.409161] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541045
[ 13.409187] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541038
[ 13.409210] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541032
[ 13.409233] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541029
[ 13.409255] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541028
[ 13.409279] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541012
[ 13.409312] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541007
[ 13.409336] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 540992
[ 13.409361] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 540791
[ 13.438394] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 540772
[ 13.438427] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 2228229
[ 13.438491] EXT4-fs (sda1): 15 orphan inodes deleted

Note: no portmap installed.

$ apt-cache policy sysvinit-utils
sysvinit-utils:
  Installed: 2.87dsf-4ubuntu23.1
  Candidate: 2.87dsf-4ubuntu23.1
  Version table:
 *** 2.87dsf-4ubuntu23.1 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386 Packages
        100 /var/lib/dpkg/status
     2.87dsf-4ubuntu23 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages

Linux <> 2.6.38-13-generic #53-Ubuntu SMP Mon Nov 28 19:23:39 UTC 2011 i686 i686 i386 GNU/Linux

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

The best way to try and figure out what is causing this is to modify /etc/init.d/umountroot and right before the umount's, add

/usr/bin/lsof -n > /saved.root.lsof
sync

This will save a listing of all opened files and which processes have them open. Things marked as 'deleted' in this list are generally the problem. If you see libc6.so opened by upstart, then this bug has regressed. Otherwise, it is probably something else.

Revision history for this message
NoOp (glgxg) wrote :

On 12/20/2011 09:47 AM, Clint Byrum wrote:
> The best way to try and figure out what is causing this is to modify
> /etc/init.d/umountroot and right before the umount's, add
>
> /usr/bin/lsof -n > /saved.root.lsof
> sync
>
> This will save a listing of all opened files and which processes have
> them open. Things marked as 'deleted' in this list are generally the
> problem. If you see libc6.so opened by upstart, then this bug has
> regressed. Otherwise, it is probably something else.
>

Thanks. I'll test it later today.

Revision history for this message
NoOp (glgxg) wrote :

Clint, can you give me the exact location for
/usr/bin/lsof -n > /saved.root.lsof
sync
in /etc/init.d/umountroot? So far I not been successful in creating '/saved.root.lsof'
Thanks.

Revision history for this message
NoOp (glgxg) wrote :

Nevermind. Got it working.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.