libc6 upgrade causes umount to fail on shutdown because init cannot be restarted

Bug #672177 reported by Paul van Berlo
174
This bug affects 30 people
Affects Status Importance Assigned to Milestone
upstart
Invalid
Undecided
Unassigned
eglibc (Ubuntu)
Fix Released
Critical
Unassigned
Lucid
Fix Released
Undecided
Bobby A. Callender
Maverick
Fix Released
Undecided
Unassigned
Natty
Fix Released
Critical
Unassigned
sysvinit (Ubuntu)
Fix Released
High
James Hunt
Lucid
Fix Released
Critical
James Hunt
Maverick
Fix Released
Undecided
James Hunt
Natty
Fix Released
High
James Hunt
upstart (Ubuntu)
Fix Released
Critical
Clint Byrum
Lucid
Fix Released
Undecided
Unassigned
Maverick
Fix Released
Undecided
Unassigned
Natty
Fix Released
Critical
Clint Byrum

Bug Description

On a clean install of Ubuntu 10.04.1, after upgrading the offer libc6 upgrade, on the next reboot the root fs can't be properly unmounted (mount: / is busy). This causes fsck to run on boot and of course some minor issues with the filesystem. This might not be a problem with libc6 itself, but a side effect of upgrading in combination with some other package (I suspect the init process, so I guess upstart).

The fsck run, and the orphaned inodes it finds are holding me back from installing this on a new server - especially since this already happens on a clean install of 10.04.1!

paul@ubuntu:~$ lsb_release -rd
Description: Ubuntu 10.04.1 LTS
Release: 10.04

ii libc6 2.11.1-0ubuntu7.2 Embedded GNU C Library: Shared libraries

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: libc6 2.11.1-0ubuntu7.2
ProcVersionSignature: Ubuntu 2.6.32-24.39-server 2.6.32.15+drm33.5
Uname: Linux 2.6.32-24-server x86_64
Architecture: amd64
Date: Sun Nov 7 16:17:07 2010
InstallationMedia: Ubuntu-Server 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.2)
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: eglibc

Revision history for this message
Paul van Berlo (pvanberlo) wrote :
Revision history for this message
Paul van Berlo (pvanberlo) wrote :

Also - this only happens once after upgrading, after that, reboots work just fine without causing mount to fail on trying to unmount a busy root filesystem.

Revision history for this message
Victor Vargas (kamus) wrote :

I have reassigned this issue to libc6 (eglibc) package for you

affects: ubuntu → eglibc (Ubuntu)
Revision history for this message
Paul van Berlo (pvanberlo) wrote :

Thank you. It appears that this is somewhat similar to what happens in a very old bug (188925). Not sure if this is some kind of regression or if this is 'how things are supposed to be'. On reboot lsof only shows init being in use, with some of the libraries (incl. libc6). So right now I can only believe that the update of libc6 is causing this issue. I also tried restarting upstart (telinit u), but that apparently either doesn't work or it does work but doesn't solve the issue.

Revision history for this message
nerdistmonk (nerdistmonk) wrote :

Its effecting me on ubuntu 10.10 I386, I installed a command line system, ran a routine update, and behold it doesnt umount the root. I have seen this bug in action since version 7.04, dont know why it hasnt been fixed.

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

If this is 'how it should be'/accepted behavior, then something is obviously wrong. Having filesystem issues right after a clean install due to some libc upgrade is not acceptable.

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

It appears Bug #616287 is related to this.

Revision history for this message
ingo (ingo-steiner) wrote :

I just reproduced it and can confirm, see here: https://bugs.launchpad.net/ubuntu/+source/mountall/+bug/616287/comments/50

If further data collection is required, pleas advise.

Changed in eglibc (Ubuntu):
status: New → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Setting Importance to Critical, as this causes FS corruption, and potentially could affect all users.

Changed in eglibc (Ubuntu):
importance: Undecided → High
importance: High → Critical
Revision history for this message
ingo (ingo-steiner) wrote :

I'll preserve the snapshot of my Lucid-VM with update of libc pending. So I am prepared to provide additional information if required and to test the fix - which hopefully will come soon.

Revision history for this message
ingo (ingo-steiner) wrote :

Everybody can reproduce the bug any time in Lucid, Maverick, no matter whether i386 or amd64 by just doing:

apt-get install --reinstall libc6

and reboot. This causes the "orphaned inodes" upon shutdown (verified by examining the filesystem after shutdown).
This bug is Ubuntu specific, the verry same with Debian-Squeeze never will lead to a corrupted filesystem.

Revision history for this message
ingo (ingo-steiner) wrote :

This even happens in the recovery mode (root shell without network) by just executing:

apt-get install --reinstall libc6 && shutdown -r now

It appears that we got a sports car with utmost accelleration, but the designer has forgotten to equip it with proper breaks to stop it without damage. (just redirecting the fsck error-message to /dev/null won't be a fix)

Revision history for this message
ingo (ingo-steiner) wrote :

just execute on any version of Lucid or Maverick to corrupt filesystem:

apt-get install --reinstall libc6 && shutdown -r now

Changed in upstart (Ubuntu):
status: New → Confirmed
Revision history for this message
ingo (ingo-steiner) wrote :

I double-checked with Debian-Squeeze: all ok there (libc6 2.11.2-7)!

So it is definitely Ubuntu specific and happens in LTS-Lucid and Maverick.

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

Debian doesn't do upstart yet I believe. Since upstart is the only process left which keeps references to the old libc, that's probably causing the issue. There is no apparent and proper way to restart upstart after a libc upgrade.

Revision history for this message
ingo (ingo-steiner) wrote :

If that proves true, Ubuntu revives the historical UNIX-slogan (slightly adopted):

"Sure It Corrupts Your Files, But Look How Fast It Boots!"

Maybe that's also the reason why Scott refused to fix https://bugs.launchpad.net/bugs/568594, which causes problems if the /-filesystem is located on a remote drive?

Revision history for this message
Alf Gaida (agaida) wrote :

Among with other issues in a clean install of 10.10 i run into this libc6-sh**. It's not amusing. When this happens in front of a customer, what will you say about this behavior? "Ok, i've warned you but it's your descision not to take debian?" Add this to the bug ingo has filed month ago - a clean solution at the moment is to sell only debian servers and aptosid workstations because there are stable.

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

Bug 348346 also talks about how a libc upgrade should restart init to unload old libc to avoid remount root to read only issues. It appears to not work though, and seeing the non-response from the package maintainers (even after being marked critical), I have my doubts this will be fixed anytime soon. Too bad.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Paul, and others, this is being actively looked into. There are a number of bugs in the shutdown process, some of which are mentioned above.

Please also, temper the "corrupts your files" language. Orphaned inodes from already deleted files are not exactly corrupted files. Yes this issue is critical because it is preventing one of the critical steps of shutdown from completing correctly, and forcing a filesystem check on reboot. But as of yet, nobody has shown actual corruption due to this bug (though other similar bugs which prevent unmounting filesystems may in fact cause that).

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

Clint,

I agree up to a certain point:

1) There is no obvious proof that anyone is doing anything with this, as some of the other bugs mentioning this issue are over a year old. I believe you, but some more active overview on what is being done would be appreciated.

2) Although it is not necessary 'corruption', the fs was still unmounted while busy. Whatever we call it - it's not what is supposed to happen and I'm sure many people get scared away when that happens, especially on a clean install.

Just my two cents...

Revision history for this message
ingo (ingo-steiner) wrote :

I just did another probably dirty experiment, so don't blame me. I just want to report the results:

1. in Maverick I installed 'libc-bin' and 'libc6' (2.11.2-7) from Squeeze and
after reboot I got a corrupted filesystem (8 orphaned inodes) besides other complaints.

2. in Debian-Squeeze I installed 'libc-bin' and 'libc6' (2.11.1-0ubuntu7.6) from Lucid-updates and
after reboot I got a clean filesystem (no orphaned inodes) besides other complaints.

Disregarding the dirty way of using unmatched libc's and the resulting complaints of apt, ...
I conclude that the libc-packages of both, Ubuntu and Debian, themselfs are correct.
So the root of the evil has to be something else - I suppose upstart.

My personal opinion is that at very first this basic and critical bug must be fixed.
As log as this filesystem-corruption during shutdown persists, all the other related problems and bugs do just try to cure the symptoms and waste a lot of work?

Maybe I am wrong, but this bug though critical is not assigned to anybody yet. I wonder when somebody will feel responsible.

Revision history for this message
ingo (ingo-steiner) wrote :

And here another oddity of Ubuntu:

Squeeze logs fs-checks correctly in /var/log/fsck/*

Lucid (and Maverick): nothing logged there depite the libc6-updates:
ls -l /var/log/fsck
insgesamt 8
-rw-r----- 1 root adm 31 2010-08-20 12:55 checkfs
-rw-r----- 1 root adm 31 2010-08-20 12:55 checkroot

why?

Revision history for this message
praseodym (oliver-ehlert) wrote :

I confirm this bug in Maverick 64 bit in a Virtualbox on a Lucid 64 bit host. It doesnt occur on the same system in Maverick 32 bit in VBox and doesnt occur on the host system.

Revision history for this message
ingo (ingo-steiner) wrote :

@Element #59 (rare earth):

did you try 'apt-get install --reinstall libc6 && sutdown -r now' ?

Revision history for this message
ingo (ingo-steiner) wrote :

I wonder what Canonical tells its customers who purchased commercial support for Ubuntu in case they discover this bug?

Something like:

"sorry, we have done our best to hide such frightening messages from our customers. Unfortunately the nasty community has retrieved and disclosed it."

We have set up plymouth to hide the messages behind a puple splash screen with walking dots and taken provisions, so a normal user won't be able to uninstall plymouth (see Bug #556372). We have switched-off the usual logging from fsck-runs in /var/log/fsck/checkroot and /var/log/fsck/checkfs (see Bug #568594). We have set up the boot process to perform a filesystem check for / at every boot, regaredless what the user specifies in /etc/fstab column #6 (see Bug #568594).

Ubuntu firmly relies on 'fsck' as a proven and reliable tool to iron out those minor glitches when your filesystem gets corrupted or marked dirty on shutdown. Moreover we always point out: Ubuntu is not Debian!

???

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

So this does in fact seem to be caused completely by init holding libraries open. I patched /etc/init.d/umountfs to save the output of lsof just before unmounting root:

clint@natty-alpha1:~$ grep DEL /lastlsof
init 1 root DEL REG 251,0 399654 /lib/libnss_files-2.12.2.so
init 1 root DEL REG 251,0 399653 /lib/libnss_nis-2.12.2.so
init 1 root DEL REG 251,0 399639 /lib/libnsl-2.12.2.so
init 1 root DEL REG 251,0 399659 /lib/libnss_compat-2.12.2.so.dpkg-new
init 1 root DEL REG 251,0 399660 /lib/libc-2.12.2.so
init 1 root DEL REG 251,0 399629 /lib/librt-2.12.2.so
init 1 root DEL REG 251,0 399662 /lib/libpthread-2.12.2.so
init 1 root DEL REG 251,0 399663 /lib/ld-2.12.2.so

This is odd, because upstart claims to suport 'telinit u' in its man page, but it actually doesn't do anything thanks to this revision in upstart's codebase:

------------------------------------------------------------
revno: 977
committer: Scott James Remnant <email address hidden>
branch nick: upstart
timestamp: Thu 2008-06-05 01:26:10 +0100
message:
  * init/main.c: Also remove SIGTERM handling, we don't re-exec
  properly and this is a dangerous signal to use anyway.
  (term_handler): Drop function.

So, I think glibc is doing its job calling 'telinit u' in the postinst. This is upstart's bug. Setting to Critical in upstart now, and Invalid in eglibc.

This is also a regression of bug #188925 , which was present in hardy, and fixed in intrepid. Tagging regression-release.

Changed in upstart (Ubuntu):
importance: Undecided → Critical
Changed in eglibc (Ubuntu):
status: Confirmed → Invalid
tags: added: regression-release
summary: - libc6 upgrade causes umount to fail on shutdown
+ libc6 upgrade causes umount to fail on shutdown because init cannot be
+ restarted
Changed in upstart (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Scott James Remnant (scott) wrote : Re: [Bug 672177] Re: libc6 upgrade causes umount to fail on shutdown

On Mon, Dec 27, 2010 at 10:17 PM, Clint Byrum <email address hidden> wrote:

> So this does in fact seem to be caused completely by init holding
> libraries open. I patched /etc/init.d/umountfs to save the output of
> lsof just before unmounting root:
>
Yes, now go read /etc/init.d/umountroot

Changed in upstart:
status: New → Invalid
Changed in upstart (Ubuntu):
status: Triaged → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Oops, I'm still learning my way around the shutdown process and, indeed, I missed umountroot.

I went ahead and added another lsof right after the call to telinit u, right before MOUNT_FORCE_OPT=, and one *after* the remount of /.

It doesn't seem to make any difference:

clint@natty-alpha1:~$ grep DEL /lastlsof-umountroot-post
init 1 root DEL REG 251,0 399637 /lib/libnss_files-2.12.2.so
init 1 root DEL REG 251,0 399636 /lib/libnss_nis-2.12.2.so
init 1 root DEL REG 251,0 399630 /lib/libnsl-2.12.2.so
init 1 root DEL REG 251,0 399643 /lib/libnss_compat-2.12.2.so.dpkg-new
init 1 root DEL REG 251,0 399644 /lib/libc-2.12.2.so
init 1 root DEL REG 251,0 390256 /lib/librt-2.12.2.so
init 1 root DEL REG 251,0 399646 /lib/libpthread-2.12.2.so
init 1 root DEL REG 251,0 399647 /lib/ld-2.12.2.so
clint@natty-alpha1:~$ grep DEL /lastlsof-umountroot
init 1 root DEL REG 251,0 399637 /lib/libnss_files-2.12.2.so
init 1 root DEL REG 251,0 399636 /lib/libnss_nis-2.12.2.so
init 1 root DEL REG 251,0 399630 /lib/libnsl-2.12.2.so
init 1 root DEL REG 251,0 399643 /lib/libnss_compat-2.12.2.so.dpkg-new
init 1 root DEL REG 251,0 399644 /lib/libc-2.12.2.so
init 1 root DEL REG 251,0 390256 /lib/librt-2.12.2.so
init 1 root DEL REG 251,0 399646 /lib/libpthread-2.12.2.so
init 1 root DEL REG 251,0 399647 /lib/ld-2.12.2.so

I even tried forcing the telinit u by doing 'touch /var/run/init.upgraded', and verified that it ran by adding -x and a sleep at the end of umountroot.

This makes sense, because telinit u just sends SIGTERM to upstart, which has no handler, as it was removed by revision 977, and doesn't seem to have been added back. Since SIG_DFL signals are not delivered to init, I'm not sure how umountroot's call to 'telinit u' can help in this case.

Unless I'm missing something in upstart's code (quite likely) I think there may still potentially need to be a change in upstart to support re-executing.

Another option if this can't be pushed into upstart is to mount a special readonly filesystem at boot just for init's libraries. However, it seems that re-execcing would be simpler than trying to get that right.

Revision history for this message
Scott James Remnant (scott) wrote : Re: [Bug 672177] Re: libc6 upgrade causes umount to fail on shutdown because init cannot be restarted

On Tue, Dec 28, 2010 at 6:47 AM, Clint Byrum <email address hidden> wrote:

> This makes sense, because telinit u just sends SIGTERM to upstart, which
> has no handler, as it was removed by revision 977, and doesn't seem to
> have been added back. Since SIG_DFL signals are not delivered to init,
> I'm not sure how umountroot's call to 'telinit u' can help in this case.
>
> Unless I'm missing something in upstart's code (quite likely) I think
> there may still potentially need to be a change in upstart to support
> re-executing.
>
Hmm, while that handled was removed Upstream, the code should have
been retained in the Ubuntu package as part of the "telinit u" patch.

The idea is that rather than doing a full state transfer, Upstart just
re-exec's itself and loses all state. That's why we do it as the last
thing before unmounting the root on shutdown, because it then doesn't
matter about the state - there shouldn't be any.

Maybe that part of the patch has been lost?

Scott

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Tue, 2010-12-28 at 14:22 +0000, Scott James Remnant wrote:
> On Tue, Dec 28, 2010 at 6:47 AM, Clint Byrum <email address hidden> wrote:
>
> > This makes sense, because telinit u just sends SIGTERM to upstart, which
> > has no handler, as it was removed by revision 977, and doesn't seem to
> > have been added back. Since SIG_DFL signals are not delivered to init,
> > I'm not sure how umountroot's call to 'telinit u' can help in this case.
> >
> > Unless I'm missing something in upstart's code (quite likely) I think
> > there may still potentially need to be a change in upstart to support
> > re-executing.
> >
> Hmm, while that handled was removed Upstream, the code should have
> been retained in the Ubuntu package as part of the "telinit u" patch.
>
> The idea is that rather than doing a full state transfer, Upstart just
> re-exec's itself and loses all state. That's why we do it as the last
> thing before unmounting the root on shutdown, because it then doesn't
> matter about the state - there shouldn't be any.
>
> Maybe that part of the patch has been lost?
>

Ahh, as I suspected I was missing something in upstart's code.

That code is cleverly hidden within the package diff.

It registers nih_main_term_signal() which calls nih_main_loop_exit(),
which tells init to exit after any currently running operation is over.

So, it would seem init exits on SIGTERM. I did see it actually exit and
end up panicing the kernel once, but only when I sent multiple kills.

init should never exit, so should'nt we instead be execcing ourselves
again? Will keep digging in, maybe neither are happening as we expect
them to, causing this issue.

Revision history for this message
Scott James Remnant (scott) wrote :
Download full text (3.4 KiB)

Are you sure? I can't find that code at all - it looks like it's been
lost somehow

On Tue, Dec 28, 2010 at 6:19 PM, Clint Byrum <email address hidden> wrote:
> On Tue, 2010-12-28 at 14:22 +0000, Scott James Remnant wrote:
>> On Tue, Dec 28, 2010 at 6:47 AM, Clint Byrum <email address hidden> wrote:
>>
>> > This makes sense, because telinit u just sends SIGTERM to upstart, which
>> > has no handler, as it was removed by revision 977, and doesn't seem to
>> > have been added back. Since SIG_DFL signals are not delivered to init,
>> > I'm not sure how umountroot's call to 'telinit u' can help in this case.
>> >
>> > Unless I'm missing something in upstart's code (quite likely) I think
>> > there may still potentially need to be a change in upstart to support
>> > re-executing.
>> >
>> Hmm, while that handled was removed Upstream, the code should have
>> been retained in the Ubuntu package as part of the "telinit u" patch.
>>
>> The idea is that rather than doing a full state transfer, Upstart just
>> re-exec's itself and loses all state.  That's why we do it as the last
>> thing before unmounting the root on shutdown, because it then doesn't
>> matter about the state - there shouldn't be any.
>>
>> Maybe that part of the patch has been lost?
>>
>
> Ahh, as I suspected I was missing something in upstart's code.
>
> That code is cleverly hidden within the package diff.
>
> It registers nih_main_term_signal() which calls nih_main_loop_exit(),
> which tells init to exit after any currently running operation is over.
>
> So, it would seem init exits on SIGTERM. I did see it actually exit and
> end up panicing the kernel once, but only when I sent multiple kills.
>
> init should never exit, so should'nt we instead be execcing ourselves
> again? Will keep digging in, maybe neither are happening as we expect
> them to, causing this issue.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/672177
>
> Title:
>  libc6 upgrade causes umount to fail on shutdown because init cannot be restarted
>
> Status in Upstart:
>  Invalid
> Status in “eglibc” package in Ubuntu:
>  Invalid
> Status in “upstart” package in Ubuntu:
>  Confirmed
>
> Bug description:
>  On a clean install of Ubuntu 10.04.1, after upgrading the offer libc6 upgrade, on the next reboot the root fs can't be properly unmounted (mount: / is busy). This causes fsck to run on boot and of course some minor issues with the filesystem. This might not be a problem with libc6 itself, but a side effect of upgrading in combination with some other package (I suspect the init process, so I guess upstart).
>
> The fsck run, and the orphaned inodes it finds are holding me back from installing this on a new server - especially since this already happens on a clean install of 10.04.1!
>
> paul@ubuntu:~$ lsb_release -rd
> Description:    Ubuntu 10.04.1 LTS
> Release:        10.04
>
> ii  libc6                           2.11.1-0ubuntu7.2                 Embedded GNU C Library: Shared libraries
>
> ProblemType: Bug
> DistroRelease: Ubuntu 10.04
> Package: libc6 2.11.1-0ubuntu7.2
> ProcVersionSignature: Ubuntu 2.6...

Read more...

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Wed, 2010-12-29 at 00:06 +0000, Scott James Remnant wrote:
> Are you sure? I can't find that code at all - it looks like it's been
> lost somehow

quite certain.. its a bit tricky to find it in bzr, but this will give
it to you for, say, maverick:

$ bzr branch lp:ubuntu/maverick/upstart maverick
$ cd maverick
$ bzr diff -r tag:upstream-0.6.6..tag:0.6.6-3

apt-get source will wrap it up in the .diff.gz for you as well.

Revision history for this message
Scott James Remnant (scott) wrote :

Yes, I know how to drive bzr and dpkg ;-)

My point is that if you do that diff, the only occurrence of SIGTERM
is in the code for the upstart/udev bridge, that patch doesn't add any
handling to upstart itself.

Scott

On Wed, Dec 29, 2010 at 6:54 AM, Clint Byrum <email address hidden> wrote:
> On Wed, 2010-12-29 at 00:06 +0000, Scott James Remnant wrote:
>> Are you sure?  I can't find that code at all - it looks like it's been
>> lost somehow
>
> quite certain.. its a bit tricky to find it in bzr, but this will give
> it to you for, say, maverick:
>
> $ bzr branch lp:ubuntu/maverick/upstart maverick
> $ cd maverick
> $ bzr diff -r tag:upstream-0.6.6..tag:0.6.6-3
>
> apt-get source will wrap it up in the .diff.gz for you as well.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/672177
>
> Title:
>  libc6 upgrade causes umount to fail on shutdown because init cannot be restarted
>
> Status in Upstart:
>  Invalid
> Status in “eglibc” package in Ubuntu:
>  Invalid
> Status in “upstart” package in Ubuntu:
>  Confirmed
>
> Bug description:
>  On a clean install of Ubuntu 10.04.1, after upgrading the offer libc6 upgrade, on the next reboot the root fs can't be properly unmounted (mount: / is busy). This causes fsck to run on boot and of course some minor issues with the filesystem. This might not be a problem with libc6 itself, but a side effect of upgrading in combination with some other package (I suspect the init process, so I guess upstart).
>
> The fsck run, and the orphaned inodes it finds are holding me back from installing this on a new server - especially since this already happens on a clean install of 10.04.1!
>
> paul@ubuntu:~$ lsb_release -rd
> Description:    Ubuntu 10.04.1 LTS
> Release:        10.04
>
> ii  libc6                           2.11.1-0ubuntu7.2                 Embedded GNU C Library: Shared libraries
>
> ProblemType: Bug
> DistroRelease: Ubuntu 10.04
> Package: libc6 2.11.1-0ubuntu7.2
> ProcVersionSignature: Ubuntu 2.6.32-24.39-server 2.6.32.15+drm33.5
> Uname: Linux 2.6.32-24-server x86_64
> Architecture: amd64
> Date: Sun Nov  7 16:17:07 2010
> InstallationMedia: Ubuntu-Server 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.2)
> ProcEnviron:
>  PATH=(custom, no user)
>  LANG=en_US.UTF-8
>  SHELL=/bin/bash
> SourcePackage: eglibc
>
>
>

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Some day, I'll learn to pay attention to the details, I promise. Sorry for the confusion.

Ok, so I reverse merged r977 back in, which mostly applied cleanly.

I then tested this on a natty VM, and, shock, it worked flawlessly.

I think we may also need to have eglibc's postinst skip the call to telinit u, and instead touch /var/run/init.upgraded. Its probably open to debate whether we need to flag users to reboot, though I'd prefer that we do.

One major problem though, is it seems upstart is FTBFS on natty right now. Without any changes, 0.6.7-3 cannot build, as init/test_conf fails, and utils/test_utmp also fails. Will open a bug report for that post-holiday.

I've pushed up a branch, it would be nice if somebody else could test it (I manually disabled test_conf and test_utmp in their respective Makefile.am's and then re-ran automake before building). I will propose the merge after returning from holiday next week.

Revision history for this message
Scott James Remnant (scott) wrote :
Download full text (3.4 KiB)

I've no idea why that revision got dropped from the Ubuntu package, I
did a bit of investigation and it seems to vanish about the point we
switched to the auto-importer based packages. My only guess is that
bzr undid the cherry-pick as part of a merge.

Right, eglibc's postinst should definitely not call telinit u and
should touch that upgraded file instead, i thought the code to do that
was definitely in glibc's postinst but again I can't find it in the
history. It does look very much like we lost a chunk of patches
somehow, the bug log shows that they were definitely uploaded!

Definitely open bug reports for test failures - these could be signs
of bigger problems and shouldn't be overridden to make an upload go
through.

Scott

On Wed, Dec 29, 2010 at 8:17 PM, Clint Byrum <email address hidden> wrote:
> Some day, I'll learn to pay attention to the details, I promise. Sorry
> for the confusion.
>
> Ok, so I reverse merged r977 back in, which mostly applied cleanly.
>
> I then tested this on a natty VM, and, shock, it worked flawlessly.
>
> I think we may also need to have eglibc's postinst skip the call to
> telinit u, and instead touch /var/run/init.upgraded. Its probably open
> to debate whether we need to flag users to reboot, though I'd prefer
> that we do.
>
> One major problem though, is it seems upstart is FTBFS on natty right
> now. Without any changes, 0.6.7-3 cannot build, as init/test_conf fails,
> and utils/test_utmp also fails. Will open a bug report for that post-
> holiday.
>
> I've pushed up a branch, it would be nice if somebody else could test it
> (I manually disabled test_conf and test_utmp in their respective
> Makefile.am's and then re-ran automake before building). I will propose
> the merge after returning from holiday next week.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/672177
>
> Title:
>  libc6 upgrade causes umount to fail on shutdown because init cannot be restarted
>
> Status in Upstart:
>  Invalid
> Status in “eglibc” package in Ubuntu:
>  Invalid
> Status in “upstart” package in Ubuntu:
>  Confirmed
>
> Bug description:
>  On a clean install of Ubuntu 10.04.1, after upgrading the offer libc6 upgrade, on the next reboot the root fs can't be properly unmounted (mount: / is busy). This causes fsck to run on boot and of course some minor issues with the filesystem. This might not be a problem with libc6 itself, but a side effect of upgrading in combination with some other package (I suspect the init process, so I guess upstart).
>
> The fsck run, and the orphaned inodes it finds are holding me back from installing this on a new server - especially since this already happens on a clean install of 10.04.1!
>
> paul@ubuntu:~$ lsb_release -rd
> Description:    Ubuntu 10.04.1 LTS
> Release:        10.04
>
> ii  libc6                           2.11.1-0ubuntu7.2                 Embedded GNU C Library: Shared libraries
>
> ProblemType: Bug
> DistroRelease: Ubuntu 10.04
> Package: libc6 2.11.1-0ubuntu7.2
> ProcVersionSignature: Ubuntu 2.6.32-24.39-server 2.6.32.15+drm33.5
> Uname: Linux 2.6.32-24-server x86...

Read more...

Changed in eglibc (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

The test failures that I saw were caused by running the build step as root. I didn't think sbuild did that, but apparently it does. When I build inside a clean chroot as non-root the build seems to work fine.

I will file the merge proposal then and also do the glibc change.

Changed in upstart (Ubuntu):
assignee: nobody → Clint Byrum (clint-fewbar)
Changed in eglibc (Ubuntu):
status: Confirmed → In Progress
Changed in upstart (Ubuntu):
status: Confirmed → In Progress
Changed in eglibc (Ubuntu):
assignee: nobody → Clint Byrum (clint-fewbar)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.12.1-0ubuntu12

---------------
eglibc (2.12.1-0ubuntu12) natty; urgency=low

  * do not run 'telinit u' on upgrade, as this will break upstart.
    touch /var/run/init.upgraded instead, which will force a re-exec just
    before remounting root read-only. LP: #672177, LP: #694772.
 -- Clint Byrum <email address hidden> Mon, 03 Jan 2011 10:17:18 -0800

Changed in eglibc (Ubuntu):
status: In Progress → Fix Released
Changed in eglibc (Ubuntu):
assignee: Clint Byrum (clint-fewbar) → nobody
Revision history for this message
ingo (ingo-steiner) wrote :

Hi Clint,

thanks for fast fix - but unfortunately only in Natty which is still in development. This bug was reported for *Lucid*, which in terms of service will even survive Natty. When will that be fixed and who cares?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Wed, 2011-01-05 at 21:31 +0000, ingo wrote:
> Hi Clint,
>
> thanks for fast fix - but unfortunately only in Natty which is still in
> development. This bug was reported for *Lucid*, which in terms of
> service will even survive Natty. When will that be fixed and who cares?
>

Hi Ingo. This is the normal procedure for stable release updates.

If we don't fix it first in the development release, then it will
persist into subsequent releases.

We need to fix it on the dev version first, and then the fix will be
backported to maverick and lucid.

I know its hard to remain patient. I'm hoping to get this fix into lucid
before 10.04.2 is released in February.

Revision history for this message
nerdistmonk (nerdistmonk) wrote :

This is funny because I reported this bug in the ubuntu forums in 2007 but i was ignored and it was blamed on the user (not surprised)

http://ohioloco.ubuntuforums.org/showthread.php?t=645429
(this is my original post from December 2007 same bug)

Matthias Klose (doko)
Changed in eglibc (Ubuntu Lucid):
status: New → In Progress
Changed in eglibc (Ubuntu Maverick):
status: New → In Progress
Changed in upstart (Ubuntu Lucid):
status: New → Triaged
Changed in upstart (Ubuntu Maverick):
status: New → Triaged
Colin Watson (cjwatson)
Changed in eglibc (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in eglibc (Ubuntu Maverick):
status: In Progress → Fix Committed
Colin Watson (cjwatson)
Changed in upstart (Ubuntu Lucid):
status: Triaged → Fix Committed
Changed in upstart (Ubuntu Maverick):
status: Triaged → Fix Committed
James Hunt (jamesodhunt)
Changed in sysvinit (Ubuntu Maverick):
assignee: nobody → James Hunt (jamesodhunt)
status: New → In Progress
tags: added: verification-done
removed: verification-needed
Changed in eglibc (Ubuntu Lucid):
status: Fix Committed → Fix Released
Changed in upstart (Ubuntu Lucid):
status: Fix Committed → Fix Released
Changed in upstart (Ubuntu Maverick):
status: Fix Committed → Fix Released
Changed in eglibc (Ubuntu Lucid):
assignee: nobody → Bobby A. Callender (bcallender)
Changed in eglibc (Ubuntu Maverick):
status: Fix Committed → Fix Released
James Hunt (jamesodhunt)
Changed in sysvinit (Ubuntu Lucid):
assignee: nobody → James Hunt (jamesodhunt)
status: New → In Progress
Changed in sysvinit (Ubuntu Natty):
assignee: nobody → James Hunt (jamesodhunt)
status: New → In Progress
44 comments hidden view all 124 comments
Revision history for this message
ingo (ingo-steiner) wrote :

James,

did you already upload to proposed (so we'll get binaries tomorrow)?
If not, could you be so kind and compile a 'Lucid-amd64 binary' for testing here?

Revision history for this message
ingo (ingo-steiner) wrote :

I just saw it is a addition to 'umountroot', so I could copy+past form your diff - right?

Revision history for this message
ingo (ingo-steiner) wrote :

@ Jimmy

> How do I verify that it works?

when system is up and running, issue following command as root and confirm the install when requested:
   'apt-get install --reinstall libc6 && shutdown -r now'
after system has rebooted, check for orphaned inodes with this command (works as user):
  'dmesg | grep orphan'
and it tells you whether shutdown was clean (no output) or forced (several orphaned inodes).

Revision history for this message
ingo (ingo-steiner) wrote :

@ James,

I verified your patch in Lucid-amd64 (by manually inserting your lines of code) and can confirm: *it works*!
(apart from the portmap and rpc.statd issue, whuich still persists of course).

I used Clints proposal inserting these 2 lines after the mount commands:
    /usr/bin/lsof -n | grep DEL
    sleep 15
(remark: lsof must be given with full path to be found).

Now, when portmap is stopped manually before, there is absolutely no more "mount: / is busy" as without your patch and shutdown is clean! I did many tests with and without your patch - appears to be absolutely reliable.

Congratulations!

Hope this patch will be inclusded into 10.0.4.2.

What now is missing: checking all the daemons which might have a buggy start/stop script.

Revision history for this message
ingo (ingo-steiner) wrote :

Additional remark: I did not observe any noticeble delay of shutdown with the patch.

Revision history for this message
ingo (ingo-steiner) wrote :

Addition (works for me):

    /usr/bin/lsof -n | grep DEL > /lsof.out 2>&1
    sleep 15

greps all the open files/libs - if there are any - and writes the information to disk.
If there are none one just gets an error message that "could not write to ro filesystem".

Revision history for this message
ingo (ingo-steiner) wrote :

I now de-installed portmap+nfs-common and did install openssh-server:

*Only when* I replace the line in /etc/init/ssh.conf by:
   stop on runlevel [!2345]
all is fine with James' patch included!

With the original line
   stop on runlevel S
I get a lot of open files, like this:

sshd 1385 root DEL REG 8,1 185304 /lib/libnss_files-2.11.1.so
sshd 1385 root DEL REG 8,1 185306 /lib/libnss_nis-2.11.1.so
sshd 1385 root DEL REG 8,1 185162 /lib/libnss_compat-2.11.1.so.dpkg-new
sshd 1385 root DEL REG 8,1 185310 /lib/libpthread-2.11.1.so
sshd 1385 root DEL REG 8,1 185369 /lib/libresolv-2.11.1.so
sshd 1385 root DEL REG 8,1 185138 /lib/libdl-2.11.1.so
sshd 1385 root DEL REG 8,1 185156 /lib/libnsl-2.11.1.so
sshd 1385 root DEL REG 8,1 185083 /lib/libc-2.11.1.so
sshd 1385 root DEL REG 8,1 185087 /lib/libcrypt-2.11.1.so
sshd 1385 root DEL REG 8,1 185373 /lib/libutil-2.11.1.so
sshd 1385 root DEL REG 8,1 184782 /lib/ld-2.11.1.so

Insn't that a fast interim fix which can be done by "100 papercuts" so it comes in 10.04.2?
(sshd is one of the most widely used daemons) would be great!

Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 672177] Re: libc6 upgrade causes umount to fail on shutdown because init cannot be restarted

Bad news there: I'm afraid that we've already had to entirely freeze
updates for 10.04.2 in order that we can get certification done in time;
unfortunately that's a rather time-consuming process and needs a couple
of weeks of clearance. I expect that 10.04.3 won't be a problem,
though.

Revision history for this message
ingo (ingo-steiner) wrote :

> ... freeze updates for 10.04.2 in order that we can get certification done

That means we get certified BUGS - and that 10 months after release!

Revision history for this message
Colin Watson (cjwatson) wrote :

I'm afraid there's no point railing about it here - we're already
committed to the date and it would be a colossal rearrangement of many
people's schedules to change it at this point. As I say, sorry, and we
should be able to get this nailed down for .3.

Revision history for this message
Scott James Remnant (scott) wrote : Re: [Bug 672177] Re: libc6 upgrade causes umount to fail on shutdown because init cannot be restarted

At least certified bugs can be documented, with certified workarounds.

On Wed, Feb 2, 2011 at 2:05 PM, ingo <email address hidden> wrote:
>> ... freeze updates for 10.04.2 in order that we can get certification
> done
>
> That means we get certified BUGS - and that 10 months after release!
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/672177
>
> Title:
>  libc6 upgrade causes umount to fail on shutdown because init cannot be
>  restarted
>
> Status in Upstart:
>  Invalid
> Status in “eglibc” package in Ubuntu:
>  Fix Released
> Status in “sysvinit” package in Ubuntu:
>  In Progress
> Status in “upstart” package in Ubuntu:
>  In Progress
> Status in “eglibc” source package in Lucid:
>  Fix Released
> Status in “sysvinit” source package in Lucid:
>  In Progress
> Status in “upstart” source package in Lucid:
>  Fix Released
> Status in “eglibc” source package in Maverick:
>  Fix Released
> Status in “sysvinit” source package in Maverick:
>  In Progress
> Status in “upstart” source package in Maverick:
>  Fix Released
> Status in “eglibc” source package in Natty:
>  Fix Released
> Status in “sysvinit” source package in Natty:
>  In Progress
> Status in “upstart” source package in Natty:
>  In Progress
>
> Bug description:
>  On a clean install of Ubuntu 10.04.1, after upgrading the offer libc6
>  upgrade, on the next reboot the root fs can't be properly unmounted
>  (mount: / is busy). This causes fsck to run on boot and of course some
>  minor issues with the filesystem. This might not be a problem with
>  libc6 itself, but a side effect of upgrading in combination with some
>  other package (I suspect the init process, so I guess upstart).
>
>  The fsck run, and the orphaned inodes it finds are holding me back
>  from installing this on a new server - especially since this already
>  happens on a clean install of 10.04.1!
>
>  paul@ubuntu:~$ lsb_release -rd
>  Description:    Ubuntu 10.04.1 LTS
>  Release:        10.04
>
>  ii  libc6                           2.11.1-0ubuntu7.2
>  Embedded GNU C Library: Shared libraries
>
>  ProblemType: Bug
>  DistroRelease: Ubuntu 10.04
>  Package: libc6 2.11.1-0ubuntu7.2
>  ProcVersionSignature: Ubuntu 2.6.32-24.39-server 2.6.32.15+drm33.5
>  Uname: Linux 2.6.32-24-server x86_64
>  Architecture: amd64
>  Date: Sun Nov  7 16:17:07 2010
>  InstallationMedia: Ubuntu-Server 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.2)
>  ProcEnviron:
>   PATH=(custom, no user)
>   LANG=en_US.UTF-8
>   SHELL=/bin/bash
>  SourcePackage: eglibc
>
>
>

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 0.6.7-7

---------------
upstart (0.6.7-7) natty; urgency=low

  * Re-add upstream r977 to allow proper re-exec on shutdown (LP: #672177)
  * debian/control: adding Breaks on eglibc version that disables
    telinit u to avoid accidentally installing a version of libc6 that
    will cause upstart to re-exec and lose its state.
 -- Clint Byrum <email address hidden> Fri, 21 Jan 2011 08:39:13 -0800

Changed in upstart (Ubuntu Natty):
status: In Progress → Fix Released
Revision history for this message
Zippo (peter-henninger) wrote :

Will this bug also be fixed for lucid the LTS version?

Revision history for this message
Paul Crawford (psc-sat) wrote :

Well it is not yet fixed for 10.04 with the 'proposed' updates. Tonight just rebooted after updates to kernal 2.6.32-29 and guess what? Yes, my syslog contained the following sort of message:

"Feb 15 21:45:24 paul-ubuntu kernel: [ 2.341704] EXT4-fs (sda5): 4 orphan inodes deleted"

So is 'proposed' still part of the 10.04.2 to be released, or will the fix come soon as mentioned for the 10.04.3 CD?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

On Tue, 2011-02-15 at 21:51 +0000, Paul Crawford wrote:
> Well it is not yet fixed for 10.04 with the 'proposed' updates. Tonight
> just rebooted after updates to kernal 2.6.32-29 and guess what? Yes, my
> syslog contained the following sort of message:
>
> "Feb 15 21:45:24 paul-ubuntu kernel: [ 2.341704] EXT4-fs (sda5): 4
> orphan inodes deleted"
>
> So is 'proposed' still part of the 10.04.2 to be released, or will the
> fix come soon as mentioned for the 10.04.3 CD?
>

Paul, I'm sorry you're still having issues.

This is all covered in the previous comments, but to summarize:

There is still a pending change to sysvinit to make sure the umounts
wait for upstart to re-exec itself. There are also a couple more bugs
covering daemons that need to be shutdown, namely, sshd and portmap.

10.04.3 should have all of these fixes, and they should be available
soon as updates to 10.04.* as well.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I'm targetting the sysvinit bug task for lucid to 10.04.3, but it should be done well before then.

Changed in sysvinit (Ubuntu Lucid):
importance: Undecided → Critical
milestone: none → ubuntu-10.04.3
Revision history for this message
ingo (ingo-steiner) wrote :

On 16.02.2011 00:25, Clint Byrum wrote:
> There are also a couple more bugs
> covering daemons that need to be shutdown, namely, sshd and portmap.

Clint,
the portmap Bug #711425 you filed has not got any attention yet, it's
still undecided and unassigned - does nobody care?

Changed in sysvinit (Ubuntu Natty):
importance: Undecided → High
milestone: none → natty-alpha-3
Revision history for this message
Martin Pitt (pitti) wrote :

This is a bug fix, can happen after FF, and isn't an a3 release blocker.

Changed in sysvinit (Ubuntu Natty):
milestone: natty-alpha-3 → none
Revision history for this message
Colin Watson (cjwatson) wrote :

James' patch appears to have been applied in natty now, so I'm closing this bug task. Please reopen if this was incorrect.

sysvinit (2.87dsf-4ubuntu20) natty; urgency=low

  [ Michael Vogt ]
  * debian/patches/100_fix_ftbfs_enoioctlcmd.patch:
    - cherry pick upstream fix for missing ENOIOCTLCMD, this
      fixes a FTBFS

  [ James Hunt ]
  * debian/initscripts/etc/init.d/umountroot: Improve handling of
    respawn of init: we now wait for inits map file to change. If this
    doesn't happen within 5 seconds, we unmount forcibly.

 -- Michael Vogt <email address hidden> Fri, 04 Mar 2011 10:38:34 +0100

Changed in sysvinit (Ubuntu Natty):
status: In Progress → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Some nitpicks about the patch:

 - patch ordering in series
 - target is {lucid,maverick}-proposed
 - changelog missing bug references.
 - version 2.87dsf-4ubuntu18 already exists in maverick, can't be used for lucid-updates
 - maverick debdiff doesn't apply, as it wasn't done against -updates.

I corrected these and uploaded to l/m-proposed. Unsubscribing sponsors.

Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted sysvinit into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in sysvinit (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: removed: verification-done
tags: added: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted sysvinit into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in sysvinit (Ubuntu Maverick):
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

Any testers? This is blocking another SRU right now.

Revision history for this message
NoOp (glgxg) wrote :

Maverick:
$ dmesg | grep orphan
[ 7.716966] EXT4-fs (sda5): orphan cleanup on readonly fs
[ 7.716978] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899558
[ 7.717070] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899556
[ 7.717087] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899555
[ 7.717104] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899553
[ 7.717119] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899551
[ 7.717135] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899550
[ 7.717153] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899549
[ 7.747616] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899548
[ 7.747639] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899545
[ 7.747655] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899544
[ 7.747673] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899543
[ 7.747687] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899542
[ 7.756193] EXT4-fs (sda5): ext4_orphan_cleanup: deleting unreferenced inode 5899540
[ 7.756209] EXT4-fs (sda5): 13 orphan inodes deleted

$ apt-cache policy sysvinit-utils
sysvinit-utils:
  Installed: 2.87dsf-4ubuntu19.1
  Candidate: 2.87dsf-4ubuntu19.1
  Version table:
 *** 2.87dsf-4ubuntu19.1 0
        500 http://archive.ubuntu.com/ubuntu/ maverick-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     2.87dsf-4ubuntu19 0
        500 http://archive.ubuntu.com/ubuntu/ maverick-updates/main amd64 Packages
     2.87dsf-4ubuntu18 0
        500 http://archive.ubuntu.com/ubuntu/ maverick/main amd64 Packages

I'll test natty next. It will be an hour or so before I can get to lucid.

Revision history for this message
NoOp (glgxg) wrote :

natty:
$ dmesg | grep orphan
$ apt-cache policy sysvinit-utils
sysvinit-utils:
  Installed: 2.87dsf-4ubuntu23
  Candidate: 2.87dsf-4ubuntu23
  Version table:
 *** 2.87dsf-4ubuntu23 0
        500 http://archive.ubuntu.com/ubuntu/ natty/main amd64 Packages
        100 /var/lib/dpkg/status

Revision history for this message
NoOp (glgxg) wrote :

lucid:
$ dmesg | grep orphan
$ apt-cache policy sysvinit-utils
sysvinit-utils:
  Installed: 2.87dsf-4ubuntu17.1
  Candidate: 2.87dsf-4ubuntu17.1
  Version table:
 *** 2.87dsf-4ubuntu17.1 0
        500 http://archive.ubuntu.com/ubuntu/ lucid-proposed/main Packages
        100 /var/lib/dpkg/status
     2.87dsf-4ubuntu17 0
        500 http://archive.ubuntu.com/ubuntu/ lucid/main Packages

Reran the test on maverick:
$ sudo -i
# apt-get install --reinstall libc6 && shutdown -r now
  on reboot:
$ dmesg | grep orphan
Same results as in comment #108 (EXT4-fs (sda5): 13 orphan inodes deleted). Note: also on the switch from natty (/dev/sda7) back to maverick, maverick (dev/sda5) ran an fsck on boot before starting gdm.

Revision history for this message
NoOp (glgxg) wrote :

@Martin: checking maverick on another system:

$ dmesg | grep orphan
[ 13.879321] EXT4-fs (sda1): orphan cleanup on readonly fs
[ 13.879356] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 263685
[ 13.879661] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 263665
[ 13.879850] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 263631
[ 13.926435] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 263583
[ 13.926653] EXT4-fs (sda1): 4 orphan inodes deleted

$ apt-cache policy sysvinit-utils
sysvinit-utils:
  Installed: 2.87dsf-4ubuntu19.1
  Candidate: 2.87dsf-4ubuntu19.1
  Version table:
 *** 2.87dsf-4ubuntu19.1 0
        500 http://archive.ubuntu.com/ubuntu/ maverick-proposed/main i386 Packages
        100 /var/lib/dpkg/status
     2.87dsf-4ubuntu19 0
        500 http://archive.ubuntu.com/ubuntu/ maverick-updates/main i386 Packages
     2.87dsf-4ubuntu18 0
        500 http://archive.ubuntu.com/ubuntu/ maverick/main i386 Packages

Let me know if you want anything else checked.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

NoOp, thanks so much for doing these tests!

There are a couple of other bugs that cause this to happen if you have some other services
installed. Some of these are fixed in uploads waiting on this SRU, and some in yet to be
SRU'd fixes (which is why natty seems to always work).

Specifically, if you have portmap installed, that will cause issues. If you do have portmap,
try stopping it before the reboot.

Also its clear that the shutdown isn't broken by this, so I am marking it verification-done.

tags: added: verification-done
removed: verification-needed
Revision history for this message
NoOp (glgxg) wrote :

Clint, I did have portmap running & stopped prior on this test:
$ sudo service portmap stop
$ sudo service portmap status
portmap stop/waiting
$ sudo -i
# service portmap status
portmap stop/waiting
# apt-get install --reinstall libc6 && shutdown -r now
 on reboot
$ dmesg | grep orphan
again "EXT4-fs (sda5): 13 orphan inodes deleted"

So on the test maverick sytem (that's the one with EXT4-fs (sda1): 4 orphan inodes deleted from comment #111):
$ sudo apt-get purge portmap
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libnfsidmap2 librpcsecgss3 libgssglue1
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
  nfs-common* nfs-kernel-server* portmap*
0 upgraded, 0 newly installed, 3 to remove and 0 not upgraded.
After this operation, 1,245kB disk space will be freed.
Do you want to continue [Y/n]? y
(Reading database ... 159103 files and directories currently installed.)
Removing nfs-kernel-server ...
 * Stopping NFS kernel daemon [ OK ]
 * Unexporting directories for NFS kernel daemon... [ OK ]
Purging configuration files for nfs-kernel-server ...
Removing nfs-common ...
stop: Unknown instance:
stop: Unknown instance:
statd stop/waiting
Purging configuration files for nfs-common ...
dpkg-statoverride: warning: No override present.
dpkg: warning: while removing nfs-common, directory '/var/lib/nfs' not empty so not removed.
Removing portmap ...
portmap stop/waiting
Purging configuration files for portmap ...
Processing triggers for man-db ...
Processing triggers for ureadahead ...
ureadahead will be reprofiled on next reboot

Rebooted & then repeated the test, only this time I also check "$ dmesg | grep orphan" before running the test:
$ dmesg | grep orphan
$
Now run the test:
$ dmesg | grep orphan
$

Would you like me to reinstall nfs-common* nfs-kernel-server* portmap* and test again?

Revision history for this message
NoOp (glgxg) wrote :

Booted to lucid (the one from comment #110 that showed no errors. Install portmap & repeated the test on that machine:

$ dmesg | grep orphan
[ 3.676128] EXT4-fs (sdb1): orphan cleanup on readonly fs
[ 3.676159] EXT4-fs (sdb1): ext4_orphan_cleanup: deleting unreferenced inode 193222
[ 3.676396] EXT4-fs (sdb1): 1 orphan inode deleted

So that machine is now showing an orphan.

lucid@lucid-desktop:~$ apt-cache policy portmap
portmap:
  Installed: 6.0.0-1ubuntu2.1
  Candidate: 6.0.0-1ubuntu2.1
  Version table:
 *** 6.0.0-1ubuntu2.1 0
        500 http://archive.ubuntu.com/ubuntu/ lucid-updates/main Packages
        100 /var/lib/dpkg/status
     6.0.0-1ubuntu2 0
        500 http://archive.ubuntu.com/ubuntu/ lucid/main Packages

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 672177] Re: libc6 upgrade causes umount to fail on shutdown because init cannot be restarted

Excerpts from NoOp's message of Thu Apr 14 21:43:19 UTC 2011:
> Rebooted & then repeated the test, only this time I also check "$ dmesg | grep orphan" before running the test:
> $ dmesg | grep orphan
> $
> Now run the test:
> $ dmesg | grep orphan
> $
>
> Would you like me to reinstall nfs-common* nfs-kernel-server* portmap*
> and test again?

No, thanks though. I think we have the info we need. There are already
open bug reports about other things being left running as of shutdown,
this one is specifically addressing init still running, which I think
we can say you've shown, its not.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.87dsf-4ubuntu17.1

---------------
sysvinit (2.87dsf-4ubuntu17.1) lucid-proposed; urgency=low

  * debian/initscripts/etc/init.d/umountroot: Improve handling of
    respawn of init: we now wait for inits map file to change. If this
    doesn't happen within 5 seconds, we unmount forcibly. (LP: #672177)
 -- James Hunt <email address hidden> Fri, 28 Jan 2011 15:33:50 +0000

Changed in sysvinit (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.87dsf-4ubuntu19.1

---------------
sysvinit (2.87dsf-4ubuntu19.1) maverick-proposed; urgency=low

  * debian/initscripts/etc/init.d/umountroot: Improve handling of
    respawn of init: we now wait for inits map file to change. If this doesn't
    happen within 5 seconds, we unmount forcibly. (LP: #672177)
 -- James Hunt <email address hidden> Fri, 28 Jan 2011 11:45:35 +0000

Changed in sysvinit (Ubuntu Maverick):
status: Fix Committed → Fix Released
Revision history for this message
blitzter47 (blitzter47) wrote :

I'm new in Ubuntu (10.10) and in Lauchpad, and I have this bug. I don't understand how you fix it, as there is fix released. Can someone explain me, please?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Excerpts from chtnh's message of Sat Jul 23 04:41:01 UTC 2011:
> I'm new in Ubuntu (10.10) and in Lauchpad, and I have this bug. I don't
> understand how you fix it, as there is fix released. Can someone explain
> me, please?
>

There are other bugs that sometimes cause an unclean shutdown, this one
is pretty well understood and fixed. Maybe you have some other services,
like mysql, or portmap, that are causing this problem.

Revision history for this message
NoOp (glgxg) wrote :

@Clint: it's back. On Natty 11.04 I am getting unclean inodes again. I notice a single inode in dmesg after every boot:
$ cat /var/log/dmesg.0 | grep orphan
[ 4.470372] EXT4-fs (sda1): orphan cleanup on readonly fs
[ 4.470443] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 2228229
[ 4.470538] EXT4-fs (sda1): 1 orphan inode deleted

So I ran the test:

$ sudo -i
# apt-get install --reinstall libc6 && shutdown -r now

 on reboot:
$ dmesg | grep orphan
[ 13.408854] EXT4-fs (sda1): orphan cleanup on readonly fs
[ 13.408928] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541072
[ 13.409078] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541058
[ 13.409108] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541055
[ 13.409138] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541048
[ 13.409161] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541045
[ 13.409187] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541038
[ 13.409210] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541032
[ 13.409233] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541029
[ 13.409255] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541028
[ 13.409279] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541012
[ 13.409312] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 541007
[ 13.409336] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 540992
[ 13.409361] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 540791
[ 13.438394] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 540772
[ 13.438427] EXT4-fs (sda1): ext4_orphan_cleanup: deleting unreferenced inode 2228229
[ 13.438491] EXT4-fs (sda1): 15 orphan inodes deleted

Note: no portmap installed.

$ apt-cache policy sysvinit-utils
sysvinit-utils:
  Installed: 2.87dsf-4ubuntu23.1
  Candidate: 2.87dsf-4ubuntu23.1
  Version table:
 *** 2.87dsf-4ubuntu23.1 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386 Packages
        100 /var/lib/dpkg/status
     2.87dsf-4ubuntu23 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages

Linux <> 2.6.38-13-generic #53-Ubuntu SMP Mon Nov 28 19:23:39 UTC 2011 i686 i686 i386 GNU/Linux

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

The best way to try and figure out what is causing this is to modify /etc/init.d/umountroot and right before the umount's, add

/usr/bin/lsof -n > /saved.root.lsof
sync

This will save a listing of all opened files and which processes have them open. Things marked as 'deleted' in this list are generally the problem. If you see libc6.so opened by upstart, then this bug has regressed. Otherwise, it is probably something else.

Revision history for this message
NoOp (glgxg) wrote :

On 12/20/2011 09:47 AM, Clint Byrum wrote:
> The best way to try and figure out what is causing this is to modify
> /etc/init.d/umountroot and right before the umount's, add
>
> /usr/bin/lsof -n > /saved.root.lsof
> sync
>
> This will save a listing of all opened files and which processes have
> them open. Things marked as 'deleted' in this list are generally the
> problem. If you see libc6.so opened by upstart, then this bug has
> regressed. Otherwise, it is probably something else.
>

Thanks. I'll test it later today.

Revision history for this message
NoOp (glgxg) wrote :

Clint, can you give me the exact location for
/usr/bin/lsof -n > /saved.root.lsof
sync
in /etc/init.d/umountroot? So far I not been successful in creating '/saved.root.lsof'
Thanks.

Revision history for this message
NoOp (glgxg) wrote :

Nevermind. Got it working.

Displaying first 40 and last 40 comments. View all 124 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.