In Quantal, the root filesystem is not cleanly unmounted at shutdown or reboot

Bug #1058987 reported by the-unconventional
98
This bug affects 18 people
Affects Status Importance Assigned to Milestone
network-manager (Ubuntu)
Incomplete
Medium
Unassigned

Bug Description

Ever since some update in the Quantal pre-releases, having dnsmasq-base installed will cause the root filesystem to not be properly unmounted on shutdown and reboot. In case the Plymouth splash screen is disabled, the message 'mount: / is busy' will be shown, but otherwise the user will not even be aware of this problem.

After rebooting, the root filesystem needs recovery, as shown in dmesg:

kevin@vbox-xubuntu-quantal:~$ dmesg | grep EXT4
[ 1.022746] EXT4-fs (sda2): INFO: recovery required on readonly filesystem
[ 1.022750] EXT4-fs (sda2): write access will be enabled during recovery
[ 1.248294] EXT4-fs (sda2): recovery complete
[ 1.248661] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
[ 1.456315] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro

Yet again, the user will not be aware of this until it is too late. The only way to avoid this from happening (or at least what I've found) is running 'sudo apt-get purge dnsmasq-base'. Sadly, this also removes network-manager and network-manager-gnome, so it isn't really a viable solution.

Another problem that might be related is that having an active connection with the Network Manager prior to shutting down or rebooting, will cause the process to hang for a few seconds, after which the message about / being busy is shown. Stopping the network service (sudo service networking stop) will solve the hanging, but not the unclean unmount. So far, only purging dnsmasq-base seems to do that, which obviously also solves the other problem, as Network Manager will then also be removed.

Although I haven't experienced it yet, this could cause potential data loss; especially for users without a seperate /home partition.

ProblemType: Bug
ApportVersion: 2.5.3-0ubuntu1
Architecture: amd64
Date: Sun Sep 30 12:49:24 2012
DistroRelease: Ubuntu 12.10
Package: dnsmasq-base 2.63-1ubuntu1
PackageArchitecture: amd64
ProcVersionSignature: Ubuntu 3.5.0-16.25-generic 3.5.4
SourcePackage: dnsmasq
Tags: quantal
Uname: Linux 3.5.0-16-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
the-unconventional (the-unconventional-deactivatedaccount-deactivatedaccount) wrote :

I've tested some more tonight, and I can reproduce this on everything I used. Both Ubuntu 12.10 and Xubuntu 12.10 stock, unmodified, clean installs show this behaviour; both in VirtualBox and on my testing machine. Installing all the updates makes no difference.
Also, doing a fully up-to-date netinstall works fine until I install network-manager-gnome (which pulls in dnsmasq-base). That's when this bug starts to show up.

dnsmasq-base causes the 'mount: / is busy' message, which indicates an unclean unmount of the root filesystem (confirmed by fsck on the next boot), and network-manager seems to cause the long delay before actually shutting down. Obviously, purging dnsmasq-base solves both, as this will also remove network-manager.

tags: added: busy is mount root
summary: - dnsmasq-base causes the root filesystem to not cleanly unmount at
- shutdown or reboot
+ In Quantal, the root filesystem is not cleanly unmounted at shutdown or
+ reboot
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

I didn't try to reproduce yet, but I definitely believe there might be a problem; so I'll look into it.

It's probably my fault too ;)

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Needs looking at nm-dns-dnsmasq.c to make sure the daemon is properly shut down when NM stops, which might have been broken by the dbus patch.

Changed in dnsmasq (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
affects: dnsmasq (Ubuntu) → network-manager (Ubuntu)
Changed in network-manager (Ubuntu):
assignee: Mathieu Trudel-Lapierre (mathieu-tl) → nobody
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
milestone: none → ubuntu-12.10
Revision history for this message
Scott Moser (smoser) wrote :

Possibly dupe, probably related https://bugs.launchpad.net/ubuntu/+source/dbus/+bug/1058517 .
There, my experience in a cloud-image was dbus update caused the issue.

Revision history for this message
the-unconventional (the-unconventional-deactivatedaccount-deactivatedaccount) wrote :

Just a side note: when I only purge network-manager and network-manager-gnome, but leave dnsmasq-base installed, the slow shutdown bug is solved, but the unmount problem persists. So I'm not completely sure if it's entirely caused by network-manager, as it also seems to happen when network-manager is not installed at all. If it's actually caused by dbus, it still makes sense though.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Bug 740390 was fixed (which should cover the issues with unmounting). Can you confirm whether that properly corrects most of the issue?

Changed in network-manager (Ubuntu):
assignee: Mathieu Trudel-Lapierre (mathieu-tl) → nobody
status: In Progress → Incomplete
importance: High → Medium
Revision history for this message
the-unconventional (the-unconventional-deactivatedaccount-deactivatedaccount) wrote :

I've just installed the updates; and it doesn't make any difference for me. I also tried reinstalling network-manager, network-manager-gnome and dnsmasq-base, but still nothing. I'll try a clean install in VirtualBox tonight.

Revision history for this message
the-unconventional (the-unconventional-deactivatedaccount-deactivatedaccount) wrote :

Tested this in VirtualBox:

Custom Xubuntu 12.10 installation (netinst with Xfce packages) - fully updated: not fixed
Stock Ubuntu 12.10 installation (installed with ubiquity, unmodified) - fully updated: not fixed
Stock Xubuntu 12.10 installation (installed with ubiquity, unmodified) - fully updated: not fixed
Fresh custom Xubuntu 12.10 installation (latest netinst with Xfce packages) - fully updated (while installing): not fixed

Tested this on my testing machine:

Custom Xubuntu 12.10 installation (netinst with Xfce packages) - fully updated: not fixed
Stock Xubuntu 12.10 installation (installed with ubiquity, unmodified) - fully updated: not fixed
Fresh custom Xubuntu 12.10 installation (latest netinst with Xfce packages) - fully updated (while installing): not fixed

In each and every case, purging dnsmasq-base solves all issues. Purging only network-manager only solves the long shutdown time, but not the unmounting issue.

Am I really the only one noticing the huge increase in shutdown time compared to 12.04, and hasn't anyone else seen fsck show up in every dmesg log?

Revision history for this message
the-unconventional (the-unconventional-deactivatedaccount-deactivatedaccount) wrote :

After tonight's dbus updates, still no change.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Right; turns out this is more likely all caused by bug 1061639. Network-manager stops on stopping dbus; so of course it will never get to stopping if dbus doesn't get its own stop condition.

Marking this bug as a duplicate of 1061639.

Revision history for this message
Christian Niemeyer (christian-niemeyer) wrote :

I'm not sure, if this isn't perhaps a bigger issue. This is definitely NOT FIXED. But most users won't notice it (until it's too late and the filesystem becomes more and more inconsistent). Comment posted on: https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1061639

Sorry for the additionally comment here. But I really want to raise awareness of this issue, because –at least for my installations- it is so critically bad.

It has something to do with services (init-scripts), /etc/init.d/, networking. But it's not only dns-masq, because I added an additional "killall -9 dnsmasqd; sync; sync; sleep 3" in /etc/init.d/umountfs, and it says "no process: dnsmasq", and it makes no difference. Filesystem never gets unmounted cleanly.

Thanks and happy bug fixing!

Revision history for this message
the-unconventional (the-unconventional-deactivatedaccount-deactivatedaccount) wrote :

I agree with Christian. Even though the unmount issues don't seem to happen on my testing machine anymore, shutting down still "feels" weird. Just shut down a 12.04 machine and compare it to a 12.10 machine. You'll just know that something is wrong. It hangs, it takes a long time, and for some people, it still causes harmful issues. And it's almost release time.

Add this to the Amazon mess, and Ubuntu will have the worst publicity in years; if not ever.

Revision history for this message
Christian Niemeyer (christian-niemeyer) wrote :
Download full text (18.0 KiB)

FINALLY worked! :) (for the first time ever with quantal!)

I tried about 2 hours shutting down manually. Stopping, starting init scripts, networking etc. I found this only way, and it was to be done exactly in this order. Contra: I have not yet figured the problem out exactly. I think many commands are not mandatory. Now it would be the time to sort them out.

I wrote a pseudo log file. Out of my bash history. I'm not an expert, but I really hope you get the idea:

>>"(NOTE: DONE BEFORE: sudo apt-get remove --purge modemmanager
COMMENT: "TESTED IT WITH REBOOT, DIDN'T DO ANYTHING GOOD OR BAD ITSELF")

BOOT, FSCK ERROR MESSAGE (EXIT 1, ERRORS RESOLVED), BOOT CONTINUING...

LIGTHDM

CTRL+ALT+F2

LOGIN TERMINAL

sudo service lightdm stop

sudo service dbus stop
COMMENT: "THAT DID HANG FOR A WHILE, APPROX 5 SECONDS / THE SAME IF YOU STOP NETWORKING BEFORE DBUS, IT HANGS, KILLS DBUS, -- THIS ORDER IS BETTER"

sudo service dbus start

sudo service dbus stop
COMMENT: "NOW BOTH STARTS AND STOPS VERY FAST AND CLEAN"

sudo service networking stop
COMMENT: "NETWORKING STOPPED FAST AND CLEANLY"

sudo service networking start
COMMENT: "THEORY: FOR CLEAN SHUTDOWN WE HAVE START/STOP YET AGAIN"

sudo service networking stop

sudo service rsyslog stop

sudo modprobe -r forcedeth bnep rfcomm bluetooth
COMMENT: "SOMETIMES FORCEDETH MODULES BRINGS EVERYONE IN TROUBLE. WHAT ABOUT FORCEDETH? AND I GOT SOME MESSAGES WITH BLUETOOTH MODULE, – WHICH I DON'T HAVE BTW –, SO UNLOADING IT FOR A CLEAN STATE, –– BUT DIDN'T HELPED WHEN ONLY DOING THIS"

sync

sudo init 1
COMMENT: "TAKES A WHILE, AND KILLING ALL REMAINING PROCESSES SAYS IT FAILED"

COMMENT: "NOW WE'RE ROOT, SO NO SUDO. BUT I ADD IT FOR NO CONFUSION"

sudo service udev stop
COMMENT: "STILL THERE STILL GOING (I THINK THAT'S CORRECT). AND STOPS FAST AND CLEAN."

sudo /etc/init.d/networking start
COMMENT: "AGAIN STARTING, BUT ONLY INIT.D, NOT USING "SERVICE". IT SEEMS THAT STARTING/STOPPING THOSE SCRIPTS SOLVES OUR PROBLEM"

COMMENT: "...AND NOW STOPPING POSSIBLY ALL OF THEM"

sudo /etc/init.d/networking stop

sudo /etc/init.d/network-manager stop

sudo /etc/init.d/network-interface-security stop

sudo /etc/init.d/network-interface-container stop

sudo /etc/init.d/network-interface stop

COMMENT: "ALL STOPS FAST AND CLEAN, PROMPTS"

COMMENT: "NEARLY DONE! NOW WE MAKE SURE, THAT FSCK IS FORCED TO RUN. IF ERRORS IT WILL SAY "WAS NOT CLEANY UMOUNTED. RETURN/EXIT VALUE: [1], PRINTED ON SCREEN". IF NO ERROR IT WILL SAY "WAS MOUNTED X TIMES, CHECK FORCED. RETURN VALUE: 0. NO EXIT/ERROR MESSAGE PRINTED, CLEAN; CONTINUING BOOT"

sudo tune2fs -c 1 /dev/sda1 #max mount counts to 1

sudo tune2fs -C 100 /dev/sda1 #make believe, it was mounted 100 times already, will trigger fsck on reboot

COMMENT: "DOING TESTING"

sudo lsof /

sudo fuser /

COMMENT: "AHA! LSOF STILL SHOW 2 (TWO!) PROCESSES NTPD RUNNING. BETTER KILL THEM. HOWEVER, I DON'T THINK THEY ARE THE PROBLEM. BECAUSE IF I ONLY KILL THEM, WITHOUT ALL THE PROCEDURE EXACTLY(!) IN ORDER ABOVE, THEN IT DOESN'T HELP"

sudo killall -15 ntpd

COMMENT: "CHECKING"

sudo ps -e

sudo lsmod

COMMENT: "LOOKS GOOD!"

sudo sync; sync; sync; sudo init 6

***REBOOOT***

....AND...

...

Revision history for this message
Christian Niemeyer (christian-niemeyer) wrote :
Revision history for this message
Marius B. Kotsbak (mariusko) wrote :

I have also seen the strange behaviour that modemmanager seems to initialize at shutdown. Not sure if it is fixed in the latest updates.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

This is nice, but all you've done here is run through manually stopping a bunch of stuff, not necessarily in order. What we need to know is absolutely and only whether the shutdown completes properly and cleanly with the automated procedure, via upstart.

In other words:
- Does upstart correctly run through rc scripts at shutdown, running sendsigs somewhere along the way to kill all processes that are *not* spawned by upstart and that are *not* listed in /run/sendsigs.omit.d;
- Does upstart correctly continue through its events, and properly stop dbus, which will trigger stopping network-manager and modemmanager.

On other instances you're also removing a whole bunch of packages. It's impossible to know which is the real cause of the issue if too many variables change at once -- one package should be removed at a time to figure out what blocks shutdown, or one upstart service should be manually stopped before shutdown to see whether it affects the shutdown procedure.

Regardless, please file a separate bug report if it hasn't already been done, with the very specific details about the shutdown procedure as it is *now*, after release, after the ifupdown fix, etc.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.