mountall 2.36.3 hangs in recovery-menu (lvm; mountall 2.36 has no issue)

Bug #1099349 reported by Michael Wisheu on 2013-01-14
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
mountall (Ubuntu)
High
Unassigned
Precise
High
Unassigned
Quantal
High
Unassigned
Raring
High
Unassigned

Bug Description

[Impact]
Following the mountall SRUs for bug #643289, mountall --no-events does not work correctly; the lack of events causes mountall to not finish. This breaks the recovery mode, which invokes mountall --no-events.

[Test case]
1. Boot Ubuntu using the 'recovery mode' option.
2. Choose the 'Enable networking' option.
3. Observe that mountall hangs.
4. Reboot and install mountall from -proposed.
5. Boot Ubuntu using the 'recovery mode' option.
6. Choose the 'Enable networking' option.
7. Confirm that mountall no longer hangs.

[Regression potential]
Minimal; the code changes should only have any effect when mountall is called with --no-events, which is only done in the recovery menu, and that's currently broken.

Fresh Ubuntu 12.04.1 amd64 install with the following partition layout / lvm setup:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
sda 8:0 0 20G 0 disk
├─sda1 8:1 0 243M 0 part /boot
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 19.8G 0 part
  ├─u1204--64-swap_1 (dm-0) 252:0 0 1020M 0 lvm [SWAP]
  ├─u1204--64-root (dm-1) 252:1 0 7.6G 0 lvm /
  └─u1204--64-usr--local (dm-2) 252:2 0 11.1G 0 lvm /usr/local

The installation works as expected and there are no imminent issues during normal usage.
The issue is with the recovery menu. If someone boots into a rescue entry in the boot menu the rescue menu appears as expected but all entries that remount the root volume rw will hang on mountall.

Michael Wisheu (wisheu) wrote :

Screenshot 1 of 4

Michael Wisheu (wisheu) wrote :

Screenshot 2 of 4

Michael Wisheu (wisheu) wrote :

Screenshot 3 of 4

Michael Wisheu (wisheu) wrote :

Screenshot 4 of 4

Michael Wisheu (wisheu) wrote :

The issue lies with the mountall run in /usr/share/recovery-mode/recovery-menu line 79. (mountall can be interrupted by pressing Ctrl+C)

I've attached the debug output of this mountall run.
(code: mountall $force_fsck $fsck_fix --no-events --debug 2>&1 | tee -a /run/mountall.debug)

I've attached the strace output of this mountall run.
(code: strace -f -ttt -s 256 -o/run/mountall.strace mountall $force_fsck $fsck_fix --no-events --debug)

Michael Wisheu (wisheu) wrote :
Michael Wisheu (wisheu) wrote :

Please note that the mountall.debug and mountall.strace runs hang with a different output. I suspect that this issue lies within the change introduced in mountall 2.36.1: " Allow 'mounting' and 'mounted' signals from unrelated mounts to be processed in parallel. Thanks to Alexander Achenbach for the initial patch. LP: #643289."

If I downgrade mountall to 2.36 the recovery menu works as expected. mountall 2.36.1, 2.36.2 and 2.36.3 are showing this issue.
(command: sudo apt-get install mountall=2.36)

Furthermore I have a VMware VM that has exactly this issue. If needed I can supply a developer with a copy.

Margarita Manterola (marga-9) wrote :

Just to make it clear, this bug is similar to #1078926, but it's not fixed by the fix provided in 2.36.2. Something is still wrong with the patch in 2.36.1 that makes mountall hang under certain conditions. Likely lvm related, althought not all lvm setups hang.

Margarita Manterola (marga-9) wrote :

I was also able to reproduce this in a fresh precise machine, installing it under kvm.

Similar lsblk:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 15.6G 0 disk
├─vda1 253:1 0 237M 0 part /boot
├─vda2 253:2 0 1K 0 part
└─vda5 253:5 0 15.4G 0 part
  ├─braavos-root (dm-0) 252:0 0 1.9G 0 lvm /
  ├─braavos-local (dm-1) 252:1 0 12.6G 0 lvm /usr/local
  └─braavos-swap (dm-2) 252:2 0 956M 0 lvm [SWAP]

Running mountall --verbose on the failing machine image:

/ is local
/proc is virtual
/sys is virtual
/sys/fs/fuse/connections is virtual
/sys/kernel/debug is virtual
/sys/kernel/security is virtual
/dev is virtual
/dev/pts is virtual
/run is virtual
/run/lock is virtual
/run/shm is virtual
/boot is local
/usr/local is local
/dev/mapper/braavos-swap is swap
mounting event sent for /sys/fs/fuse/connections
mounting event sent for /sys/kernel/debug
mounting event sent for /sys/kernel/security
mounting event sent for /run/lock
mounting event sent for /run/shm
checking /boot
fsck from util-linux 2.20.1
checking /
checking /usr/local
fsck from util-linux 2.20.1
mounting event sent for swap /dev/mapper/braavos-swap
fsck from util-linux 2.20.1
/dev/mapper/braavos-root: clean, 51505/121920 files, 233327/487424 blocks
fsck / [779] exited normally
mounting event sent for /
/dev/vda1: clean, 230/121440 files, 40274/242688 blocks
fsck /boot [778] exited normally
mounting event sent for /
/dev/mapper/braavos-local: clean, 36/825776 files, 92681/3301376 blocks
fsck /usr/local [780] exited normally

And then it hangs... This happens reliably, every time I select 'network'. Pressing ctrl-c continues with the normal boot and everything is mounted:

/dev/mapper/braavos-root on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
/dev/mapper/braavos-local on /usr/local type ext4 (rw)
/dev/vda1 on /boot type ext2 (rw)

I noticed that if I select the 'root' option which doesn't run mountall and then manually run mountall --verbose, it works fine, but if I run mountall --no-events --verbose, then it hangs. The first event to get "handled" when not using --no-events is they /sys event.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mountall (Ubuntu):
status: New → Confirmed
Margarita Manterola (marga-9) wrote :

After recompiling mountall and libnih (which was quite painful, since most tests were not passing), I got this very small backtrace:

#0 0x00007ffff6c5d003 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff7bcd814 in nih_main_loop () at main.c:581
#2 0x0000555555565acd in main (argc=2, argv=0x7fffffffe678) at mountall.c:4098

The nih code is:

                /* Now we hang around until either a signal comes in (and
                 * calls nih_main_loop_interrupt), a file descriptor we're
                 * watching changes in some way or it's time to run a timer.
                 */
                ret = select (nfds, &readfds, &writefds, &exceptfds,
                              (next_timer ? &timeout : NULL));

I don't know if this is too useful, but at least we know where it's hanging.

What I find interesting is that this portion of the code in mountall did not change from 2.36 to 2.36.3, the changes are earlier in the file, however they are causing the call to nih_main_loop to hang.

Margarita Manterola (marga-9) wrote :

Some extra info:
 I tested recompiling the original mountall 2.36, it still worked.

 I tested this with quantal's 2.42 and 2.42ubuntu0.3. Failed with both.
 I also tested it with raring's 2.46, and Debian's 2.46, also failed.

[I installed those versions in the precise box, I didn't do a full quantal/raring install]

I'm going to test a full quantal install, but it looks like not only precise is affected.

Margarita Manterola (marga-9) wrote :

Tried it: fresh quantal install, with mountall=2.42, same LVM partitioning: it hangs when using recovery mode.

Steve Langasek (vorlon) wrote :

Confirmed; the issue here is that the --no-events flag is causing the event emitting to be skipped (as expected), but as a side effect ensures that the callback is never called, since the callback setup is handled via the event.

Changed in mountall (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
Steve Langasek (vorlon) on 2013-01-14
Changed in mountall (Ubuntu Quantal):
status: New → Triaged
importance: Undecided → High
Changed in mountall (Ubuntu Precise):
status: New → Triaged
importance: Undecided → High
Steve Langasek (vorlon) wrote :

I think the attached patch should fix the issue with --no-events, but I haven't tested it. Could you test?

Bryan Quigley (bryanquigley) wrote :

That patch worked for me, patching mountall from precise. Tested with recover broken packages.

tags: added: patch
Margarita Manterola (marga-9) wrote :

Thanks Steve for figuring it out so quickly! The patch worked fine both in my virtual test machine and in my local desktop.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mountall - 2.47

---------------
mountall (2.47) unstable; urgency=low

  [ Dave Chiluk ]
  * Adjust parsing of options so mountall doesn't strip options that are
    substrings of these strings (showthrough, optional, bootwait, nobootwait
    or timeout). This fixes the issue where timeo was getting stripped from
    nfs mounts. LP: #1041377.

  [ Steve Langasek ]
  * Ensure callbacks are called directly when running with --no-events,
    otherwise the "event" handling of the non-events never finishes and
    mountall hangs. LP: #1099349.

 -- Steve Langasek <email address hidden> Mon, 14 Jan 2013 16:37:36 -0800

Changed in mountall (Ubuntu Raring):
status: Triaged → Fix Released
Michael Wisheu (wisheu) wrote :

Can someone please give a quick update till when we can expect to see a fixed package in Precise? Thanks.

Steve Langasek (vorlon) on 2013-01-17
tags: added: regression-update
Steve Langasek (vorlon) on 2013-01-17
description: updated
Changed in mountall (Ubuntu Quantal):
status: Triaged → In Progress
Changed in mountall (Ubuntu Precise):
status: Triaged → In Progress
Michael Wisheu (wisheu) wrote :

Is there any update?

Hello Michael, or anyone else affected,

Accepted mountall into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/mountall/2.36.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in mountall (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in mountall (Ubuntu Quantal):
status: In Progress → Fix Committed
Colin Watson (cjwatson) wrote :

Hello Michael, or anyone else affected,

Accepted mountall into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/mountall/2.42ubuntu0.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Michael Wisheu (wisheu) wrote :

Hi Colin,

I've tested mountall 2.36.4 on Precise and it resolved the issue. Downgrading the package back to 2.36.3 reestablished the issue.
From my point of view mountall 2.36.4 looks good for Precise to me.

Best,

Michael

Steve Langasek (vorlon) wrote :

Michael, thanks very much for the verification. Is there any chance you'd be able to do the verification for the quantal update too? (No problem if not, but doesn't hurt to ask!)

tags: added: verification-done-precise
Michael Wisheu (wisheu) wrote :

Let me upgrade my VM with the issue to Quantal...

Michael Wisheu (wisheu) wrote :

Hi Steve,

I've upgraded my Precise VM with the mountall issue to Quantal and confirmed that the issue is also present under Quantal for this VM. Then I've installed mountall 2.42ubuntu0.4 and the issue is resolved.
From my point of view mountall 2.42ubuntu0.4 looks good for Quantal to me.

Best,

Michael

Steve Langasek (vorlon) on 2013-01-29
tags: added: verification-done
removed: verification-done-precise verification-needed
Michael Wisheu (wisheu) wrote :

Hi everyone,

Is there any ETA when the fix will be released for Precise?
I'm playing with the idea to push the proposed package to our fleet but would hold back if the release would happen in the next days anyway.

Best,

Michael

Steve Langasek (vorlon) wrote :

We normally age SRUs in -proposed for 7 days before publishing to -updates; so it should be ready to go in by Tuesday.

Michael Wisheu (wisheu) wrote :

Thank you Steve for the fast reply. Tuesday sounds great. ^^

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mountall - 2.42ubuntu0.4

---------------
mountall (2.42ubuntu0.4) quantal-proposed; urgency=low

  * Ensure callbacks are called directly when running with --no-events,
    otherwise the "event" handling of the non-events never finishes and
    mountall hangs. LP: #1099349.
 -- Steve Langasek <email address hidden> Wed, 16 Jan 2013 17:12:54 -0800

Changed in mountall (Ubuntu Quantal):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mountall - 2.36.4

---------------
mountall (2.36.4) precise-proposed; urgency=low

  * Ensure callbacks are called directly when running with --no-events,
    otherwise the "event" handling of the non-events never finishes and
    mountall hangs. LP: #1099349.
 -- Steve Langasek <email address hidden> Wed, 16 Jan 2013 17:13:23 -0800

Changed in mountall (Ubuntu Precise):
status: Fix Committed → Fix Released
Michael Wisheu (wisheu) wrote :

Thank you to everyone who was involved in fixing this bug so quickly.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers