installer hangs detecting existing partitions

Bug #725408 reported by Phillip Susi
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
partman-auto (Ubuntu)
Fix Released
High
Evan
Nominated for Natty by Phillip Susi

Bug Description

Binary package hint: partman-auto

Today's natty daily build hangs detecting partitions. I have traced the cause to this change:

partman-auto (93ubuntu5) natty; urgency=low

  * Set the executable bit on reuse/*.
 -- Evan Dandrea <email address hidden> Tue, 15 Feb 2011 12:15:40 +0000

Related branches

Phillip Susi (psusi)
Changed in partman-auto (Ubuntu):
assignee: nobody → Evan Dandrea (ev)
Revision history for this message
Evan (ev) wrote :

Does 93ubuntu6 fix the bug for you?:

partman-auto (93ubuntu6) natty; urgency=low

  * Don't explode on unpartitioned space in the reuse option (LP:
    #722198).

 -- Evan Dandrea <email address hidden> Thu, 24 Feb 2011 15:29:10 +0000

Revision history for this message
Phillip Susi (psusi) wrote :

Nope, still happens with Ubiquity 2.5.18.

Revision history for this message
Phillip Susi (psusi) wrote :

By the way, this seems to only happen when dmraid and/or lvm is active.

Revision history for this message
Colin Watson (cjwatson) wrote :

Does http://bazaar.launchpad.net/~ubuntu-core-dev/partman-auto/ubuntu/revision/595 fix this? It might be worth applying locally, just to see.

Revision history for this message
Phillip Susi (psusi) wrote :

It looks like this went into ubiquity 2.5.19, which seems to be even worse. With 2.5.18, I can at least deactivate dmraid and lvm volumes, and then the installer will get past that screen without hanging. With 2.5.19, it seems to reactivate them and then hang. Also it now seems to hang while one of the partitions is mounted in /tmp, and leaves it mounted and various partman processes running when I quit the installer.

Revision history for this message
Colin Watson (cjwatson) wrote :

On IRC, I asked Phillip to attach syslog and partman logs from the installation attempt. It should be possible to debug this from those.

Changed in partman-auto (Ubuntu):
status: New → Incomplete
Revision history for this message
Phillip Susi (psusi) wrote :
Revision history for this message
Phillip Susi (psusi) wrote :
Phillip Susi (psusi)
Changed in partman-auto (Ubuntu):
importance: Undecided → High
status: Incomplete → Triaged
tags: added: regression-potential
Revision history for this message
Colin Watson (cjwatson) wrote :

Hm. It's still not absolutely clear what's going on here. Would you mind trying one more test?

With a current daily build, boot into a live session (using "Try Ubuntu" if you get the try-or-install screen), edit /lib/partman/lib/base.sh and put 'set -x' on the line immediately after '. /usr/share/debconf/confmodule', then run the installer with 'ubiquity -d'. After it hangs, quit everything you can (killing installer processes from a terminal if need be), then attach /var/log/syslog, /var/log/partman, and /var/log/installer/debug to this bug report.

Thanks in advance.

Revision history for this message
Phillip Susi (psusi) wrote :

After closing the installer, I notice the following processes still running:

sh -c /usr/share/ubiquity/activate-dmraid && /bin/partman
/bin/sh /bin/partman
parted_server

Revision history for this message
Phillip Susi (psusi) wrote :
Revision history for this message
Phillip Susi (psusi) wrote :
Revision history for this message
Phillip Susi (psusi) wrote :
Revision history for this message
Colin Watson (cjwatson) wrote :

Hmm. The 'set -x' output is truncated, confusingly, but I think I have a guess. My suspicion is that 'blockdev --setro' fails when the block device is in use due to being part of a DM-RAID array or LVM physical volume.

Revision history for this message
Colin Watson (cjwatson) wrote :

... except that doesn't make sense because those devices are excluded from consideration by partman.

What are the large "unknown" partitions on /dev/sdc1 and /dev/sdd1?

Revision history for this message
Phillip Susi (psusi) wrote :

One of them is an SSD, the other is a 1.5 TB WD Green drive, both used as LVM PVs.

Revision history for this message
Phillip Susi (psusi) wrote :

I noticed that unmounting the first partition failed, could this be the problem?

Mar 20 22:48:25 ubuntu ubiquity: umount: /dev/mapper/nvidia_ahhfbbffp1: not mounted

Revision history for this message
Evan (ev) wrote :

While that alone wouldn't cause the error (it's guarded with a true), it is nonetheless quite strange:

Mar 20 22:48:25 ubuntu ubiquity: + mount -o ro /dev/mapper/nvidia_ahhfbbffp1 /tmp/tmp.FuQfIqOMOD
Mar 20 22:48:25 ubuntu ntfs-3g[6264]: Version 2010.8.8 external FUSE 28
Mar 20 22:48:25 ubuntu ntfs-3g[6264]: Mounted /dev/dm-2 (Read-Only, label "", NTFS 3.1)
Mar 20 22:48:25 ubuntu ntfs-3g[6264]: Cmdline options: ro
Mar 20 22:48:25 ubuntu ntfs-3g[6264]: Mount options: ro,allow_other,nonempty,relatime,fsname=/dev/dm-2,blkdev,blksize=4096
Mar 20 22:48:25 ubuntu ntfs-3g[6264]: Ownership and permissions disabled, configuration type 1
Mar 20 22:48:25 ubuntu ubiquity: +
Mar 20 22:48:25 ubuntu ubiquity: grep
Mar 20 22:48:25 ubuntu ubiquity: -s
Mar 20 22:48:25 ubuntu ubiquity: DISTRIB_ID
Mar 20 22:48:25 ubuntu ubiquity: /tmp/tmp.FuQfIqOMOD/etc/lsb-release
Mar 20 22:48:25 ubuntu ubiquity:
Mar 20 22:48:25 ubuntu ubiquity: + release=
Mar 20 22:48:25 ubuntu ubiquity: + true
Mar 20 22:48:25 ubuntu ubiquity: + umount /dev/mapper/nvidia_ahhfbbffp1
Mar 20 22:48:25 ubuntu ubiquity: umount: /dev/mapper/nvidia_ahhfbbffp1: not mounted
Mar 20 22:48:25 ubuntu ubiquity: + true

So it successfully mounted, but somewhere between that and the call to umount, when the only thing between them is a grep (that is also guarded with a true), it went away.

Revision history for this message
Evan (ev) wrote :

Phillip,

Could you please try commenting out the calls to blockdev in /lib/partman/automatically_partition/15reuse/choices before the partitioning step, and see if that gets you any further along?

Thanks

Revision history for this message
Phillip Susi (psusi) wrote :

I don't think it went away. I think that mount canonicalized the device name so that it can not be unmounted using the non canonicalized name, and since it is left mounted, something further down the line hangs.

Since this happens with the ntfs partition but not the ext4 partition, maybe the problem is in mount-ntfs3g?

Revision history for this message
Phillip Susi (psusi) wrote :

Yep, that's what is going on. The ntfs partition is listed as /dev/dm-2 by mount, and remains mounted. Attempting to umount it by the /dev/mapper name, OR /dev/dm-2 fails saying it is not mounted. To unmount it, you have to umount the directory where it was mounted. If I remove that ntfs partition, then the installer proceeds normally, so somehow that stuck mount is making the whole partman process hang up.

Revision history for this message
Phillip Susi (psusi) wrote :

This mount brokenness seems to be on Maverick as well. mount -t ntfs -o ro /dev/mapper/nvidia_ahhfbbff1 /mnt then shows /dev/dm-7 being the mounted device, and both umount /dev/mapper/nvidia_ahhfbbff1 and umount /dev/dm-7 fail saying it is not mounted.

I wonder if this should be a separate bug filed against fuse?

Revision history for this message
Phillip Susi (psusi) wrote :

15reuse/choices, 15reuse/do_option, and 25replace/choices all umount $path in the normal execution, but $mountpoint in the cleanup error handler. Changing these to also umount $mountpoint instead fixes the problem.

It seems that there are two other bugs that feed into this one:

1) ntfs-3g canonicalizes the name so that if you mount with /dev/mapper/foo, the device name listed in mtab is /dev/dm-n, so you can not umount /dev/mapper/foo

2) umount won't even recognize and translate /dev/dm-n to the mount point, apparently because of the '-' involved. Editing mtab and removing the '-' allows you to umount /dev/dmn successfully.

Even so, there seems to be another underlying bug somewhere in the installer because it probably should not hang just because a partition was already mounted.

Revision history for this message
Colin Watson (cjwatson) wrote :

umount(8) says that unmounting by device is obsolete anyway, so I don't know if ntfs-3g/fuse needs to be fixed separately - we should just not do obsolete things.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package partman-auto - 93ubuntu14

---------------
partman-auto (93ubuntu14) natty; urgency=low

  * Fix reuse and replace options to umount using the path name
    instead of the device name to work around a bug where ntfs-3g
    on dmraid can not be unmounted with the device name, which
    then caused the installer to hang (LP: #725408).
 -- Phillip Susi <email address hidden> Sun, 10 Apr 2011 12:27:57 -0400

Changed in partman-auto (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Evan (ev) wrote :

Brilliant catch, Phillip.

Revision history for this message
Phillip Susi (psusi) wrote :

I still wonder though, why the process hangs just because the fs is left mounted. Seems to still be an underlying bug that this fix just avoids triggering. I have a feeling that this will come back to bite us in the ass at some point when something else triggers it.

Revision history for this message
Jean-Pierre (jean-pierre-andre) wrote :

Phillip, in post # 23 you say "1) ntfs-3g canonicalizes the name so that if you mount with /dev/mapper/foo, the device name listed in mtab is /dev/dm-n, so you can not umount /dev/mapper/foo"

A similar remark has just been posted to the ntfs-3g forum http://tuxera.com/forum/viewtopic.php?f=3&t=27391&sid=cbf10a9a3e0fce875eebb4605ef24028 What ntfs-3g does to the device name is limited to removing occurrences of ".." and "." in the device path, and resolving symbolic links for security reasons. Is "/dev/mapper/foo" a symbolic link to "/dev/dm-n" ? If so, the proper fix is to have it accepted by umount, instead of working it around in all filesystem drivers.

Revision history for this message
Phillip Susi (psusi) wrote :

The comment seems to be wrong then since /dev/mapper/* are not symbolic links.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers