Ubiquity Erase Disk and Install Fails to create Swap Space

Bug #1552539 reported by Gene Soo
52
This bug affects 8 people
Affects Status Importance Assigned to Milestone
casper (Ubuntu)
Fix Released
Critical
Martin Pitt
ubiquity (Ubuntu)
Invalid
Critical
Mathieu Trudel-Lapierre

Bug Description

Daily Build of Gnome 16.04 using installation option to Erase Disk and Install fails with following message:
"The creation of swap space in Partition #5 of SCSi3(0,0,0) (sda) failed.

This is a Virtualbox based install using virtual disks that had a prior installation of a daily build.
The installation process does not fail when I delete and recreate the Virtual Machine.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: ubiquity 2.21.47
ProcVersionSignature: Ubuntu 4.4.0-9.24-generic 4.4.3
Uname: Linux 4.4.0-9-generic x86_64
ApportVersion: 2.20-0ubuntu3
Architecture: amd64
CasperVersion: 1.367
Date: Thu Mar 3 04:26:59 2016
ExecutablePath: /usr/lib/ubiquity/bin/ubiquity
InstallCmdLine: file=/cdrom/preseed/ubuntu-gnome.seed boot=casper initrd=/casper/initrd.lz quiet splash --- maybe-ubiquity
InterpreterPath: /usr/bin/python3.5
LiveMediaBuild: Ubuntu-GNOME 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160302)
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: ubiquity
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Gene Soo (genesoo77072) wrote :
Revision history for this message
Gene Soo (genesoo77072) wrote :

Additional information: If I select Try Ubuntu and use Gparted to delete the existing partitions, the installation works when you restart the installation.

Revision history for this message
Phillip Susi (psusi) wrote :

It looks like systemd is reactivating the swap partition before it can be reformatted.

Phillip Susi (psusi)
Changed in ubiquity (Ubuntu):
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1552539

tags: added: iso-testing
Revision history for this message
Erick Brunzell (lbsolost) wrote :

I had been reporting repetitive failures of Xenial images as bug #990744. Is this a duplicate or a new issue?

Revision history for this message
Dave Morley (davmor2) wrote :

This could be new so it's best to keep them separate and then the developers can pass them both if they are the same this is just the one that apport reports to as the wording is identical. So this needs to be the focus bug as it will get more hits.

Revision history for this message
Dave Morley (davmor2) wrote :

This happens on any automatic selection (I've not tried side-by-side) Only way I have found around it is to manually partition the system then it honour the unmounting of the swap partition to format it.

Reproducible on Macbook pro 2011 and Lenovo ideapad however not happening in kvm I assume the wipe triggers faster on it is vm maybe.

Changed in partman (Ubuntu):
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
Changed in ubiquity (Ubuntu):
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

Hello. Link to this bug is broken in https://wiki.ubuntu.com/XenialXerus/ReleaseNotes

The sentence:

Please see this [[bug|https://bugs.launchpad.net/ubuntu/+source/ubiquity/+bug/1552539]]

should really be:

Please see this [[https://bugs.launchpad.net/ubuntu/+source/ubiquity/+bug/1552539|bug]]

I can't fix it as the page is marked immutable (at least for my account perspective).

Thank you for your attention.

Revision history for this message
Erick Brunzell (lbsolost) wrote :

This is not limited to entire disc installs. I've also been able to reproduce it performing auto-resize and manual partitioning installs.

Revision history for this message
Phillip Susi (psusi) wrote :

I am guessing that partman is not inhibiting systemd's auto mounting feature and it needs to.

Revision history for this message
john fisher (john-jpfisher) wrote :

I get variations when installing to a gpt disk (usb stick).
1) Start with usbstick, using gdisk and gparted from 16.04 beta2, create new gpt table. Create 400M Fat32 partition and mark it boot. Create an ext4 partition and a swap partition.
2) boot onto16.04 beta2 server livecd usbstick, select install.

a. the partitioner will not allow you to make part1 Fat32 bootable, but if you made it bootable in step 1, it will accept the setting.
b. running the partitioner produces different error messages, but it will not succeed:
~ cant erase existing files on part1
~ can't format part1
~ can't create the ext4 partition
~ failed to remove conflicting files

I tried the swapoff workaround and it didn't help.

This procedure was tricky but possible in 14.04, I can't find any way to make it work in 16.04 beta2

Revision history for this message
Lyn Perrine (walterorlin) wrote :

This does not affect the entire disk for the lubuntu alternate installer images, IT gives a prompt to unmount the partitions in use and then continues on with the install. Ubiquity has no prompt for this. This was with an mbr system.

Mathew Hodson (mhodson)
Changed in ubuntu-release-notes:
status: New → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

I think the cleanest way would be for "erase disk" to make sure that all partitions get unmounted/swapoff'ed before erasing.

Otherwise, it's fairly easy to disable systemd-gpt-auto-generator (the thing which discovers and enables GPT partitions of type swap) in casper (essentially just rm /root/lib/systemd/system-generators/systemd-gpt-auto-generator). That will work for this particular case, but of course still not be sufficient if the user mounts anything in the life session and then starts ubiquity to reformat.

Revision history for this message
Martin Pitt (pitti) wrote :

Discussed with Mathieu. I think it's best to completely disable systemd's GPT auto generator for the live session. There is no good use case to ever automatically mount/swapon existing partitions on the hard disk in the live session as that's supposed to not change your system. This should also fix this bug.

affects: partman (Ubuntu) → casper (Ubuntu)
Changed in casper (Ubuntu):
assignee: Mathieu Trudel-Lapierre (cyphermox) → Martin Pitt (pitti)
status: Triaged → In Progress
Changed in ubiquity (Ubuntu):
status: Triaged → Invalid
Changed in casper (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package casper - 1.375

---------------
casper (1.375) xenial; urgency=medium

  * scripts/casper-bottom/16disable_gpt_auto_mount: Disable systemd's GPT auto
    generator. We don't want this on a live system that is supposed to not
    touch your hard disk by default, and particularly we don't want to
    automatically enable discovered swap partitions. (LP: #1552539)
  * debian/control: Drop Vcs-Bzr:, which does not exist any more.

 -- Martin Pitt <email address hidden> Tue, 19 Apr 2016 15:53:13 +0200

Changed in casper (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Martin Wimpress  (flexiondotorg) wrote :

I am still encountering this issue on the April 19th images. As requested in IRC I'm attaching syslog ad partman logs here.

Revision history for this message
Martin Wimpress  (flexiondotorg) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :

Reopening then, thanks. So now we know that this is not due to GPT partition auto-detection (disabling that was still right for the live system, but apparently not the cause for this, sorry).

Changed in ubiquity (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
Erick Brunzell (lbsolost) wrote :

I just got this error while performing a manual partitioning install with Ubuntu GNOME Xenial 20160419 amd64 where two other Ubuntu GNOME installs exist along with the existing swap. Obviously I don't want to just delete and recreate swap because I don't want to have to putz around correcting UUID's afterwards.

Revision history for this message
Martin Pitt (pitti) wrote :

I tried to reproduce this from Martin Wimpress' recipe:

 * Do an install with "use full disk" (which creates a swap partition on /dev/vda5)
 * Reinstall in the same mode
 * Before starting ubiuqity, /dev/vda5 is active in the live session (that's done by a casper script that auto-detects and enables swap partitions)
 * I see in syslog that mkswap succeeded and was done, and wiped the old swap signature, so partman did its thing.

I tried that three times, no luck so far :-/

Revision history for this message
Martin Pitt (pitti) wrote :

As per comment #19 I tried to use manual partitioning. The "Format" box in the swap partition is grayed out and disabled, but the installer always formats swap anyway, so this isn't very relevant. Either way, mkswap again succeeded and vda5 got reactivated as expected. Looks like this is some nasty timing problem somewhere.

Revision history for this message
Martin Pitt (pitti) wrote :

Dave has a way to reproduce this in a loop on his system. I logged in through ssh and attached strace -fvvs1024 to all the running processes with "ubiquity" or "partman" in the name. Attaching the raw data, I'll stare at it in a bit.

I think the next step is to find out which precise commands ubiquity and/or partman are running in order to set up the swap partition, and then try to reproduce these on the command line to study what's up with those. I don't see any commands logged in the partman log, hence I was trying strace.

Sorry, I know absolutely nothing about partman or the partitioning/fs creation part of installer at large, so I'm just poking in the dark so far.

Revision history for this message
Martin Pitt (pitti) wrote :

So according to the traces, these are the commands that ubiquity calls:

  swapoff /dev/sda3
  grep "^/dev/sda3 " /proc/swaps
  log-output -t partman --pass-stdout mkswap /dev/sda3

and the latter fails with

  open("/dev/sda3", O_RDONLY|O_EXCL|O_CLOEXEC) = -1 EBUSY (Device or resource busy

I tried to reproduce this with just

  swapoff /dev/sda3; grep "^/dev/sda3 " /proc/swaps; sleep 0.5; mkswap /dev/sda3; swapon /dev/sda3

for varying values in the sleep (or no sleep at all), so far not successfully yet. I also don't believe that this is generally broken as it is quite hard to reproduce. Slower machines/hard disks do seem to help, though.

Another theory is that something else has the swap device open; ubiquity (or some called programs) open it quite a lot, things like blkid; but mkswap works fine while the device is opened read-only (tail -f /dev/sda3) or even write-only (dd of=/dev/sda3) and I also let blkid race against mkswap:

    while blkid -p /dev/sda3; do true; done # in one shell
    while mkswap /dev/sda3; do true; done # in another

on Dave's machine, and all of these work.

I also put systemd into debug mode (systemd-analyze set-log-level debug) and watched journalctl -f to verify that there is no automatic activation of the new swap partition after mkswap. It just tracks swapoff and swapon via the uevents.

Thus so far I'm none the wiser yet.

Revision history for this message
Martin Pitt (pitti) wrote :

Another observation from the strace: when mkswap runs, /dev/sda3 is already enabled again:

21626 open("/proc/swaps", O_RDONLY|O_CLOEXEC) = 4
21626 read(4, "Filename\t\t\t\tType\t\tSize\tUsed\tPriority\n/dev/sda3 partition\t8251388\t0\t-1\n", 1024)
 = 100
[...]
21626 open("/dev/sda3", O_RDONLY|O_EXCL|O_CLOEXEC) = -1 EBUSY (Device or resource busy)

There is a "dd if=/dev/sda3 of=991754715136-1000204140543/old_uuid" running which has the device open (pid 21620), but it finishes well before mkswap starts. I don't see anything else accessing /dev/sda3 in the straces in between, so something from outside interferes here (triggered by udev or similar).

Revision history for this message
Martin Pitt (pitti) wrote :

I managed to grab a journal with debugging enabled. This shows that a change event for sda3 gets picked up which reactivates the dev-sda3.swap unit and then calls swapon:

Apr 20 13:11:21 ubuntu systemd[1]: dev-sda3.device: Changed dead -> plugged
Apr 20 13:11:21 ubuntu systemd[1]: dev-sda3.swap: Trying to enqueue job dev-sda3.swap/start/fail
Apr 20 13:11:21 ubuntu systemd[1]: dev-sda3.swap: Installed new job dev-sda3.swap/start as 2745
Apr 20 13:11:21 ubuntu systemd[1]: dev-sda3.swap: Enqueued job dev-sda3.swap/start as 2745
Apr 20 13:11:21 ubuntu systemd[1]: dev-sda3.swap: About to execute: /sbin/swapon /dev/sda3

That swapon actually fails in this log because it's busy, but it eventually succeeds.

So this answers *what* is calling swapon. It does not yet answer what exactly happens in between the swapoff and mkswap calls, as doing just those from a shell don't trigger this behaviour. I suppose some uevent is generated in between which triggers the re-activation.

Revision history for this message
Martin Pitt (pitti) wrote :

I'm still unable to synthetically trigger an event that apparently happens after swapoff and before mkswap that would trigger dev-sda3.device/dev-sda3.swap like that. I tried reading and writing /dev/sda3 and even udevadm trigger'ing it.

Dave shut down his machine to try something (removing /scripts/casper-bottom/13swap to avoid this existing swap partition to go into the live system's /etc/fstab in the first place), which might be a good enough workaround for the release.

Some notes:

 * My experiments above have never touched/changed dev-sda3.device, only the data on that partition. "dev-sda3.device: Changed dead -> plugged" sounds like the entire partition got removed and re-added, which I didn't try yet.

 * If we don't understand/cannot fix the real issue in time, then changing /scripts/casper-bottom/13swap to merely swapon existing swap partitions (or leave them alone completely) instead of writing them to the life system's /etc/fstab might be good enough at this point.

Revision history for this message
Martin Pitt (pitti) wrote :

I'm able to replicate this behaviour with the following in QEMU:

  swapoff /dev/vda5
  udevadm trigger --action=remove --sysname-match=vda5
  udevadm trigger --action=add --sysname-match=vda5

This makes dev-vda5.device go down and back up, and thus looks like it got hotplugged, and then the fstab entry for /dev/vda5 gets triggered which auto-enables the swap partition again.

This remove/add event can happen if partman removes the existing partition and recreates it, which is fairly plausible.

Revision history for this message
Marius Gedminas (mgedmin) wrote :

I wonder if telling the kernel to re-read the partition table (with e.g. sfdisk -R /dev/vda, or kpartx) would trigger this bug.

Revision history for this message
Martin Pitt (pitti) wrote :

This was discussed quickly in #ubuntu-release, and at this point we need a relatively unintrusive bandaid, so we'll do this in casper. We avoid creating the /etc/fstab entry in the live system for swap partitions and just enable them with swapon directly in casper's 13fstab hook.

Changed in casper (Ubuntu):
status: Fix Released → In Progress
Changed in ubiquity (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Martin Pitt (pitti) wrote :

> I wonder if telling the kernel to re-read the partition table would trigger this bug.

Indeed, "swapoff /dev/vda5" followed by "partprobe" does that.

Martin Pitt (pitti)
Changed in casper (Ubuntu):
status: In Progress → Fix Committed
milestone: none → ubuntu-16.04
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package casper - 1.376

---------------
casper (1.376) xenial; urgency=medium

  * scripts/casper-bottom/13swap: Don't add detected swap partitions to the
    live system's fstab, just swapon it. This avoids creating *.swap units for
    those which automatically trigger whenever partman re-creates swap
    partitions, as this races with partman's own swapon (while it's
    indistinguishable from a hotplug/early boot event in systemd). By
    disabling the fstab entries we disable that automatic start of swap
    partitions, as a relatively unintrusive bandaid for the final release.
    (LP: #1552539)
  * hooks/casper: Copy swapon into the casper initrd for the above.

 -- Martin Pitt <email address hidden> Wed, 20 Apr 2016 16:26:55 +0200

Changed in casper (Ubuntu):
status: Fix Committed → Fix Released
Mathew Hodson (mhodson)
no longer affects: ubuntu-release-notes
Revision history for this message
Phillip Susi (psusi) wrote :

Martin, we have always mounted the swap partition from the live cd to make sure we have enough memory to operate. Why would we want to disable that now, and *only* for GPT partitioned disks?

Revision history for this message
Martin Pitt (pitti) wrote :

> Martin, we have always mounted the swap partition from the live cd to make sure we have enough memory to operate.

That's not a good justification as installation has to work as well if there are no swap partitions on the hard disk.

> Why would we want to disable that now

The change above didn't disable that -- we just enable swap in the initrd now, via a direct "swapon" instead of adding them to fstab. (See the above notes for the reason).

> and *only* for GPT partitioned disks?

This isn't related to GPT vs. MBR at all, both the bug and the fix apply to both.

Revision history for this message
Phillip Susi (psusi) wrote :

Ahh, I see... the mention of systemd's "GPT auto generator" made it sound like it was specific to GPT.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.