lxc 'delete' fails to destroy ZFS filesystem 'dataset is busy'

Bug #1779156 reported by Scott Moser
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Colin Ian King
lxc (Ubuntu)
Fix Released
Undecided
Unassigned
Cosmic
Fix Released
Undecided
Unassigned
Disco
Fix Released
Undecided
Unassigned
Eoan
Fix Released
Undecided
Unassigned

Bug Description

I'm not sure exactly what got me into this state, but I have several lxc containers that cannot be deleted.

$ lxc info
<snip>
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    <snip>
    -----END CERTIFICATE-----
  certificate_fingerprint: 3af6f8b8233c5d9e898590a9486ded5c0bec045488384f30ea921afce51f75cb
  driver: lxc
  driver_version: 3.0.1
  kernel: Linux
  kernel_architecture: x86_64
  kernel_version: 4.15.0-23-generic
  server: lxd
  server_pid: 15123
  server_version: "3.2"
  storage: zfs
  storage_version: 0.7.5-1ubuntu15
  server_clustered: false
  server_name: milhouse

$ lxc delete --force b1
Error: Failed to destroy ZFS filesystem: cannot destroy 'default/containers/b1': dataset is busy

Talking in #lxc-dev, stgraber and sforeshee provided diagnosis:

 | short version is that something unshared a mount namespace causing
 | them to get a copy of the mount table at the time that dataset was
 | mounted, which then prevents zfs from being able to destroy it)

The work around provided was

 | you can unstick this particular issue by doing:
 | grep default/containers/b1 /proc/*/mountinfo
 | then for any of the hits, do:
 | nsenter -t PID -m -- umount /var/snap/lxd/common/lxd/storage-pools/default/containers/b1
 | then try the delete again

ProblemType: Bug
DistroRelease: Ubuntu 18.10
Package: linux-image-4.15.0-23-generic 4.15.0-23.25
ProcVersionSignature: Ubuntu 4.15.0-23.25-generic 4.15.18
Uname: Linux 4.15.0-23-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.10-0ubuntu3
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: smoser 31412 F.... pulseaudio
 /dev/snd/controlC2: smoser 31412 F.... pulseaudio
 /dev/snd/controlC0: smoser 31412 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
Date: Thu Jun 28 10:42:45 2018
EcryptfsInUse: Yes
InstallationDate: Installed on 2015-07-23 (1071 days ago)
InstallationMedia: Ubuntu 15.10 "Wily Werewolf" - Alpha amd64 (20150722.1)
MachineType: b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-23-generic root=UUID=f897b32a-eacf-4191-9717-844918947069 ro quiet splash vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-23-generic N/A
 linux-backports-modules-4.15.0-23-generic N/A
 linux-firmware 1.174
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/09/2015
dmi.bios.vendor: Intel Corporation
dmi.bios.version: RYBDWi35.86A.0246.2015.0309.1355
dmi.board.asset.tag: ���������������������������������
dmi.board.name: NUC5i5RYB
dmi.board.vendor: Intel Corporation
dmi.board.version: H40999-503
dmi.chassis.asset.tag: ���������������������������������
dmi.chassis.type: 3
dmi.chassis.vendor: ���������������������������������
dmi.chassis.version: ���������������������������������
dmi.modalias: dmi:bvnIntelCorporation:bvrRYBDWi35.86A.0246.2015.0309.1355:bd03/09/2015:svn:pn:pvr:rvnIntelCorporation:rnNUC5i5RYB:rvrH40999-503:cvn:ct3:cvr:
dmi.product.family: ���������������������������������
dmi.product.name: ���������������������������������
dmi.product.version: ���������������������������������
dmi.sys.vendor: ���������������������������������

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
tags: added: kernel-da-key
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lxc (Ubuntu Cosmic):
status: New → Confirmed
Changed in lxc (Ubuntu):
status: New → Confirmed
Revision history for this message
Ryan Harper (raharper) wrote :

This is still around. Scott wrote up a script to handle cleaning this up.

https://gist.github.com/smoser/2c78cf54a1e22b6f05270bd3fead8a5c

Revision history for this message
Ryan Harper (raharper) wrote :
Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
Colin Ian King (colin-king) wrote :

Cosmic is now end-of-life. Does this still occur on Disco?

Changed in lxc (Ubuntu Cosmic):
status: Confirmed → Won't Fix
status: Won't Fix → Confirmed
Changed in linux (Ubuntu):
importance: High → Medium
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :

Do we have any hunches on how to reproduce this issue?

Revision history for this message
Paride Legovini (paride) wrote :

Hi,

I hit this issue on Bionic, Disco and Eoan. Our (server-team) Jenkins nodes are often filled by stale LXD containers which are left there because of "fails to destroy ZFS filesystem" errors.

Some thoughts and qualitative observations:

0. This is not a corner case, I see the problem all the time.

1. There is probably more than one issue involved here, even we get similar error messages when trying to delete a container.

2. One issue is about mount namespaces: stray mounts that prevent to the container to be deleted. This issue can be worked around by entering the namespace and unmounting. The container can then be deleted. When this happens retrying `lxd delete` doesn't help. This is described in [0]. I think the newer versions of LXD are way less prone to end up in this case.

3. In other cases `lxc delete --force` fails with the "ZFS dataset is busy" error, but the deletion succeeds if the delete is retried immediately after. In my case I don't even need to wait for a single second: the second delete in `lxc delete --force <x> ; lxc delete <x>` already works. Stopping and deleting the container as separate operations also works.

4. It has been suggested in [0] that LXD could retry the "delete" operation if it fails. stgraber wrote that LXD *already* retries the operation 20 times over 10 seconds, but the outcome is still a failure. It is not clear to me how retrying manually works, while LXD auto-retrying does not.

5. Some time ago (weeks) the error message changed from "Failed to destroy ZFS filesystem: dataset is busy" to "Failed to destroy ZFS filesystem:" with no other detail. I can't tell which specific upgrade triggered this change.

6. I see this problem in both file-backed and device-backed zpools.

7. I'm not sure system load plays a role: I often hit the problem on my lightly loaded laptop.

8. I don't have clear steps to reproduce the problem, but I personally see it happening most of the time. While I don't have steps to reproduce with 100% probability, I'm seeing this more times than I don't. But see the next point.

9. In my experience a system can be in a "bad state" (the problem always happens), or in a "good state" (the problem never happens). When the system is in a "good state" we can `lxc delete` hundreds of containers with no errors. I can't tell what makes a system switch from a good to a bad state. I almost certain I also saw systems switching from a bad to a good state.

10. The lxcfs package it not installed in the systems where I hit this issue

That's it for the moment. Thanks for looking into this!

Paride

[0] https://github.com/lxc/lxd/issues/4656

Revision history for this message
Paride Legovini (paride) wrote :

To be clear: point(3) above is what I see happening most of the time at the moment.

Revision history for this message
Colin Ian King (colin-king) wrote :

Reproducer is as follows:

lxc launch ubuntu:bionic zfs-bug-test
Creating zfs-bug-test
Starting zfs-bug-test
lxc delete zfs-bug-test --force
Error: Failed to destroy ZFS filesystem:

Can reproduce this on Eoan with latest 5.2, 5.3 kernel.

Revision history for this message
Colin Ian King (colin-king) wrote :

The ZFS destroy checks the reference count on the dataset with zfs_refcount_count(&ds->ds_longholds) != expected_holds and returns EBUSY in dsl_destroy_head_check_impl.

Revision history for this message
Colin Ian King (colin-king) wrote :

Been digging into this a bit further with lxc 3.17 on Eoan.

lxc launch ubuntu:bionic zfs-bug-test
Creating zfs-bug-test
Starting zfs-bug-test
lxc delete zfs-bug-test --force
Error: Failed to destroy ZFS filesystem: Failed to run: zfs destroy -r default/containers/z1: cannot destroy 'default/containers/z1': dataset is busy

However, re-running the delete works fine:
lxd.lxc delete z1 --force

Looking at system calls, it appears that the first failing delete --force command attempts to destroy the zfs file system multiple times and then gives up. In doing so, it umounts the zfs file system. Hence the second time the delete is issued it works fine because zfs is now umounted. So it appears that the ordering in the delete is not as it expected.

It seems to do:
zfs destroy x 10 (or so and then gives up because of errno 16 -EBUSY)
zfs umount

It should be doing:
zfs umount
zfs destroy

This matches the observed reference counting. The ref count is only dropped once the umount is complete. Attempts to destroy it before that will cause an -EBUSY.

Revision history for this message
Colin Ian King (colin-king) wrote :

See: https://github.com/lxc/lxd/issues/4656#issuecomment-535531229

In https://github.com/lxc/lxd/blob/master/lxd/storage_zfs_utils.go#L255 the umount is done by

    err := unix.Unmount(mountpoint, unix.MNT_DETACH)

The umount2(2) manpage writes about MNT_DETACH:

    Perform a lazy unmount: make the mount point unavailable for new accesses, immediately disconnect the filesystem and all filesystems mounted below it from each other and from the mount table, and actually perform the unmount when the mount point ceases to be busy.

Could this be it? The MNT_DETACH umount looks partially asynchronous. All the subsequent destroy commands may fail because they keep the mount point busy. Finally the retry loop ends, the umount happens for real and the following destroy succeeds.

Revision history for this message
Colin Ian King (colin-king) wrote :

A fix has landed in lxd, I refer you to the following comment:

https://github.com/lxc/lxd/issues/4656#issuecomment-541266681

Please check if this addresses the issues.

Revision history for this message
Paride Legovini (paride) wrote :

The released fix does not appear to fully address the problem:

https://github.com/lxc/lxd/issues/4656#issuecomment-542630903

Revision history for this message
Paride Legovini (paride) wrote :

The current stable snap seems to fully address the issue. See:

https://github.com/lxc/lxd/issues/4656#issuecomment-542886330

and following.

Revision history for this message
Scott Moser (smoser) wrote :

best. fix. ever.

working after a reboot.

$ snap list lxd
Name Version Rev Tracking Publisher Notes
lxd 3.18 12211 stable canonical✓ -

Paride Legovini (paride)
Changed in lxc (Ubuntu):
status: Confirmed → Fix Released
Changed in lxc (Ubuntu Cosmic):
status: Confirmed → Fix Released
Changed in lxc (Ubuntu Disco):
status: New → Fix Released
Changed in lxc (Ubuntu Eoan):
status: Confirmed → Fix Released
no longer affects: linux (Ubuntu Eoan)
no longer affects: linux (Ubuntu Disco)
no longer affects: linux (Ubuntu Cosmic)
Changed in linux (Ubuntu):
status: Triaged → Invalid
Revision history for this message
Paride Legovini (paride) wrote :

Bug status updated accordingly. I set the linux task to Invalid as the kernel was not involved after all. Thanks again Stéphane and Colin!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.