[Focal] zsys still offers auto snapshotting when reach near full disk space

Bug #1876334 reported by Chris Newcomer on 2020-05-01
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
zsys (Ubuntu)
High
Jean-Baptiste Lallement
Focal
High
Jean-Baptiste Lallement

Bug Description

[Impact]
 * ZSys is still auto snapshooting and allow for state save when reaching full disk space, erroring out.
 * We now limit it to 80% of full disk space (either bpool or rpool - which can be tweaked in ZSys configuration file), so that ZSys itself is not responsible for ZFS performance degradation which is considered when disk reach 85% capacity.
 * This is covered by an extensive testsuite, for either bpool or rpool or both, both for automated and manual tests.

[Test Case]
 1. Ensure the disk has reached more than 80% capacity
 2. Run zsysctl save -> it should fail due to max capacity
 3. Run sudo zsysctl save --system -> it should fail due to max capacity
 4. Run sudo zsysctl save <manual_name> -> it should fail due to max capacity
extra:
 1. Try the above on a disk which hasn’t reached this capacity -> the save should work.

[Regression Potential]
 * This code impacts automated and manual save and existing tests hasn’t been impacted.
 * New tests with mock functionality for simulating disk capacity has been added.

------

I had been running Focal since the March 4th and installed it via the daily ISO at that time. I opted for ZFS root during the installation process. It was mostly the reason why I went with Focal in the first place.

I ran my frequent, almost daily, apt upgrades to get the latest packages and was happy to see a lot of fixes going in during that period until the release date.

This frequent apt upgrade cadence created a snapshot in rpool/bpool every time. This feature proved useful for me since I was able to restore to a usable state using this once in the past. I was aware of the rpool snapshots, so I was cleaning them out as they built up, but I did not consider the bpool snapshots.

Eventually the small 2GB bpool will fill up with /boot changes and this did happen to me on 30-APR-2020 when I was attempting to upgrade to the latest kernel release. It gave me a cryptic error when trying to run update-initramfs, so I did a G search on the error and it pointed to a disk full situation. When I checked, I saw that only 14MB was free on bpool. I then removed all snapshots that didn't have a coordinating snapshot in rpool and that freed up about 1.2GB in bpool.

I will upload the /var/log/apt/term.log file for the time that I attempted the kernel upgrade and it failed. I will also upload a sosreport from my laptop (which has the same issue at this time without me doing any intervention). It didn't get as many apt updates, so it has free space in bpool still.

I will leave my laptop in the current state in case you need more information. I will probably send snapshot lists from both bpool and rpool on that laptop. (Once again, I did clean up rpool snapshots manually on the laptop).

Let me know if you need more information, log files, command outputs.

Great work on this package so far!
Chris

Chris Newcomer (cnewcomer) wrote :
Chris Newcomer (cnewcomer) wrote :
Chris Newcomer (cnewcomer) wrote :
Chris Newcomer (cnewcomer) wrote :
Chris Newcomer (cnewcomer) wrote :
Jean-Baptiste Lallement (jibel) wrote :

Thanks for your report. It's something we discussed yesterday with Didier and we have to revisit the way we handle bpool ( would a bigger pool be enough for example or would it just push the wall further).

Could you please run

apport-collect 1876334

It'll collect useful information to help us sort out this issue.

Thanks.

Changed in zsys (Ubuntu):
importance: Undecided → High
status: New → Triaged
Changed in zsys (Ubuntu Focal):
status: New → Triaged
importance: Undecided → High
milestone: none → ubuntu-20.04.1
Changed in zsys (Ubuntu):
assignee: nobody → Jean-Baptiste Lallement (jibel)
Changed in zsys (Ubuntu Focal):
assignee: nobody → Jean-Baptiste Lallement (jibel)
satmandu (satadru-umich) wrote :

I'm seeing the same issue and can't update my kernel either. Anyone have a good set of commands to run to clear out the oldest snapshots from pool (and rpool as well, ideally?)

update-initramfs: Generating /boot/initrd.img-5.4.0-31-lowlatency
Error 24 : Write error : cannot write compressed block
E: mkinitramfs failure cpio 141 lz4 -9 -l 24
update-initramfs: failed for /boot/initrd.img-5.4.0-31-lowlatency with 1.
run-parts: /etc/kernel/postinst.d/initramfs-tools exited with return code 1
dpkg: error processing package linux-image-5.4.0-31-lowlatency (--configure):
 installed linux-image-5.4.0-31-lowlatency package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:
 linux-image-5.4.0-31-lowlatency
ERROR rpc error: code = DeadlineExceeded desc = context deadline exceeded
E: Sub-process /usr/bin/dpkg returned an error code (1)

bpool/BOOT/ubuntu_pzg8hb 186M 113M 74M 61% /boot
/dev/nvme0n1p1 511M 7.8M 504M 2% /boot/efi
/dev/nvme0n1p2 45M 8.1M 33M 20% /boot/grub

satmandu (satadru-umich) wrote :

Here's what I did:

sudo zpool set listsnapshots=on bpool
zfs list -o space -r bpool

That gave me a list of snapshots on pool and told me how big they were, and then I was able to go in and delete the right snapshot to get space back.

Hopefully garbage collection will realize this happened and fix everything for me? :)

satmandu (satadru-umich) wrote :

Because after deleting some bpool snapshots I'm definitely seeing errors when I run update-grub complaining about missing snapshots.
e.g.
cannot open 'bpool/BOOT/ubuntu_pzg8hb@autozsys_r0tzci': dataset does not exist

Didier Roche (didrocks) wrote :

An instance of the issue: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1867542 (not dupping because GRUB needs to be fixed there)

Didier Roche (didrocks) on 2020-05-21
summary: - [Focal] zsys does not clear out old snapshots, potentially filling up
- bpool
+ [Focal] zsys still offer auto snapshotting when reach near full disk
+ space

This bug was fixed in the package zsys - 0.5.0

---------------
zsys (0.5.0) groovy; urgency=medium

  [ Jean-Baptiste Lallement ]
  [ Didier Roche ]
  * Fix infinite GC loop (LP: #1870461)
  * Enhance timeout handling to avoid error rpc error: code = DeadlineExceeded
    desc = context deadline exceeded while the daemon is doing work
    (LP: #1875564)
  * Stop taking automated or manual snapshot when there is less than 20% of
    free disk space (LP: #1876334)
  * Enable trim support for upgrading users (LP: #1881540)
  * Only clean up previously linked user datasets when unlinked under USERDATA
    (LP: #1881538)
  * Strategy for deleted user datasets via a new hidden command called by
    userdel (LP: #1870058)
  * Get better auto snapshots message when integrated to apt (LP: #1875420)
  * Update LastUsed on shutdown via a new hidden command service call
    (LP: #1881536)
  * Prevent segfault immediately after install when zfs kernel module isn't
    loaded (LP: #1881541)
  * Don’t try to autosave gdm user (and in general non system user), even if
    systemd --user is started for them. (LP: #1881539)
  * Prevent apt printing errors when zsys is removed without purge
    (LP: #1881535)
  * Some tests enhancements:
    - new tests for all the above
    - allow setting a different local socket for debugging/tests purposes only
    - ascii order datasets in golden files
  * Typos and messages fixes. Direct prints are not prefixed with INFO
    anymore.
  * Refreshed po and readme with the above.

 -- Didier Roche <email address hidden> Mon, 01 Jun 2020 09:26:52 +0200

Changed in zsys (Ubuntu):
status: Triaged → Fix Released
Didier Roche (didrocks) on 2020-06-09
description: updated

Hello Chris, or anyone else affected,

Accepted zsys into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zsys/0.4.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in zsys (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-focal
summary: - [Focal] zsys still offer auto snapshotting when reach near full disk
+ [Focal] zsys still offers auto snapshotting when reach near full disk
space

SRU verification for Focal:
I have reproduced the problem with zsys 0.4.5 in focal and have verified that the version of zsys 0.4.6 in -proposed fixes the issue.

Marking as verification-done

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zsys - 0.4.6

---------------
zsys (0.4.6) focal; urgency=medium

  [ Jean-Baptiste Lallement ]
  [ Didier Roche ]
  * Fix infinite GC loop (LP: #1870461)
  * Enhance timeout handling to avoid error rpc error: code = DeadlineExceeded
    desc = context deadline exceeded while the daemon is doing work
    (LP: #1875564)
  * Stop taking automated or manual snapshot when there is less than 20% of
    free disk space (LP: #1876334)
  * Enable trim support for upgrading users (LP: #1881540)
  * Only clean up previously linked user datasets when unlinked under USERDATA
    (LP: #1881538)
  * Strategy for deleted user datasets via a new hidden command called by
    userdel (LP: #1870058)
  * Get better auto snapshots message when integrated to apt (LP: #1875420)
  * Update LastUsed on shutdown via a new hidden command service call
    (LP: #1881536)
  * Prevent segfault immediately after install when zfs kernel module isn't
    loaded (LP: #1881541)
  * Don’t try to autosave gdm user (and in general non system user), even if
    systemd --user is started for them. (LP: #1881539)
  * Prevent apt printing errors when zsys is removed without purge
    (LP: #1881535)
  * Some tests enhancements:
    - new tests for all the above
    - allow setting a different local socket for debugging/tests purposes only
    - ascii order datasets in golden files
  * Typos and messages fixes. Direct prints are not prefixed with INFO
    anymore.
  * Refreshed po and readme with the above.

 -- Didier Roche <email address hidden> Mon, 01 Jun 2020 09:26:52 +0200

Changed in zsys (Ubuntu Focal):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for zsys has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers