zpool scrub malfunction after kernel upgrade
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
zfs-linux (Ubuntu) |
Fix Released
|
Critical
|
Colin Ian King | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
== SRU Request [BIONIC] ==
The HWE kernel on bionic provides zfs 0.8.1 driver which includes an improved scrub however, the progress stats reported by the kernel are incompatible to the 0.7.x zfs driver.
== Fix ==
Use the new zfs 8.x pool_scan_stat_t extra fields to calculate
the scan progress when using zfs 8.x kernel drivers. Add detection of the kernel module version and use an approximation to the zfs 0.8.0 progress and rate reporting for newer kernels.
For 0.7.5 we can pass the larger 8.x port_scan_stat_t to 0.7.5
zfs w/o problems and ignore these new fields and continue
to use the 0.7.5 rate calculations.
== Test ==
Install the HWE kernel on Bionic, create some large ZFS pools and populate with a lot of data. Issue:
sudo zpool scrub poolname
and then look at the progress using
sudo zpool status
Without the fix, the progress stats are incorrect. With the fix the duration and rate stats as a fairly good approximation of the progress. Since the newer 0.8.x zfs does scanning now in two phases the older zfs tools will only report accurate stats for phase #2 of the scan to keep it roughly compatible with the 0.7.x zfs utils output.
== Regression Potential ==
This is a userspace reporting fix so the zpool status output is only affected by this fix when doing a scrub, so the impact of this fix is very small and limited.
-------
I ran a zpool scrub prior to upgrading my 18.04 to the latest HWE kernel (5.3.0-26-generic #28~18.04.1-Ubuntu) and it ran properly:
eric@eric-8700K:~$ zpool status
pool: storagepool1
state: ONLINE
scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020
config:
NAME STATE READ WRITE CKSUM
storagepool1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-
ata-
mirror-1 ONLINE 0 0 0
ata-
ata-
I ran zpool scrub after upgrading the kernel and rebooting, and now it fails to work properly. It appeared to finish in about 5 minutes but did not, and says it is going slow:
eric@eric-8700K:~$ sudo zpool status
pool: storagepool1
state: ONLINE
scan: scrub in progress since Fri Jan 17 15:32:07 2020
1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time)
0B repaired, 100.00% done
config:
NAME STATE READ WRITE CKSUM
storagepool1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-
ata-
mirror-1 ONLINE 0 0 0
ata-
ata-
errors: No known data errors
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: zfsutils-linux 0.7.5-1ubuntu16.7
ProcVersionSign
Uname: Linux 5.3.0-26-generic x86_64
NonfreeKernelMo
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Fri Jan 17 16:22:01 2020
InstallationDate: Installed on 2018-03-07 (681 days ago)
InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
SourcePackage: zfs-linux
UpgradeStatus: Upgraded to bionic on 2018-08-02 (533 days ago)
modified.
Changed in zfs-linux (Ubuntu): | |
assignee: | nobody → Colin Ian King (colin-king) |
importance: | Undecided → High |
status: | New → Triaged |
Changed in zfs-linux (Ubuntu): | |
importance: | High → Medium |
description: | updated |
Changed in zfs-linux (Ubuntu Bionic): | |
status: | New → Fix Committed |
tags: | added: verification-done-bionic |
Changed in zfs-linux (Ubuntu): | |
status: | In Progress → Fix Released |
Note: rebooting and running the older kernel (4.15.0-74-generic #84-Ubuntu) results in the scrub proceeding at the expected rate.