ZFS pools should be automatically scrubbed

Bug #1548009 reported by Richard Laager on 2016-02-21
44
This bug affects 6 people
Affects Status Importance Assigned to Milestone
zfs-linux (Ubuntu)
Medium
Colin Ian King
Xenial
Medium
Unassigned

Bug Description

[Impact]

Xenial shipped with a cron job to automatically scrub ZFS pools, as
desired by many users and as implemented by mdadm for traditional Linux
software RAID. Unfortunately, this cron job does not work, because it needs a PATH line for /sbin, where the zpool utility lives.

Given the existence of the cron job and various discussions on IRC, etc.,
users expect that scrubs are happening, when they are not. This means ZFS
is not pre-emptively checking for (and correcting) corruption. The odds of
disk corruption are admittedly very low, but violating users' expectations
of data safety, especially when they've gone out of their way to use a
filesystem which touts data safety, is bad.

[Test Case]

$ truncate -s 1G test.img
$ sudo zpool create test `pwd`/test.img
$ sudo zpool status test

$ sudo vi /etc/cron.d/zfsutils-linux
Modify /etc/cron.d/zfsutils-linux to run the cron job in a few minutes
(modifying the date range if it's not currently the 8th through the 14th
and the "-eq 0" check if it's not currently a Sunday).

$ grep zfs /var/log/cron.log
Verify in /var/log/cron.log that the job ran.

$ sudo zpool status test

Expected results:
  scan: scrub repaired 0 in ... on <shortly after the cron job ran>

Actual results:
  scan: none requested

Then, add the PATH line, update the time rules in the cron job, and repeat
the test. Now it will work.

- OR -

The best test case is to leave the cron job file untouched, install the
patched package, wait for the second Sunday of the month, and verify with
zpool status that a scrub ran. I did this, on Xenial, with the package I
built. The debdiff is in comment #11 and was accepted to Yakkety.

If someone can get this in -proposed before the 14th, I'll gladly install
the actual package from -proposed and make sure it runs correctly on the
14th.

[Regression Potential]

The patch only touches the cron.d file, which has only one cron job in it.
This cron job is completely broken (inoperative) at the moment, so the
regression potential is very low.

ORIGINAL, PRE-SRU, DESCRIPTION:

mdadm automatically checks MD arrays. ZFS should automatically scrub pools too. Scrubbing a pool allows ZFS to detect on-disk corruption and (when the pool has redundancy) correct it. Note that ZFS does not blindly assume the other copy is correct; it will only overwrite bad data with data that is known to be good (i.e. it passes the checksum).

I've attached a debdiff which accomplishes this. It builds and installs cleanly.

The meat of it is the scrub script I've been using on production systems, both servers and laptops, and recommending in my Ubuntu root-on-ZFS HOWTO, for years, which scrubs all *healthy* pools. If a pool is not healthy, scrubbing it is bad for two reasons: 1) It adds a lot of disk load which could theoretically lead to another failure. We should save that disk load for resilvering. 2) Performance is already less on a degraded pool and scrubbing can make that worse, even though scrubs are throttled. Arguably, I might be being too conservative here, but the marginal benefit of scrubbing a *degraded* pool is pretty minimal as pools should not be left degraded for very long.

The cron.d in this patch scrubs on the second Sunday of the month. mdadm scrubs on the first Sunday of the month. This way, if a system has both MD and ZFS pools, the load doesn't all happen at the same time. If the system doesn't have both types, it shouldn't really matter which week. If you'd rather make it the same week as MD, I see no problem with that.

Richard Laager (rlaager) wrote :

The attachment "zfs-linux.scrub.debdiff.2" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Changed in zfs-linux (Ubuntu):
importance: Undecided → Wishlist
Changed in zfs-linux (Ubuntu):
assignee: nobody → Colin Ian King (colin-king)
Colin Ian King (colin-king) wrote :

Thanks Richard,

Just a few things:

"The meat of it is the scrub script I've been using (and recommending in my HOWTO) for years, which scrubs all *healthy* pools. If a pool is not healthy, scrubbing it is bad for two reasons:

1) It adds a lot of disk load which could lead to another failure. We should save that disk load for resilvering.
2) Performance is already less on a degraded pool and scrubbing will make that worse."

Yep, this sounds perfectly reasonable to me.

1) So, how much extra wear will scrubbing involve? I'm concerned that extra write cycles may add extra wear to users using SSDs.
2) How much extra load and performance loss do we get with scrubbing? It would be good to get some idea so we have an informed idea of the kind of impact this adds.

Secondly, this is actually a feature, so if we go with this I need to fix this issue with a feature freeze exception, which requires some extra work.

Richard Laager (rlaager) wrote :

The disk load added by scrubbing is almost entirely reading. There is a tiny amount of metadata writes to track of the scrub's progress (so it can resume after a reboot). The only time a scrub would write significant data is if it found a bad block (checksum error) and needed to rewrite a good copy of it (from another disk or another part of the disk).

Since it walks the ZFS tree, rather than proceeding in logical block order, scrubbing adds a lot of seeks. That's the load I want to avoid if you're already degraded, out of an abundance of caution, both for performance and safety. (I can't remember actually seeing or hearing of a scrub killing another disk, but the ZFS approach is to be conservative about data safety.) Under normal circumstances, it's definitely not a problem. Scrubbing and resilvering is throttled, and the #1 complaint I've seen is that it is too conservative/slow. (Google "ZFS resilver performance" to see everyone talking about how their resilvers are slow and they want to speed them up.)

At my day job, we have 100% of our storage on ZFS (not on Linux). Scrubs take 1.5 days on one cluster and 4 days on the other, on the exact same hardware, for the exact same data. (The data is the same because they each replicate their data to the other site for disaster recovery purposes.) The difference is that one system has more normal activity than the other, so scrubs have to yield to that. We've never noticed any sort of performance problem while scrubs are running.

Our storage is from Nexenta (whose system is Illumos, not Linux, but still OpenZFS-based). Providing ZFS storage to enterprises is essentially their entire business and they recommend scrubs. Oracle, creator of ZFS, has traditionally (and recently) recommended "at least quarterly" scrubs as a general rule of thumb if you don't know where to start.

In comparing this to the precedent of mdadm... a ZFS scrub would read the same or less data (since MD has to read the entire disk, but ZFS only reads the blocks it has in use), but it would be far more random reads.

I asked in #zfsonlinux and this also seemed worth noting:
(09:17:03) kash: rlaager: i do weekly scrubs everywhere
(09:25:25) kash: rlaager: my scrub zones have included AWS, i don't consider it an expensive operation

Colin Ian King (colin-king) wrote :

What about the issue of scrubbing on systems that don't have memory correction (non-ECC). Is there a possibility that a scrub with silently corrupted memory could actually introduce more harm than good?

Richard Laager (rlaager) wrote :

A false negative, where an objectively corrupt block is treated as valid, is not ideal, but not harmful. The scrub would fail to correct the error, but it wouldn't make it worse. It would be detected as bad on the next read (scrub or otherwise).

There's also a case of a bad block being overwritten and the data gets corrupted on the way to disk. In that case, again, it's less than ideal, but the scrub hasn't made anything worse. It was bad before and is still bad. Since block pointers can't be overwritten in place, the block pointer hasn't changed, and thus the checksum hasn't changed. So the block will be detected as bad on the next read/scrub.

A false positive, where an objectively valid block is treated as corrupt, is where things have the potential to go bad. If the scrub overwrites that block and the good data makes it to disk, it's a no-op, which isn't harmful. If the scrub overwrites that and the good data is corrupted on its way to disk, that's the only case where a scrub makes things worse. As before, the checksum hasn't changed, so we'll know it's bad. And we obviously had a good copy from which to overwrite, so there's still the potential to repair this on the next read (scrub or otherwise).

So it is technically possible that a scrub could make something worse, but it requires the following:
1) Good data fails checksum validation due to a memory fault.
2) Another copy exists. That copy passes checksum validation.
3) The write is corrupted due to a memory fault on its way to disk.

If we're seeing enough memory faults to trigger this, the filesystem is already in serious danger from normal writes.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu):
status: New → Confirmed
Richard Laager (rlaager) wrote :

Is this going to make the Xenial release? What needs to be done still?

Richard Laager (rlaager) on 2016-03-14
description: updated
Seth Arnold (seth-arnold) wrote :

BTW, the parameters that are necessary to tun the relative priority of scrubs and resilvers against normal IO is described in detail in the zfs-module-parameters(5) manpage, especially the "ZFS I/O SCHEDULER" section:

https://github.com/zfsonlinux/zfs/blob/master/man/man5/zfs-module-parameters.5#L1768

The debdiff and cronjob look good to me.

Thanks

Richard Laager (rlaager) on 2016-03-14
description: updated
summary: - ZFS pools should be automatically scrubbed
+ [FFe] ZFS pools should be automatically scrubbed
description: updated

This bug was fixed in the package zfs-linux - 0.6.5.6-0ubuntu5

---------------
zfs-linux (0.6.5.6-0ubuntu5) xenial; urgency=medium

  * Only enable testing on 64 bit arches as these are the ones that ZFS
    officially supports.

 -- Colin Ian King <email address hidden> Tue, 5 Apr 2016 09:37:37 +0100

Changed in zfs-linux (Ubuntu):
status: Confirmed → Fix Released
Richard Laager (rlaager) on 2016-07-07
Changed in zfs-linux (Ubuntu):
status: Fix Released → Confirmed
Richard Laager (rlaager) wrote :

It turns out this isn't actually working. The problem is the PATH in cron, which does not include /sbin, where the zpool command lives. I've attached a debdiff to fix this. I've tested to confirm that the PATH thing fixes it, but on Sunday, I'll know absolutely 100% by verifying that it works "in the wild".

summary: - [FFe] ZFS pools should be automatically scrubbed
+ ZFS pools should be automatically scrubbed
Richard Laager (rlaager) wrote :

I can confirm that the .debdiff I attached in comment #11 fixes the problem. My system ran the scrub today.

Changed in zfs-linux (Ubuntu):
importance: Wishlist → Medium
status: Confirmed → In Progress
Michael Terry (mterry) wrote :

Thanks for the patch! I've uploaded to yakkety, this bug should autoclose.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.6.5.7-0ubuntu4

---------------
zfs-linux (0.6.5.7-0ubuntu4) yakkety; urgency=medium

  * Set PATH in cron.d job to fix monthly scrubs. (LP: #1548009)

 -- Richard Laager <email address hidden> Wed, 06 Jul 2016 22:37:47 -0500

Changed in zfs-linux (Ubuntu):
status: In Progress → Fix Released
Richard Laager (rlaager) on 2016-07-28
description: updated
Michael Terry (mterry) wrote :

I've uploaded the patch to xenial-proposed too. Thanks!

Andy Whitcroft (apw) wrote :

This patch adds /usr/local/bin and /usr/local/sbin to the path. Given the utilities in use are all official binaries this seems unexpected and a potential security risk.

Changed in zfs-linux (Ubuntu Xenial):
status: New → Incomplete
Colin Ian King (colin-king) wrote :

/usr/local/bin and /usr/local/sbin are not required for anything zfs specific, so given that they are redundant they should not be added to the PATH. Plus the security issue as mentioned above.

Colin Ian King (colin-king) wrote :

This should be rejected for Xenial and reverted for Yakkety and a more minimal path used.

Mark Wilkinson (mhw) wrote :

Would it not make more sense to have /usr/lib/zfs-linux/scrub set the PATH that it requires in order to find the zpool command?

Michael Terry (mterry) wrote :

I get not wanting unnecessary PATHs... But is /usr/local really considered a security issue? It's in /usr...

It's part of the standard PATH for the root user. And if an admin has installed tools in /usr/local, wouldn't they expect them to be used?

Andy Whitcroft (apw) wrote :

Ok, comparing this to other users of /etc/cron.d it does seem to be a common idiom used there. To update the path to exactly the above form. So this is not making things any worse. I am inclined to let this go on that basis given it is already in yakkety.

/etc/cron.d/anacron:PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
/etc/cron.d/certbot:PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
/etc/cron.d/popularity-contest:PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

Changed in zfs-linux (Ubuntu Xenial):
status: Incomplete → In Progress

Hello Richard, or anyone else affected,

Accepted zfs-linux into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.6.5.6-0ubuntu11 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in zfs-linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Tim Bishop (tdb) wrote :

Could /usr/lib/zfs-linux/scrub not just give the full path to zpool instead? Then PATH wouldn't need modifying at all.

Changed in zfs-linux (Ubuntu Xenial):
importance: Undecided → Medium
Richard Laager (rlaager) wrote :

Also note that this is the exact PATH from /etc/crontab, which is thus also the PATH under which /etc/cron.{hourly,daily,weekly,monthly} run.

Andy Whitcroft (apw) wrote :

Hello Richard, or anyone else affected,

Accepted zfs-linux into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.6.5.6-0ubuntu12 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Richard Laager (rlaager) wrote :

This works for me, using the first test case. If this feels good enough, feel free to change the tag yourself. Otherwise, I'll do so after verifying the unmodified cron configuration works on the 14th.

Richard Laager (rlaager) wrote :

This works for me "in the wild" (i.e. waiting until today, which was the second Sunday of the month). I had two servers, one with 0.6.5.6-0ubuntu11 and one with 0.6.5.6-0ubuntu12.

tags: added: verification-done
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.6.5.6-0ubuntu12

---------------
zfs-linux (0.6.5.6-0ubuntu12) xenial; urgency=medium

  * Change to include some important tools in zfsutils-linux build.
  - Add tools : arcstat.py, arc_summary.py & dbufstat.py in /usr/sbin.
  - Change utilities path (bindir) to /usr/sbin per
    https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1574342/comments/6
  - Add python related dependencies (LP: #1574342)

zfs-linux (0.6.5.6-0ubuntu11) xenial; urgency=medium

  [ Richard Laager ]
  * Set PATH in cron.d job to fix monthly scrubs. (LP: #1548009)

 -- Eric Desrochers <email address hidden> Thu, 28 Jul 2016 16:18:16 -0400

Changed in zfs-linux (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for zfs-linux has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers