ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"

Bug #1587686 reported by LeetMiniWheat on 2016-06-01
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Native ZFS for Linux
linux (Ubuntu)
Colin Ian King
zfs-linux (Ubuntu)
Colin Ian King

Bug Description

[SRU Justification][XENIAL]

Problem: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"


Upstream commit https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51

Without the fix, the ztest will fail after hours of soak testing. With the fix, the issue can't be reproduced.


This fix is an upstream fix and therefore passed the ZFS integration tested. I have also tested this thoroughly with the kernel team ZFS regression tests and not found any issues, so the regression potential is slim to zero.


Problem: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"

This bug affects the xenial kernel built-in ZFS as well as the package zfs-dkms. I don't believe ZFS 0.6.3-stable or 0.6.4-release are effected, 0.6.5-release seems to have included the offending commit. Sorry for excessive "Affects" tagging, I'm still new to this and unsure of the proper packages to report this against and/or how to properly add the upstream issues/commits.

Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129
"ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file."

How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail:
ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z*
(I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space)

Upstream fix: https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51
Description: Fix ztest truncated cache file
"Commit efc412b updated spa_config_write() for Linux 4.2 kernels to
truncate and overwrite rather than rename the cache file. This is
the correct fix but it should have only been applied for the kernel
build. In user space rename(2) is needed because ztest depends on
the cache file."
Associated pull request for above commit: https://github.com/zfsonlinux/zfs/pull/4130

I'm not sure why this wasn't backported to release but it's in zfs master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic, 4.4.0-23-generic, 4.4.0-22-lowlatency, and 4.4.0-23-lowlatency as well as various xenial master-next builds. After applying the above commit patch to kernel and building/installing kernel manually, ztest runs fine. I've also separately tested the commit patch on zfs-dkms package which also appears to fix the issue. Note however, there may still be some other outstanding ztest related issues upstream - especially when preempt and hires timers are used. I'm currently testing more heavily against lowlatency builds and master-next.

(I'm unsure how to associate this bug with multiple packages but zfs-dkms and linux-image-* packages both are affected).

P.S. Also of note is https://github.com/zfsonlinux/zfs/commit/60a4ea3f948f1596b92b666fc7dd21202544edbb "Fix inverted logic on none elevator comparison" - which interestingly was signed-off-by canonical but curiously not included in the xenial kernel or zfs-dkms packages. It was however, backported to 0.6.5-release upstream.

summary: - Running ztest repeatedly for long periods of time eventually results in
- "zdb: can't open 'ztest': No such file or directory"
+ ZFS: Running ztest repeatedly for long periods of time eventually
+ results in "zdb: can't open 'ztest': No such file or directory"
description: updated
description: updated
description: updated
description: updated
description: updated

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1587686

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Changed in zfs-linux (Ubuntu):
status: New → In Progress
assignee: nobody → Colin Ian King (colin-king)
importance: Undecided → Medium
LeetMiniWheat (white-phoenix) wrote :

Interestingly this fixes the issue in 4.4.0-23-generic but not 4.4.0-23-lowlatency, and it appears my local builds of master-next on both 4.4.0-24-generic and 4.4.0-24-lowlatency fail still. perhaps something in the 4.4.10 sublevel patches changed something. I've also merged 4.4.11 and 4.4.12 into master-next for my local build to test against those and so far my generic build hasn't failed but on master-next plus 4.4.11 merged it was failing. something is fishy and I'm not sure where, but there's multiple bug reports upstream regarding ztest and some patches in master - but this truncated cache file patch should have been in 0.6.5-release anyways.

Also in case this matters, I've been testing on a 2 node NUMA machine (2x xeon) with ECC memory with no reported memory errors.

LeetMiniWheat (white-phoenix) wrote :

Sorry, correction to above: I meant perhaps something changed since 4.4.11 since the 4.4.10 based xenial kernels seem fine (except the curious issue with lowlatency build). I've run into some stack traces regarding __pthread and some occasional failures in the spare tests with "returned 0, expected 75" on newer builds.

There's a few interesting upstream patches that potentially fix these new errors, though this should probably be brought to the attention of upstream if ztest has issues on 0.6.5-release:
"Skip ctldir znode in zfs_rezget to fix snapdir issues"
"OpenZFS 6739 - assumption in cv_timedwait_hires"
"Fix do_div() types in condvar:timeout"

Either way I think xenial should probably sync with upstream 0.6.5.x, though I understand this is a sensitive matter. I reported this upstream against 0.6.5-release about this missing "Fix ztest truncated cache file" patch which hopefully should be queued for the next point release and maybe these other issues will be discovered and fixed as well.

Colin Ian King (colin-king) wrote :

I've applied upstream fix https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51 and build some test kernels. I'm currently testing these, but it appears the reproducer takes a while to run to completion.


If you can test this, I'll get this fix applied as a Stable Release Update.

LeetMiniWheat (white-phoenix) wrote :

Thanks for looking into this, I'll test that build tonight but I assume I'll see similar results.

In my previous tests with this commit applied I still occasionally ran into the error (albeit far less often, sometimes not at all, or rather quickly) and also some other traces regarding pthreads (current 0.6.5-release sort of incorrectly uses pthreads and ASSERTs and other things at the moment from what I understand, and there's a lot of upstream work being done on it in master). lowlatency kernels seem to fail faster on it too which is a bit confusing. I still think there's a lot of corner case bugs in ZFS and ztest. Fully fixing ztest/ZFS/SPL for 0.5.6-release would likely be way too invasive to backport, and it looks like bandaids such as this only prolong the inevitable failure.

After much cherry picking, trial & error, and commenting on some upsteam commits I don't believe ztest was intended for end users or as a reliable long-term stress tool - nor does it get as much developer attention for releases since it's not a real-world test. One upstream developer/maintainer even commented that ztest is intended for ZFS developers (implying end users shouldn't be using it?) - which makes me question why it's even included in zfsutils-linux if it's fundamentally broken on release versions. If it's this unreliable then it will create many more false positives for others looking to test the stability of Ubuntu's ZFS, resulting in people thikning ZFS or Ubuntu's ZFS implementation is broken when in fact it may be perfectly fine under real world workloads.

ztest still works as a short term test for ZFS functions though and this commit probably did belong in release (they've marked it for milestone) but as mentioned above there's many other outstanding issues this tool brings to light (whether falsely positive or not).

On a side note, I'd be interested in seeing ZFS ran under AFL (AFL Filesystem fuzzing, a tool which recently discovered many upstream bugs in existing kernel filesystems) since many corner case bugs were found in current filesystems with fixes incoming for backport to 4.4.13, 4.5.7, and 4.6.2 however LinuxFoundation's Oracle AFL event/presentation only included the most commonly used in-kernel filesystems.

Sorry if this is noise, but hopefully this will bring more awareness to this issue which may not even be an issue, the correct fix may be to move ztest to another (dev or debug?) package.

Colin Ian King (colin-king) wrote :

I've run these tests now for 24 hours w/o issue on the original kernel and on my 'fixed' kernel, with no issues in either, so I'm unable to reproduce this issue. Are there any specific configuration options on your H/W that I need to try and duplicate as maybe my configuration is not able to trip the issue.

LeetMiniWheat (white-phoenix) wrote :

Appreciate you looking into this, I was only able to test your builds for about 5 hours on generic kernel version so far (doing some hardware upgrades at the moment, but my current test system is torture-test stable).

My test hardware was 2x (westmere) Intel Xeon E5620's (2 NUMA nodes) with 12GB (2GBx6) ECC RDIMMs on each CPU (24GB total) on ubuntu-server 16.04. ztest was ran on default /tmp however I had /tmp mounted on tmpfs with 10G limit, but from what I could tell it was not exceeding that limit.

I believe this issue becomes more apparent in 4.4.11 and 4.4.12 (and possibly 4.4.13 now) for some reason since those were failing for me within a few hours with this "fix" applied, whereas latest stable I compiled with fix seemed okay. I think there's some race conditions of some sort with newer kernels, especially since I saw different results on the lowlatency kernel awhile back (on the same stable release).

I'll do some more testing if I have some time, and I want to test this on some other distros as well but I think the fix might not work on future kernel releases that integrate 4.4.11, 4.4.12, and 4.4.13 since some of the patches may have changed some core functions which uncovered ZFS bugs again.

It's still possible it somehow only effects my hardware/OS only. Unless I was compiling the kernel strangely, I was doing a git clone from master-next, checking out latest stable (detached head) and applying/commiting the patch. My 4.4.11 and 4.4.12 builds were were manually applied cleanly from upstream on top of xenial master-next (neither were merged into master-next at the time), so that could also have been a possible issue - there was a few redundant patches I skipped that were already in master-next though.

However, the bug still stands on stock stable xenial kernel - and this patch seems to fix it (at least on generic, still unsure about lowlatency).

Compiling debian/ubuntu kernels from git is pretty complicated though with conflicting documentation. I was using this command after checking out and appluing patch:
fakeroot debian/rules clean
fakeroot debian/rules updateconfigs
fakeroot debian/rules binary-headers binary-generic binary-perarch
(or binary-lowlatency for lowlatency builds)
I'm not using cloud-tools packages.

Anyways I guess you can close this and it can be reopened if I have time to attempt to reproduce the bug. it's not a critical patch but it's queued for 0.6.5-release upstream so there's probably no harm including it in ubuntu kernel.


description: updated

Hello LeetMiniWheat, or anyone else affected,

Accepted zfs-linux into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/ in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in zfs-linux (Ubuntu Xenial):
status: New → Fix Committed
Colin Ian King (colin-king) wrote :

I'm running some tests on this overnight. Will report back on this later over the weekend.

Colin Ian King (colin-king) wrote :

Tests completed without any issues. I'm marking this as verified.

tags: added: verification-done
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Colin Ian King (colin-king) wrote :

Currently soak testing this. Will be complete in ~8 hours time.

Colin Ian King (colin-king) wrote :

I've completed 14 hours of testing and cannot reproduce the issue with the -proposed kernel. Also the -proposed kernel passes all the ZFS regression tests, so it looks good to me.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Martin Pitt (pitti) wrote :

Colin, please fix this in yakkety so that the SRU can be released.

Launchpad Janitor (janitor) wrote :
Download full text (5.8 KiB)

This bug was fixed in the package linux - 4.4.0-30.49

linux (4.4.0-30.49) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1597897

  * FCP devices are not detected correctly nor deterministically (LP: #1567602)
    - scsi_dh_alua: Disable ALUA handling for non-disk devices
    - scsi_dh_alua: Use vpd_pg83 information
    - scsi_dh_alua: improved logging
    - scsi_dh_alua: sanitze sense code handling
    - scsi_dh_alua: use standard logging functions
    - scsi_dh_alua: return standard SCSI return codes in submit_rtpg
    - scsi_dh_alua: fixup description of stpg_endio()
    - scsi_dh_alua: use flag for RTPG extended header
    - scsi_dh_alua: use unaligned access macros
    - scsi_dh_alua: rework alua_check_tpgs() to return the tpgs mode
    - scsi_dh_alua: simplify sense code handling
    - scsi: Add scsi_vpd_lun_id()
    - scsi: Add scsi_vpd_tpg_id()
    - scsi_dh_alua: use scsi_vpd_tpg_id()
    - scsi_dh_alua: Remove stale variables
    - scsi_dh_alua: Pass buffer as function argument
    - scsi_dh_alua: separate out alua_stpg()
    - scsi_dh_alua: Make stpg synchronous
    - scsi_dh_alua: call alua_rtpg() if stpg fails
    - scsi_dh_alua: switch to scsi_execute_req_flags()
    - scsi_dh_alua: allocate RTPG buffer separately
    - scsi_dh_alua: Use separate alua_port_group structure
    - scsi_dh_alua: use unique device id
    - scsi_dh_alua: simplify alua_initialize()
    - revert commit a8e5a2d593cb ("[SCSI] scsi_dh_alua: ALUA handler attach should
      succeed while TPG is transitioning")
    - scsi_dh_alua: move optimize_stpg evaluation
    - scsi_dh_alua: remove 'rel_port' from alua_dh_data structure
    - scsi_dh_alua: Use workqueue for RTPG
    - scsi_dh_alua: Allow workqueue to run synchronously
    - scsi_dh_alua: Add new blacklist flag 'BLIST_SYNC_ALUA'
    - scsi_dh_alua: Recheck state on unit attention
    - scsi_dh_alua: update all port states
    - scsi_dh_alua: Send TEST UNIT READY to poll for transitioning
    - scsi_dh_alua: do not fail for unknown VPD identification

linux (4.4.0-29.48) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1597015

  * Wireless hotkey fails on Dell XPS 15 9550 (LP: #1589886)
    - intel-hid: new hid event driver for hotkeys
    - intel-hid: fix incorrect entries in intel_hid_keymap
    - intel-hid: allocate correct amount of memory for private struct
    - intel-hid: add a workaround to ignore an event after waking up from S4.

  * cgroupfs mounts can hang (LP: #1588056)
    - Revert "UBUNTU: SAUCE: (namespace) mqueue: Super blocks must be owned by the
      user ns which owns the ipc ns"
    - Revert "UBUNTU: SAUCE: kernfs: Do not match superblock in another user
      namespace when mounting"
    - Revert "UBUNTU: SAUCE: cgroup: Use a new super block when mounting in a
      cgroup namespace"
    - (namespace) bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
    - (namespace) bpf, inode: disallow userns mounts
    - (namespace) ipc: Initialize ipc_namespace->user_ns early.
    - (namespace) vfs: Pass data, ns, and ns->userns to mount_ns
    - SAUCE: (namespace) S...


Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (4.8 KiB)

This bug was fixed in the package zfs-linux -

zfs-linux ( xenial; urgency=medium

  * Sync with relevant upstream fixes (LP: #1594871)
   - Fix user namespaces uid/gid mapping
     As described in torvalds/linux@5f3a4a2 the &init_user_ns, and
     not the current user_ns, should be passed to posix_acl_from_xattr()
     and posix_acl_to_xattr(). Conveniently the init_user_ns is
     available through the init credential (kcred).
     (upstream commit 874bd959f4f15b3d4b007160ee7ad3f4111dd341)
     ZFS #4177
   - Fix ZPL miswrite of default POSIX ACL
     Commit 4967a3e introduced a typo that caused the ZPL to store the
     intended default ACL as an access ACL. Due to caching this problem
     may not become visible until the filesystem is remounted or the inode
     is evicted from the cache. Fix the typo.
     (upstream commit 98f03691a4c08f38ca4538c468e9523f8e6b24be)
     ZFS #4520
   - Create unique partition labels
     When partitioning a device a name may be specified for each partition.
     Internally zfs doesn't use this partition name for anything so it
     has always just been set to "zfs".
     However this isn't optimal because udev will create symlinks using
     this name in /dev/disk/by-partlabel/. If the name isn't unique
     then all the links cannot be created.
     Therefore a random 64-bit value has been added to the partition
     label, i.e "zfs-1234567890abcdef". Additional information could
     be encoded here but since partitions may be reused that might
     result in confusion and it was decided against.
     (upstream commit fbffa53a5cdb9b796de5afc9be8c1f79619253d4)
     ZFS #4517
   - Fix inverted logic on none elevator comparison
     Commit d1d7e2689db9e03f1 ("cstyle: Resolve C style issues") inverted
     the logic on the none elevator comparison. Fix this and make it
     cstyle warning clean.
     (upstream commit 60a4ea3f948f1596b92b666fc7dd21202544edbb)
     ZFS #4507
   - Remove wrong ASSERT in annotate_ecksum
     When using large blocks like 1M, there will be more than UINT16_MAX
     qwords in one block, so this ASSERT would go off. Also, it is possible
     for the histogram to overflow. We cap them to UINT16_MAX to prevent this.
     (upstream commit 21ea9460fa880bb072a9ca9d845aef740f9d3af6)
     ZFS #4257
   - Fix 'zpool import' blkid device names
     When importing a pool using the blkid cache only the device
     node path was added to the list of known paths for a device.
     This results in 'zpool import' always using the sdX names
     in preference to the 'path' name stored in the label.
     To fix the issue the blkid import path has been updated to
     add both the 'path', 'devid', and 'devname' names from the
     label to the known paths. A sanity check is done to ensure
     these paths do refer to the same device identified by blkid.
     (upstream commit c9ca152fd1de1b0fd959e772b9a25d14a891952b)
     ZFS #4523, #3043
   - Use udev for partition detection
     When ZFS partitions a block device it must wait for udev to create
     both a device node and all the device symlinks. This process takes
     a variable length of t...


Changed in zfs-linux (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for zfs-linux has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Changed in zfs-linux (Ubuntu Xenial):
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
no longer affects: zfs-linux (Ubuntu)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers