ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Native ZFS for Linux |
New
|
Undecided
|
Unassigned | |||
linux (Ubuntu) |
Fix Released
|
Medium
|
Colin Ian King | |||
zfs-linux (Ubuntu) | ||||||
Xenial |
Fix Released
|
Medium
|
Colin Ian King |
Bug Description
[SRU Justification]
Problem: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
[FIX]
Upstream commit https:/
[TEST CASE]
Without the fix, the ztest will fail after hours of soak testing. With the fix, the issue can't be reproduced.
[REGRESSION POTENTIAL]
This fix is an upstream fix and therefore passed the ZFS integration tested. I have also tested this thoroughly with the kernel team ZFS regression tests and not found any issues, so the regression potential is slim to zero.
-------
Problem: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
This bug affects the xenial kernel built-in ZFS as well as the package zfs-dkms. I don't believe ZFS 0.6.3-stable or 0.6.4-release are effected, 0.6.5-release seems to have included the offending commit. Sorry for excessive "Affects" tagging, I'm still new to this and unsure of the proper packages to report this against and/or how to properly add the upstream issues/commits.
Upstream bug report: https:/
"ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file."
How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail:
ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z*
(I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space)
Upstream fix: https:/
Description: Fix ztest truncated cache file
"Commit efc412b updated spa_config_write() for Linux 4.2 kernels to
truncate and overwrite rather than rename the cache file. This is
the correct fix but it should have only been applied for the kernel
build. In user space rename(2) is needed because ztest depends on
the cache file."
Associated pull request for above commit: https:/
I'm not sure why this wasn't backported to release but it's in zfs master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic, 4.4.0-23-generic, 4.4.0-22-
(I'm unsure how to associate this bug with multiple packages but zfs-dkms and linux-image-* packages both are affected).
P.S. Also of note is https:/
summary: |
- Running ztest repeatedly for long periods of time eventually results in - "zdb: can't open 'ztest': No such file or directory" + ZFS: Running ztest repeatedly for long periods of time eventually + results in "zdb: can't open 'ztest': No such file or directory" |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
assignee: | nobody → Colin Ian King (colin-king) |
Changed in zfs-linux (Ubuntu): | |
status: | New → In Progress |
assignee: | nobody → Colin Ian King (colin-king) |
importance: | Undecided → Medium |
description: | updated |
Changed in zfs-linux (Ubuntu Xenial): | |
importance: | Undecided → Medium |
assignee: | nobody → Colin Ian King (colin-king) |
no longer affects: | zfs-linux (Ubuntu) |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1587686
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.