PANIC at zfs_vfsops.c:585:zfs_space_delta_cb / VERIFY3(sa.sa_magic == SA_MAGIC) failed

Bug #1914287 reported by John Stumbles
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Native ZFS for Linux
New
Unknown
zfs-linux (Ubuntu)
New
Undecided
Unassigned

Bug Description

When running rsync or tree (haven't tried much else) on much of the filesystem it will pretty routinely throw a panic report in syslog/kern.log, and the rsync/tree operation will later hang solidly such that even ^C doesn't break back to the shell, and any further file operations will likely hang and lock up their shells. Often requires power cycle rather than just shutdown -r to reboot.

syslog/kern.log:

Feb 2 16:52:43 donny kernel: [ 517.355790] VERIFY3(sa.sa_magic == SA_MAGIC) failed (*value* == 3100762) <- *value* varies e.g. 1612270437, 1612270437, 1612283815
Feb 2 16:52:43 donny kernel: [ 517.362904] PANIC at zfs_vfsops.c:585:zfs_space_delta_cb()
Feb 2 16:52:43 donny kernel: [ 517.368512] Showing stack for process *pid* <- *pid* varies
Feb 2 16:52:43 donny kernel: [ 517.368522] CPU: 1 PID: *pid* Comm: dp_sync_taskq Tainted: P C OE 5.4.0-1028-raspi #31-Ubuntu <- *pid* as above
Feb 2 16:52:43 donny kernel: [ 517.368525] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
Feb 2 16:52:43 donny kernel: [ 517.368527] Call trace:
Feb 2 16:52:43 donny kernel: [ 517.368540] dump_backtrace+0x0/0x198
Feb 2 16:52:43 donny kernel: [ 517.368544] show_stack+0x28/0x38
Feb 2 16:52:43 donny kernel: [ 517.368549] dump_stack+0xd8/0x134
Feb 2 16:52:43 donny kernel: [ 517.368583] spl_dumpstack+0x2c/0x34 [spl]
Feb 2 16:52:43 donny kernel: [ 517.368606] spl_panic+0xe0/0xf8 [spl]
Feb 2 16:52:43 donny kernel: [ 517.368837] zfs_space_delta_cb+0x180/0x278 [zfs]
Feb 2 16:52:43 donny kernel: [ 517.368968] dmu_objset_userquota_get_ids+0x15c/0x3b8 [zfs]
Feb 2 16:52:43 donny kernel: [ 517.369094] dnode_sync+0x11c/0x568 [zfs]
Feb 2 16:52:43 donny kernel: [ 517.369216] dmu_objset_sync_dnodes+0x48/0xf0 [zfs]
Feb 2 16:52:43 donny kernel: [ 517.369333] sync_dnodes_task+0x34/0x58 [zfs]
Feb 2 16:52:43 donny kernel: [ 517.369353] taskq_thread+0x20c/0x3b0 [spl]
Feb 2 16:52:43 donny kernel: [ 517.369363] kthread+0x150/0x170
Feb 2 16:52:43 donny kernel: [ 517.369368] ret_from_fork+0x10/0x18

rsync/tree command carries on running (for a while), then later log file shows:

Feb 2 16:56:11 donny kernel: [ 725.993423] INFO: task dp_sync_taskq:*pid* blocked for more than 120 seconds.
Feb 2 16:56:11 donny kernel: [ 726.000633] Tainted: P C OE 5.4.0-1028-raspi #31-Ubuntu
Feb 2 16:56:11 donny kernel: [ 726.007610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 2 16:56:11 donny kernel: [ 726.015586] dp_sync_taskq D 0 *pid* 2 0x00000028
Feb 2 16:56:11 donny kernel: [ 726.015595] Call trace:
Feb 2 16:56:11 donny kernel: [ 726.015608] __switch_to+0x108/0x1c0
Feb 2 16:56:11 donny kernel: [ 726.015615] __schedule+0x328/0x828
Feb 2 16:56:11 donny kernel: [ 726.015621] schedule+0x4c/0xe8
Feb 2 16:56:11 donny kernel: [ 726.015650] spl_panic+0xf4/0xf8 [spl]
Feb 2 16:56:11 donny kernel: [ 726.015829] zfs_space_delta_cb+0x180/0x278 [zfs]
Feb 2 16:56:11 donny kernel: [ 726.015939] dmu_objset_userquota_get_ids+0x15c/0x3b8 [zfs]
Feb 2 16:56:11 donny kernel: [ 726.016045] dnode_sync+0x11c/0x568 [zfs]
Feb 2 16:56:11 donny kernel: [ 726.016151] dmu_objset_sync_dnodes+0x48/0xf0 [zfs]
Feb 2 16:56:11 donny kernel: [ 726.016256] sync_dnodes_task+0x34/0x58 [zfs]
Feb 2 16:56:11 donny kernel: [ 726.016276] taskq_thread+0x20c/0x3b0 [spl]
Feb 2 16:56:11 donny kernel: [ 726.016283] kthread+0x150/0x170
Feb 2 16:56:11 donny kernel: [ 726.016288] ret_from_fork+0x10/0x18
Feb 2 16:56:11 donny kernel: [ 726.016298] INFO: task txg_sync:*num* blocked for more than 120 seconds. <- *num* varies
Feb 2 16:56:11 donny kernel: [ 726.023051] Tainted: P C OE 5.4.0-1028-raspi #31-Ubuntu
Feb 2 16:56:11 donny kernel: [ 726.029957] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 2 16:56:11 donny kernel: [ 726.037919] txg_sync D 0 *num* 2 0x00000028
Feb 2 16:56:11 donny kernel: [ 726.037929] Call trace:
Feb 2 16:56:11 donny kernel: [ 726.037939] __switch_to+0x108/0x1c0
Feb 2 16:56:11 donny kernel: [ 726.037945] __schedule+0x328/0x828
Feb 2 16:56:11 donny kernel: [ 726.037951] schedule+0x4c/0xe8
Feb 2 16:56:11 donny kernel: [ 726.037973] taskq_wait+0x78/0xb0 [spl]
Feb 2 16:56:11 donny kernel: [ 726.038142] dmu_objset_sync+0x25c/0x3e0 [zfs]
Feb 2 16:56:11 donny kernel: [ 726.038263] dsl_dataset_sync+0x74/0x240 [zfs]
Feb 2 16:56:11 donny kernel: [ 726.038391] dsl_pool_sync+0xb0/0x3b8 [zfs]
Feb 2 16:56:11 donny kernel: [ 726.038509] spa_sync_iterate_to_convergence+0xf8/0x1f0 [zfs]
Feb 2 16:56:11 donny kernel: [ 726.038617] spa_sync+0x2ec/0x520 [zfs]
Feb 2 16:56:11 donny kernel: [ 726.038736] txg_sync_thread+0x244/0x2a0 [zfs]
Feb 2 16:56:11 donny kernel: [ 726.038756] thread_generic_wrapper+0x78/0xa8 [spl]
Feb 2 16:56:11 donny kernel: [ 726.038768] kthread+0x150/0x170
Feb 2 16:56:11 donny kernel: [ 726.038774] ret_from_fork+0x10/0x18

I don't know if this is related to bug #1906476 https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476

System Information:
ubuntu 20.04.2 (5.4.0-1028-raspi) server on RPi4/8G with 8TB external Seagate formatted zfs

# lsb_release -rd
Description: Ubuntu 20.04.2 LTS
Release: 20.04

# apt-cache policy zfsutils-linux
zfsutils-linux:
  Installed: 0.8.3-1ubuntu12.6
  Candidate: 0.8.3-1ubuntu12.6
  Version table:
 *** 0.8.3-1ubuntu12.6 500
        500 http://ports.ubuntu.com/ubuntu-ports focal-updates/main arm64 Packages
        100 /var/lib/dpkg/status
     0.8.3-1ubuntu12 500
        500 http://ports.ubuntu.com/ubuntu-ports focal/main arm64 Packages

# zpool version
zfs-0.8.3-1ubuntu12.6
zfs-kmod-0.8.3-1ubuntu12.5

Revision history for this message
John Stumbles (jstumbles) wrote :

Going back through logfiles the first occurrence is
Jan 29 12:27:52 donny kernel: [72122.085286] PANIC at zfs_vfsops.c:585:zfs_space_delta_cb()

So it seems to have started after I upgraded from Linux version 5.4.0-1026-raspi to 1028

Jan 27 04:27:18 donny kernel: [ 0.000000] Linux version 5.4.0-1026-raspi (buildd@bos02-arm64-062) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #29-Ubuntu SMP PREEMPT Mon Dec 14 17:01:16 UTC 2020 (Ubuntu 5.4.0-1026.29-raspi 5.4.78)

Jan 28 16:25:31 donny kernel: [ 0.000000] Linux version 5.4.0-1028-raspi (buildd@bos02-arm64-034) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #31-Ubuntu SMP PREEMPT Wed Jan 20 11:30:45 UTC 2021 (Ubuntu 5.4.0-1028.31-raspi 5.4.78)

Revision history for this message
John Stumbles (jstumbles) wrote :

The date (Jan 29) of the first occurrence of the panic suggests that it was not caused by changes in the packages

 libnvpair1linux 0.8.3-1ubuntu12.6
 libuutil1linux 0.8.3-1ubuntu12.6
 libzfs2linux 0.8.3-1ubuntu12.6
 libzpool2linux 0.8.3-1ubuntu12.6
 zfs-zed 0.8.3-1ubuntu12.6
 zfsutils-linux 0.8.3-1ubuntu12.6

It was not until Feb 01 that I got a message from apticron that they needed upgrading.

(I don't know what versions I was running before, and I don't know how to find out. my /var/cache/apt/archives has only latest versions.)

Revision history for this message
John Stumbles (jstumbles) wrote :

I've just realised (newbie here :-)) that /var/log/apt/history.* has information on package versions which may be useful.

Linux version before/after problem:

Start-Date: 2021-01-28 15:42:31
Commandline: apt-get dist-upgrade
Requested-By: ubuntu (1000)
Install: linux-image-5.4.0-1028-raspi:arm64 (5.4.0-1028.31, automatic), linux-headers-5.4.0-1028-raspi:arm64 (5.4.0-1028.31, automatic), linux-modules-5.4.0-1028-raspi:arm64 (5.4.0-1028.31, automatic), linux-raspi-headers-5.4.0-1028:arm64 (5.4.0-1028.31, automatic)
Upgrade: lshw:arm64 (02.18.85-0.3ubuntu2, 02.18.85-0.3ubuntu2.20.04.1), libc6:arm64 (2.31-0ubuntu9.1, 2.31-0ubuntu9.2), locales:arm64 (2.31-0ubuntu9.1, 2.31-0ubuntu9.2), libc-bin:arm64 (2.31-0ubuntu9.1, 2.31-0ubuntu9.2), linux-raspi:arm64 (5.4.0.1026.61, 5.4.0.1028.63), linux-image-raspi:arm64 (5.4.0.1026.61, 5.4.0.1028.63), linux-headers-raspi:arm64 (5.4.0.1026.61, 5.4.0.1028.63)
End-Date: 2021-01-28 15:52:33

zfs versions (earlier version was running before and after problem started):

Start-Date: 2021-02-01 14:07:23
Commandline: apt-get dist-upgrade
Requested-By: ubuntu (1000)
Upgrade: zfs-zed:arm64 (0.8.3-1ubuntu12.5, 0.8.3-1ubuntu12.6), zfsutils-linux:arm64 (0.8.3-1ubuntu12.5, 0.8.3-1ubuntu12.6), libzfs2linux:arm64 (0.8.3-1ubuntu12.5, 0.8.3-1ubuntu12.6), libzpool2linux:arm64 (0.8.3-1
ubuntu12.5, 0.8.3-1ubuntu12.6), libnvpair1linux:arm64 (0.8.3-1ubuntu12.5, 0.8.3-1ubuntu12.6), libuutil1linux:arm64 (0.8.3-1ubuntu12.5, 0.8.3-1ubuntu12.6)
End-Date: 2021-02-01 14:08:43

Revision history for this message
John Stumbles (jstumbles) wrote :

Seems that this may be a long-standing bug in openzfs v5
https://github.com/openzfs/zfs/issues/6332

Changed in zfs:
status: Unknown → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.