First panic happens after I log in with my OS user, this triggers decryption of zfs subvolumes via PAM and voila:
VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed
PANIC at zfs_znode.c:339:zfs_znode_sa_init()
Showing stack for process 8821
CPU: 6 PID: 8821 Comm: Cache2 I/O Tainted: P O 5.13.0-20-generic #20-Ubuntu
Hardware name: ASUS System Product Name/PRO H410T, BIOS 1401 07/27/2020
Call Trace:
show_stack+0x52/0x58
dump_stack+0x7d/0x9c
spl_dumpstack+0x29/0x2b [spl]
spl_panic+0xd4/0xfc [spl]
? queued_spin_unlock+0x9/0x10 [zfs]
? do_raw_spin_unlock+0x9/0x10 [zfs]
? __raw_spin_unlock+0x9/0x10 [zfs]
? dmu_buf_replace_user+0x65/0x80 [zfs]
? dmu_buf_set_user+0x13/0x20 [zfs]
? dmu_buf_set_user_ie+0x15/0x20 [zfs]
zfs_znode_sa_init+0xd9/0xe0 [zfs]
…
The system itself is still usable but becomes unresponsive here and there, on irregular basis. And then it goes on and on with messages like this in dmesg:
INFO: task Cache2 I/O:8821 blocked for more than 1208 seconds.
Tainted: P O 5.13.0-20-generic #20-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:Cache2 I/O state:D stack: 0 pid: 8821 ppid: 4247 flags:0x00000000
Call Trace:
__schedule+0x268/0x680
schedule+0x4f/0xc0
spl_panic+0xfa/0xfc [spl]
? queued_spin_unlock+0x9/0x10 [zfs]
? do_raw_spin_unlock+0x9/0x10 [zfs]
? __raw_spin_unlock+0x9/0x10 [zfs]
? dmu_buf_replace_user+0x65/0x80 [zfs]
? dmu_buf_set_user+0x13/0x20 [zfs]
? dmu_buf_set_user_ie+0x15/0x20 [zfs]
zfs_znode_sa_init+0xd9/0xe0 [zfs]
…
Processes that are suffering from being locked forever in 'D' state (ps output second column) are usually firefox, gsd-housekeeping , sometimes gnome-shell and, as in case above, find. I believe gnome-shell causes nautilus to misbehave. What also sucks is that this seems to cause my laptop to abort entering sleep mode with resource busy error, recursively. So it would try to enter sleep, abort (there's a message in syslog) and try again, until the battery depletes completely.
Adding `zfs.zfs_recover=1` to kernel boot parameter list maybe helps (thank you https://launchpad.net/~jawn-smith). At least it prevented the first zfs_node panic message from appearing in dmesg after login, but this needs longer and more detailed observation under different loads. Also, an open question remains whether having such kernel parameter for regular use is appropriate.
I have the same problem on my two machines that have identical setup.
ZFS on root and data (two pools), compression enabled, some subvolumes are encrypted.
ubuntu 21.10
Kernel 5.13.0-20-generic
$ zfs --version 2.0.6-1ubuntu2
zfs-2.0.6-1ubuntu2
zfs-kmod-
First panic happens after I log in with my OS user, this triggers decryption of zfs subvolumes via PAM and voila:
VERIFY(0 == sa_handle_ get_from_ db(zfsvfs- >z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed c:339:zfs_ znode_sa_ init() 0x52/0x58 0x7d/0x9c 0x29/0x2b [spl] 0xd4/0xfc [spl] spin_unlock+ 0x9/0x10 [zfs] spin_unlock+ 0x9/0x10 [zfs] unlock+ 0x9/0x10 [zfs] replace_ user+0x65/ 0x80 [zfs] set_user+ 0x13/0x20 [zfs] set_user_ ie+0x15/ 0x20 [zfs] sa_init+ 0xd9/0xe0 [zfs]
PANIC at zfs_znode.
Showing stack for process 8821
CPU: 6 PID: 8821 Comm: Cache2 I/O Tainted: P O 5.13.0-20-generic #20-Ubuntu
Hardware name: ASUS System Product Name/PRO H410T, BIOS 1401 07/27/2020
Call Trace:
show_stack+
dump_stack+
spl_dumpstack+
spl_panic+
? queued_
? do_raw_
? __raw_spin_
? dmu_buf_
? dmu_buf_
? dmu_buf_
zfs_znode_
…
The system itself is still usable but becomes unresponsive here and there, on irregular basis. And then it goes on and on with messages like this in dmesg:
INFO: task Cache2 I/O:8821 blocked for more than 1208 seconds. kernel/ hung_task_ timeout_ secs" disables this message. 0x268/0x680 0xfa/0xfc [spl] spin_unlock+ 0x9/0x10 [zfs] spin_unlock+ 0x9/0x10 [zfs] unlock+ 0x9/0x10 [zfs] replace_ user+0x65/ 0x80 [zfs] set_user+ 0x13/0x20 [zfs] set_user_ ie+0x15/ 0x20 [zfs] sa_init+ 0xd9/0xe0 [zfs]
Tainted: P O 5.13.0-20-generic #20-Ubuntu
"echo 0 > /proc/sys/
task:Cache2 I/O state:D stack: 0 pid: 8821 ppid: 4247 flags:0x00000000
Call Trace:
__schedule+
schedule+0x4f/0xc0
spl_panic+
? queued_
? do_raw_
? __raw_spin_
? dmu_buf_
? dmu_buf_
? dmu_buf_
zfs_znode_
…
Processes that are suffering from being locked forever in 'D' state (ps output second column) are usually firefox, gsd-housekeeping , sometimes gnome-shell and, as in case above, find. I believe gnome-shell causes nautilus to misbehave. What also sucks is that this seems to cause my laptop to abort entering sleep mode with resource busy error, recursively. So it would try to enter sleep, abort (there's a message in syslog) and try again, until the battery depletes completely.
Adding `zfs.zfs_recover=1` to kernel boot parameter list maybe helps (thank you https:/ /launchpad. net/~jawn- smith). At least it prevented the first zfs_node panic message from appearing in dmesg after login, but this needs longer and more detailed observation under different loads. Also, an open question remains whether having such kernel parameter for regular use is appropriate.