Bug #1931660 “PANIC at zfs_znode.c:339:zfs_znode_sa_init()” : Bugs : linux package : Ubuntu

Revision history for this message

Chris Halse Rogers (raof) wrote on 2021-06-11:

#1

dmesg from previous boot showing PANIC Edit (171.1 KiB, text/plain)
AlsaInfo.txt Edit (45.9 KiB, text/plain; charset="utf-8")
CRDA.txt Edit (592 bytes, text/plain; charset="utf-8")
CurrentDmesg.txt Edit (98.4 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (2.9 KiB, text/plain; charset="utf-8")
IwConfig.txt Edit (512 bytes, text/plain; charset="utf-8")
Lspci.txt Edit (20.2 KiB, text/plain; charset="utf-8")
Lspci-vt.txt Edit (1.8 KiB, text/plain; charset="utf-8")
Lsusb.txt Edit (823 bytes, text/plain; charset="utf-8")
Lsusb-t.txt Edit (1.5 KiB, text/plain; charset="utf-8")
Lsusb-v.txt Edit (125.9 KiB, text/plain; charset="utf-8")
PaInfo.txt Edit (94.0 KiB, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (11.5 KiB, text/plain; charset="utf-8")
ProcCpuinfoMinimal.txt Edit (1.4 KiB, text/plain; charset="utf-8")
ProcEnviron.txt Edit (119 bytes, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (6.0 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (11.1 KiB, text/plain; charset="utf-8")
PulseList.txt Edit (26.8 KiB, text/plain; charset="utf-8")
RfKill.txt Edit (112 bytes, text/plain; charset="utf-8")
UdevDb.txt Edit (329.3 KiB, text/plain; charset="utf-8")
WifiSyslog.txt Edit (132.6 KiB, text/plain; charset="utf-8")
acpidump.txt Edit (1.2 MiB, text/plain; charset="utf-8")

Revision history for this message

Launchpad Janitor (janitor) wrote on 2021-07-20:

#2

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status:	New → Confirmed

Revision history for this message

Robert Bordelon (riselstrom) wrote on 2021-08-03:

#3

I encountered this problem on kernel 5.11.0-25 when using version 2.0.2-1ubuntu5.1 of the ZFS libs (I have the following ZFS related libraries installed on my system: libnvpair3linux, libuutil3linux, libzfs4linux, libzpool4linux, zfs-initramfs, zfs-zed and zfsutils-linux), but it went away after downgrading these libraries to version '2.0.2-1ubuntu5'.

Revision history for this message

mhosken (martin-hosken) wrote on 2021-08-11:

#4

Download full text (6.1 KiB)

Downgrading these libraries did not fix it for me.

One canary is to look at the state of the updatedb.mlocate process which goes into live hang as it hits the file that is currently faulty. Thus for example:

2728 ? DNs 0:01 /usr/bin/updatedb.mlocate

Using lsof one can find out which is the faulty file but, if you try to ls the directory with the file in, then you terminal live hangs. And if you try to delete the directory containing the faulty file, then that silently fails, leaving the directory and the live hang should you try to read it.

It's said that this bug relates to 1906476 and via that to https://github.com/openzfs/zfs/issues/11474 but I think I have that one fixed with zfs_recover=1. But the syslog message is very similar:

Aug 11 10:13:45 silmh9 kernel: [ 242.531205] INFO: task updatedb.mlocat:2728 blocked for more than 120 seconds.
Aug 11 10:13:45 silmh9 kernel: [ 242.531211] Tainted: P U O 5.13.0-13-generic #13-Ubuntu
Aug 11 10:13:45 silmh9 kernel: [ 242.531212] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 11 10:13:45 silmh9 kernel: [ 242.531213] task:updatedb.mlocat state:D stack: 0 pid: 2728 ppid: 1 flags:0x00004220
Aug 11 10:13:45 silmh9 kernel: [ 242.531216] Call Trace:
Aug 11 10:13:45 silmh9 kernel: [ 242.531221] __schedule+0x268/0x680
Aug 11 10:13:45 silmh9 kernel: [ 242.531225] ? arch_local_irq_enable+0xb/0xd
Aug 11 10:13:45 silmh9 kernel: [ 242.531228] schedule+0x4f/0xc0
Aug 11 10:13:45 silmh9 kernel: [ 242.531231] spl_panic+0xfa/0xfc [spl]
Aug 11 10:13:45 silmh9 kernel: [ 242.531238] ? queued_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531287] ? do_raw_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531313] ? __raw_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531339] ? dmu_buf_replace_user+0x65/0x80 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531367] ? dmu_buf_set_user+0x13/0x20 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531393] ? dmu_buf_set_user_ie+0x15/0x20 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531420] zfs_znode_sa_init+0xd9/0xe0 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531479] zfs_znode_alloc+0x101/0x560 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531536] ? dmu_buf_unlock_parent+0x5d/0x90 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531564] ? do_raw_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531590] ? dbuf_read_impl.constprop.0+0x316/0x3e0 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531616] ? __cond_resched+0x1a/0x50
Aug 11 10:13:45 silmh9 kernel: [ 242.531618] ? __raw_callee_save___native_queued_spin_unlock+0x15/0x23
Aug 11 10:13:45 silmh9 kernel: [ 242.531620] ? queued_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531642] ? __cond_resched+0x1a/0x50
Aug 11 10:13:45 silmh9 kernel: [ 242.531644] ? down_read+0x13/0x90
Aug 11 10:13:45 silmh9 kernel: [ 242.531645] ? __raw_callee_save___native_queued_spin_unlock+0x15/0x23
Aug 11 10:13:45 silmh9 kernel: [ 242.531646] ? queued_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [ 242.531680] ? do_raw_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh...

Downgrading these libraries did not fix it for me.

One canary is to look at the state of the updatedb.mlocate process which goes into live hang as it hits the file that is currently faulty. Thus for example:

2728 ?        DNs    0:01 /usr/bin/updatedb.mlocate

Using lsof one can find out which is the faulty file but, if you try to ls the directory with the file in, then you terminal live hangs. And if you try to delete the directory containing the faulty file, then that silently fails, leaving the directory and the live hang should you try to read it.

It's said that this bug relates to 1906476 and via that to https://github.com/openzfs/zfs/issues/11474 but I think I have that one fixed with zfs_recover=1. But the syslog message is very similar:

Aug 11 10:13:45 silmh9 kernel: [  242.531205] INFO: task updatedb.mlocat:2728 blocked for more than 120 seconds.
Aug 11 10:13:45 silmh9 kernel: [  242.531211]       Tainted: P     U     O      5.13.0-13-generic #13-Ubuntu
Aug 11 10:13:45 silmh9 kernel: [  242.531212] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 11 10:13:45 silmh9 kernel: [  242.531213] task:updatedb.mlocat state:D stack:    0 pid: 2728 ppid:     1 flags:0x00004220
Aug 11 10:13:45 silmh9 kernel: [  242.531216] Call Trace:
Aug 11 10:13:45 silmh9 kernel: [  242.531221]  __schedule+0x268/0x680
Aug 11 10:13:45 silmh9 kernel: [  242.531225]  ? arch_local_irq_enable+0xb/0xd
Aug 11 10:13:45 silmh9 kernel: [  242.531228]  schedule+0x4f/0xc0
Aug 11 10:13:45 silmh9 kernel: [  242.531231]  spl_panic+0xfa/0xfc [spl]
Aug 11 10:13:45 silmh9 kernel: [  242.531238]  ? queued_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531287]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531313]  ? __raw_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531339]  ? dmu_buf_replace_user+0x65/0x80 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531367]  ? dmu_buf_set_user+0x13/0x20 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531393]  ? dmu_buf_set_user_ie+0x15/0x20 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531420]  zfs_znode_sa_init+0xd9/0xe0 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531479]  zfs_znode_alloc+0x101/0x560 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531536]  ? dmu_buf_unlock_parent+0x5d/0x90 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531564]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531590]  ? dbuf_read_impl.constprop.0+0x316/0x3e0 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531616]  ? __cond_resched+0x1a/0x50
Aug 11 10:13:45 silmh9 kernel: [  242.531618]  ? __raw_callee_save___native_queued_spin_unlock+0x15/0x23
Aug 11 10:13:45 silmh9 kernel: [  242.531620]  ? queued_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531642]  ? __cond_resched+0x1a/0x50
Aug 11 10:13:45 silmh9 kernel: [  242.531644]  ? down_read+0x13/0x90
Aug 11 10:13:45 silmh9 kernel: [  242.531645]  ? __raw_callee_save___native_queued_spin_unlock+0x15/0x23
Aug 11 10:13:45 silmh9 kernel: [  242.531646]  ? queued_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531680]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531712]  ? __raw_callee_save___native_queued_spin_unlock+0x15/0x23
Aug 11 10:13:45 silmh9 kernel: [  242.531713]  ? dmu_object_info_from_dnode+0x8e/0xa0 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531742]  zfs_zget+0x237/0x280 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531796]  zfs_dirent_lock+0x42a/0x570 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531850]  zfs_dirlook+0x91/0x2a0 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531902]  zfs_lookup+0x1fb/0x3f0 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.531954]  zpl_lookup+0xcb/0x230 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.532007]  __lookup_slow+0x84/0x150
Aug 11 10:13:45 silmh9 kernel: [  242.532009]  walk_component+0x141/0x1b0
Aug 11 10:13:45 silmh9 kernel: [  242.532010]  ? dput+0x5c/0x180
Aug 11 10:13:45 silmh9 kernel: [  242.532012]  path_lookupat+0x6e/0x1c0
Aug 11 10:13:45 silmh9 kernel: [  242.532013]  filename_lookup+0xbb/0x1c0
Aug 11 10:13:45 silmh9 kernel: [  242.532015]  ? __check_object_size.part.0+0x128/0x150
Aug 11 10:13:45 silmh9 kernel: [  242.532017]  ? __check_object_size+0x1c/0x20
Aug 11 10:13:45 silmh9 kernel: [  242.532018]  ? strncpy_from_user+0x44/0x150
Aug 11 10:13:45 silmh9 kernel: [  242.532020]  ? __raw_spin_unlock+0x9/0x10 [zfs]
Aug 11 10:13:45 silmh9 kernel: [  242.532066]  user_path_at_empty+0x59/0x90
Aug 11 10:13:45 silmh9 kernel: [  242.532068]  vfs_statx+0x7a/0x120
Aug 11 10:13:45 silmh9 kernel: [  242.532070]  __do_sys_newlstat+0x3e/0x70
Aug 11 10:13:45 silmh9 kernel: [  242.532072]  ? __secure_computing+0x42/0xe0
Aug 11 10:13:45 silmh9 kernel: [  242.532073]  ? syscall_trace_enter.constprop.0+0xa0/0x1c0
Aug 11 10:13:45 silmh9 kernel: [  242.532075]  __x64_sys_newlstat+0x16/0x20
Aug 11 10:13:45 silmh9 kernel: [  242.532076]  do_syscall_64+0x61/0xb0
Aug 11 10:13:45 silmh9 kernel: [  242.532078]  ? exit_to_user_mode_prepare+0x37/0xb0
Aug 11 10:13:45 silmh9 kernel: [  242.532079]  ? syscall_exit_to_user_mode+0x27/0x50
Aug 11 10:13:45 silmh9 kernel: [  242.532080]  ? do_syscall_64+0x6e/0xb0
Aug 11 10:13:45 silmh9 kernel: [  242.532081]  ? do_syscall_64+0x6e/0xb0
Aug 11 10:13:45 silmh9 kernel: [  242.532086]  ? do_syscall_64+0x6e/0xb0
Aug 11 10:13:45 silmh9 kernel: [  242.532087]  entry_SYSCALL_64_after_hwframe+0x44/0xae
Aug 11 10:13:45 silmh9 kernel: [  242.532088] RIP: 0033:0x7fbfb3cd97ba
Aug 11 10:13:45 silmh9 kernel: [  242.532090] RSP: 002b:00007ffc377eead8 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Aug 11 10:13:45 silmh9 kernel: [  242.532091] RAX: ffffffffffffffda RBX: 000055f3abe4af70 RCX: 00007fbfb3cd97ba
Aug 11 10:13:45 silmh9 kernel: [  242.532092] RDX: 00007ffc377eeb50 RSI: 00007ffc377eeb50 RDI: 000055f3abe68499
Aug 11 10:13:45 silmh9 kernel: [  242.532092] RBP: 000055f3abe68499 R08: 0000000000000001 R09: 000055f3ab979740
Aug 11 10:13:45 silmh9 kernel: [  242.532093] R10: 000055f3ab979580 R11: 0000000000000246 R12: 00007ffc377eed20
Aug 11 10:13:45 silmh9 kernel: [  242.532093] R13: 0000000000000000 R14: 00007ffc377eed30 R15: 0000000000000000

Revision history for this message

mhosken (martin-hosken) wrote on 2021-08-11:

#5

Does the fact that this is an encrypted pool have any impact on this?

Revision history for this message

Doki (lkishalmi) wrote on 2021-09-21:

#6

It stared to happen recently on my encrypted pool. The FS scrub reported no errors.

Revision history for this message

Robert Bordelon (riselstrom) wrote on 2021-10-08:

#7

I've hit this bug on a non-encrypted pool. FS scrub also reported no errors for me. I'm currently on Kernel 5.11.0-37-generic / zfs libs 2.0.2-1ubuntu5.2 (Hirsute), though the corruption may have occurred on a previous kernel / library. The directory which is triggering the hang for me was created Jun 7 of this year so it appears to be recent problem. I've identified the corrupted directory and moved it out of the way (fortunately is contains nothing important), but I'm unable to delete it from the filesystem. Any attempts to read or delete the directory hangs. Setting zfs_recovery=1 does not help. The only way I can recover from hang is to reboot. I'd like to know if there is any way to get rid of the corrupted directory other than having to create a new filesystem and copy over all the other files / dirs except the problem one?

Revision history for this message

Trent Lloyd (lathiat) wrote on 2021-10-15:

#8

This looks like a duplicate of this:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476

Ubuntu
linux package

PANIC at zfs_znode.c:339:zfs_znode_sa_init()

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntulinux package

PANIC at zfs_znode.c:339:zfs_znode_sa_init()

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package