error -lock in zfs send

Bug #1723948 reported by alberto fiaschi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Native ZFS for Linux
Fix Released
Unknown
zfs-linux (Ubuntu)
Won't Fix
Medium
Unassigned

Bug Description

zfs send stop working. the process did not produce output. checked by mbuffer log

in kernel.log:
Oct 15 07:25:53 zfs-cis kernel: [479439.151281] INFO: task zfs:8708 blocked for more than 120 seconds.
Oct 15 07:25:53 zfs-cis kernel: [479439.156980] Tainted: P OE 4.4.0-96-generic #119-Ubuntu
Oct 15 07:25:53 zfs-cis kernel: [479439.162688] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 15 07:25:53 zfs-cis kernel: [479439.173974] zfs D ffff88197bd77318 0 8708 8141 0x00000000
Oct 15 07:25:53 zfs-cis kernel: [479439.173981] ffff88197bd77318 ffffffff810c3dc2 ffff8820374cf000 ffff881b645ff000
Oct 15 07:25:53 zfs-cis kernel: [479439.173985] ffff88197bd78000 0000000000792c6d ffff882030aa4ac8 0000000000000000
Oct 15 07:25:53 zfs-cis kernel: [479439.173989] ffff88101dac1840 ffff88197bd77330 ffffffff8183f165 ffff882030aa4a00
Oct 15 07:25:53 zfs-cis kernel: [479439.173993] Call Trace:
Oct 15 07:25:53 zfs-cis kernel: [479439.174006] [<ffffffff810c3dc2>] ? __wake_up_common+0x52/0x90
Oct 15 07:25:53 zfs-cis kernel: [479439.174023] [<ffffffff8183f165>] schedule+0x35/0x80
Oct 15 07:25:53 zfs-cis kernel: [479439.174045] [<ffffffffc03d7cb0>] taskq_wait_id+0x60/0xb0 [spl]
Oct 15 07:25:53 zfs-cis kernel: [479439.174051] [<ffffffff810c4480>] ? wake_atomic_t_function+0x60/0x60
Oct 15 07:25:53 zfs-cis kernel: [479439.174115] [<ffffffffc05036b0>] ? dump_write+0x230/0x230 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174178] [<ffffffffc0542022>] spa_taskq_dispatch_sync+0x92/0xd0 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174223] [<ffffffffc0502b81>] dump_bytes+0x51/0x70 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174267] [<ffffffffc050359e>] dump_write+0x11e/0x230 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174311] [<ffffffffc0505523>] backup_cb+0x633/0x850 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174356] [<ffffffffc050777a>] traverse_visitbp+0x47a/0x960 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174365] [<ffffffffc03d497f>] ? spl_kmem_alloc+0xaf/0x190 [spl]
Oct 15 07:25:53 zfs-cis kernel: [479439.174409] [<ffffffffc05078c0>] traverse_visitbp+0x5c0/0x960 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174462] [<ffffffffc05078c0>] traverse_visitbp+0x5c0/0x960 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174503] [<ffffffffc050848f>] traverse_dnode+0x7f/0xe0 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174542] [<ffffffffc05079cc>] traverse_visitbp+0x6cc/0x960 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174579] [<ffffffffc05078c0>] traverse_visitbp+0x5c0/0x960 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174616] [<ffffffffc05078c0>] traverse_visitbp+0x5c0/0x960 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174653] [<ffffffffc05078c0>] traverse_visitbp+0x5c0/0x960 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174690] [<ffffffffc05078c0>] traverse_visitbp+0x5c0/0x960 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174726] [<ffffffffc05078c0>] traverse_visitbp+0x5c0/0x960 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174762] [<ffffffffc05078c0>] traverse_visitbp+0x5c0/0x960 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174799] [<ffffffffc050848f>] traverse_dnode+0x7f/0xe0 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174835] [<ffffffffc0507b65>] traverse_visitbp+0x865/0x960 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174871] [<ffffffffc0507e0e>] traverse_impl+0x1ae/0x410 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174908] [<ffffffffc0504ef0>] ? dmu_recv_end_check+0x210/0x210 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174944] [<ffffffffc05080c2>] traverse_dataset+0x52/0x60 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.174981] [<ffffffffc0504ef0>] ? dmu_recv_end_check+0x210/0x210 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.175017] [<ffffffffc0502fa9>] dmu_send_impl+0x409/0x560 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.175060] [<ffffffffc0505a22>] dmu_send_obj+0x172/0x1e0 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.175129] [<ffffffffc0572389>] zfs_ioc_send+0xe9/0x2c0 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.175143] [<ffffffffc03d489b>] ? strdup+0x3b/0x60 [spl]
Oct 15 07:25:53 zfs-cis kernel: [479439.175207] [<ffffffffc057670b>] zfsdev_ioctl+0x44b/0x4e0 [zfs]
Oct 15 07:25:53 zfs-cis kernel: [479439.175218] [<ffffffff81223faf>] do_vfs_ioctl+0x29f/0x490
Oct 15 07:25:53 zfs-cis kernel: [479439.175225] [<ffffffff8108061c>] ? _do_fork+0xec/0x360
Oct 15 07:25:53 zfs-cis kernel: [479439.175232] [<ffffffff81224219>] SyS_ioctl+0x79/0x90
Oct 15 07:25:53 zfs-cis kernel: [479439.175242] [<ffffffff81843272>] entry_SYSCALL_64_fastpath+0x16/0x71

Revision history for this message
Colin Ian King (colin-king) wrote :

This stack trace was also reported in bug https://github.com/zfsonlinux/zfs/issues/6917 in log https://github.com/zfsonlinux/zfs/files/1686101/kern.log.2.gz

Is this issue still occurring?

1. what is your ZFS configuration (number of drives?)
2. size of your system (total memory, free memory).
3. Do your drive(s) suffer from large I/O latencies?
   - what is the output from iostat -mx when you are loading the system when issues like this occur?

Changed in zfs-linux (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Changed in zfs:
status: Unknown → New
Revision history for this message
alberto fiaschi (alberto-fiaschi) wrote : Re: [Bug 1723948] Re: error -lock in zfs send
Download full text (40.6 KiB)

2018-04-12 18:22 GMT+02:00 Colin Ian King <email address hidden>:

> This stack trace was also reported in bug
> https://github.com/zfsonlinux/zfs/issues/6917 in log
> https://github.com/zfsonlinux/zfs/files/1686101/kern.log.2.gz
>
> Is this issue still occurring?
>
yes

>
> 1. what is your ZFS configuration (number of drives?)
>
pool : 2 vdev of 6 hd 8T +2 mirror ssd for zil + 2 ssd for cache (
model : hd = ST8000NM0075 zil=INTEL SSDSC2BA20 cache=SSDSC2BB480G7R)

Dedub on
compression lz4
different filesystem: 184
used: 48T
free:24T
(look at end for command output)

> 2. size of your system (total memory, free memory).
>
free
              total used free shared buff/cache
available
Mem: 131784684 41365676 10998624 65028 79420384
21031972
Swap: 62467068 33036 62434032

> 3. Do your drive(s) suffer from large I/O latencies?
> - what is the output from iostat -mx when you are loading the system
> when issues like this occur?
>

yes, now i can put output in normal condition
iostat -xmd 20
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 0,00 20,75 0,00 78,35 0,00 0,92
24,16 4,81 61,40 0,00 61,40 1,15 9,04
sdb 0,00 20,75 0,00 78,25 0,00 0,92
24,18 4,70 60,05 0,00 60,05 1,18 9,24
sdc 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 0,00 0,00 0,00
md2 0,00 0,00 0,00 98,40 0,00 0,92
19,19 0,00 0,00 0,00 0,00 0,00 0,00
md1 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 0,00 0,00 0,00
md0 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdd 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 0,00 0,00 0,00
sde 0,00 0,00 25,70 7,75 0,06 0,07
8,43 0,01 0,28 0,20 0,52 0,17 0,56
sdf 0,00 0,00 24,25 7,75 0,07 0,07
9,33 0,00 0,13 0,12 0,15 0,10 0,32
sdg 0,00 0,00 10,60 56,45 0,11 0,44
16,58 0,27 3,98 15,57 1,80 0,80 5,38
sdh 0,00 0,00 2,60 60,25 0,04 0,46
16,26 0,13 1,99 18,00 1,30 0,44 2,78
sdi 0,00 0,00 1,85 55,30 0,04 0,43
16,85 0,12 2,03 17,62 1,50 0,41 2,34
sdj 0,00 0,00 11,10 53,85 0,10 0,44
17,21 0,29 4,40 16,54 1,90 0,91 5,94
sdk 0,00 0,00 3,50 55,90 0,05 0,45
17,49 0,13 2,26 15,03 1,46 0,57 3,38
sdl 0,00 0,00 2,40 53,80 0,05 0,43
17,49 0,13 2,28 18,92 1,54 0,50 2,80
sdn 0,00 0,00 11,90 64,80 0,08 0,47
14,83 0,39 5,11 14,62 3,36 0,95 7,28
sdm 0,00 0,00 2,70 66,40 0,04 0,47
15,11 0,22 3,23 ...

Revision history for this message
Colin Ian King (colin-king) wrote :

Not sure why there is little activity on the upstream bug report.

If you are using ZFS intent log, it may be worth trying to set it to a fairly large size, such as 128 or 256MB using the zil_slog_limit setting.

Revision history for this message
Colin Ian King (colin-king) wrote :

Did the advice in comment #3 help? I've not heard any response from this for a couple of months, so I'm closing this bug as Won't Fix. If this is still and issue, please feel free to reopen the bug.

Changed in zfs-linux (Ubuntu):
status: Incomplete → Won't Fix
Changed in zfs-linux (Ubuntu):
assignee: Colin Ian King (colin-king) → nobody
Changed in zfs:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.