Comment 34 for bug 1654517

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Found one in my Debian box logs:

Jun 11 10:02:37 workstation kernel: [238276.640617] INFO: task txg_sync:5936 blocked for more than 120 seconds.
Jun 11 10:02:37 workstation kernel: [238276.643806] Tainted: P O 4.16.0-2-amd64 #1 Debian 4.16.12-
Jun 11 10:02:37 workstation kernel: [238276.646803] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this m
Jun 11 10:02:37 workstation kernel: [238276.649819] txg_sync D 0 5936 2 0x80000000
Jun 11 10:02:37 workstation kernel: [238276.652850] Call Trace:
Jun 11 10:02:37 workstation kernel: [238276.655776] ? __schedule+0x291/0x870
Jun 11 10:02:37 workstation kernel: [238276.658671] ? zio_taskq_dispatch+0x6f/0x90 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.661196] ? zio_nowait+0xa3/0x140 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.664199] schedule+0x28/0x80
Jun 11 10:02:37 workstation kernel: [238276.667206] io_schedule+0x12/0x40
Jun 11 10:02:37 workstation kernel: [238276.670226] cv_wait_common+0xac/0x130 [spl]
Jun 11 10:02:37 workstation kernel: [238276.673206] ? finish_wait+0x80/0x80
Jun 11 10:02:37 workstation kernel: [238276.676522] zio_wait+0xe6/0x1a0 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.679588] dsl_pool_sync+0xe6/0x440 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.682620] spa_sync+0x424/0xcf0 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.685657] txg_sync_thread+0x2ce/0x490 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.688661] ? txg_delay+0x1b0/0x1b0 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.691641] ? __thread_exit+0x20/0x20 [spl]
Jun 11 10:02:37 workstation kernel: [238276.694604] thread_generic_wrapper+0x6f/0x80 [spl]
Jun 11 10:02:37 workstation kernel: [238276.697591] kthread+0x113/0x130
Jun 11 10:02:37 workstation kernel: [238276.700633] ? kthread_create_worker_on_cpu+0x70/0x70
Jun 11 10:02:37 workstation kernel: [238276.703594] ret_from_fork+0x22/0x40

Running zfs/spl 0.7.9-3.

I think this still happens from time to time with mainline and lates zfs/spl (as I tested in another box). So, if its related to zfs cache dirty ratio, then the zfs_dirty_data_sync could make cache smaller and the sync wouldn't take so long.. if changing it doesn't solve, then its likely an intermittent locking issue on the zio_wait logic.

Hope it helps!