kcompactd0 and btrfs-transaction keep deadlocking with each other

Bug #1891773 reported by Simon Arlott
80
This bug affects 15 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
postgresql-11 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Example 1:
[346911.187920] INFO: task kcompactd0:53 blocked for more than 120 seconds.
[346911.187938] Not tainted 4.15.0-112-generic #113-Ubuntu
[346911.187951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[346911.187968] kcompactd0 D 0 53 2 0x80000000
[346911.187969] Call Trace:
[346911.187973] __schedule+0x24e/0x880
[346911.187986] ? btree_releasepage+0x42/0x50 [btrfs]
[346911.187987] schedule+0x2c/0x80
[346911.187988] io_schedule+0x16/0x40
[346911.187989] __lock_page+0xff/0x140
[346911.187990] ? page_cache_tree_insert+0xe0/0xe0
[346911.187992] migrate_pages+0x91f/0xb80
[346911.187993] ? __ClearPageMovable+0x10/0x10
[346911.187994] ? isolate_freepages_block+0x3b0/0x3b0
[346911.187995] compact_zone+0x681/0x950
[346911.187995] kcompactd_do_work+0xfe/0x2a0
[346911.187996] ? __switch_to_asm+0x35/0x70
[346911.187997] ? __switch_to_asm+0x41/0x70
[346911.187998] kcompactd+0x86/0x1c0
[346911.187999] ? kcompactd+0x86/0x1c0
[346911.188001] ? wait_woken+0x80/0x80
[346911.188002] kthread+0x121/0x140
[346911.188003] ? kcompactd_do_work+0x2a0/0x2a0
[346911.188003] ? kthread_create_worker_on_cpu+0x70/0x70
[346911.188004] ret_from_fork+0x35/0x40

[346911.188015] INFO: task btrfs-transacti:858 blocked for more than 120 seconds.
[346911.188031] Not tainted 4.15.0-112-generic #113-Ubuntu
[346911.188043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[346911.188060] btrfs-transacti D 0 858 2 0x80000000
[346911.188061] Call Trace:
[346911.188062] __schedule+0x24e/0x880
[346911.188063] ? bit_wait+0x60/0x60
[346911.188063] schedule+0x2c/0x80
[346911.188064] io_schedule+0x16/0x40
[346911.188065] bit_wait_io+0x11/0x60
[346911.188066] __wait_on_bit+0x4c/0x90
[346911.188066] ? bit_wait+0x60/0x60
[346911.188067] out_of_line_wait_on_bit+0x90/0xb0
[346911.188068] ? bit_waitqueue+0x40/0x40
[346911.188080] lock_extent_buffer_for_io+0x100/0x2a0 [btrfs]
[346911.188090] btree_write_cache_pages+0x1b8/0x420 [btrfs]
[346911.188092] ? native_sched_clock_from_tsc+0x30/0x70
[346911.188092] ? update_load_avg+0x423/0x780
[346911.188101] btree_writepages+0x5d/0x70 [btrfs]
[346911.188102] do_writepages+0x4b/0xe0
[346911.188102] ? enqueue_task_fair+0xb6/0x300
[346911.188111] ? merge_state.part.47+0x44/0x130 [btrfs]
[346911.188112] __filemap_fdatawrite_range+0xcf/0x100
[346911.188113] ? __filemap_fdatawrite_range+0xcf/0x100
[346911.188114] filemap_fdatawrite_range+0x13/0x20
[346911.188122] btrfs_write_marked_extents+0x68/0x140 [btrfs]
[346911.188129] btrfs_write_and_wait_marked_extents.constprop.20+0x4f/0x90 [btrfs]
[346911.188136] btrfs_commit_transaction+0x696/0x910 [btrfs]
[346911.188143] ? btrfs_commit_transaction+0x696/0x910 [btrfs]
[346911.188144] ? wait_woken+0x80/0x80
[346911.188151] transaction_kthread+0x18d/0x1b0 [btrfs]
[346911.188152] kthread+0x121/0x140
[346911.188158] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[346911.188159] ? kthread_create_worker_on_cpu+0x70/0x70
[346911.188160] ret_from_fork+0x35/0x40

Example 2:
2020-08-11T20:23:03.533+01:00 <err> kernel: INFO: task kcompactd0:53 blocked for more than 120 seconds.
2020-08-11T20:23:03.533+01:00 <err> kernel: Not tainted 4.15.0-112-generic #113-Ubuntu
2020-08-11T20:23:03.533+01:00 <err> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-08-11T20:23:03.533+01:00 <info> kernel: kcompactd0 D 0 53 2 0x80000000
2020-08-11T20:23:03.533+01:00 <warning> kernel: Call Trace:
2020-08-11T20:23:03.533+01:00 <warning> kernel: __schedule+0x24e/0x880
2020-08-11T20:23:03.533+01:00 <warning> kernel: ? btree_releasepage+0x42/0x50 [btrfs]
2020-08-11T20:23:03.533+01:00 <warning> kernel: schedule+0x2c/0x80
2020-08-11T20:23:03.533+01:00 <warning> kernel: io_schedule+0x16/0x40
2020-08-11T20:23:03.533+01:00 <warning> kernel: __lock_page+0xff/0x140
2020-08-11T20:23:03.533+01:00 <warning> kernel: ? page_cache_tree_insert+0xe0/0xe0
2020-08-11T20:23:03.533+01:00 <warning> kernel: migrate_pages+0x91f/0xb80
2020-08-11T20:23:03.533+01:00 <warning> kernel: ? __ClearPageMovable+0x10/0x10
2020-08-11T20:23:03.533+01:00 <warning> kernel: ? isolate_freepages_block+0x3b0/0x3b0
2020-08-11T20:23:03.533+01:00 <warning> kernel: compact_zone+0x681/0x950
2020-08-11T20:23:03.533+01:00 <warning> kernel: kcompactd_do_work+0xfe/0x2a0
2020-08-11T20:23:03.533+01:00 <warning> kernel: ? __switch_to_asm+0x35/0x70
2020-08-11T20:23:03.533+01:00 <warning> kernel: ? __switch_to_asm+0x41/0x70
2020-08-11T20:23:03.533+01:00 <warning> kernel: ? __switch_to_asm+0x35/0x70
2020-08-11T20:23:03.533+01:00 <warning> kernel: kcompactd+0x86/0x1c0
2020-08-11T20:23:03.533+01:00 <warning> kernel: ? kcompactd+0x86/0x1c0
2020-08-11T20:23:03.533+01:00 <warning> kernel: ? wait_woken+0x80/0x80
2020-08-11T20:23:03.533+01:00 <warning> kernel: kthread+0x121/0x140
2020-08-11T20:23:03.533+01:00 <warning> kernel: ? kcompactd_do_work+0x2a0/0x2a0
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? kthread_create_worker_on_cpu+0x70/0x70
2020-08-11T20:23:03.534+01:00 <warning> kernel: ret_from_fork+0x35/0x40
2020-08-11T20:23:03.534+01:00 <err> kernel: INFO: task btrfs-transacti:864 blocked for more than 120 seconds.
2020-08-11T20:23:03.534+01:00 <err> kernel: Not tainted 4.15.0-112-generic #113-Ubuntu
2020-08-11T20:23:03.534+01:00 <err> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-08-11T20:23:03.534+01:00 <info> kernel: btrfs-transacti D 0 864 2 0x80000000
2020-08-11T20:23:03.534+01:00 <warning> kernel: Call Trace:
2020-08-11T20:23:03.534+01:00 <warning> kernel: __schedule+0x24e/0x880
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? bit_wait+0x60/0x60
2020-08-11T20:23:03.534+01:00 <warning> kernel: schedule+0x2c/0x80
2020-08-11T20:23:03.534+01:00 <warning> kernel: io_schedule+0x16/0x40
2020-08-11T20:23:03.534+01:00 <warning> kernel: bit_wait_io+0x11/0x60
2020-08-11T20:23:03.534+01:00 <warning> kernel: __wait_on_bit+0x4c/0x90
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? bit_wait+0x60/0x60
2020-08-11T20:23:03.534+01:00 <warning> kernel: out_of_line_wait_on_bit+0x90/0xb0
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? bit_waitqueue+0x40/0x40
2020-08-11T20:23:03.534+01:00 <warning> kernel: lock_extent_buffer_for_io+0x100/0x2a0 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: btree_write_cache_pages+0x1b8/0x420 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: btree_writepages+0x5d/0x70 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: do_writepages+0x4b/0xe0
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? enqueue_task_fair+0xb6/0x300
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? merge_state.part.47+0x44/0x130 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: __filemap_fdatawrite_range+0xcf/0x100
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? __filemap_fdatawrite_range+0xcf/0x100
2020-08-11T20:23:03.534+01:00 <warning> kernel: filemap_fdatawrite_range+0x13/0x20
2020-08-11T20:23:03.534+01:00 <warning> kernel: btrfs_write_marked_extents+0x68/0x140 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: btrfs_write_and_wait_marked_extents.constprop.20+0x4f/0x90 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: btrfs_commit_transaction+0x696/0x910 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? btrfs_commit_transaction+0x696/0x910 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? start_transaction+0x191/0x430 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: transaction_kthread+0x18d/0x1b0 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: kthread+0x121/0x140
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? kthread_create_worker_on_cpu+0x70/0x70
2020-08-11T20:23:03.534+01:00 <warning> kernel: ret_from_fork+0x35/0x40
2020-08-11T20:23:03.534+01:00 <err> kernel: INFO: task snapperd:5196 blocked for more than 120 seconds.
2020-08-11T20:23:03.534+01:00 <err> kernel: Not tainted 4.15.0-112-generic #113-Ubuntu
2020-08-11T20:23:03.534+01:00 <err> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2020-08-11T20:23:03.534+01:00 <info> kernel: snapperd D 0 5196 1 0x00000000
2020-08-11T20:23:03.534+01:00 <warning> kernel: Call Trace:
2020-08-11T20:23:03.534+01:00 <warning> kernel: __schedule+0x24e/0x880
2020-08-11T20:23:03.534+01:00 <warning> kernel: schedule+0x2c/0x80
2020-08-11T20:23:03.534+01:00 <warning> kernel: wait_for_commit+0x46/0x90 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? wait_woken+0x80/0x80
2020-08-11T20:23:03.534+01:00 <warning> kernel: btrfs_commit_transaction+0x287/0x910 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? start_transaction+0x191/0x430 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: btrfs_mksubvol+0x592/0x5a0 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: btrfs_ioctl_snap_create_transid+0x18c/0x1a0 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: btrfs_ioctl_snap_create_v2+0x11e/0x180 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: btrfs_ioctl+0x84f/0x2490 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? do_filp_open+0xaf/0x110
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? _copy_to_user+0x26/0x30
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? cp_new_stat+0x152/0x180
2020-08-11T20:23:03.534+01:00 <warning> kernel: do_vfs_ioctl+0xa8/0x630
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? do_vfs_ioctl+0xa8/0x630
2020-08-11T20:23:03.534+01:00 <warning> kernel: ? __fput+0x193/0x220
2020-08-11T20:23:03.534+01:00 <warning> kernel: SyS_ioctl+0x79/0x90
2020-08-11T20:23:03.534+01:00 <warning> kernel: do_syscall_64+0x73/0x130
2020-08-11T20:23:03.534+01:00 <warning> kernel: entry_SYSCALL_64_after_hwframe+0x41/0xa6
2020-08-11T20:23:03.534+01:00 <warning> kernel: RIP: 0033:0x7f6dec22d6d7
2020-08-11T20:23:03.534+01:00 <warning> kernel: RSP: 002b:00007f6de6c782c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
2020-08-11T20:23:03.534+01:00 <warning> kernel: RAX: ffffffffffffffda RBX: 00007f6de6c782d0 RCX: 00007f6dec22d6d7
2020-08-11T20:23:03.534+01:00 <warning> kernel: RDX: 00007f6de6c782d0 RSI: 0000000050009417 RDI: 0000000000000007
2020-08-11T20:23:03.534+01:00 <warning> kernel: RBP: 0000000000000006 R08: 000000000000000f R09: 00007f6de6c79f83
2020-08-11T20:23:03.534+01:00 <warning> kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000007
2020-08-11T20:23:03.534+01:00 <warning> kernel: R13: 00007f6de6c7a450 R14: 0000000000000000 R15: 0000000000000001

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-generic 4.15.0.112.100
ProcVersionSignature: Ubuntu 4.15.0-112.113-generic 4.15.18
Uname: Linux 4.15.0-112-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version k4.15.0-112-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.16
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/hwC0D2', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D10p', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D0c', '/dev/snd/by-path', '/dev/snd/pcmC0D0p', '/dev/snd/controlC0', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 'amixer'
Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 'amixer'
Date: Sat Aug 15 22:39:32 2020
InstallationDate: Installed on 2012-02-01 (3117 days ago)
InstallationMedia: Ubuntu-Server 11.10 "Oneiric Ocelot" - Release amd64 (20111011)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 002: ID 05e3:0608 Genesys Logic, Inc. Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Gigabyte Technology Co., Ltd. Q370M D3H GSM PLUS
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-4.15.0-112-generic root=/dev/mapper/lvm_loko0-root ro rootflags=subvol=@ sysrq_always_enabled panic=60
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-112-generic N/A
 linux-backports-modules-4.15.0-112-generic N/A
 linux-firmware 1.173.19
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2020-08-07 (8 days ago)
dmi.bios.date: 05/11/2020
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F15c C2
dmi.board.asset.tag: Default string
dmi.board.name: Q370M D3H GSM PLUS
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF15cC2:bd05/11/2020:svnGigabyteTechnologyCo.,Ltd.:pnQ370MD3HGSMPLUS:pvrDefaultstring:rvnGigabyteTechnologyCo.,Ltd.:rnQ370MD3HGSMPLUS:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: Default string
dmi.product.name: Q370M D3H GSM PLUS
dmi.product.version: Default string
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
Simon Arlott (sa.me.uk) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Simon Arlott (sa.me.uk) wrote :

I'm also getting kcompactd0 hanging on its own on another host, which presumably blocks btrfs-transaction later on:
[606364.285752] INFO: task kcompactd0:27 blocked for more than 120 seconds.
[606364.285994] Not tainted 4.15.0-112-generic #113-Ubuntu
[606364.286129] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[606364.286268] kcompactd0 D 0 27 2 0x80000000
[606364.286270] Call Trace:
[606364.286284] __schedule+0x24e/0x880
[606364.286341] ? btree_releasepage+0x42/0x50 [btrfs]
[606364.286343] schedule+0x2c/0x80
[606364.286344] io_schedule+0x16/0x40
[606364.286345] __lock_page+0xff/0x140
[606364.286347] ? page_cache_tree_insert+0xe0/0xe0
[606364.286348] migrate_pages+0x91f/0xb80
[606364.286349] ? __ClearPageMovable+0x10/0x10
[606364.286350] ? isolate_freepages_block+0x3b0/0x3b0
[606364.286351] compact_zone+0x681/0x950
[606364.286352] kcompactd_do_work+0xfe/0x2a0
[606364.286354] ? __switch_to_asm+0x35/0x70
[606364.286355] ? __switch_to_asm+0x41/0x70
[606364.286356] kcompactd+0x86/0x1c0
[606364.286357] ? kcompactd+0x86/0x1c0
[606364.286358] ? wait_woken+0x80/0x80
[606364.286360] kthread+0x121/0x140
[606364.286361] ? kcompactd_do_work+0x2a0/0x2a0
[606364.286361] ? kthread_create_worker_on_cpu+0x70/0x70
[606364.286363] ret_from_fork+0x35/0x40

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Is this a regression that only starts to happen recently?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
OR (olafr) wrote :
Download full text (6.9 KiB)

I have the same issue: kcompactd0 regularity (about every 10 days) blocks (iowait?) and never recovers. After a few more hours, the system becomes more or less dead.

I never had this issue until, I would say, 2 month ago. I was running Ubuntu 18.04 fine until then. No heavy load on the server.

# ps aux|grep compac
root 28 0.0 0.0 0 0 ? D Aug27 0:06 [kcompactd0]

btrfs-transaction seems OK though:

# ps aux|grep trans
root 471 0.0 0.0 0 0 ? S Aug27 7:01 [btrfs-transacti]

From syslog:

Sep 8 16:36:04 nc4 kernel: [1037582.945988] INFO: task kcompactd0:28 blocked for more than 120 seconds.
Sep 8 16:36:04 nc4 kernel: [1037582.946216] Not tainted 4.15.0-112-generic #113-Ubuntu
Sep 8 16:36:04 nc4 kernel: [1037582.946375] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 8 16:36:04 nc4 kernel: [1037582.946557] kcompactd0 D 0 28 2 0x80000000
Sep 8 16:36:04 nc4 kernel: [1037582.946562] Call Trace:
Sep 8 16:36:04 nc4 kernel: [1037582.946642] __schedule+0x24e/0x880
Sep 8 16:36:04 nc4 kernel: [1037582.946826] ? btree_releasepage+0x42/0x50 [btrfs]
Sep 8 16:36:04 nc4 kernel: [1037582.946831] schedule+0x2c/0x80
Sep 8 16:36:04 nc4 kernel: [1037582.946833] io_schedule+0x16/0x40
Sep 8 16:36:04 nc4 kernel: [1037582.946853] __lock_page+0xff/0x140
Sep 8 16:36:04 nc4 kernel: [1037582.946861] ? page_cache_tree_insert+0xe0/0xe0
Sep 8 16:36:04 nc4 kernel: [1037582.946869] migrate_pages+0x91f/0xb80
Sep 8 16:36:04 nc4 kernel: [1037582.946875] ? __ClearPageMovable+0x10/0x10
Sep 8 16:36:04 nc4 kernel: [1037582.946881] ? isolate_freepages_block+0x3b0/0x3b0
Sep 8 16:36:04 nc4 kernel: [1037582.946884] compact_zone+0x681/0x950
Sep 8 16:36:04 nc4 kernel: [1037582.946887] kcompactd_do_work+0xfe/0x2a0
Sep 8 16:36:04 nc4 kernel: [1037582.946890] ? __switch_to_asm+0x35/0x70
Sep 8 16:36:04 nc4 kernel: [1037582.946892] ? __switch_to_asm+0x41/0x70
Sep 8 16:36:04 nc4 kernel: [1037582.946895] kcompactd+0x86/0x1c0
Sep 8 16:36:04 nc4 kernel: [1037582.946897] ? kcompactd+0x86/0x1c0
Sep 8 16:36:04 nc4 kernel: [1037582.946912] ? wait_woken+0x80/0x80
Sep 8 16:36:04 nc4 kernel: [1037582.946922] kthread+0x121/0x140
Sep 8 16:36:04 nc4 kernel: [1037582.946924] ? kcompactd_do_work+0x2a0/0x2a0
Sep 8 16:36:04 nc4 kernel: [1037582.946926] ? kthread_create_worker_on_cpu+0x70/0x70
Sep 8 16:36:04 nc4 kernel: [1037582.946929] ret_from_fork+0x35/0x40

Some more infos (all while kcompactd is still blocked):

# uptime
 17:20:25 up 12 days, 57 min, 1 user, load average: 1.48, 1.19, 1.08

# free -h
              total used free shared buff/cache available
Mem: 5.8G 476M 1.6G 72M 3.8G 5.0G
Swap: 6.0G 568M 5.4G

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic

The root FS is BTRF on a LUKS encrypted volume.

# cryptsetup status sda5_crypt
/dev/mapper/sda5_crypt is active and is in use.
  type: LUKS1
  cipher: aes-xts-plain64
  keysize: 512 bits
  key location: ...

Read more...

Revision history for this message
OR (olafr) wrote :

Added requested info: yes, this started to happen recently and could be a regression.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Simon Arlott (sa.me.uk) wrote :

It's a regression from xenial and took a few weeks to appear on bionic.

I tried using lock_page_killable() in __unmap_and_move() but as far as I can tell kcompactd0 hasn't got stuck in the same way. Instead I now have every btrfs_start_transaction suddenly blocking for all processes, which is impossible to debug because I can't even login.

I've installed the 5.4 HWE kernel instead because 4.15 is just unusable on two hosts.

Revision history for this message
Kevin Buhr (buhr) wrote :

As another data point, I was running 4.15.0-106-generic on a BTRFS file server with an uptime since June. After rebooting Sept 4 to 4.15-0-115-generic, I've experienced this sort of deadlock every few days (on Sept 5, 8, and 10) under both the -115 and -117 kernels. So, given the above report on -112, it looks like a regression between -106 and -112. Hope that helps! (I'm going to give the HWE a spin, too.)

Revision history for this message
Simon Arlott (sa.me.uk) wrote :

The HWE kernel still has some instances of btrfs-transaction hanging for 2+ minutes but it recovers. Remember to regularly reset the sysctl for reporting hung tasks or it'll just stop reporting new instances.

The first kernel used on the affected systems was -112 so I don't have experience of the earlier versions.

Revision history for this message
Miroslav Goldgamer (rusonov) wrote :

For some time now I thought the issue is related to the storage for few months now.
I currently have a system with -113 kernel and btrfs running for over 30 days but other systems with this kernel had issues. In other hand - previous kernels had issues (since 106 or so)

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you please do a kernel bisection?

https://wiki.ubuntu.com/Kernel/KernelBisection

Revision history for this message
Tomaž Peršin (montazek) wrote :

I have same problem on two computers
One old i5-2500 running Ubuntu 18.04 - 4.15.0-115-generic with btrfs mounted on /home
One new Ryzen 9 3900X (was) running 18.04 - 4.15.117-generic with btrfs on / and /home
New one I upgraded yesterday to 20.04 and it is now on 5.4.0-47-generic. I hope issue is gone since this machine is important.
Both are running also VMs. New one is running Virtualbox VMs and Old one is running one KVM VM.
On the old one I still have 112 and 66 kernel versions. But prior to 115 I didn't use KVM on this computer. Not sure if virtualization is also trigger for the issue.

Revision history for this message
Miroslav Goldgamer (rusonov) wrote :

I have this issues on VMs running on ESXI hosts. Not a KVM issue.
Currently moved to HWE kernel as suggested and looks like the issue didn't return.
Hopefully the issue will be resolved soon.

Revision history for this message
Druggo Yang (druggo) wrote :

4.15.0-118 still have the issue:

Oct 11 02:20:36 localhost kernel: [1043143.644035] INFO: task kcompactd0:39 blocked for more than 120 seconds.
Oct 11 02:20:36 localhost kernel: [1043143.645095] Not tainted 4.15.0-118-generic #119-Ubuntu
Oct 11 02:20:36 localhost kernel: [1043143.645969] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 11 02:20:36 localhost kernel: [1043143.647159] kcompactd0 D 0 39 2 0x80000000
Oct 11 02:20:36 localhost kernel: [1043143.647161] Call Trace:
Oct 11 02:20:36 localhost kernel: [1043143.647169] __schedule+0x24e/0x880
Oct 11 02:20:36 localhost kernel: [1043143.647172] schedule+0x2c/0x80
Oct 11 02:20:36 localhost kernel: [1043143.647173] io_schedule+0x16/0x40
Oct 11 02:20:36 localhost kernel: [1043143.647176] __lock_page+0xff/0x140
Oct 11 02:20:36 localhost kernel: [1043143.647179] ? page_cache_tree_insert+0xe0/0xe0
Oct 11 02:20:36 localhost kernel: [1043143.647182] migrate_pages+0x91f/0xb80
Oct 11 02:20:36 localhost kernel: [1043143.647184] ? __ClearPageMovable+0x10/0x10
Oct 11 02:20:36 localhost kernel: [1043143.647186] ? isolate_freepages_block+0x3b0/0x3b0
Oct 11 02:20:36 localhost kernel: [1043143.647187] compact_zone+0x681/0x950
Oct 11 02:20:36 localhost kernel: [1043143.647189] kcompactd_do_work+0xfe/0x2a0
Oct 11 02:20:36 localhost kernel: [1043143.647191] ? __switch_to_asm+0x35/0x70
Oct 11 02:20:36 localhost kernel: [1043143.647193] ? __switch_to_asm+0x41/0x70
Oct 11 02:20:36 localhost kernel: [1043143.647195] kcompactd+0x86/0x1c0
Oct 11 02:20:36 localhost kernel: [1043143.647196] ? kcompactd+0x86/0x1c0
Oct 11 02:20:36 localhost kernel: [1043143.647199] ? wait_woken+0x80/0x80
Oct 11 02:20:36 localhost kernel: [1043143.647202] kthread+0x121/0x140
Oct 11 02:20:36 localhost kernel: [1043143.647203] ? kcompactd_do_work+0x2a0/0x2a0
Oct 11 02:20:36 localhost kernel: [1043143.647204] ? kthread_create_worker_on_cpu+0x70/0x70
Oct 11 02:20:36 localhost kernel: [1043143.647206] ret_from_fork+0x35/0x40

Revision history for this message
Bogdan Mart (winnie-ua) wrote :
Download full text (5.0 KiB)

Have same issue:

[ 2538.464378] INFO: task kcompactd0:27 blocked for more than 120 seconds.
[ 2538.466729] Not tainted 4.15.0-122-generic #124-Ubuntu
[ 2538.468652] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2538.471451] kcompactd0 D 0 27 2 0x80000000
[ 2538.471453] Call Trace:
[ 2538.471462] __schedule+0x24e/0x880
[ 2538.471464] schedule+0x2c/0x80
[ 2538.471466] io_schedule+0x16/0x40
[ 2538.471468] __lock_page+0xff/0x140
[ 2538.471472] ? page_cache_tree_insert+0xe0/0xe0
[ 2538.471476] migrate_pages+0x91f/0xb80
[ 2538.471479] ? __ClearPageMovable+0x10/0x10
[ 2538.471482] ? isolate_freepages_block+0x3b0/0x3b0
[ 2538.471484] compact_zone+0x681/0x950
[ 2538.471486] kcompactd_do_work+0xfe/0x2a0
[ 2538.471489] ? __switch_to_asm+0x35/0x70
[ 2538.471491] ? __switch_to_asm+0x41/0x70
[ 2538.471494] kcompactd+0x86/0x1c0
[ 2538.471495] ? kcompactd+0x86/0x1c0
[ 2538.471499] ? wait_woken+0x80/0x80
[ 2538.471502] kthread+0x121/0x140
[ 2538.471504] ? kcompactd_do_work+0x2a0/0x2a0
[ 2538.471506] ? kthread_create_worker_on_cpu+0x70/0x70
[ 2538.471508] ret_from_fork+0x35/0x40

Kernel: 4.15.120

few days before it was hanging inside btrfs:
[12688.280246] INFO: task btrfs-transacti:597 blocked for more than 120 seconds.
[12688.285589] Not tainted 4.15.0-122-generic #124-Ubuntu
[12688.292217] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12688.305038] btrfs-transacti D 0 597 2 0x80000000
[12688.305042] Call Trace:
[12688.305054] __schedule+0x24e/0x880
[12688.305058] ? bit_wait+0x60/0x60
[12688.305059] schedule+0x2c/0x80
[12688.305061] io_schedule+0x16/0x40
[12688.305063] bit_wait_io+0x11/0x60
[12688.305065] __wait_on_bit+0x4c/0x90
[12688.305067] ? bit_wait+0x60/0x60
[12688.305068] out_of_line_wait_on_bit+0x90/0xb0
[12688.305075] ? bit_waitqueue+0x40/0x40
[12688.305116] lock_extent_buffer_for_io+0x100/0x2a0 [btrfs]
[12688.305136] btree_write_cache_pages+0x1b8/0x420 [btrfs]
[12688.305153] btree_writepages+0x5d/0x70 [btrfs]
[12688.305156] do_writepages+0x4b/0xe0
[12688.305174] ? free_extent_state+0x48/0xc0 [btrfs]
[12688.305188] ? merge_state.part.47+0x44/0x130 [btrfs]
[12688.305191] __filemap_fdatawrite_range+0xcf/0x100
[12688.305193] ? __filemap_fdatawrite_range+0xcf/0x100
[12688.305195] filemap_fdatawrite_range+0x13/0x20
[12688.305211] btrfs_write_marked_extents+0x68/0x140 [btrfs]
[12688.305226] btrfs_write_and_wait_marked_extents.constprop.20+0x4f/0x90 [btrfs]
[12688.305241] btrfs_commit_transaction+0x696/0x910 [btrfs]
[12688.305255] ? btrfs_commit_transaction+0x696/0x910 [btrfs]
[12688.305268] ? start_transaction+0x191/0x430 [btrfs]
[12688.305283] transaction_kthread+0x18d/0x1b0 [btrfs]
[12688.305286] kthread+0x121/0x140
[12688.305300] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[12688.305302] ? kthread_create_worker_on_cpu+0x70/0x70
[12688.305305] ret_from_fork+0x35/0x40
[12688.305320] INFO: task postgres:1121 blocked for more than 120 seconds.
[12688.311393] Not tainted 4.15.0-122-generic #124-Ubuntu
[12688.314307] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12688.326485] p...

Read more...

Revision history for this message
Bogdan Mart (winnie-ua) wrote :

attch dmesg

Revision history for this message
Bogdan Mart (winnie-ua) wrote :

I confirm that downgrading to 4.15.0-106 fixed the issue for me.

Revision history for this message
robert packard (bumbabumba) wrote :

Looks like the same issue kcompactd and btrfs-transaction locking up. Can not immediately reproduce, seemingly takes ~2weeks to appear. More problematic systems generally also have higher disk i/o, all btrfs.

%Cpu0  :  3.8 us,  1.3 sy,  0.0 ni,  0.0 id, 94.9 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  1.3 us,  2.6 sy,  0.0 ni, 96.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

%CPU    PID    TID     TIME STAT   RSS    SZ F COMMAND
 0.0     27     27 00:00:03 D        0     0 1 [kcompactd0]
 0.0    618    618 00:03:04 D        0     0 1 [btrfs-transacti]

cat /proc/27/stack
[<0>] __lock_page+0xff/0x140
[<0>] migrate_pages+0x91f/0xb80
[<0>] compact_zone+0x681/0x950
[<0>] kcompactd_do_work+0xfe/0x2a0
[<0>] kcompactd+0x86/0x1c0
[<0>] kthread+0x121/0x140
[<0>] ret_from_fork+0x35/0x40
[<0>] 0xffffffffffffffff

cat /proc/618/stack
[<0>] lock_extent_buffer_for_io+0x100/0x2a0 [btrfs]
[<0>] btree_write_cache_pages+0x1b8/0x420 [btrfs]
[<0>] btree_writepages+0x5d/0x70 [btrfs]
[<0>] do_writepages+0x4b/0xe0
[<0>] __filemap_fdatawrite_range+0xcf/0x100
[<0>] filemap_fdatawrite_range+0x13/0x20
[<0>] btrfs_write_marked_extents+0x68/0x140 [btrfs]
[<0>] btrfs_write_and_wait_marked_extents.constprop.20+0x4f/0x90 [btrfs]
[<0>] btrfs_commit_transaction+0x696/0x910 [btrfs]
[<0>] transaction_kthread+0x18d/0x1b0 [btrfs]
[<0>] kthread+0x121/0x140
[<0>] ret_from_fork+0x35/0x40
[<0>] 0xffffffffffffffff

INFO: task kcompactd0:27 blocked for more than 120 seconds.
      Tainted: P E 4.15.0-122-generic #124-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kcompactd0 D 0 27 2 0x90000000
Call Trace:
 __schedule+0x24e/0x880
 schedule+0x2c/0x80
 io_schedule+0x16/0x40
 __lock_page+0xff/0x140
 ? page_cache_tree_insert+0xe0/0xe0
 migrate_pages+0x91f/0xb80
 ? __ClearPageMovable+0x10/0x10
 ? isolate_freepages_block+0x3b0/0x3b0
 compact_zone+0x681/0x950
 kcompactd_do_work+0xfe/0x2a0
 ? __switch_to_asm+0x35/0x70
 ? __switch_to_asm+0x41/0x70
 kcompactd+0x86/0x1c0
 ? kcompactd+0x86/0x1c0
 ? wait_woken+0x80/0x80
 kthread+0x121/0x140
 ? kcompactd_do_work+0x2a0/0x2a0
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x35/0x40
~

Revision history for this message
K van Murfi (christophe-courtois) wrote :

I had the problem with Ubuntu 18.04 for months. It didn't occur after that I switched to HWE kernels.

No problem since the migration to Ubuntu 20.04.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in postgresql-11 (Ubuntu):
status: New → Confirmed
Revision history for this message
Pierre-Olivier Vallès (povalles) wrote :

Just happened for the second time in 46 hours (and it never happened before):

kernel: [167836.884337] INFO: task kcompactd0:63 blocked for more than 120 seconds.
kernel: [167836.887341] Not tainted 4.15.0-128-generic #131-Ubuntu
kernel: [167836.889880] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: [167836.893754] kcompactd0 D 0 63 2 0x80000000
kernel: [167836.893760] Call Trace:
kernel: [167836.894017] __schedule+0x24e/0x880
kernel: [167836.894031] schedule+0x2c/0x80
kernel: [167836.894034] io_schedule+0x16/0x40
kernel: [167836.894160] __lock_page+0xff/0x140
kernel: [167836.894197] ? page_cache_tree_insert+0xe0/0xe0
kernel: [167836.894246] migrate_pages+0x91f/0xb80
kernel: [167836.894269] ? __ClearPageMovable+0x10/0x10
kernel: [167836.894272] ? isolate_freepages_block+0x3b0/0x3b0
kernel: [167836.894276] compact_zone+0x681/0x950
kernel: [167836.894279] kcompactd_do_work+0xfe/0x2a0
kernel: [167836.894282] ? __switch_to_asm+0x35/0x70
kernel: [167836.894284] ? __switch_to_asm+0x41/0x70
kernel: [167836.894288] kcompactd+0x86/0x1c0
kernel: [167836.894292] ? kcompactd+0x86/0x1c0
kernel: [167836.894395] ? wait_woken+0x80/0x80
kernel: [167836.894445] kthread+0x121/0x140
kernel: [167836.894449] ? kcompactd_do_work+0x2a0/0x2a0
kernel: [167836.894452] ? kthread_create_worker_on_cpu+0x70/0x70
kernel: [167836.894454] ret_from_fork+0x35/0x40

Message appears every 2 minutes for 18 minutes (then I guess it stops reporting) ; after that the disk are blocked, then NFS fails, then the whole service I'm running fails.

It happened twice with kernel 4.15.0-128-generic.

I updated to 4.15.0-137-generic while service was down, let's see how long it will work.
HWE kernel is the next step.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.