I've got a way to rapidly reproduce the error now. I can do it reliably with a turnaround time of 5-10 minutes.
I've reproduced the crash on the new Kernel, so it has now been observed on both 3.13.0-32-generic and 3.15.7-031507-generic. I'll try 3.16 next.
I've also discovered this new stack trace at the point of the crash (setup_cluster_bitmap) which happens every time:
[97056.916053] NMI backtrace for cpu 1 [97056.916053] CPU: 1 PID: 1107 Comm: kworker/u30:1 Not tainted 3.13.0-32-generic #57-Ubuntu [97056.916053] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/23/2014 [97056.916053] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1) [97056.916053] task: ffff8800e8018000 ti: ffff8800da1ac000 task.ti: ffff8800da1ac000 [97056.916053] RIP: 0010:[<ffffffff81044a8c>] [<ffffffff81044a8c>] default_send_IPI_mask_sequence_phys+0xbc/0x100 [97056.916053] RSP: 0018:ffff8801efc23c60 EFLAGS: 00000046 [97056.916053] RAX: 0000000000000400 RBX: 000000000000b06a RCX: 0000000000000001 [97056.916053] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000300 [97056.916053] RBP: ffff8801efc23c98 R08: ffffffff81d13780 R09: 0000000000000d2e [97056.916053] R10: 0000000000000000 R11: ffff8801efc239e6 R12: ffffffff81d13780 [97056.916053] R13: 0000000000000400 R14: 0000000000000086 R15: 0000000000000002 [97056.916053] FS: 0000000000000000(0000) GS:ffff8801efc20000(0000) knlGS:0000000000000000 [97056.916053] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [97056.916053] CR2: 00007f803f12e000 CR3: 0000000101cc2000 CR4: 00000000001406e0 [97056.916053] Stack: [97056.916053] 0000000100000008 0000000000000001 0000000000002710 ffffffff81c4e1c0 [97056.916053] ffffffff81d137a0 ffffffff81c4e1c0 0000000000000001 ffff8801efc23ca8 [97056.916053] ffffffff81049217 ffff8801efc23cc0 ffffffff81044c60 ffff8801efc2e800 [97056.916053] Call Trace: [97056.916053] <IRQ> [97056.916053] [<ffffffff81049217>] physflat_send_IPI_all+0x17/0x20 [97056.916053] [<ffffffff81044c60>] arch_trigger_all_cpu_backtrace+0x70/0xb0 [97056.916053] [<ffffffff810ca9ee>] rcu_check_callbacks+0x3fe/0x650 [97056.916053] [<ffffffff81076227>] update_process_times+0x47/0x70 [97056.916053] [<ffffffff810d5cf5>] tick_sched_handle.isra.17+0x25/0x60 [97056.916053] [<ffffffff810d5d71>] tick_sched_timer+0x41/0x60 [97056.916053] [<ffffffff8108e5e7>] __run_hrtimer+0x77/0x1d0 [97056.916053] [<ffffffff810d5d30>] ? tick_sched_handle.isra.17+0x60/0x60 [97056.916053] [<ffffffff8108edaf>] hrtimer_interrupt+0xef/0x230 [97056.916053] [<ffffffff81009fef>] xen_timer_interrupt+0x2f/0x150 [97056.916053] [<ffffffff81009ef0>] ? xen_clocksource_read+0x20/0x30 [97056.916053] [<ffffffff8101b7e9>] ? sched_clock+0x9/0x10 [97056.916053] [<ffffffff8109d1ad>] ? sched_clock_local+0x1d/0x80 [97056.916053] [<ffffffff810bf78e>] handle_irq_event_percpu+0x3e/0x1d0 [97056.916053] [<ffffffff810c2bbe>] handle_percpu_irq+0x3e/0x60 [97056.916053] [<ffffffff8142dba7>] __xen_evtchn_do_upcall+0x317/0x320 [97056.916053] [<ffffffff8109d98e>] ? __vtime_account_system+0x2e/0x40 [97056.916053] [<ffffffff8109dd2c>] ? vtime_account_system+0x3c/0x50 [97056.916053] [<ffffffff8142fc8b>] xen_evtchn_do_upcall+0x2b/0x50 [97056.916053] [<ffffffff8172e22d>] xen_hvm_callback_vector+0x6d/0x80 [97056.916053] <EOI> [97056.916053] [<ffffffff813732df>] ? find_next_zero_bit+0x8f/0xf0 [97056.916053] [<ffffffffa0164f6a>] setup_cluster_bitmap+0x15a/0x280 [btrfs] [97056.916053] [<ffffffffa0168174>] btrfs_find_space_cluster+0x244/0x290 [btrfs] [97056.916053] [<ffffffffa0115a32>] find_free_extent+0x4d2/0xc30 [btrfs] [97056.916053] [<ffffffffa01162b8>] btrfs_reserve_extent+0xa8/0x1a0 [btrfs] [97056.916053] [<ffffffffa012f165>] cow_file_range+0x135/0x430 [btrfs] [97056.916053] [<ffffffffa0130142>] run_delalloc_range+0x312/0x350 [btrfs] [97056.916053] [<ffffffffa0143f69>] ? find_lock_delalloc_range.constprop.43+0x1b9/0x1f0 [btrfs] [97056.916053] [<ffffffffa0145134>] __extent_writepage+0x2f4/0x750 [btrfs] [97056.916053] [<ffffffffa0144010>] ? end_extent_writepage+0x70/0x70 [btrfs] [97056.916053] [<ffffffffa0145802>] extent_write_cache_pages.isra.30.constprop.50+0x272/0x3d0 [btrfs] [97056.916053] [<ffffffffa0146acd>] extent_writepages+0x4d/0x70 [btrfs] [97056.916053] [<ffffffffa012caf0>] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs] [97056.916053] [<ffffffffa012ac88>] btrfs_writepages+0x28/0x30 [btrfs] [97056.916053] [<ffffffff8115a5ee>] do_writepages+0x1e/0x40 [97056.916053] [<ffffffff811e5f40>] __writeback_single_inode+0x40/0x210 [97056.916053] [<ffffffff811e6cf7>] writeback_sb_inodes+0x247/0x3e0 [97056.916053] [<ffffffff811e6f2f>] __writeback_inodes_wb+0x9f/0xd0 [97056.916053] [<ffffffff811e71a3>] wb_writeback+0x243/0x2c0 [97056.916053] [<ffffffff811e8a79>] bdi_writeback_workfn+0x1b9/0x430 [97056.916053] [<ffffffff810838f2>] process_one_work+0x182/0x450 [97056.916053] [<ffffffff810846e1>] worker_thread+0x121/0x410 [97056.916053] [<ffffffff810845c0>] ? rescuer_thread+0x430/0x430 [97056.916053] [<ffffffff8108b3d2>] kthread+0xd2/0xf0 [97056.916053] [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0 [97056.916053] [<ffffffff8172c5bc>] ret_from_fork+0x7c/0xb0 [97056.916053] [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0 [97056.916053] Code: 90 8b 0c 25 00 73 5f ff 80 e5 10 75 f2 89 f0 c1 e0 18 89 04 25 10 73 5f ff 41 83 ff 02 44 89 e8 41 0f 45 c7 89 04 25 00 73 5f ff <eb> 8a 4c 89 f7 57 9d 0f 1f 44 00 00 48 83 c4 10 5b 41 5c 41 5d
I've got a way to rapidly reproduce the error now. I can do it reliably with a turnaround time of 5-10 minutes.
I've reproduced the crash on the new Kernel, so it has now been observed on both 3.13.0-32-generic and 3.15.7- 031507- generic. I'll try 3.16 next.
I've also discovered this new stack trace at the point of the crash (setup_ cluster_ bitmap) which happens every time:
[97056.916053] NMI backtrace for cpu 1 workfn (flush-btrfs-1) ffffffff81044a8 c>] [<ffffffff81044 a8c>] default_ send_IPI_ mask_sequence_ phys+0xbc/ 0x100 c23c60 EFLAGS: 00000046 0(0000) GS:ffff8801efc2 0000(0000) knlGS:000000000 0000000 217>] physflat_ send_IPI_ all+0x17/ 0x20 c60>] arch_trigger_ all_cpu_ backtrace+ 0x70/0xb0 9ee>] rcu_check_ callbacks+ 0x3fe/0x650 227>] update_ process_ times+0x47/ 0x70 cf5>] tick_sched_ handle. isra.17+ 0x25/0x60 d71>] tick_sched_ timer+0x41/ 0x60 5e7>] __run_hrtimer+ 0x77/0x1d0 d30>] ? tick_sched_ handle. isra.17+ 0x60/0x60 daf>] hrtimer_ interrupt+ 0xef/0x230 fef>] xen_timer_ interrupt+ 0x2f/0x150 ef0>] ? xen_clocksource _read+0x20/ 0x30 7e9>] ? sched_clock+ 0x9/0x10 1ad>] ? sched_clock_ local+0x1d/ 0x80 78e>] handle_ irq_event_ percpu+ 0x3e/0x1d0 bbe>] handle_ percpu_ irq+0x3e/ 0x60 ba7>] __xen_evtchn_ do_upcall+ 0x317/0x320 98e>] ? __vtime_ account_ system+ 0x2e/0x40 d2c>] ? vtime_account_ system+ 0x3c/0x50 c8b>] xen_evtchn_ do_upcall+ 0x2b/0x50 22d>] xen_hvm_ callback_ vector+ 0x6d/0x80 2df>] ? find_next_ zero_bit+ 0x8f/0xf0 f6a>] setup_cluster_ bitmap+ 0x15a/0x280 [btrfs] 174>] btrfs_find_ space_cluster+ 0x244/0x290 [btrfs] a32>] find_free_ extent+ 0x4d2/0xc30 [btrfs] 2b8>] btrfs_reserve_ extent+ 0xa8/0x1a0 [btrfs] 165>] cow_file_ range+0x135/ 0x430 [btrfs] 142>] run_delalloc_ range+0x312/ 0x350 [btrfs] f69>] ? find_lock_ delalloc_ range.constprop .43+0x1b9/ 0x1f0 [btrfs] 134>] __extent_ writepage+ 0x2f4/0x750 [btrfs] 010>] ? end_extent_ writepage+ 0x70/0x70 [btrfs] 802>] extent_ write_cache_ pages.isra. 30.constprop. 50+0x272/ 0x3d0 [btrfs] acd>] extent_ writepages+ 0x4d/0x70 [btrfs] af0>] ? btrfs_real_ readdir+ 0x5b0/0x5b0 [btrfs] c88>] btrfs_writepage s+0x28/ 0x30 [btrfs] 5ee>] do_writepages+ 0x1e/0x40 f40>] __writeback_ single_ inode+0x40/ 0x210 cf7>] writeback_ sb_inodes+ 0x247/0x3e0 f2f>] __writeback_ inodes_ wb+0x9f/ 0xd0 1a3>] wb_writeback+ 0x243/0x2c0 a79>] bdi_writeback_ workfn+ 0x1b9/0x430 8f2>] process_ one_work+ 0x182/0x450 6e1>] worker_ thread+ 0x121/0x410 5c0>] ? rescuer_ thread+ 0x430/0x430 3d2>] kthread+0xd2/0xf0 300>] ? kthread_ create_ on_node+ 0x1d0/0x1d0 5bc>] ret_from_ fork+0x7c/ 0xb0 300>] ? kthread_ create_ on_node+ 0x1d0/0x1d0
[97056.916053] CPU: 1 PID: 1107 Comm: kworker/u30:1 Not tainted 3.13.0-32-generic #57-Ubuntu
[97056.916053] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/23/2014
[97056.916053] Workqueue: writeback bdi_writeback_
[97056.916053] task: ffff8800e8018000 ti: ffff8800da1ac000 task.ti: ffff8800da1ac000
[97056.916053] RIP: 0010:[<
[97056.916053] RSP: 0018:ffff8801ef
[97056.916053] RAX: 0000000000000400 RBX: 000000000000b06a RCX: 0000000000000001
[97056.916053] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000300
[97056.916053] RBP: ffff8801efc23c98 R08: ffffffff81d13780 R09: 0000000000000d2e
[97056.916053] R10: 0000000000000000 R11: ffff8801efc239e6 R12: ffffffff81d13780
[97056.916053] R13: 0000000000000400 R14: 0000000000000086 R15: 0000000000000002
[97056.916053] FS: 000000000000000
[97056.916053] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[97056.916053] CR2: 00007f803f12e000 CR3: 0000000101cc2000 CR4: 00000000001406e0
[97056.916053] Stack:
[97056.916053] 0000000100000008 0000000000000001 0000000000002710 ffffffff81c4e1c0
[97056.916053] ffffffff81d137a0 ffffffff81c4e1c0 0000000000000001 ffff8801efc23ca8
[97056.916053] ffffffff81049217 ffff8801efc23cc0 ffffffff81044c60 ffff8801efc2e800
[97056.916053] Call Trace:
[97056.916053] <IRQ>
[97056.916053] [<ffffffff81049
[97056.916053] [<ffffffff81044
[97056.916053] [<ffffffff810ca
[97056.916053] [<ffffffff81076
[97056.916053] [<ffffffff810d5
[97056.916053] [<ffffffff810d5
[97056.916053] [<ffffffff8108e
[97056.916053] [<ffffffff810d5
[97056.916053] [<ffffffff8108e
[97056.916053] [<ffffffff81009
[97056.916053] [<ffffffff81009
[97056.916053] [<ffffffff8101b
[97056.916053] [<ffffffff8109d
[97056.916053] [<ffffffff810bf
[97056.916053] [<ffffffff810c2
[97056.916053] [<ffffffff8142d
[97056.916053] [<ffffffff8109d
[97056.916053] [<ffffffff8109d
[97056.916053] [<ffffffff8142f
[97056.916053] [<ffffffff8172e
[97056.916053] <EOI>
[97056.916053] [<ffffffff81373
[97056.916053] [<ffffffffa0164
[97056.916053] [<ffffffffa0168
[97056.916053] [<ffffffffa0115
[97056.916053] [<ffffffffa0116
[97056.916053] [<ffffffffa012f
[97056.916053] [<ffffffffa0130
[97056.916053] [<ffffffffa0143
[97056.916053] [<ffffffffa0145
[97056.916053] [<ffffffffa0144
[97056.916053] [<ffffffffa0145
[97056.916053] [<ffffffffa0146
[97056.916053] [<ffffffffa012c
[97056.916053] [<ffffffffa012a
[97056.916053] [<ffffffff8115a
[97056.916053] [<ffffffff811e5
[97056.916053] [<ffffffff811e6
[97056.916053] [<ffffffff811e6
[97056.916053] [<ffffffff811e7
[97056.916053] [<ffffffff811e8
[97056.916053] [<ffffffff81083
[97056.916053] [<ffffffff81084
[97056.916053] [<ffffffff81084
[97056.916053] [<ffffffff8108b
[97056.916053] [<ffffffff8108b
[97056.916053] [<ffffffff8172c
[97056.916053] [<ffffffff8108b
[97056.916053] Code: 90 8b 0c 25 00 73 5f ff 80 e5 10 75 f2 89 f0 c1 e0 18 89 04 25 10 73 5f ff 41 83 ff 02 44 89 e8 41 0f 45 c7 89 04 25 00 73 5f ff <eb> 8a 4c 89 f7 57 9d 0f 1f 44 00 00 48 83 c4 10 5b 41 5c 41 5d