UbuntuKVM guest crashed while running I/O stress test with Ubuntu kernel 4.4.0-47-generic

Bug #1659111 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
In Progress
High
Unassigned
Xenial
Fix Released
Undecided
Kleber Sacilotto de Souza
Yakkety
Won't Fix
Undecided
Kleber Sacilotto de Souza
Zesty
Incomplete
High
Kleber Sacilotto de Souza

Bug Description

Attn. Canonical: For your awareness only at this time.

== Comment: #0 - LEKSHMI C. PILLAI - 2016-11-22 03:49:38 ==

Machine INFO

KVM HOST: luckyv1

Guest :lucky05

lucky05 crashed while running the I/O stress test for SAN disks.

Installed lucky05 and enabled the xmon on that.After that started the RAW disk test on around 50 disks.After 6-7 hours after running,Now machine dropped into xmon.

Logs:
[25023.224182] Unable to handle kernel paging request for data at address 0x00000000
[25023.224257] Faulting instruction address: 0xc000000000324c60
cpu 0x3: Vector: 300 (Data Access) at [c0000000fffc3620]
    pc: c000000000324c60: locked_inode_to_wb_and_lock_list+0x50/0x290
    lr: c00000000032831c: writeback_sb_inodes+0x30c/0x590
    sp: c0000000fffc38a0
   msr: 8000000100009033
   dar: 0
 dsisr: 40000000
  current = 0xc0000000ff99e470
  paca = 0xc00000000fb41c80 softe: 0 irq_happened: 0x01
    pid = 14736, comm = kworker/u16:8
enter ? for help
[c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590
[c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
[c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450
[c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570
[c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0
[c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680
[c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130
[c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4
3:mon> f
3:mon> th
[c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590
[c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
[c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450
[c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570
[c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0
[c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680
[c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130
[c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4
3:mon> sh
[27384.651055] INFO: rcu_sched detected stalls on CPUs/tasks:
[27384.651220] (detected by 4, t=40598 jiffies, g=2849830, c=2849829, q=992)
[27384.651286] All QSes seen, last rcu_sched kthread activity 40596 (4301188714-4301148118), jiffies_till_next_fqs=1, root ->qsmask 0x0
[27384.651501] rcu_sched kthread starved for 40596 jiffies! g2849830 c2849829 f0x2 s3 ->state=0x0
[27384.651747] INFO: rcu_sched detected stalls on CPUs/tasks:
[27384.651905] (detected by 4, t=590354 jiffies, g=2849830, c=2849829, q=1285)
[27384.652012] All QSes seen, last rcu_sched kthread activity 590352 (4301738470-4301148118), jiffies_till_next_fqs=1, root ->qsmask 0x0
[27384.652191] rcu_sched kthread starved for 590352 jiffies! g2849830 c2849829 f0x2 s3 ->state=0x0
[27384.730645] Unable to handle kernel paging request for data at address 0xffffffffffffffd8
[27384.730781] Faulting instruction address: 0xc0000000000e7258
cpu 0x3: Vector: 300 (Data Access) at [c0000000fffc3000]
    pc: c0000000000e7258: kthread_data+0x28/0x40
    lr: c0000000000de940: wq_worker_sleeping+0x30/0x110
    sp: c0000000fffc3280
   msr: 8000000100009033
   dar: ffffffffffffffd8
 dsisr: 40000000
  current = 0xc0000000ff99e470
  paca = 0xc00000000fb41c80 softe: 0 irq_happened: 0x01
    pid = 14736, comm = kworker/u16:8
enter ? for help

== Comment: #1 - LEKSHMI C. PILLAI - 2016-11-22 04:05:41 ==
3:mon> th
[c0000000fffc32b0] c0000000000de940 wq_worker_sleeping+0x30/0x110
[c0000000fffc32f0] c000000000af31bc __schedule+0x6ec/0x990
[c0000000fffc33c0] c000000000af34a8 schedule+0x48/0xc0
[c0000000fffc33f0] c0000000000bd3d0 do_exit+0x760/0xc30
[c0000000fffc34b0] c000000000020bf4 die+0x314/0x470
[c0000000fffc3540] c000000000050d98 bad_page_fault+0xd8/0x150
[c0000000fffc35b0] c000000000008680 handle_page_fault+0x2c/0x30
--- Exception: 300 (Data Access) at c000000000324c60 locked_inode_to_wb_and_lock_list+0x50/0x290
[c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590
[c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
[c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450
[c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570
[c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0
[c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680
[c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130
[c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4
3:mon>

== Comment: #6 - Laurent Dufour - 2016-11-23 03:00:16 ==
Logged in luckyv1, found a lot of ipr issue on this node:
[525973.896624] qla2xxx 0005:09:00.0: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update
[525973.956619] qla2xxx 0005:09:00.1: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update
[529433.834853] ipr 0001:04:00.0: FFFE: Soft device bus error recovered by the IOA
[529433.834867] ipr: -----Failing Device Information-----
[529433.834870] ipr: World Wide Unique ID: 500507605EC10C000000000000000000
[529433.834873] ipr: Device Resource Path: FF
[529433.834875] ipr: Primary Problem Description: Command Timeout
[529433.834878] ipr: Secondary Problem Description: Command timeout expired
[529433.834880] ipr: SCSI Sense Data:
[529433.834882] ipr: 00000000: 00000000 00000000 00000000 00000000
[529433.834884] ipr: 00000010: 00000000 00000000 00000000 00000000
[529433.834886] ipr: SCSI Command Descriptor Block:
[529433.834889] ipr: 00000000: 9E120004 0F000000 00000000 0020AD00
[529433.834891] ipr: Additional IOA Data:
[529433.834893] ipr: 00000000: 4646001C 44010007 00050000 04700002
[529433.834895] ipr: 00000010: 3B894A49 1EE620CC 04700002 49574631
[529433.834897] ipr: 00000020: 455300CC 06B00027 00000020 84000000
[529433.834899] ipr: 00000030: 00000000 05801000 0B29A7C0 00000000
[529433.834901] ipr: 00000040: 00000000 00000000 00000000 00000000
[529433.834904] ipr: 00000050: 00000000 00000000 00000000 00000000
[529433.834906] ipr: 00000060: 00000000 00000000 00000000 00000000
[529433.834908] ipr: 00000070: 00000000 00000000 00000000 00000000
[529433.834910] ipr: 00000080: 00000000 00000000 00000000 00000000
[529433.834912] ipr: 00000090: 00000000 00000000 00000000 00000000
[529433.834914] ipr: 000000A0: 00000000 D4000018 80000000 FFFFFFFF
[529433.834917] ipr: 000000B0: FFFFFFFF 00000000 0980EC21 00000000
[529433.834919] ipr: 000000C0: 00000000 00000000 01769A24 00000000
[529433.834921] ipr: 000000D0: 01D3C300 E0050000 FFFFFFFE 0B5A0000
[529433.834923] ipr: 000000E0: 00000000 9E120004 0F000000 00000000
[529433.834926] ipr: 000000F0: 43440010 9E120004 0F000000 00000000
[529433.834928] ipr: 00000100: 0020AD00 45480010 0100E038 9E12FFFF
[529433.834930] ipr: 00000110: 01080002 00000000 45540004 00001463

In addition there are some NFS issue reported:
[563034.817901] nfs: server 10.33.11.31 not responding, timed out
[563405.504308] nfs: server 10.33.11.31 not responding, timed out

This said, chig5 enter xmon due to a bad pointer in the kernel:
3:mon> e
cpu 0x3: Vector: 300 (Data Access) at [c0000000fffc3000]
    pc: c0000000000e7258: kthread_data+0x28/0x40
    lr: c0000000000de940: wq_worker_sleeping+0x30/0x110
    sp: c0000000fffc3280
   msr: 8000000100009033
   dar: ffffffffffffffd8
 dsisr: 40000000
  current = 0xc0000000ff99e470
  paca = 0xc00000000fb41c80 softe: 0 irq_happened: 0x01
    pid = 14736, comm = kworker/u16:8
3:mon> th
[c0000000fffc32b0] c0000000000de940 wq_worker_sleeping+0x30/0x110
[c0000000fffc32f0] c000000000af31bc __schedule+0x6ec/0x990
[c0000000fffc33c0] c000000000af34a8 schedule+0x48/0xc0
[c0000000fffc33f0] c0000000000bd3d0 do_exit+0x760/0xc30
[c0000000fffc34b0] c000000000020bf4 die+0x314/0x470
[c0000000fffc3540] c000000000050d98 bad_page_fault+0xd8/0x150
[c0000000fffc35b0] c000000000008680 handle_page_fault+0x2c/0x30
--- Exception: 300 (Data Access) at c000000000324c60 locked_inode_to_wb_and_lock_list+0x50/0x290
[c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590
[c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
[c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450
[c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570
[c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0
[c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680
[c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130
[c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4

Looking at the other guest as Lekshmi mentioned that all the guests are crashing.

== Comment: #7 - Laurent Dufour - 2016-11-23 03:24:34 ==
The guest lucky01 (4.4.0-47-generic) is fine :
root@lucky01:/Blast# date
Wed Nov 23 03:04:23 CST 2016

The guest lucky02 (4.4.0-47generic) has entered xmon due to the same issue as lukcy05:
7:mon> e
cpu 0x7: Vector: 300 (Data Access) at [c0000001f265b620]
    pc: c000000000324c60: locked_inode_to_wb_and_lock_list+0x50/0x290
    lr: c00000000032831c: writeback_sb_inodes+0x30c/0x590
    sp: c0000001f265b8a0
   msr: 8000000100009033
   dar: 0
 dsisr: 40000000
  current = 0xc0000001f222fcc0
  paca = 0xc00000000fb44280 softe: 0 irq_happened: 0x01
    pid = 12062, comm = kworker/u16:3
7:mon> t
[c0000001f265b900] c00000000032831c writeback_sb_inodes+0x30c/0x590
[c0000001f265ba10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
[c0000001f265ba70] c000000000328aec wb_writeback+0x30c/0x450
[c0000001f265bb40] c0000000003296b4 wb_workfn+0x264/0x570
[c0000001f265bc50] c0000000000dd930 process_one_work+0x1e0/0x5a0
[c0000001f265bce0] c0000000000dde84 worker_thread+0x194/0x680
[c0000001f265bd80] c0000000000e6980 kthread+0x110/0x130
[c0000001f265be30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4
--- Exception: 0 at 0000000000000000

The guest lucky03 didn't enter xmon but is not responding any more. Unfornately sysrq is not enabled on this guest. There are still some activity on this guest.
root@luckyv1:~# virsh qemu-monitor-command --hmp lucky03 'info cpus'
* CPU #0: nip=0xc0000000001035e0 thread_id=76434
  CPU #1: nip=0xc0000000000863dc thread_id=76435
  CPU #2: nip=0xc0000000000863dc thread_id=76436
  CPU #3: nip=0xc0000000000863dc thread_id=76437
  CPU #4: nip=0xc0000000000863dc thread_id=76439
  CPU #5: nip=0xc0000000000863dc thread_id=76440
  CPU #6: nip=0x0000000010072f68 thread_id=76441
  CPU #7: nip=0xc0000000000863dc thread_id=76442

The guest lucky04 is not responding but neither enter xmon, but sysrq are not enabled on this node.
But the node seems to be still active:
root@luckyv1:~# virsh qemu-monitor-command --hmp lucky04 'info cpus'
* CPU #0: nip=0xc000000000af8834 thread_id=68201
  CPU #1: nip=0xc0000000000863dc thread_id=68202
  CPU #2: nip=0xc0000000000645ac thread_id=68203
  CPU #3: nip=0xc0000000000863dc thread_id=68204
  CPU #4: nip=0xc0000000000863dc thread_id=68205
  CPU #5: nip=0xc0000000000863dc thread_id=68206
  CPU #6: nip=0xc000000000064590 thread_id=68207
  CPU #7: nip=0xc000000000af8904 thread_id=68208

The guest lucky06 is alive:
root@lucky06:/# cat /proc/version; date
Linux version 4.4.0-47-generic (buildd@bos01-ppc64el-008) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #68-Ubuntu SMP Wed Oct 26 19:38:24 UTC 2016
Wed Nov 23 03:20:19 CST 2016

To summarize:
lucky01 good
lucky02 panic in locked_inode_to_wb_and_lock_list()
lucky03 not responding but still active
lucky04 not responding but still active
lucky05 panic in locked_inode_to_wb_and_lock_list()
lucky06 good

== Comment: #10 - Laurent Dufour - 2016-11-24 10:27:52 ==
Here the data I captured on lucky02 which did panic the way lucky05 did.

CPU 7 panic due to a data access error:
 7:mon> e
cpu 0x7: Vector: 300 (Data Access) at [c0000001f265b620]
    pc: c000000000324c60: locked_inode_to_wb_and_lock_list+0x50/0x290
    lr: c00000000032831c: writeback_sb_inodes+0x30c/0x590
    sp: c0000001f265b8a0
   msr: 8000000100009033
   dar: 0
 dsisr: 40000000
  current = 0xc0000001f222fcc0
  paca = 0xc00000000fb44280 softe: 0 irq_happened: 0x01
    pid = 12062, comm = kworker/u16:3
7:mon> r
R00 = c00000000032831c R16 = c0000001fc972ef8
R01 = c0000001f265b8a0 R17 = c0000001fc972e70
R02 = c0000000015c6a00 R18 = c0000001fc972f60
R03 = c0000001fc972e70 R19 = 0000000000000000
R04 = c0000001f2230700 R20 = 0000000000000000
R05 = 0000000000000000 R21 = c0000001f2658000
R06 = 00000001fef30000 R22 = c0000001f35d5c88
R07 = 000108f684c40713 R23 = c0000001f35d5c68
R08 = 0000000000000000 R24 = 0000000000000000
R09 = 0000000000000000 R25 = c0000001fc972ef8
R10 = 0000000080000007 R26 = 0000000000000000
R11 = 00000000030883ec R27 = 0000000000000000
R12 = 0000000000000000 R28 = 0000000000000001
R13 = c00000000fb44280 R29 = c0000001fc972e70
R14 = c0000000000e6878 R30 = c0000001f265bba0
R15 = 0000000000000000 R31 = 0000000000000000
pc = c000000000324c60 locked_inode_to_wb_and_lock_list+0x50/0x290
cfar= 00003fff9647a5a8
lr = c00000000032831c writeback_sb_inodes+0x30c/0x590
msr = 8000000100009033 cr = 24652882
ctr = c000000000110b50 xer = 0000000020000000 trap = 300
dar = 0000000000000000 dsisr = 40000000
7:mon> t
[c0000001f265b900] c00000000032831c writeback_sb_inodes+0x30c/0x590
[c0000001f265ba10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
[c0000001f265ba70] c000000000328aec wb_writeback+0x30c/0x450
[c0000001f265bb40] c0000000003296b4 wb_workfn+0x264/0x570
[c0000001f265bc50] c0000000000dd930 process_one_work+0x1e0/0x5a0
[c0000001f265bce0] c0000000000dde84 worker_thread+0x194/0x680
[c0000001f265bd80] c0000000000e6980 kthread+0x110/0x130
[c0000001f265be30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4

The system tried to access data pointed by r31 which contains data retrieved from the inode address stored in r29.
The panic happened during the inline call to wb_get when inode->i_wb is used.
So here inode->i_wb is null which is not expeted to happen.

At this time, CPU 6 is waiting for the same inode's spinlock inode->i_lock to be released here:
6:mon> t
[link register ] c000000000064624 __spin_yield+0xb4/0xc0
[c0000000fdb93900] c0000000fdb93940 (unreliable)
[c0000000fdb93970] c000000000af8968 _raw_spin_lock+0xd8/0xe0
[c0000000fdb939a0] c000000000327330 __mark_inode_dirty+0xd0/0x4a0
[c0000000fdb93a20] c0000000003326f0 mark_buffer_dirty+0x1f0/0x210
[c0000000fdb93a60] c000000000334ff0 __block_commit_write.isra.7+0xf0/0x170
[c0000000fdb93ad0] c00000000033513c block_write_end+0x7c/0x100
[c0000000fdb93b20] c00000000033a340 blkdev_write_end+0x60/0xa0
[c0000000fdb93b80] c00000000022d340 generic_perform_write+0x180/0x280
[c0000000fdb93c20] c00000000022f568 __generic_file_write_iter+0x208/0x250
[c0000000fdb93c80] c00000000033b498 blkdev_write_iter+0x98/0x160
[c0000000fdb93cf0] c0000000002e24a4 new_sync_write+0xc4/0x120
[c0000000fdb93d90] c0000000002e32a0 vfs_write+0xc0/0x230
[c0000000fdb93de0] c0000000002e42dc SyS_write+0x6c/0x110
[c0000000fdb93e30] c000000000009204 system_call+0x38/0xb4
--- Exception: c01 (System Call) at 00003fff944c6728
SP (3ffef9ffe0c0) is in userspace

The CPU 6 hold the inode->i_lock in the call to inode_to_wb_and_lock_list().
Why inode->i_wb is null ?

== Comment: #11 - Laurent Dufour - 2016-11-25 11:57:50 ==
I found that lucky03 hit the panic also.
I took a closer look and it seems that there is a lock / memory barrier issue around between the code run in locked_inode_to_wb_and_lock_list() and another CPU. I found that the CPU 5 was running 'latest_blast' at the time the CPU 0 hit the panic. The same applied on lucky02.

== Comment: #13 - Laurent Dufour - 2016-12-05 07:32:30 ==
I did some test on luckyv05 and I was able to recreate it on 4.8 vanilla kernel:
[113031.075540] Unable to handle kernel paging request for data at address 0x00000000
[113031.075614] Faulting instruction address: 0xc0000000003692e0
0:mon> t
[c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590
[c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150
[c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450
[c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580
[c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590
[c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660
[c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130
[c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c
--- Exception: 0 at 0000000000000000
0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c0000000fb65f620]
    pc: c0000000003692e0: locked_inode_to_wb_and_lock_list+0x50/0x290
    lr: c00000000036cb6c: writeback_sb_inodes+0x30c/0x590
    sp: c0000000fb65f8a0
   msr: 800000010280b033
   dar: 0
 dsisr: 40000000
  current = 0xc0000001d69be400
  paca = 0xc000000003480000 softe: 0 irq_happened: 0x01
    pid = 18689, comm = kworker/u16:10
Linux version 4.8.0 (laurent@lucky05) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #1 SMP Thu Dec 1 09:25:13 CST 2016

So this is not a Ubuntu's issue but a more global one which is not fixed by the patch
https://patchwork.kernel.org/patch/9247955/
as expected while investigating the bug 142781.

== Comment: #17 - Laurent Dufour - 2016-12-07 03:22:05 ==
For the record, I also hit the bug with 4.9-rc8:
4:mon> t
[c000000012a7f900] c0000000003787cc writeback_sb_inodes+0x30c/0x590
[c000000012a7fa10] c000000000378b34 __writeback_inodes_wb+0xe4/0x150
[c000000012a7fa70] c000000000378f9c wb_writeback+0x30c/0x450
[c000000012a7fb40] c000000000379df8 wb_workfn+0x268/0x580
[c000000012a7fc50] c0000000000f8c20 process_one_work+0x1e0/0x590
[c000000012a7fce0] c0000000000f9078 worker_thread+0xa8/0x650
[c000000012a7fd80] c000000000101a30 kthread+0x110/0x130
[c000000012a7fe30] c00000000000c0e8 ret_from_kernel_thread+0x5c/0x74
4:mon> e
cpu 0x4: Vector: 300 (Data Access) at [c000000012a7f620]
    pc: c000000000374f40: locked_inode_to_wb_and_lock_list+0x50/0x290
    lr: c0000000003787cc: writeback_sb_inodes+0x30c/0x590
    sp: c000000012a7f8a0
   msr: 800000010280b033
   dar: 0
 dsisr: 40000000
  current = 0xc000000011540000
  paca = 0xc000000003482400 softe: 0 irq_happened: 0x01
    pid = 8357, comm = kworker/u16:3
Linux version 4.9.0-rc8 (root@lucky05) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Tue Dec 6 05:17:47 CST 2016

== Comment: #24 - Thiago Jung Bauermann - 2017-01-11 16:09:45 ==
Dan Willians posted on 01/06 a patch series which aims to solve this bug:

https://www.spinics.net/lists/linux-fsdevel/msg106092.html

Unfortunately, the kernel test robot found problems with it:

http://lkml.iu.edu/hypermail/linux/kernel/1701.1/00239.html

Still, I think it's useful to perform tests to confirm that:

1. v4.10 is still affected by the problem and
2. Dan's patches fix this bug.

Therefore, could you please reproduce this bug on the unmodified v4.10-rc3 build below?

http://kernel.stglabs.ibm.com/~bauermann/bug149014/v4.10-rc3/

This will allow us to confirm point 1.

Then, can you please try to reproduce it with the build below?

http://kernel.stglabs.ibm.com/~bauermann/bug149014/fix-backing_dev_info-lifetime-v2/

This one is v4.10-rc3 plus Dan Willian's two patches from my link above applied to it.

== Comment: #28 - Lata Kuntal - 2017-01-16 01:34:05 ==
I am seeing the same crash issue on one of UbuntuKVM 16.04.02 guest gusg8.
Pasting the console logs below :

root@guskvm:~# virsh console gusg8 --force
Connected to domain gusg8
Escape character is ^]

0:mon>
0:mon>
0:mon> t
[c00000023d1ab900] c00000000036a41c writeback_sb_inodes+0x30c/0x590
[c00000023d1aba10] c00000000036a784 __writeback_inodes_wb+0xe4/0x150
[c00000023d1aba70] c00000000036abfc wb_writeback+0x30c/0x450
[c00000023d1abb40] c00000000036ba38 wb_workfn+0x268/0x580
[c00000023d1abc50] c0000000000ef5e8 process_one_work+0x1e8/0x5b0
[c00000023d1abce0] c0000000000efa58 worker_thread+0xa8/0x650
[c00000023d1abd80] c0000000000f8224 kthread+0x114/0x140
[c00000023d1abe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c
--- Exception: 0 at 0000000000000000
0:mon>
0:mon>
0:mon> d
0000000000000000 **************** **************** | |
0:mon> r
R00 = c00000000036a41c R16 = c00000027ca0e868
R01 = c00000023d1ab8a0 R17 = c00000027ca0e7e0
R02 = c0000000014a6600 R18 = c00000027ca0e8d0
R03 = c00000027ca0e7e0 R19 = 0000000000000000
R04 = c0000001b092e710 R20 = 0000000000000000
R05 = 0000000000000000 R21 = c00000023d1a8000
R06 = 000000027ee30000 R22 = c000000273aace50
R07 = 00001d0c11165f1a R23 = c000000273aace30
R08 = 0000000000000000 R24 = 0000000000000000
R09 = 0000000000000000 R25 = 0000000000000000
R10 = 0000000080000000 R26 = c00000027ca0e868
R11 = c0000000014daae0 R27 = 0000000000000000
R12 = 0000000000005500 R28 = 0000000000000001
R13 = c00000000fb80000 R29 = c00000027ca0e7e0
R14 = c0000000000f8118 R30 = c00000023d1abba0
R15 = 0000000000000000 R31 = 0000000000000000
pc = c000000000366be4 locked_inode_to_wb_and_lock_list+0x54/0x290
cfar= d000000004bbf2e4 xfs_buf_delwri_submit_buffers+0x1e4/0x2b0 [xfs]
lr = c00000000036a41c writeback_sb_inodes+0x30c/0x590
msr = 800000010280b033 cr = 24aa2882
ctr = c000000000122210 xer = 0000000020000000 trap = 300
dar = 0000000000000000 dsisr = 40000000
0:mon> c
cpus stopped: 0x0-0x3
0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c00000023d1ab620]
    pc: c000000000366be4: locked_inode_to_wb_and_lock_list+0x54/0x290
    lr: c00000000036a41c: writeback_sb_inodes+0x30c/0x590
    sp: c00000023d1ab8a0
   msr: 800000010280b033
   dar: 0
 dsisr: 40000000
  current = 0xc0000001b092dc00
  paca = 0xc00000000fb80000 softe: 0 irq_happened: 0x01
    pid = 774, comm = kworker/u8:3
Linux version 4.8.0-34-generic (buildd@bos01-ppc64el-026) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #36~16.04.1-Ubuntu SMP Wed Dec 21 18:53:20 UTC 2016 (Ubuntu 4.8.0-34.36~16.04.1-generic 4.8.11)
0:mon>

== Comment: #33 - Thiago Jung Bauermann - 2017-01-23 15:31:24 ==
Lekshmi mentioned that she wasn't able to reproduce this bug with kernel 4.10.0-rc3fixlifetime+, so I replied to Dan's patch series mentioning that it fixes this bug:

https://www.spinics.net/lists/linux-fsdevel/msg106830.html

Let's see if he answers back with a status or thoughts regarding the patch series.

== Comment: #34 - LEKSHMI C. PILLAI - 2017-01-24 00:26:22 ==
Hi

The fix worked with 4.10.0-rc3fixlifetime+ kernel.Need to know which kernel the fix is going to be.and whether able to get the workaround for 16.04.02 ie; kernel 4.8

Thanks
Lekshmi

Revision history for this message
bugproxy (bugproxy) wrote : lucky05 dmesg buffer

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-149014 severity-critical targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : gusg8 dl, host logs

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : lucky05 dmesg buffer

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : gusg8 dl, host logs

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-01-24 18:04 EDT-------
Hello Canonical,

This is a crash we see on Ubuntu 16.04.02 guests when stress testing I/O.
Comment #24 summarizes the current state of this bug.

Basically, it was reported upstream already and it is present even in v4.10-rc3.

Dan Williams from Intel posted a patch series which fixes it. Unfortunately there are still issues with the patch series and he didn't post a newer version yet, therefore at this point this bug is for your awareness.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-01-26 09:45 EDT-------
Dan Williams responded mentioning that he is planning to address the issues found by the test robot:

https://www.spinics.net/lists/linux-fsdevel/msg106907.html

Then Jan Kara and Christoph Hellwig followed up with their thoughts on what is going wrong and what could be done. I suspect a fix is afoot.

Revision history for this message
Michael Hohnbaum (hohnbaum) wrote :

Kernel team is aware of this bug, removing Tasty Taco

Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → nobody
Manoj Iyer (manjo)
Changed in linux (Ubuntu):
assignee: nobody → Taco Screen team (taco-screen-team)
Changed in linux (Ubuntu):
status: New → Incomplete
bugproxy (bugproxy)
tags: removed: bugnameltc-149014 severity-critical
bugproxy (bugproxy)
tags: added: bugnameltc-149014 severity-critical
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-02-13 09:09 EDT-------
Hi

The fix is working fine.The machine is up and tests are running fine.As far it didn't crashed.

Thanks
Lekshmi

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-03-02 05:34 EDT-------
*** Bug 146487 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-03-13 14:27 EDT-------
Hello Canonical,

Let me provide a status update:

To fix this bug it's necessary to fix several issues with lifetime of structures in the block subsystem, as well as a number of related race conditions. A number of patches need to be backported to fix these problems:

The "BDI lifetime fix" v3 series?, which was already merged into Linus' tree:

f44f1ab5a2dc block: Unhash block device inodes on gendisk destruction
dc3b17cc8bf2 block: Use pointer to backing_dev_info from request_queue
d03f6cdc1fc4 block: Dynamically allocate and refcount backing_dev_info
b1d2dc5659b4 block: Make blk_get_backing_dev_info() safe without open bdev
efa7c9f97e3e block: Get rid of blk_get_backing_dev_info()

A couple of follow up fixes were merged for commits dc3b17cc8bf and b1d2dc5659b4:

e17354961bb5 zram_drv: update for backing dev info changes
a5a79d00017c block: Initialize bd_bdi on inode initialization

The first 4 patches from the "block: Fix block device shutdown related races" v2 series? were also merged:

4b8c861a7c79 block: Move bdev_unhash_inode() after invalidate_partition()
d06e05c026ab block: Unhash also block device inode for the whole device
cccd9fb9ec96 block: Revalidate i_bdev reference in bd_aquire()
165a5e22fafb block: Move bdi_unregister() to del_gendisk()

A couple of follow up fixes were also merged for commit 165a5e22fafb:

b6f8fec4448a block: Allow bdi re-registration
df23de55615f bdi: Fix use-after-free in wb_congested_put()

Commit df23de55615f depends on the following commit:

5f478e4ea5c5 block: fix double-free in the failure path of cgwb_bdi_init()

In addition to the commits mentioned above, we will also need the patches from v4 of the same series, which was posted today:

https://marc.info/?l=linux-block&m=148941806220331&w=2

I have a branch with the patches that were already merged upstream backported on top of tag Ubuntu-4.4.0-65.86. Most of the commits didn't have any conflicts, and the ones that did were simple to resolve.

I will attach the backported patches to this bug report when all of the patches that we need are accepted upstream.

--
? https://marc.info/?l=linux-block&m=148604743126356&w=2
? https://marc.info/?l=linux-block&m=148769705431982&w=2

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-03-13 15:22 EDT-------
Small update:

The following commit is not actually necessary. It fixes code which is not present in v4.4:

e17354961bb5 zram_drv: update for backing dev info changes

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-03-20 03:23 EDT-------
Hi

Tested the 4.4kernel fix given on lucky05 and the fix is working fine.

Waiting for 4.11 new kernel fix

Thanks
Lekshmi

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-03-28 05:43 EDT-------
Hi

I applied the fix on lucky04 and started the tests.
root@lucky04:/Blast# uname -r
4.11.0-rc4blockshutdownracesv5+
root@lucky04:/Blast#

Will update after 72 hours of run.

Thanks
Lekshmi

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-04-03 05:39 EDT-------
*** Bug 153091 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote : backported patches

------- Comment on attachment From <email address hidden> 2017-04-05 12:17 EDT-------

Hello Canonical,

Here is an mbox file that can be used with git-am, containing all patches needed to fix this bug backported on top of tag Ubuntu-4.4.0-67.88. Most of the commits didn't have any conflicts, and the ones that did were simple to resolve. Each patch mentions the hash of the upstream commit it was cherry-picked from.

Lekshmi ran an I/O stress-test on them for more than 72h and didn't find any problem.

To fix this bug it's necessary to fix several issues with lifetime of structures in the block subsystem, as well as a number of related race conditions. Therefore, a number of patches needed to be backported:

The "BDI lifetime fix" v3 series?, which was already merged into Linus' tree:

f44f1ab5a2dc block: Unhash block device inodes on gendisk destruction
dc3b17cc8bf2 block: Use pointer to backing_dev_info from request_queue
d03f6cdc1fc4 block: Dynamically allocate and refcount backing_dev_info
b1d2dc5659b4 block: Make blk_get_backing_dev_info() safe without open bdev
efa7c9f97e3e block: Get rid of blk_get_backing_dev_info()

A follow up fix was merged for commits dc3b17cc8bf and b1d2dc5659b4:

a5a79d00017c block: Initialize bd_bdi on inode initialization

The first 4 patches from the "block: Fix block device shutdown related races" v2 series? were also merged:

4b8c861a7c79 block: Move bdev_unhash_inode() after invalidate_partition()
d06e05c026ab block: Unhash also block device inode for the whole device
cccd9fb9ec96 block: Revalidate i_bdev reference in bd_aquire()
165a5e22fafb block: Move bdi_unregister() to del_gendisk()

A few follow up fixes? were also merged for commit 165a5e22fafb:

b6f8fec4448a block: Allow bdi re-registration
df23de55615f bdi: Fix use-after-free in wb_congested_put()
90f16fddcc28 block: Make del_gendisk() safer for disks without queues

Commit df23de55615f depends on the following commit:

5f478e4ea5c5 block: fix double-free in the failure path of cgwb_bdi_init()

In addition to the commits mentioned above, we also need the patches from the "block: Fix block device shutdown related races" v5, which are queued for v4.12 in linux-block/for-4.12/block?:

03e262798884 block: Fix bdi assignment to bdev inode when racing with disk delete
b7d680d7bf58 bdi: Mark congested->bdi as internal
810df54a64fb bdi: Make wb->bdi a proper reference
e8cb72b322cf bdi: Unify bdi->wb_list handling for root wb_writeback
5318ce7d4686 bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()
4514451e79ae bdi: Do not wait for cgwbs release in bdi_unregister()
b1c51afc00f1 bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister()
f759741d9d91 block: Fix oops in locked_inode_to_wb_and_lock_list()
c70c176ff8c3 kobject: Export kobject_get_unless_zero()
d01b2dcb441b block: Fix oops scsi_disk_get()

--
? https://marc.info/?l=linux-block&m=148604743126356&w=2
? https://marc.info/?l=linux-block&m=148769705431982&w=2
? https://marc.info/?l=linux-block&m=148905459526037&w=2
? https://marc.info/?l=linux-block&m=149023527505174&w=2

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-04-05 17:43 EDT-------
Hello Canonical,

This bug also affects Zesty Zapus. It was reproduced with kernel 4.10.0-15-generic.

Should we open a separate bug for that version or will this one track both?

Revision history for this message
bugproxy (bugproxy) wrote : gusg8 dl, host logs

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : backported patches

------- Comment on attachment From <email address hidden> 2017-04-05 12:17 EDT-------

Hello Canonical,

Here is an mbox file that can be used with git-am, containing all patches needed to fix this bug backported on top of tag Ubuntu-4.4.0-67.88. Most of the commits didn't have any conflicts, and the ones that did were simple to resolve. Each patch mentions the hash of the upstream commit it was cherry-picked from.

Lekshmi ran an I/O stress-test on them for more than 72h and didn't find any problem.

To fix this bug it's necessary to fix several issues with lifetime of structures in the block subsystem, as well as a number of related race conditions. Therefore, a number of patches needed to be backported:

The "BDI lifetime fix" v3 series?, which was already merged into Linus' tree:

f44f1ab5a2dc block: Unhash block device inodes on gendisk destruction
dc3b17cc8bf2 block: Use pointer to backing_dev_info from request_queue
d03f6cdc1fc4 block: Dynamically allocate and refcount backing_dev_info
b1d2dc5659b4 block: Make blk_get_backing_dev_info() safe without open bdev
efa7c9f97e3e block: Get rid of blk_get_backing_dev_info()

A follow up fix was merged for commits dc3b17cc8bf and b1d2dc5659b4:

a5a79d00017c block: Initialize bd_bdi on inode initialization

The first 4 patches from the "block: Fix block device shutdown related races" v2 series? were also merged:

4b8c861a7c79 block: Move bdev_unhash_inode() after invalidate_partition()
d06e05c026ab block: Unhash also block device inode for the whole device
cccd9fb9ec96 block: Revalidate i_bdev reference in bd_aquire()
165a5e22fafb block: Move bdi_unregister() to del_gendisk()

A few follow up fixes? were also merged for commit 165a5e22fafb:

b6f8fec4448a block: Allow bdi re-registration
df23de55615f bdi: Fix use-after-free in wb_congested_put()
90f16fddcc28 block: Make del_gendisk() safer for disks without queues

Commit df23de55615f depends on the following commit:

5f478e4ea5c5 block: fix double-free in the failure path of cgwb_bdi_init()

In addition to the commits mentioned above, we also need the patches from the "block: Fix block device shutdown related races" v5, which are queued for v4.12 in linux-block/for-4.12/block?:

03e262798884 block: Fix bdi assignment to bdev inode when racing with disk delete
b7d680d7bf58 bdi: Mark congested->bdi as internal
810df54a64fb bdi: Make wb->bdi a proper reference
e8cb72b322cf bdi: Unify bdi->wb_list handling for root wb_writeback
5318ce7d4686 bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()
4514451e79ae bdi: Do not wait for cgwbs release in bdi_unregister()
b1c51afc00f1 bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister()
f759741d9d91 block: Fix oops in locked_inode_to_wb_and_lock_list()
c70c176ff8c3 kobject: Export kobject_get_unless_zero()
d01b2dcb441b block: Fix oops scsi_disk_get()

--
? https://marc.info/?l=linux-block&m=148604743126356&w=2
? https://marc.info/?l=linux-block&m=148769705431982&w=2
? https://marc.info/?l=linux-block&m=148905459526037&w=2
? https://marc.info/?l=linux-block&m=149023527505174&w=2

Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
status: Incomplete → Triaged
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Thiago - can you note which patches are backports and which files had conflicts ?

Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Yakkety):
status: New → In Progress
assignee: nobody → Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Zesty):
assignee: Canonical Kernel Team (canonical-kernel-team) → Tim Gardner (timg-tpi)
status: Triaged → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : backported patches, mentioning conflicts

------- Comment on attachment From <email address hidden> 2017-04-06 14:11 EDT-------

I'm not sure I understood the question. All of the patches are backports.
These are the files that had conflicts:

block/blk-cgroup.c
block/blk-core.c
block/blk-settings.c
block/blk-sysfs.c
block/blk-wbt.c
block/compat_ioctl.c
block/ioctl.c
drivers/block/aoe/aoeblk.c
drivers/block/drbd/drbd_nl.c
drivers/md/dm.c
drivers/md/md.c
fs/block_dev.c
fs/btrfs/volumes.c
include/linux/backing-dev-defs.h
include/linux/blkdev.h

I'm also attaching a new mbox file with exactly the same patches, with the only difference being that the description of each patch mentions which files had conflicts when cherry-picking.

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
bugproxy (bugproxy)
tags: added: targetmilestone-inin16043
removed: targetmilestone-inin---
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Changed in linux (Ubuntu Xenial):
assignee: Tim Gardner (timg-tpi) → Kleber Sacilotto de Souza (kleber-souza)
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Hello IBM,

The fix for this issue has been included on linux kernel 4.4.0-78.99 for Xenial, which is about to be promoted from -proposed to -updates. Could you please verify if this kernel version fixes the issue?

Thank you.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.9 KiB)

This bug was fixed in the package linux - 4.4.0-78.99

---------------
linux (4.4.0-78.99) xenial; urgency=low

  * linux: 4.4.0-78.99 -proposed tracker (LP: #1686645)

  * Please backport fix to reference leak in cgroup blkio throttle
    (LP: #1683976)
    - block: fix module reference leak on put_disk() call for cgroups throttle

  * UbuntuKVM guest crashed while running I/O stress test with Ubuntu kernel
    4.4.0-47-generic (LP: #1659111)
    - block: Unhash block device inodes on gendisk destruction
    - block: Use pointer to backing_dev_info from request_queue
    - block: Dynamically allocate and refcount backing_dev_info
    - block: Make blk_get_backing_dev_info() safe without open bdev
    - block: Get rid of blk_get_backing_dev_info()
    - block: Move bdev_unhash_inode() after invalidate_partition()
    - block: Unhash also block device inode for the whole device
    - block: Revalidate i_bdev reference in bd_aquire()
    - block: Initialize bd_bdi on inode initialization
    - block: Move bdi_unregister() to del_gendisk()
    - block: Allow bdi re-registration
    - bdi: Fix use-after-free in wb_congested_put()
    - block: Make del_gendisk() safer for disks without queues
    - block: Fix bdi assignment to bdev inode when racing with disk delete
    - bdi: Mark congested->bdi as internal
    - bdi: Make wb->bdi a proper reference
    - bdi: Unify bdi->wb_list handling for root wb_writeback
    - bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()
    - bdi: Do not wait for cgwbs release in bdi_unregister()
    - bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister()
    - block: Fix oops in locked_inode_to_wb_and_lock_list()
    - kobject: Export kobject_get_unless_zero()
    - block: Fix oops scsi_disk_get()

  * Touchpad not working correctly after kernel upgrade (LP: #1662589)
    - Input: ALPS - fix V8+ protocol handling (73 03 28)

  * Xenial update to v4.4.62 stable release (LP: #1683728)
    - drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3
    - drm/i915: Stop using RP_DOWN_EI on Baytrail
    - usb: dwc3: gadget: delay unmap of bounced requests
    - mtd: bcm47xxpart: fix parsing first block after aligned TRX
    - MIPS: Introduce irq_stack
    - MIPS: Stack unwinding while on IRQ stack
    - MIPS: Only change $28 to thread_info if coming from user mode
    - MIPS: Switch to the irq_stack in interrupts
    - MIPS: Select HAVE_IRQ_EXIT_ON_IRQ_STACK
    - MIPS: IRQ Stack: Fix erroneous jal to plat_irq_dispatch
    - crypto: caam - fix RNG deinstantiation error checking
    - Linux 4.4.62

  * ifup service of network device stay active after driver stop (LP: #1672144)
    - net: use net->count to check whether a netns is alive or not

  * [Hyper-V] mkfs regression in kernel 4.4+ (LP: #1682215)
    - block: relax check on sg gap

  * [Feature] KBL: intel_powerclamp driver support (LP: #1591641)
    - thermal/powerclamp: remove cpu whitelist
    - thermal/powerclamp: correct cpu support check
    - thermal/powerclamp: add back module device table

  * sysfs channel reads of lps22hb pressure sensor are stale (LP: #1682103)
    - iio: st_pressure: initialize lps22hb bootime

  * Backlight control does no...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment on attachment From <email address hidden> 2017-04-06 14:11 EDT-------

I'm not sure I understood the question. All of the patches are backports.
These are the files that had conflicts:

block/blk-cgroup.c
block/blk-core.c
block/blk-settings.c
block/blk-sysfs.c
block/blk-wbt.c
block/compat_ioctl.c
block/ioctl.c
drivers/block/aoe/aoeblk.c
drivers/block/drbd/drbd_nl.c
drivers/md/dm.c
drivers/md/md.c
fs/block_dev.c
fs/btrfs/volumes.c
include/linux/backing-dev-defs.h
include/linux/blkdev.h

I'm also attaching a new mbox file with exactly the same patches, with the only difference being that the description of each patch mentions which files had conflicts when cherry-picking.

Changed in linux (Ubuntu Yakkety):
assignee: Tim Gardner (timg-tpi) → Kleber Sacilotto de Souza (kleber-souza)
Changed in linux (Ubuntu Zesty):
assignee: Tim Gardner (timg-tpi) → Kleber Sacilotto de Souza (kleber-souza)
anuj sachan (anuj72)
Changed in linux (Ubuntu Zesty):
status: In Progress → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-05-31 23:49 EDT-------
Hi

I am running tets for 10 days.No issues with the kernel
root@lucky05:/Blast# uname -r
4.4.0-78-generic
root@lucky05:/Blast#

Its working fine

Thanks
Lekshmi

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-06-09 16:08 EDT-------
Hello Lekshmi,

Do you need this fix in Ubuntu kernels 4.8 and 4.10 too, or just 4.4?

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Hi Lekshmi,

Thank you for testing the fix on 4.4.0-78.99.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-07-07 17:15 EDT-------
(In reply to comment #97)
> Hello Lekshmi,
>
> Do you need this fix in Ubuntu kernels 4.8 and 4.10 too, or just 4.4?

Yes, this bug exist with higher kernels also.
Please see LTC bug 152231 / LP 1702998 running 4.10 kernel.

Revision history for this message
bugproxy (bugproxy) wrote : lucky05 dmesg buffer

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : gusg8 dl, host logs

Default Comment by Bridge

Changed in linux (Ubuntu Yakkety):
status: In Progress → Won't Fix
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu):
assignee: Tim Gardner (timg-tpi) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.