ISST-LTE:pKVM311:lotg5:Ubutu16041:lotg5 crashed @ writeback_sb_inodes+0x30c/0x590
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Undecided
|
Tim Gardner | ||
Yakkety |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
== Comment: #0 - PRIYA M. A <email address hidden> - 2016-06-17 10:01:28 ==
Problem Description:
================
- lotg5 crashed at writeback_
Steps to re-create:
==============
- Install lotg5 with Ubuntu16041(
- Start the regression tests in lotg5
Logs:
====
root@lotg5:~# show.report.py
HOSTNAME KERNEL VERSION DISTRO INFO
-------- ---------------- -----------
lotg5 4.4.0-24-generic Ubuntu 16.04 LTS \n \l
######## Current Time: Fri Jun 17 01:10:46 2016 ########
Job-ID FOCUS Start-Time Duration Function
------ ----- ---------- -------- --------
1 BASE 20160614-05:50:19 67.0 hr(s) 20.0 min(s) Test
2 IO 20160614-05:50:26 67.0 hr(s) 20.0 min(s) IO_Focus
3 NFS 20160614-06:24:35 66.0 hr(s) 46.0 min(s) DistributeFS_
4 TCP 20160614-06:32:03 66.0 hr(s) 38.0 min(s) networkTest2lotg3
FOCUS BASE IO NFS TCP SUM
TOTAL 48647 1825 517 82690 133679
FAIL 5028 0 0 24 5052
PASS 43619 1825 517 82666 128627
(%) (89%) (100%) (100%) (99%) (96%)
DLPAR is not tested!
root@lotg5:~#
- After 65+ hr of execution lotg5 crashed with follwoing call traces
Logs:
====
[root@lotkvm ~]# virsh console lotg5
Connected to domain lotg5
Escape character is ^]
0:mon> c
cpus stopped: 0x0 0x4 0x8 0xc
0:mon> d
0000000000000000 **************** **************** | |
0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c0000000c4f4b620]
pc: c000000000323720: locked_
lr: c000000000326dbc: writeback_
sp: c0000000c4f4b8a0
msr: 8000000100009033
dar: 0
dsisr: 40000000
current = 0xc00000017191cf60
paca = 0xc000000007b40000 softe: 0 irq_happened: 0x01
pid = 5792, comm = kworker/u32:5
0:mon> t
[c0000000c4f4b900] c000000000326dbc writeback_
[c0000000c4f4ba10] c000000000327124 __writeback_
[c0000000c4f4ba70] c00000000032758c wb_writeback+
[c0000000c4f4bb40] c00000000032803c wb_workfn+
[c0000000c4f4bc50] c0000000000dd1d0 process_
[c0000000c4f4bce0] c0000000000dd724 worker_
[c0000000c4f4bd80] c0000000000e61e0 kthread+0x110/0x130
[c0000000c4f4be30] c000000000009538 ret_from_
--- Exception: 0 at 0000000000000000
0:mon>
== Comment: #4 - Chandan Kumar <email address hidden> - 2016-06-20 06:23:33 ==
dmesg log:
-------------
[251403.003999] EXT4-fs (loop0): mounted filesystem without journal. Opts: (null)
[251403.471118] Unable to handle kernel paging request for data at address 0x00000000
[251403.473391] Faulting instruction address: 0xc000000000323720 << ---- PC
-------------
0:mon> di c000000000323720
c000000000323720 e93f0000 ld r9,0(r31)
// [R31 = 0000000000000000, trying to de-reference null address]
c000000000323724 39290050 addi r9,r9,80
c000000000323728 7fbf4840 cmpld cr7,r31,r9
====
Dominic,
Can you please take a look and assign this to suitable developer.
Thanks,
Chandan
== Comment: #6 - Laurent Dufour <email address hidden> - 2016-06-20 13:03:15 ==
It sounds that inode->i_wb has been cleared while waiting for IO to be dropped in writeback_
That's need to be double checked...
== Comment: #10 - Laurent Dufour <email address hidden> - 2016-06-21 05:11:35 ==
That seems to be an already known issue raised by commit 43d1c0eb7e11 "block: detach bdev inode from its wb in __blkdev_put()".
There is a patch pushed on the lkml but there is still on going discussion about it :
https:/
https:/
== Comment: #13 - Laurent Dufour <email address hidden> - 2016-06-22 03:29:00 ==
It appears that the right way to fix that would be https:/
I may build a patched ubuntu kernel on your node and you may restart the test again.
Do you agree ?
== Comment: #14 - PRIYA M. A <email address hidden> - 2016-06-22 03:44:00 ==
Sure Laurent. lotg5 is being installed. Will update this bug once installation is complete so that you can apply on lotg5 and I will start tests in it
== Comment: #16 - Laurent Dufour <email address hidden> - 2016-06-22 06:21:05 ==
root@lotg5:~# uname -a
Linux lotg5 4.4.0-24-generic #43+ldu SMP Wed Jun 22 03:24:05 CDT 2016 ppc64le ppc64le ppc64le GNU/Linux
The patch kernel (#43+ldu) is installed in place of the ubuntu one and is running on lotg5.
Please give it a try...
== Comment: #19 - PRIYA M. A <email address hidden> - 2016-06-29 02:33:54 ==
- Issue is not seen at lotg5
== Comment: #21 - Laurent Dufour <email address hidden> - 2016-07-12 12:01:00 ==
(In reply to comment #20)
> (In reply to comment #19)
> > - Issue is not seen at lotg5
>
> Can we close this bug then?
I would prefer waiting for the patch mentioned in comment #13 to be accepted upstream.
I'll update this bug once this done.
== Comment: #22 - Laurent Dufour <email address hidden> - 2016-07-25 08:00:20 ==
I asked on the mailing list why the patch mentioned in comment #13 is not yet upstream.
I'll update the bug once I got a reply.
== Comment: #23 - Laurent Dufour <email address hidden> - 2016-07-26 10:27:34 ==
The patch has been applied on the linux-fsdevel tree, it is on the way to be applied in 4.8.
I think this can now be closed
== Comment: #24 - Laurent Dufour <email address hidden> - 2016-07-26 10:30:14 ==
For the record: https:/
== Comment: #29 - Laurent Dufour <email address hidden> - 2016-08-18 09:14:41 ==
The patch is now part of the kernel 4.8-rc1.
It would have to be backported to 16.04.
== Comment: #31 - Laurent Dufour <email address hidden> - 2016-08-18 09:16:25 ==
Requesting mirroring to get the kernel commit dc5ff2b1d66f21c
Changed in linux (Ubuntu Xenial): | |
status: | In Progress → Fix Committed |
tags: |
added: targetmilestone-inin16041 removed: targetmilestone-inin--- |
tags: |
added: verification-done-xenial removed: verification-needed-xenial |
Default Comment by Bridge