ISST-LTE:KVM:Ubuntu1804:BostonLC:boslcp3g1: Migration guest running with IO stress crashed@security_file_permission+0xf4/0x160.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Invalid
|
Medium
|
Canonical Kernel Team | ||
linux (Ubuntu) |
Invalid
|
Medium
|
Canonical Kernel Team | ||
Bionic |
Invalid
|
Medium
|
Canonical Kernel Team |
Bug Description
Problem Description: Migration Guest running with IO stress crashed@
Steps to re-create:
Source host - boslcp3
Destination host - boslcp4
1.boslcp3 & boslcp4 installed with latest kernel
root@boslcp3:~# uname -a
Linux boslcp3 4.15.0-20-generic #21+bug166588 SMP Thu Apr 26 15:05:59 CDT 2018 ppc64le ppc64le ppc64le GNU/Linux
root@boslcp3:~#
root@boslcp4:~# uname -a
Linux boslcp4 4.15.0-20-generic #21+bug166588 SMP Thu Apr 26 15:05:59 CDT 2018 ppc64le ppc64le ppc64le GNU/Linux
root@boslcp3:~#
2. Installed guest boslcp3g1 with kernel and started LTP run from boslcp3 host
root@boslcp3g1:~# uname -a
Linux boslcp3g1 4.15.0-15-generic #16+bug166877 SMP Wed Apr 18 14:47:30 CDT 2018 ppc64le ppc64le ppc64le GNU/Linux
3. Started migrating boslcp3g1 guest from source to destination & viceversa.
4. After couple of migrations it crashed at boslcp4 & enters into xmon
8:mon> t
[c0000004f8a23d20] c0000000005a7674 security_
[c0000004f8a23d60] c0000000003d1d30 rw_verify_
[c0000004f8a23d90] c0000000003d375c vfs_read+0x8c/0x1b0
[c0000004f8a23de0] c0000000003d3d88 SyS_read+0x68/0x110
[c0000004f8a23e30] c00000000000b184 system_
--- Exception: c01 (System Call) at 000071f1779fe280
SP (7fffe99ece50) is in userspace
8:mon> S
msr = 8000000000001033 sprg0 = 0000000000000000
pvr = 00000000004e1202 sprg1 = c000000007a85800
dec = 00000000591e3e03 sprg2 = c000000007a85800
sp = c0000004f8a234a0 sprg3 = 0000000000010008
toc = c0000000016eae00 dar = 000000000000023c
srr0 = c0000000000c355c srr1 = 8000000000001033 dsisr = 40000000
dscr = 0000000000000000 ppr = 0010000000000000 pir = 00000011
amr = 0000000000000000 uamor = 0000000000000000
dpdes = 0000000000000000 tir = 0000000000000000 cir = 00000000
fscr = 0500000000000180 tar = 0000000000000000 pspb = 00000000
mmcr0 = 0000000080000000 mmcr1 = 0000000000000000 mmcr2 = 0000000000000000
pmc1 = 00000000 pmc2 = 00000000 pmc3 = 00000000 pmc4 = 00000000
mmcra = 0000000000000000 siar = 0000000000000000 pmc5 = 0000026c
sdar = 0000000000000000 sier = 0000000000000000 pmc6 = 00000861
ebbhr = 0000000000000000 ebbrr = 0000000000000000 bescr = 0000000000000000
iamr = 4000000000000000
pidr = 0000000000000034 tidr = 0000000000000000
cpu 0x8: Vector: 700 (Program Check) at [c0000004f8a23220]
pc: c0000000000e4854: xmon_core+
lr: c0000000000e4850: xmon_core+
sp: c0000004f8a234a0
msr: 8000000000041033
current = 0xc0000004f89faf00
paca = 0xc000000007a85800 softe: 0 irq_happened: 0x01
pid = 24028, comm = top
Linux version 4.15.0-20-generic (buildd@
cpu 0x8: Exception 700 (Program Check) in xmon, returning to main loop
[c0000004f8a23d20] c0000000005a7674 security_
[c0000004f8a23d60] c0000000003d1d30 rw_verify_
[c0000004f8a23d90] c0000000003d375c vfs_read+0x8c/0x1b0
[c0000004f8a23de0] c0000000003d3d88 SyS_read+0x68/0x110
[c0000004f8a23e30] c00000000000b184 system_
--- Exception: c01 (System Call) at 000071f1779fe280
SP (7fffe99ece50) is in userspace
8:mon> r
R00 = c00000000043b7fc R16 = 0000000000000000
R01 = c0000004f8a23c90 R17 = ffffffffffffff70
R02 = c0000000016eae00 R18 = 00000a51b4bebfc8
R03 = c000000279557200 R19 = 00007fffe99edbb0
R04 = c0000003242499c0 R20 = 00000a51b4c04db0
R05 = 0000000000020000 R21 = 00000a51b4c20e90
R06 = 0000000000000004 R22 = 0000000000040f00
R07 = ffffff8100000000 R23 = 00000a51b4c06560
R08 = ffffff8000000000 R24 = ffffffffffffff80
R09 = 0000000000000000 R25 = 00000a51b4bec2b8
R10 = 0000000000000000 R26 = 000071f177bb0b20
R11 = 0000000000000000 R27 = 0000000000000000
R12 = 0000000000002000 R28 = c000000279557200
R13 = c000000007a85800 R29 = c0000004c7734210
R14 = 0000000000000000 R30 = 0000000000000000
R15 = 0000000000000000 R31 = c0000003242499c0
pc = c00000000043b808 __fsnotify_
cfar= c0000000003f9e78 dget_parent+
lr = c00000000043b7fc __fsnotify_
msr = 8000000000009033 cr = 28002222
ctr = c0000000006252b0 xer = 0000000000000000 trap = 300
dar = 000000000000023c dsisr = 40000000
8:mon> e
cpu 0x8: Vector: 300 (Data Access) at [c0000004f8a23a10]
pc: c00000000043b808: __fsnotify_
lr: c00000000043b7fc: __fsnotify_
sp: c0000004f8a23c90
msr: 8000000000009033
dar: 23c
dsisr: 40000000
current = 0xc0000004f89faf00
paca = 0xc000000007a85800 softe: 0 irq_happened: 0x01
pid = 24028, comm = top
Linux version 4.15.0-20-generic (buildd@
6. Guest enters into xmon after migrating from boslcp3 to boslcp4.
>
> 8:mon> t
> [c0000004f8a23d20] c0000000005a7674 security_
> [c0000004f8a23d60] c0000000003d1d30 rw_verify_
> [c0000004f8a23d90] c0000000003d375c vfs_read+0x8c/0x1b0
> [c0000004f8a23de0] c0000000003d3d88 SyS_read+0x68/0x110
> [c0000004f8a23e30] c00000000000b184 system_
> --- Exception: c01 (System Call) at 000071f1779fe280
> SP (7fffe99ece50) is in userspace
> 8:mon> r
> R00 = c00000000043b7fc R16 = 0000000000000000
> R01 = c0000004f8a23c90 R17 = ffffffffffffff70
> R02 = c0000000016eae00 R18 = 00000a51b4bebfc8
> R03 = c000000279557200 R19 = 00007fffe99edbb0
> R04 = c0000003242499c0 R20 = 00000a51b4c04db0
> R05 = 0000000000020000 R21 = 00000a51b4c20e90
> R06 = 0000000000000004 R22 = 0000000000040f00
> R07 = ffffff8100000000 R23 = 00000a51b4c06560
> R08 = ffffff8000000000 R24 = ffffffffffffff80
> R09 = 0000000000000000 R25 = 00000a51b4bec2b8
> R10 = 0000000000000000 R26 = 000071f177bb0b20
> R11 = 0000000000000000 R27 = 0000000000000000
> R12 = 0000000000002000 R28 = c000000279557200
> R13 = c000000007a85800 R29 = c0000004c7734210
> R14 = 0000000000000000 R30 = 0000000000000000
> R15 = 0000000000000000 R31 = c0000003242499c0
> pc = c00000000043b808 __fsnotify_
> cfar= c0000000003f9e78 dget_parent+
> lr = c00000000043b7fc __fsnotify_
> msr = 8000000000009033 cr = 28002222
> ctr = c0000000006252b0 xer = 0000000000000000 trap = 300
> dar = 000000000000023c dsisr = 40000000
> BUG_ON in jbd2_journal_
I've included xmon crash data from a more recent crash, this time a BUG_ON in fs/jbd2/
int jbd2_journal_
{
int need_copy_out = 0;
int done_copy_out = 0;
int do_escape = 0;
char *mapped_data;
struct buffer_head *new_bh;
struct page *new_page;
unsigned int new_offset;
struct buffer_head *bh_in = jh2bh(jh_in);
journal_t *journal = transaction-
/*
* The buffer really shouldn't be locked: only the current committing
* transaction is allowed to write it, so nobody else is allowed
* to do any IO.
*
* akpm: except if we're journalling data, and write() output is
* also part of a shared mapping, and another thread has
* decided to launch a writepage() against this buffer.
*/
This is not the same as the original bug, but I suspect they are part of a class of issues we're hitting while running under very particular circumstances which might not generally be seen during normal operation and triggering various corner cases. As such I think it makes sense to group them under this bug for the time being.
The general workload is running IO-heavy disk workloads on large guests (20GB memory, 16 vcpus) with SAN-based storage, and then performing migration during the workload. During migration we begin to see a high occurrence of rcu_sched stall warnings, and after 1-3 hours of operations we hit filesystem-related crashes like the ones posted. We've seen this with 2 separate FC cards, emulex and qlogic, where we invoke QEMU through libvirt as:
C_ALL=C PATH=/usr/
I will attach the libvirt XML separately
IBM is requesting some general filesystem skills from Canonical if they have some as we continue debugging...
Changed in ubuntu-power-systems: | |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
importance: | Undecided → High |
tags: | added: triage-g |
tags: | added: kernel-da-key |
description: | updated |
Changed in ubuntu-power-systems: | |
status: | New → Triaged |
Changed in ubuntu-power-systems: | |
status: | Triaged → Incomplete |
tags: | added: p9 |
Changed in linux (Ubuntu): | |
assignee: | Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team) |
importance: | Undecided → High |
Changed in ubuntu-power-systems: | |
status: | Incomplete → Triaged |
tags: |
added: triage-a removed: triage-g |
Changed in linux (Ubuntu): | |
status: | New → Triaged |
Changed in linux (Ubuntu Bionic): | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in ubuntu-power-systems: | |
status: | Triaged → Incomplete |
Changed in linux (Ubuntu): | |
status: | Triaged → Incomplete |
Changed in linux (Ubuntu Bionic): | |
status: | Triaged → Incomplete |
Changed in ubuntu-power-systems: | |
importance: | High → Medium |
Changed in linux (Ubuntu): | |
assignee: | Joseph Salisbury (jsalisbury) → Canonical Kernel Team (canonical-kernel-team) |
Changed in linux (Ubuntu Bionic): | |
assignee: | Joseph Salisbury (jsalisbury) → Canonical Kernel Team (canonical-kernel-team) |
Default Comment by Bridge