Ubuntu17.04 KVM: Guest crashed @ xfs_perag_get_tag+0x6c

Bug #1678745 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Won't Fix
Critical
Breno Leitão
linux (Ubuntu)
Won't Fix
Critical
Canonical Kernel Team

Bug Description

== Comment: #0 - Lata Kuntal <email address hidden> - 2017-03-30 09:44:23 ==
Ubuntu 17.04 KVM guest gusg8 was having ubuntu 16.04.2 and was running stress test IO, Base,TCP and NFS.The guest is having XFS as rootFS and after running few hours of regression test it dropped at xmon.

Console logs :
============
root@guskvm:~# virsh console gusg8 --force
Connected to domain gusg8
Escape character is ^]

1:mon> r
R00 = d00000000288edf4 R16 = 00000000024200ca
R01 = c0000000378cb1f0 R17 = 0000000000000000
R02 = d000000002936080 R18 = 0000000000000020
R03 = 0000000000000001 R19 = c0000002734d1800
R04 = c0000000378cb190 R20 = 0000000000000000
R05 = 0000000000000000 R21 = 0000000000000000
R06 = 3c000000d03fe056 R22 = c00000027e26ccf0
R07 = 0000000000000000 R23 = 0000000000000000
R08 = c0000000048492d0 R24 = 0000000000000000
R09 = 3c000000d03fe056 R25 = 0000000000000000
R10 = 3c000000d03fe062 R26 = 000000024df4cd49
R11 = d0000000028fa360 R27 = 0000000000000000
R12 = 0000000000000000 R28 = d0000000028ac7b0
R13 = c00000000fb80900 R29 = c000000004849000
R14 = 0000000000000000 R30 = 0000000000000000
R15 = c00000000137ad08 R31 = 0000000000000000
pc = d00000000288ee0c xfs_perag_get_tag+0x6c/0x170 [xfs]
cfar= c00000000096a494 perf_trace_mmc_request_start+0x104/0x440
lr = d00000000288edf4 xfs_perag_get_tag+0x54/0x170 [xfs]
msr = 800000010280b033 cr = 82428424
ctr = c0000000005e4950 xer = 0000000020000000 trap = 300
dar = 3c000000d03fe062 dsisr = 40000000
1:mon> t
[c0000000378cb250] d0000000028ac7b0 xfs_reclaim_inodes_count+0x70/0xa0 [xfs]
[c0000000378cb290] d0000000028c0ea8 xfs_fs_nr_cached_objects+0x28/0x40 [xfs]
[c0000000378cb2b0] c0000000003292d8 super_cache_count+0x68/0x120
[c0000000378cb2f0] c000000000271530 shrink_slab.part.14+0x150/0x4f0
[c0000000378cb430] c000000000276db8 shrink_node+0x158/0x3f0
[c0000000378cb4f0] c000000000277178 do_try_to_free_pages+0x128/0x460
[c0000000378cb590] c0000000002775ac try_to_free_pages+0xfc/0x280
[c0000000378cb620] c000000000260158 __alloc_pages_nodemask+0x758/0xe30
[c0000000378cb7e0] c0000000002dbb98 alloc_pages_vma+0x108/0x360
[c0000000378cb880] c00000000029d080 wp_page_copy+0xf0/0x9d0
[c0000000378cb920] c0000000002a0770 do_wp_page+0x210/0xb20
[c0000000378cb9b0] c0000000002a656c handle_mm_fault+0x9cc/0x14c0
[c0000000378cba60] c000000000b511a0 do_page_fault+0x260/0x7d0
[c0000000378cbb10] c000000000008948 handle_page_fault+0x10/0x30
--- Exception: 301 (Data Access) at c00000000010aec4 schedule_tail+0x84/0xb0
[c0000000378cbe30] c000000000009844 ret_from_fork+0x4/0x54
--- Exception: c00 (System Call) at 00003fffa2b5bf44
1:mon> d
0000000000000000 **************** **************** | |
1:mon> c
cpus stopped: 0x0-0x3
1:mon>

Kernel host build
=============
root@guskvm:~# uname -r
4.10.0-13-generic
root@guskvm:~#

== Comment: #1 - Luciano Chavez <email address hidden> - 2017-03-30 10:42:15 ==
At first glance, based on the following assembly from around the failure point:

d00000000288edd4 38c00001 li r6,1
d00000000288edd8 7f8802a6 mflr r28
d00000000288eddc 78a70020 clrldi r7,r5,32
d00000000288ede0 7c7d1b78 mr r29,r3
d00000000288ede4 7c852378 mr r5,r4
d00000000288ede8 386302c8 addi r3,r3,712
d00000000288edec 38810020 addi r4,r1,32
d00000000288edf0 4806b571 bl d0000000028fa360 # exit_xfs_fs+0x180c/0xfd44 [xfs]
d00000000288edf4 e8410018 ld r2,24(r1)
d00000000288edf8 2f830000 cmpwi cr7,r3,0
d00000000288edfc 409d0104 ble cr7,d00000000288ef00 # xfs_perag_get_tag+0x160/0x170 [xfs]
d00000000288ee00 7c0004ac sync
d00000000288ee04 e9210020 ld r9,32(r1)
d00000000288ee08 3949000c addi r10,r9,12
d00000000288ee0c 7fc05028 lwarx r30,0,r10
d00000000288ee10 33de0001 addic r30,r30,1
d00000000288ee14 7fc0512d stwcx. r30,0,r10

I believe the crash in fs_perag_get_tag() is after we come back from the radix_tree_gang_lookup_tag() call and are attempting the atomic_inc_return() and struct xfs_perag *pag is R09 = 3c000000d03fe056 which is invalid.

 85 rcu_read_lock();
 86 found = radix_tree_gang_lookup_tag(&mp->m_perag_tree,
 87 (void **)&pag, first, 1, tag);
 88 if (found <= 0) {
 89 rcu_read_unlock();
 90 return NULL;
 91 }
 92 ref = atomic_inc_return(&pag->pag_ref);

Revision history for this message
bugproxy (bugproxy) wrote : host(dmesg,var/log/syslog) guest(xmon & dl logs)

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-153040 severity-critical targetmilestone-inin1704
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
summary: - Ubuntu KVM guest crashed @ xfs_perag_get_tag+0x6c
+ Ubuntu17.04 KVM: Guest crashed @ xfs_perag_get_tag+0x6c
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-04-10 19:38 EDT-------
Since we don't have an XFS developer in-house, does Canonical have any suggestions on what kernel config options or debug facilities we can use in the event this is reproducible?

Manoj Iyer (manjo)
tags: added: ubuntu-17.04
Manoj Iyer (manjo)
Changed in linux (Ubuntu):
importance: Undecided → Critical
Changed in ubuntu-power-systems:
importance: Undecided → Critical
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.13 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc1/

tags: added: kernel-da-key
Manoj Iyer (manjo)
tags: added: triage-g
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
status: New → Incomplete
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
assignee: Canonical Kernel Team (canonical-kernel-team) → Breno Leitão (breno-leitao)
Revision history for this message
Manoj Iyer (manjo) wrote :

IBM, Could you please test the kernel mentioned in comment #3 ?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-11-10 09:03 EDT-------
(In reply to comment #11)
> IBM, Could you please test the kernel mentioned in comment #3 ?

Hi Manoj,

Test team is trying to find an available system to recreate with but it may take a while as they have most of their machines tied up with ongoing tests. We appreciate your patience.

Revision history for this message
Manoj Iyer (manjo) wrote :

4.10 was replaced with 4.13 linux-hwe, please retest and reopen this bug if you are able to reproduce.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
Changed in ubuntu-power-systems:
status: Incomplete → Won't Fix
bugproxy (bugproxy)
tags: removed: bugnameltc-153040 kernel-da-key severity-critical triage-g ubuntu-17.04
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.