ISST-KVM:Ubuntu1510:abakvm: abag3 guest hung while running stress test.

Bug #1543480 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Taco Screen team

Bug Description

== Comment: #0 ==
Issue
====
The Ubuntu1510 guest abag3 was running with stress test IO Base and TCP. After 40 hrs it hung , unable to perform ssh and unable to access its console but able to ping the guest. Xmon was enable for guest but it didn't drop there.

[root@abakvm ~]# virsh list --all
 Id Name State
----------------------------------------------------
 11 abag6 running
 23 abag5 running
 25 abag1 running
 27 abag2 running
 29 abag3 running
 30 abag4 running

Logs
===
[127112.624493] NIP [c0000000005183d8] __blk_mq_run_hw_queue+0x98/0x4c0
[127112.624953] LR [c0000000000d9320] process_one_work+0x1e0/0x560
[127112.625343] Call Trace:
[127112.625505] [c000000179adbb50] [c000000001578190] cur_cpu_spec+0x0/0x8 (unreliable)
[127112.626079] [c000000179adbc50] [c0000000000d9320] process_one_work+0x1e0/0x560
[127112.627154] [c000000179adbce0] [c0000000000d9830] worker_thread+0x190/0x660
[127112.627620] [c000000179adbd80] [c0000000000e2280] kthread+0x110/0x130
[127112.628084] [c000000179adbe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
[127112.628743] Instruction dump:
[127112.629080] 391e0188 eb1e0298 fb810088 fac10078 a12d0008 792ad182 792906a0 794a1f24
[127112.630112] 7d48502a 7d494c36 792907e0 69290001 <0b090000> e93e0080 792a07e1 408202c8
[127112.631318] ---[ end trace de34117cdb302980 ]---

XML of abag3
========
[root@abakvm ~]# virsh dumpxml abag3
<domain type='kvm' id='29'>
  <name>abag3</name>
  <uuid>4d73aa99-64f3-4d1d-af02-995abb29cfc5</uuid>
  <memory unit='KiB'>6291456</memory>
  <currentMemory unit='KiB'>6291456</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <vcpu placement='static'>48</vcpu>
  <cputune>
    <shares>4096</shares>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='ppc64' machine='pseries-2.4'>hvm</type>
    <boot dev='hd'/>
    <boot dev='network'/>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-model'>
    <model fallback='allow'>power8</model>
    <topology sockets='4' cores='3' threads='8'/>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>coredump-restart</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-system-ppc64</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/disk/by-id/dm-uuid-part3-mpath-1IBM_IPR-0_5EC08100000006A0'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/abag3_io1'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/abag3_io2'/>
      <backingStore/>
      <target dev='vdc' bus='virtio'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:c3:a0:b0'/>
      <source bridge='brenP1p5s0f1'/>
      <target dev='vnet2'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/10'/>
      <target type='isa-serial' port='0'/>
      <alias name='serial0'/>
      <address type='spapr-vio' reg='0x30000000'/>
    </serial>
    <console type='pty' tty='/dev/pts/10'>
      <source path='/dev/pts/10'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
      <address type='spapr-vio' reg='0x30000000'/>
    </console>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='none' model='selinux'/>
</domain>

SYSTEM INFORMATION
---------------------------------
HOST
KVM BUILD LEVEL: GA3 SP1 build 51
OPAL - FW840.10 - F4_1606A

Revision history for this message
bugproxy (bugproxy) wrote : abag3 logs received from VNC before hung

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-136711 severity-critical targetmilestone-inin1510
Revision history for this message
bugproxy (bugproxy) wrote : dmesg logs obtained fron host abakvm.

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : /var/log/messages of Host

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : QEMU abag3.logs

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1543480/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
bugproxy (bugproxy) wrote : abag3 logs received from VNC before hung

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : dmesg logs obtained fron host abakvm.

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : /var/log/messages of Host

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : QEMU abag3.logs

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-02-11 10:11 EDT-------
Hi Canonical,

IBM is still investigating.

Revision history for this message
bugproxy (bugproxy) wrote : dmesg logs when guest is up and running

------- Comment (attachment only) From <email address hidden> 2016-02-15 02:51 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-02-18 02:23 EDT-------
Hi,

The hung condition is not reproducible now.
The guest is working fine with other kernel builds.

Thanks

Revision history for this message
bugproxy (bugproxy) wrote : dmesg logs when guest is up and running

------- Comment (attachment only) From <email address hidden> 2016-02-15 02:51 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-03-01 01:30 EDT-------
Hi Chandan,

I am going to test this guest will the latest kernel build 55.

Thanks

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (3.5 KiB)

------- Comment From <email address hidden> 2016-03-01 21:49 EDT-------
Hi ,

Inside grub file I made xmon on, but it is not active even after updating grub and reboot the guest.

linux /boot/vmlinux-4.2.0-16-generic root=UUID=b7301650-03c4-44
983-a51b-ab0b061b50de ro splash xmon=on quiet crashkernel=384M-:128M

root@abag3:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinux-4.2.0-16-generic root=UUID=b7301650-03c4-4983-a51b-ab0b061b50de ro splash quiet

I tried to cause kernel panic, which shows that xmon is disable.After performing kernel panic , the guest started rebooting itself after few mins.
Details are give below:

root@abag3:~# echo c > /proc/sysrq-trigger
[ 1079.297301] sysrq: SysRq : Trigger a crash
[ 1079.297556] Unable to handle kernel paging request for data at address 0x00000000
[ 1079.297905] Faulting instruction address: 0xc00000000062f094
[ 1079.298482] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1079.298719] SMP NR_CPUS=2048 NUMA pSeries
[ 1079.298993] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache pseries_rng dm_multipath scsi_dh rtc_generic sunrpc autofs4 btrfs xor raid6_pq
[ 1079.300169] CPU: 25 PID: 2379 Comm: bash Not tainted 4.2.0-30-generic #36-Ubuntu
[ 1079.300589] task: c000000169825140 ti: c0000001698f0000 task.ti: c0000001698f0000
[ 1079.300957] NIP: c00000000062f094 LR: c000000000630148 CTR: c00000000062f060
[ 1079.301248] REGS: c0000001698f3990 TRAP: 0300 Not tainted (4.2.0-30-generic)
[ 1079.301537] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28242222 XER: 20000000
[ 1079.302264] CFAR: c000000000009958 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: c000000000630148 c0000001698f3c10 c00000000151af00 0000000000000063
GPR04: c00000017f848450 c00000017f859cd8 c00000017f180000 0000000000000165
GPR08: 0000000000000007 0000000000000001 0000000000000000 c00000017f1862a0
GPR12: c00000000062f060 c00000000fb4ed80 ffffffffffffffff 0000000022000000
GPR16: 0000000010170710 0000010010640278 00000000101406f0 00000000100c7100
GPR20: 0000000000000000 000000001017df98 0000000010140588 0000000000000000
GPR24: 0000000010153440 000000001017b848 c000000001464098 0000000000000004
GPR28: c000000001464458 0000000000000063 c000000001420cb4 0000000000000000
[ 1079.306339] NIP [c00000000062f094] sysrq_handle_crash+0x34/0x50
[ 1079.306598] LR [c000000000630148] __handle_sysrq+0xe8/0x270
[ 1079.306786] Call Trace:
[ 1079.306898] [c0000001698f3c10] [c000000000da0e50] _fw_tigon_tg3_bin_name+0x2cbb8/0x34c90 (unreliable)
[ 1079.307285] [c0000001698f3c30] [c000000000630148] __handle_sysrq+0xe8/0x270
[ 1079.307589] [c0000001698f3cd0] [c0000000006308e8] write_sysrq_trigger+0x78/0xa0
[ 1079.307926] [c0000001698f3d00] [c00000000035f710] proc_reg_write+0xb0/0x110
[ 1079.308219] [c0000001698f3d50] [c0000000002cb28c] __vfs_write+0x6c/0xe0
[ 1079.308514] [c0000001698f3d90] [c0000000002cbfc0] vfs_write+0xc0/0x230
[ 1079.308808] [c0000001698f3de0] [c0000000002ccffc] SyS_write+0x6c/0x110
[ 1079.309100] [c0000001698f3e30] [c000000000009204] system_call+0x38/0xb4
[ 1079.309383] Instruction dump:
[ 1079.309541] 3842bea0 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 3949dfe4
[ 1079.310085] 39200001 912a0000 ...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : host var/log/messages

------- Comment (attachment only) From <email address hidden> 2016-03-03 05:37 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : host dmesg logs

------- Comment (attachment only) From <email address hidden> 2016-03-03 05:38 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : qemu- abag3.logs for build 55

------- Comment (attachment only) From <email address hidden> 2016-03-03 05:40 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : dmesg of guest

------- Comment (attachment only) From <email address hidden> 2016-03-03 06:25 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : all required logs attched in the zip file

------- Comment on attachment From <email address hidden> 2016-03-08 01:31 EDT-------

As guest is in hug state , could only take abakvm (Host ) dmesg and var/log/messages

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (9.1 KiB)

------- Comment From <email address hidden> 2016-03-08 06:11 EDT-------
From console output -
[14104.304001] ------------[ cut here ]------------
[14104.304213] WARNING: at /build/linux-jJaVgZ/linux-4.2.0/block/blk-mq.c:758
[14104.304351] Modules linked in: lz4_compress rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache pseries_rng rtc_generic dm_multipath scsi_dh sunrpc autofs4 btrfs xor raid6_pq [last unloaded: zram]
[14104.305271] CPU: 12 PID: 2861 Comm: kworker/12:1H Not tainted 4.2.0-30-generic #36-Ubuntu
[14104.305544] Workqueue: kblockd blk_mq_run_work_fn
[14104.305699] task: c00000022ef6e590 ti: c000000231df0000 task.ti: c000000231df0000
[14104.305876] NIP: c000000000518b98 LR: c0000000000d9420 CTR: c0000000005194b0
[14104.306040] REGS: c000000231df38d0 TRAP: 0700 Not tainted (4.2.0-30-generic)
[14104.306229] MSR: 8000000100029033 <SF,EE,ME,IR,DR,RI,LE> CR: 24df2824 XER: 00000000
[14104.306651] CFAR: c0000000005194e8 SOFTE: 1
GPR00: c0000000000d9420 c000000231df3b50 c00000000151af00 c0000002317eac00
GPR04: 0000000000000320 c00000023f51cbd8 0000000000000000 0000000000000001
GPR08: c0000002317ead88 0000000000000001 0000010101010fff 0000000000000000
GPR12: 0000000000002200 c000000007b47200 c0000000000e2248 c000000004304480
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000001 0000000000000000 c000000231df3bc8 c00000000149f800
GPR24: c0000002317dce40 c0000000fc022430 0000000000000000 c00000023f51cf00
GPR28: c000000231df3bd8 c00000023f51cb00 c0000002317eac00 c0000002317eac88
[14104.308904] NIP [c000000000518b98] __blk_mq_run_hw_queue+0x98/0x4c0
[14104.309049] LR [c0000000000d9420] process_one_work+0x1e0/0x560
[14104.309168] Call Trace:
[14104.309332] [c000000231df3b50] [c000000001578190] cur_cpu_spec+0x0/0x8 (unreliable)
[14104.309507] [c000000231df3c50] [c0000000000d9420] process_one_work+0x1e0/0x560
[14104.309708] [c000000231df3ce0] [c0000000000d9930] worker_thread+0x190/0x660
[14104.309903] [c000000231df3d80] [c0000000000e2350] kthread+0x110/0x130
[14104.310056] [c000000231df3e30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
[14104.310228] Instruction dump:
[14104.310304] 391e0188 eb1e0298 fb810088 fac10078 a12d0008 792ad182 792906a0 794a1f24
[14104.310566] 7d48502a 7d494c36 792907e0 69290001 <0b090000> e93e0080 792a07e1 408202c8
[14104.310876] ---[ end trace 865a68f8e2647847 ]---

From the qemu logs -

2016-03-04T08:08:19.709047Z qemu-system-ppc64: unable to map backing store for hugepages: Cannot allocate memory
error: kvm run failed Bad address
NIP c000000000062824 LR c0000000008efac0 CTR 00000000000003d2 XER 0000000020000000 CPU#0
MSR 8000000000001033 HID0 0000000000000000 HF 0000000000000000 idx 1
TB 00000000 00000000 DECR 00000000
GPR00 00000000000003d2 c00000000151bdc0 c00000000151ae00 c00000023fff0b40
GPR04 0000000000000000 000000000000003c c00000023fff0b40 c0000000017b2f90
GPR08 000000023fff0b40 c000000000000000 c0000000017b2f30 0000000240000000
GPR12 0000000000002200 c00000000fb40000 000000007dd13d30 0000000000000060
GPR16 c000000000d03c58 c000000000cdb9b8 c000000000cdbc20 c000000000cdbc58
GPR20 c000000000cdbc80 c000000000cdcdc8 c000000000cdbc68 c000000000cdcd88
G...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : host var/log/messages

------- Comment (attachment only) From <email address hidden> 2016-03-03 05:37 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : host dmesg logs

------- Comment (attachment only) From <email address hidden> 2016-03-03 05:38 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : qemu- abag3.logs for build 55

------- Comment (attachment only) From <email address hidden> 2016-03-03 05:40 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : dmesg of guest

------- Comment (attachment only) From <email address hidden> 2016-03-03 06:25 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : all required logs attched in the zip file

------- Comment on attachment From <email address hidden> 2016-03-08 01:31 EDT-------

As guest is in hug state , could only take abakvm (Host ) dmesg and var/log/messages

Revision history for this message
Breno Leitão (breno-leitao) wrote :

Canonical,

From what we understand until this moment, there is a fix for this problem already upstream. It is a kernel commit 1356aae08338f1c19ce1c67bf8c543a267688fc3.

This was included in kernel starting at 4.3, so, we just need it in the 4.2 series.

affects: ubuntu → linux (Ubuntu)
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Breno - commit 1356aae08338f1c19ce1c67bf8c543a267688fc3 ('blk-mq: avoid setting hctx->tags->cpumask before allocation') was cherry-picked in Ubuntu-4.2.0-19.23 as part of the v4.2.4 stable update.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-03-08 13:05 EDT-------
> Breno - commit 1356aae08338f1c19ce1c67bf8c543a267688fc3 ('blk-mq: avoid
> setting hctx->tags->cpumask before allocation') was cherry-picked in
> Ubuntu-4.2.0-19.23 as part of the v4.2.4 stable update.

Thanks Tim

Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : dmesg logs when guest is up and running

------- Comment (attachment only) From <email address hidden> 2016-02-15 02:51 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : host var/log/messages

------- Comment (attachment only) From <email address hidden> 2016-03-03 05:37 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : host dmesg logs

------- Comment (attachment only) From <email address hidden> 2016-03-03 05:38 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : qemu- abag3.logs for build 55

------- Comment (attachment only) From <email address hidden> 2016-03-03 05:40 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : dmesg of guest

------- Comment (attachment only) From <email address hidden> 2016-03-03 06:25 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : all required logs attched in the zip file

------- Comment on attachment From <email address hidden> 2016-03-08 01:31 EDT-------

As guest is in hug state , could only take abakvm (Host ) dmesg and var/log/messages

Revision history for this message
Breno Leitão (breno-leitao) wrote :

Closing this bug since the patch was already integrated, and available in all the current supported releases.

Changed in linux (Ubuntu):
status: New → Fix Released
To post a comment you must log in.