Ubuntu 16.10: System hangs after crash on Ubuntu KVM guest.

Bug #1635063 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

---Problem Description---

Ubuntu 16.10: System hangs after crash on Ubuntu KVM guest.

---Steps to Reproduce---

1) apt-get install linux-crashdump
2) increase crashdump size:
sudo vim /etc/default/grub.d/kexec-tools.cfg

GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M"

3) sudo update-grub ; reboot the machine
4) sudo sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools
5) kdump-config show
6) echo "c" > /proc/sysrq-trigger

Logs
====
root@ubuntu:/var/crash# uname -a
Linux ubuntu 4.8.0-17-generic #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
root@ubuntu:/var/crash# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr:
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.8.0-17-generic
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.8.0-17-generic
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.8.0-17-generic root=UUID=70f3c690-fe90-444d-a74c-71c05eef8b0e ro splash quiet irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
root@ubuntu:/var/crash# echo c > /proc/sysrq-trigger
[ 202.677946] sysrq: SysRq : Trigger a crash
[ 202.678018] Unable to handle kernel paging request for data at address 0x00000000
[ 202.678098] Faulting instruction address: 0xc0000000006de134
[ 202.678169] Oops: Kernel access of bad area, sig: 11 [#1]
[ 202.678222] SMP NR_CPUS=2048 NUMA pSeries
[ 202.678281] Modules linked in: vmx_crypto ip_tables x_tables autofs4 ibmvscsi crc32c_vpmsum 8139too 8139cp mii
[ 202.678465] CPU: 3 PID: 1992 Comm: bash Not tainted 4.8.0-17-generic #19-Ubuntu
[ 202.678547] task: c000000004c5ce00 task.stack: c00000000428c000
[ 202.678612] NIP: c0000000006de134 LR: c0000000006df218 CTR: c0000000006de100
[ 202.678716] REGS: c00000000428f990 TRAP: 0300 Not tainted (4.8.0-17-generic)
[ 202.678796] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28242222 XER: 20000000
[ 202.678992] CFAR: c000000000014f84 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: c0000000006df218 c00000000428fc10 c0000000014e5e00 0000000000000063
GPR04: c00000007fecaca0 c00000007fedfb40 c00000000168d278 0000000000004b78
GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000001
GPR12: c0000000006de100 c000000007b81b00 ffffffffffffffff 0000000022000000
GPR16: 0000000010170dc8 0000010020c60488 0000000010140f58 00000000100c7570
GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608
GPR24: 00003fffd5f377c4 0000000000000001 c0000000013fe5d0 0000000000000004
GPR28: c0000000013fe990 0000000000000063 c0000000013b2590 0000000000000000
[ 202.680090] NIP [c0000000006de134] sysrq_handle_crash+0x34/0x50
[ 202.680159] LR [c0000000006df218] __handle_sysrq+0xe8/0x280
[ 202.680213] Call Trace:
[ 202.680246] [c00000000428fc10] [c000000000e79720] _fw_tigon_tg3_bin_name+0x2f1a8/0x36f48 (unreliable)
[ 202.680356] [c00000000428fc30] [c0000000006df218] __handle_sysrq+0xe8/0x280
[ 202.680440] [c00000000428fcd0] [c0000000006df9c8] write_sysrq_trigger+0x78/0xa0
[ 202.680535] [c00000000428fd00] [c0000000003cf890] proc_reg_write+0xb0/0x110
[ 202.680618] [c00000000428fd50] [c00000000032b5dc] __vfs_write+0x6c/0xe0
[ 202.680702] [c00000000428fd90] [c00000000032cae4] vfs_write+0xd4/0x240
[ 202.680783] [c00000000428fde0] [c00000000032e7fc] SyS_write+0x6c/0x110
[ 202.680867] [c00000000428fe30] [c000000000009584] system_call+0x38/0xec
[ 202.680948] Instruction dump:
[ 202.680991] 38427d00 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d22001a 3949d1e0
[ 202.681133] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6
[ 202.681276] ---[ end trace 3e9cbc319000fff4 ]---
[ 202.684658]
[ 202.684718] Sending IPI to other CPUs
[ 202.686765] IPI complete
I'm in purgatory
 -> smp_release_cpus()
spinning_secondaries = 3
 <- smp_release_cpus()
Linux ppc64le
#19-Ubuntu SMP S[ 242.661649] INFO: task swapper/0:1 blocked for more than 120 seconds.
[ 242.661708] Not tainted 4.8.0-17-generic #19-Ubuntu
[ 242.661746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.493648] INFO: task swapper/0:1 blocked for more than 120 seconds.
[ 363.493712] Not tainted 4.8.0-17-generic #19-Ubuntu
[ 363.493755] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 484.325651] INFO: task swapper/0:1 blocked for more than 120 seconds.
[ 484.325704] Not tainted 4.8.0-17-generic #19-Ubuntu
[ 484.325738] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 605.157653] INFO: task swapper/0:1 blocked for more than 120 seconds.
[ 605.157712] Not tainted 4.8.0-17-generic #19-Ubuntu
[ 605.157750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 725.989649] INFO: task swapper/0:1 blocked for more than 120 seconds.

Observing hang in scheduler while booting kdump kernel with "nr_cpus=1" on KVM guest.

    [c00000001a06aeb0] [c00000000801ac1c] __switch_to+0x29c/0x420
    [c00000001a06af10] [c000000008b8c414] __schedule+0x2f4/0x9b0
    [c00000001a06aff0] [c000000008b8cb18] schedule+0x48/0xc0

Need to debug further to find the root cause.

The issue is not seen with "maxcpus=1" parameter. As a workaround,
please use "maxcpus=1" instead of "nr_cpus=1" for other kdump tests..

Thanks
Hari

Revision history for this message
bugproxy (bugproxy) wrote : Sosreport after reboot

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-147161 severity-high targetmilestone-inin1610
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : kdump kernel console with more prints enabled

------- Comment (attachment only) From <email address hidden> 2016-11-07 01:57 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-01-19 06:53 EDT-------
Hi Pavithra,

Use "noirqdistrib" instead of "irqpoll" since that is the expected parameter
for kdump kernel as already documented here:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/kdump/kdump.txt#n365

Please share the results for all the guests where the issue was reproducible.
(since 16.04.1, I guess)..

Thanks
Hari

Revision history for this message
bugproxy (bugproxy) wrote : 17.04 console log

------- Comment (attachment only) From <email address hidden> 2017-01-17 23:59 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-01-20 06:16 EDT-------
(In reply to comment #18)
> Hi Pavithra,
>
> Use "noirqdistrib" instead of "irqpoll" since that is the expected parameter
> for kdump kernel as already documented here:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/
> Documentation/kdump/kdump.txt#n365
>

"irqpoll" & "noirqdistrib" are not mutually exclusive after all. So, the fix here
would be to pass "noirqdistrib" for kdump kernel along with "irqpoll".

Hi Louis/Canonical,

This fix, to append "noirqdistrib" to kdump kernel on powerpc, is needed on
all active release versions (14.04.* to 17.04).

Thanks
Hari

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-01-23 10:48 EDT-------
*** Bug 150359 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-02-01 14:30 EDT-------
I have added the workaround to recommendations in our wiki page at:

https://wiki.ubuntu.com/ppc64el/Recommendations#Kdump_not_generating_crash_dump_file

Please check it out and let me know if it looks ok.

Cheers

Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → nobody
Revision history for this message
Manoj Iyer (manjo) wrote :

Looks like the kernel option needs to be added when you setup kdump, and you have already updated the wiki pages to document this. I am marking this as fix released because there is nothing more to do from canonical's side.

Changed in linux (Ubuntu):
status: New → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-03-06 07:30 EDT-------
(In reply to comment #28)
> Looks like the kernel option needs to be added when you setup kdump, and you
> have already updated the wiki pages to document this. I am marking this as
> fix released because there is nothing more to do from canonical's side.

While this is documented, it would be good to get that sorted in package as well.
Similar request made for Zesty here: https://bugs.launchpad.net/bugs/1664552

Thanks
Hari

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.