ISST-LTE: pVM:high cpus number need a high crashkernel value in kdump

Bug #1560552 reported by bugproxy
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Tim Gardner
Xenial
Fix Released
High
Tim Gardner

Bug Description

The kdump kernel will be booted with "maxcpus=1", so no matter how many cpus this system has, only one cpu will be brought up during kdump I think. But on LPAR thymelp2, if we allocate 40 cpus on it, then kdump works just fine with 512M as crashkernel value. But if we allocate 200 cpus on it, kdump fails by trggering OOM. It has been proved that in latter situation we need 1.5G RAM reserved for kdump to let it works.

Contact Information = Ping Tian <email address hidden> Carrie <email address hidden>

---uname output---
Linux thymelp2 4.2.0-23-generic #28~14.04.1-Ubuntu SMP Thu Dec 31 13:41:19 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = 8247-22L, LPAR
 ---Debugger---
A debugger was configured, however the system did not enter into the debugger

---Steps to Reproduce---
 1. configure kdump with crashkernel=512M on thymelp2
2. allcoate 40 cpus on thymelp2 and trigger kdump
3. allocate 200 cpus on thymelp2 then do kdump again

*Additional Instructions for Ping Tian <email address hidden> Carrie <email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.

== Comment: #1 - Nadia N. Fry <email address hidden> - 2016-02-19 13:53:02 ==
Any update?

== Comment: #2 - Hari Krishna Bathini <email address hidden> - 2016-02-22 03:40:05 ==
Mahesh posted a patch upstream which should take care of this problem

http://patchwork.ozlabs.org/patch/577193/

Thanks
Hari

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-137281 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1560552/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-03-24 13:23 EDT-------
Hi Hari, Mahesh

> Mahesh posted a patch upstream which should take care of this problem
>
>http://patchwork.ozlabs.org/patch/577193/

I didn't see any review for this patch. It doesn't seem to be accept yet. Is there any indication it will be accepted, so we can ask Canonical to accept it for 16.04?

Thanks,
Breno

affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
status: New → Triaged
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Xenial):
assignee: Canonical Kernel Team (canonical-kernel-team) → Tim Gardner (timg-tpi)
status: Triaged → Fix Committed
bugproxy (bugproxy)
tags: added: targetmilestone-inin16041
removed: targetmilestone-inin---
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.5 KiB)

This bug was fixed in the package linux - 4.4.0-17.33

---------------
linux (4.4.0-17.33) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1563441

  * ISST-LTE: pVM:high cpus number need a high crashkernel value in kdump
    (LP: #1560552)
    - SAUCE: (noup) ppc64 boot: Wait for boot cpu to show up if nr_cpus limit is
      about to hit.

  * Predictable naming mechanism is leading to issues in DLPAR operations of
    NICs (LP: #1560514)
    - SAUCE: (noup) powerpc/pci: Assign fixed PHB number based on device-tree
      properties

  * ThunderX: support alternative phy implementations (LP: #1562968)
    - net: thunderx: Cleanup PHY probing code.
    - [Config] CONFIG_MDIO_CAVIUM=m
    - phy: mdio-octeon: Refactor into two files/modules
    - [Config] CONFIG_MDIO_THUNDER=m
    - phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.
    - phy: mdio-cavium: Add missing MODULE_* annotations.
    - net: cavium: For Kconfig THUNDER_NIC_BGX, select MDIO_THUNDER.
    - phy: mdio-thunder: Fix some Kconfig typos
    - [d-i] Add phy drivers for Cavium ThunderX to nic-modules udeb

  * linux: exclude ZONE_DEVICE from GFP_ZONE_TABLE (LP: #1563293)
    - Revert "mm: CONFIG_NR_ZONES_EXTENDED"
    - mm: exclude ZONE_DEVICE from GFP_ZONE_TABLE

  * lots of printk to serial console can hang system for long time
    (LP: #1534216)
    - printk: set may_schedule for some of console_trylock() callers

  * [i915_bpo] Update i915 backport driver (LP: #1560395)
    - SAUCE: i915_bpo: Update to drm-intel-next-fixes-2016-03-16
    - PM / runtime: Add new helper for conditional usage count incrementation
    - drm/core: Add drm_for_each_encoder_mask, v2.
    - drm/atomic-helper: Implement subsystem-level suspend/resume

  * [Hyper-V] VM Sockets (LP: #1541585)
    - Drivers: hv: vmbus: Cleanup vmbus_set_event()
    - Drivers: hv: vmbus: Add vendor and device atttributes
    - Drivers: hv: vmbus: avoid infinite loop in init_vp_index()
    - Drivers: hv: vmbus: avoid scheduling in interrupt context in vmbus_initiate_unload()
    - Drivers: hv: vmbus: don't manipulate with clocksources on crash
    - Drivers: hv: vmbus: add a helper function to set a channel's pending send size
    - Drivers: hv: vmbus: define the new offer type for Hyper-V socket (hvsock)
    - Drivers: hv: vmbus: vmbus_sendpacket_ctl: hvsock: avoid unnecessary signaling
    - Drivers: hv: vmbus: define a new VMBus message type for hvsock
    - Drivers: hv: vmbus: add a hvsock flag in struct hv_driver
    - Drivers: hv: vmbus: add a per-channel rescind callback
    - Drivers: hv: vmbus: add an API vmbus_hvsock_device_unregister()
    - Drivers: hv: vmbus: Eliminate the spin lock on the read path
    - Drivers: hv: vmbus: Give control over how the ring access is serialized
    - drivers/hv: Move VMBus hypercall codes into Hyper-V UAPI header
    - Drivers: hv: vmbus: don't loose HVMSG_TIMER_EXPIRED messages
    - Drivers: hv: vmbus: avoid wait_for_completion() on crash
    - Drivers: hv: vmbus: remove code duplication in message handling
    - Drivers: hv: vmbus: avoid unneeded compiler optimizations in vmbus_wait_for_unload()
    - Drivers: hv: util: Pass the chann...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-07 22:56 EDT-------
Looks like this bug has been fixed with 4.4.0-17-generic. But the maxcpus=1 in cmdline of second kernel should be updated to nr_cpus=1. I'll file a new bug for this.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-28 09:20 EDT-------
Mahesh spoke to PPC Maintainer offline about this patch. Since the changes are more invasive (and at the code path which is used at every boot) review is taking time. Maintainer is also thinking about alternate solution but that includes rewriting lots of cpu numbering code. So as of now maintainer conveyed that he needs some more time (hasn't mention specific time) to review this patch to be very sure that it is not harmful.

So as it stands, this patch may not be going anywhere at least for some time..

Thanks
Hari

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.