Kernel WARN @drivers/base/memory.c:200 during DLPAR memory operation

Bug #1463654 reported by bugproxy on 2015-06-10
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Tim Gardner
Vivid
Undecided
Tim Gardner
Wily
Undecided
Tim Gardner
Xenial
Medium
Tim Gardner

Bug Description

---Problem Description---
Kernel WARN @drivers/base/memory.c:200 during DLPAR memory operation

Contact Information = Sachin Sant / <email address hidden>

---uname output---
3.19.0-18-generic

---Patches Installed---
A patched powerpc-ibm-utils package is required

Machine Type = POWER8

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1) Using latest daily ISO install 14.04.02 as a Power VM guest
2) Upgrade the kernel to 3.19 level (3.19.0-18-generic)
3) Ensure ksh and powerpc-ibm-utils packages are installed.
4) Download following DLPAR packages from http://ausgsa.ibm.com/projects/r/rsctdev/builds/muthu/rmuts006a/ppc64le/

devices.chrp.base.servicerm_2.5.0.1-15111_ppc64el.deb
dynamicrm_2.0.1-3_ppc64el.deb
rsct.core_3.2.0.6-15111_ppc64el.deb
rsct.core.utils_3.2.0.6-15111_ppc64el.deb
src_3.2.0.6-15111_ppc64el.deb

5) Install the packages.
6) Perform a add memory operation via HMC

Stack trace output:

alp9 kernel: [44170.234662] ------------[ cut here ]------------
alp9 kernel: [44170.234667] WARNING: at /build/buildd/linux-lts-vivid-3.19.0/drivers/base/memory.c:200
alp9 kernel: [44170.234668] Modules linked in: rpadlpar_io rpaphp pseries_rng rtc_generic
alp9 kernel: [44170.234675] CPU: 2 PID: 1391 Comm: systemd-udevd Not tainted 3.19.0-18-generic #18~14.04.1-Ubuntu
alp9 kernel: [44170.234677] task: c0000003dbea7c80 ti: c0000003dbf10000 task.ti: c0000003dbf10000
Jun 4 19:44:47 alp9 kernel: [44170.234678] NIP: c000000000668f34 LR: c0000000006699b0 CTR: 0000000000000000
Jun 4 19:44:47 alp9 kernel: [44170.234680] REGS: c0000003dbf13910 TRAP: 0700 Not tainted (3.19.0-18-generic)
Jun 4 19:44:47 alp9 kernel: [44170.234680] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28042888 XER: 20000000
Jun 4 19:44:47 alp9 kernel: [44170.234686] CFAR: c000000000668ed8 SOFTE: 1
Jun 4 19:44:47 alp9 kernel: [44170.234686] GPR00: c0000000006699b0 c0000003dbf13b90 c00000000144c760 0000000000000001
Jun 4 19:44:47 alp9 kernel: [44170.234686] GPR04: 0000000000000779 0000000000000100 0000000000078000 f000000001de4000
Jun 4 19:44:47 alp9 kernel: [44170.234686] GPR08: c0000000013ac760 0000000000000001 0000000000007790 00000000003fffff
Jun 4 19:44:47 alp9 kernel: [44170.234686] GPR12: c00000000172cb00 c00000000e831200 00000100074a0010 0000000000000000
Jun 4 19:44:47 alp9 kernel: [44170.234686] GPR16: 0000000010032230 00000000100311d0 00003fffcb59bf20 0000000000000003
Jun 4 19:44:47 alp9 kernel: [44170.234686] GPR20: 00000000100527f8 00000000100322b0 0000000001312d00 00000100074af7f0
Jun 4 19:44:47 alp9 kernel: [44170.234686] GPR24: 00000000100322d0 00003fffcb59bf20 c0000003dbf13e00 0000000000000000
Jun 4 19:44:47 alp9 kernel: [44170.234686] GPR28: 0000000000001000 0000000000000000 0000000000077000 0000000000077900
Jun 4 19:44:47 alp9 kernel: [44170.234707] NIP [c000000000668f34] pages_correctly_reserved+0x134/0x1c0
Jun 4 19:44:47 alp9 kernel: [44170.234709] LR [c0000000006699b0] memory_subsys_online+0x70/0x140
Jun 4 19:44:47 alp9 kernel: [44170.234710] Call Trace:
Jun 4 19:44:47 alp9 kernel: [44170.234711] [c0000003dbf13b90] [0000000000000006] 0x6 (unreliable)
Jun 4 19:44:47 alp9 kernel: [44170.234714] [c0000003dbf13c00] [c0000000006699b0] memory_subsys_online+0x70/0x140
Jun 4 19:44:47 alp9 kernel: [44170.234716] [c0000003dbf13c40] [c0000000006476f4] device_online+0xb4/0x120
Jun 4 19:44:47 alp9 kernel: [44170.234718] [c0000003dbf13c80] [c00000000066987c] store_mem_state+0x8c/0x150
Jun 4 19:44:47 alp9 kernel: [44170.234721] [c0000003dbf13cc0] [c000000000643618] dev_attr_store+0x68/0xa0
Jun 4 19:44:47 alp9 kernel: [44170.234724] [c0000003dbf13d00] [c00000000035afd0] sysfs_kf_write+0x80/0xb0
Jun 4 19:44:47 alp9 kernel: [44170.234726] [c0000003dbf13d40] [c000000000359f0c] kernfs_fop_write+0x18c/0x1f0
Jun 4 19:44:47 alp9 kernel: [44170.234730] [c0000003dbf13d90] [c0000000002b450c] vfs_write+0xdc/0x260
Jun 4 19:44:47 alp9 kernel: [44170.234732] [c0000003dbf13de0] [c0000000002b53bc] SyS_write+0x6c/0x110
Jun 4 19:44:47 alp9 kernel: [44170.234735] [c0000003dbf13e30] [c000000000009258] system_call+0x38/0xd0
Jun 4 19:44:47 alp9 kernel: [44170.234736] Instruction dump:
Jun 4 19:44:47 alp9 kernel: [44170.234737] 419e0024 788a2428 7d095214 2fa80000 41de0014 7d29502a 38e74000 7928ffe3
Jun 4 19:44:47 alp9 kernel: [44170.234740] 4082ff7c 3d02fff6 892808e3 69290001 <0b090000> 2fa90000 40de0068 38600000
Jun 4 19:44:47 alp9 kernel: [44170.234744] ---[ end trace f8d28c560fa5c980 ]---

System Dump Info:
  The system is not configured to capture a system dump.

== Comment: #1 - SACHIN P. SANT <email address hidden> - 2015-06-05 01:28:00 ==
The DLPAR memory operation succeeds and is not affected by this warning.

== Comment: #3 - Nathan D. Fontenot <email address hidden> - 2015-06-05 07:11:45 ==
Looking at this issue further, it appears to only happen when adding memory that was previously removed. When adding memory that has not been assigned to the partition yet the warnings are not generated.

Need to look into the memory remove code paths to make sure we are releasing the memory properly.

== Comment: #4 - Nathan D. Fontenot <email address hidden> - 2015-06-05 07:59:17 ==
I will have to verify in the Ubuntu source, but this seems like this is an issue that was solved previously by commit 2bbcb8788311a40714b585fc11b51da6ffa2ab92.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/base/memory.c?id=2bbcb8788311a40714b585fc11b51da6ffa2ab92

bugproxy (bugproxy) on 2015-06-10
tags: added: architecture-ppc64le bugnameltc-125934 severity-high targetmilestone-inin---

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1463654/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment

------- Comment From <email address hidden> 2015-06-23 20:41 EDT-------
(In reply to comment #4)
> I will have to verify in the Ubuntu source, but this seems like this is an
> issue that was solved previously by commit
> 2bbcb8788311a40714b585fc11b51da6ffa2ab92.
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> drivers/base/memory.c?id=2bbcb8788311a40714b585fc11b51da6ffa2ab92

I have been able to reproduce the bug despite the inclusion of this commit in the version of the kernel in question (3.19.0-18-generic). I am continuing to debug the issue.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-06-26 18:44 EDT-------
After some further investigation, I have found that the warning only occurs after the first memory add after a reboot. After the warning is triggered, successive add memory operations complete without triggering the warning.

bugproxy (bugproxy) on 2015-06-29
tags: added: severity-medium
removed: severity-high
bugproxy (bugproxy) on 2015-07-06
tags: added: targetmilestone-inin14043
removed: targetmilestone-inin---

------- Comment (attachment only) From <email address hidden> 2015-11-02 17:40 EDT-------

------- Comment From <email address hidden> 2016-01-07 18:41 EDT-------
Latest version of the patch has been submitted and can be seen here:
http://lkml.iu.edu/hypermail/linux/kernel/1601.0/03413.html

Luciano Chavez (lnx1138) on 2016-01-20
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-20 10:29 EDT-------
The patch is tested for the kernel version 4.4.0 and it works as expected.

tags: added: targetmilestone-inin14044
removed: targetmilestone-inin14043
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-20 11:29 EDT-------
BTW, Canonical please ignore the patch attached to the bug back in Nov 2015. The fix has been committed in linux-next (a link will follow) and the developer will backport the patch and supply a new one for the LTS kernel if necessary.

Tim Gardner (timg-tpi) wrote :

Yep, I noticed the patch was up to v4 so I thought I'd wait until it was closer to being merged.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-20 12:03 EDT-------
The patch (drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64) is merged into the 4.4.0 kernel and can be seen here:
https://github.com/torvalds/linux/commit/cb5490a5eea415106d7438df440da5fb1e17318d

Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Medium
status: New → Triaged
Tim Gardner (timg-tpi) wrote :

Indeed it has been merged, albeit with a slightly different commit subject (which is why I did not find it). I'm marking this fix released for Xenial since the v4.4 based kernel will be uploaded real soon now.

Changed in linux (Ubuntu Xenial):
assignee: Canonical Kernel Team (canonical-kernel-team) → Tim Gardner (timg-tpi)
status: Triaged → Fix Released
Changed in linux (Ubuntu Wily):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Tim Gardner (timg-tpi) on 2016-01-20
Changed in linux (Ubuntu Vivid):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-27 02:02 EDT-------
FYI. We can reproduce this problem with the 4.2.0-25-generic kernel.

Tim Gardner (timg-tpi) wrote :

Just to be sure, the upstream fix is 6add7cd618b4d4dc525731beb539c5e06e891855 ('memory hotplug: sysfs probe routine should add all memory sections'), right ?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-27 09:38 EDT-------
(In reply to comment #24)
>
> Just to be sure, the upstream fix is
> 6add7cd618b4d4dc525731beb539c5e06e891855 ('memory hotplug: sysfs probe
> routine should add all memory sections'), right ?

That commit (6add7cd618b4d4dc525731beb539c5e06e891855) introduced the old way of adding memory back in 2011. The commit that fixes that is cb5490a5eea415106d7438df440da5fb1e17318d ('drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64')

Brad Figg (brad-figg) on 2016-02-01
Changed in linux (Ubuntu Vivid):
status: In Progress → Fix Committed
Tim Gardner (timg-tpi) on 2016-02-01
Changed in linux (Ubuntu Wily):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-vivid
tags: added: verification-needed-wily
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-10 03:41 EDT-------
I have verified on Ubuntu 16.04 with kernel version 4.4.0-2-generic and it works as expected.

Brad Figg (brad-figg) on 2016-02-18
tags: added: verification-done-vivid verification-done-wily
removed: verification-needed-vivid verification-needed-wily
Launchpad Janitor (janitor) wrote :
Download full text (43.7 KiB)

This bug was fixed in the package linux - 4.2.0-30.35

---------------
linux (4.2.0-30.35) wily; urgency=low

  [ Seth Forshee ]

  * SAUCE: cred: Add clone_cred() interface
    - LP: #1531747, #1534961, #1535150
    - CVE-2016-1575 CVE-2016-1576
  * SAUCE: overlayfs: Use mounter's credentials instead of selectively
    raising caps
    - LP: #1531747, #1534961, #1535150
    - CVE-2016-1575 CVE-2016-1576
  * SAUCE: overlayfs: Skip permission checking for trusted.overlayfs.*
    xattrs
    - LP: #1531747, #1534961, #1535150
    - CVE-2016-1575 CVE-2016-1576
  * SAUCE: overlayfs: Be more careful about copying up sxid files
    - LP: #1534961, #1535150
    - CVE-2016-1575 CVE-2016-1576
  * SAUCE: overlayfs: Propogate nosuid from lower and upper mounts
    - LP: #1534961, #1535150
    - CVE-2016-1575 CVE-2016-1576

linux (4.2.0-29.34) wily; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1543167

  [ Brad Figg ]

  * Revert "SAUCE: apparmor: fix sleep from invalid context"
    - LP: #1542049

  [ Upstream Kernel Changes ]

  * Revert "af_unix: Revert 'lock_interruptible' in stream receive code"
    - LP: #1540731

linux (4.2.0-28.33) wily; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1540634

  [ Brad Figg ]

  * CONFIG: CONFIG_DEBUG_UART_BCM63XX is not set

  [ J. R. Okajima ]

  * SAUCE: ubuntu: aufs: tiny, extract a new func xino_fwrite_wkq()
    - LP: #1533043
  * SAUCE: ubuntu: aufs: for 4.3, XINO handles EINTR from the dying process
    - LP: #1533043

  [ John Johansen ]

  * SAUCE: (no-up): apparmor: fix for failed mediation of socket that is
    being shutdown
    - LP: #1446906
  * SAUCE: apparmor: fix sleep from invalid context
    - LP: #1539349

  [ Tim Gardner ]

  * [Config] Add pvpanic to virtual flavour
    - LP: #1537923

  [ Upstream Kernel Changes ]

  * Revert "ACPI / LPSS: allow to use specific PM domain during ->probe()"
    - LP: #1540532
  * tools: Add a "make all" rule
    - LP: #1536370
  * vf610_adc: Fix internal temperature calculation
    - LP: #1536370
  * iio: lpc32xx_adc: fix warnings caused by enabling unprepared clock
    - LP: #1536370
  * iio:ad5064: Make sure ad5064_i2c_write() returns 0 on success
    - LP: #1536370
  * iio: ad5064: Fix ad5629/ad5669 shift
    - LP: #1536370
  * iio:ad7793: Fix ad7785 product ID
    - LP: #1536370
  * iio: adc: vf610_adc: Fix division by zero error
    - LP: #1536370
  * mmc: mmc: Improve reliability of mmc_select_hs200()
    - LP: #1536370
  * mmc: mmc: Fix HS setting in mmc_select_hs400()
    - LP: #1536370
  * mmc: mmc: Move mmc_switch_status()
    - LP: #1536370
  * mmc: mmc: Improve reliability of mmc_select_hs400()
    - LP: #1536370
  * crypto: qat - don't use userspace pointer
    - LP: #1536370
  * iio: si7020: Swap data byte order
    - LP: #1536370
  * iio: adc: xilinx: Fix VREFN scale
    - LP: #1536370
  * ipmi: Start the timer and thread on internal msgs
    - LP: #1536370
  * drm/i915: quirk backlight present on Macbook 4, 1
    - LP: #1536370
  * drm/i915: get runtime PM reference around GEM set_caching IOCTL
    - LP: #1536370
  * drm/radeon: Disable uncacheable CPU mappings of GTT with RV6xx
    - LP: #1536370
  *...

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (12.3 KiB)

This bug was fixed in the package linux - 3.19.0-51.57

---------------
linux (3.19.0-51.57) vivid; urgency=low

  [ Seth Forshee ]

  * SAUCE: cred: Add clone_cred() interface
    - LP: #1531747, #1534961, #1535150
    - CVE-2016-1575 CVE-2016-1576
  * SAUCE: overlayfs: Use mounter's credentials instead of selectively
    raising caps
    - LP: #1531747, #1534961, #1535150
    - CVE-2016-1575 CVE-2016-1576
  * SAUCE: overlayfs: Skip permission checking for trusted.overlayfs.*
    xattrs
    - LP: #1531747, #1534961, #1535150
    - CVE-2016-1575 CVE-2016-1576
  * SAUCE: overlayfs: Be more careful about copying up sxid files
    - LP: #1534961, #1535150
    - CVE-2016-1575 CVE-2016-1576
  * SAUCE: overlayfs: Propogate nosuid from lower and upper mounts
    - LP: #1534961, #1535150
    - CVE-2016-1575 CVE-2016-1576

linux (3.19.0-50.56) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1540576

  [ J. R. Okajima ]

  * SAUCE: ubuntu: aufs: tiny, extract a new func xino_fwrite_wkq()
    - LP: #1533043
  * SAUCE: ubuntu: aufs: for 4.3, XINO handles EINTR from the dying process
    - LP: #1533043

  [ John Johansen ]

  * SAUCE: (no-up): apparmor: fix for failed mediation of socket that is
    being shutdown
    - LP: #1446906

  [ Upstream Kernel Changes ]

  * drivers/base/memory.c: fix kernel warning during memory hotplug on
    ppc64
    - LP: #1463654
  * sched/wait: Fix signal handling in bit wait helpers
    - LP: #1537859
  * sched/wait: Fix the signal handling fix
    - LP: #1537859
  * ARC: Fix silly typo in MAINTAINERS file
    - LP: #1537859
  * ip6mr: call del_timer_sync() in ip6mr_free_table()
    - LP: #1537859
  * gre6: allow to update all parameters via rtnl
    - LP: #1537859
  * atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation
    - LP: #1537859
  * sctp: use the same clock as if sock source timestamps were on
    - LP: #1537859
  * sctp: update the netstamp_needed counter when copying sockets
    - LP: #1537859
  * sctp: also copy sk_tsflags when copying the socket
    - LP: #1537859
  * net: qca_spi: fix transmit queue timeout handling
    - LP: #1537859
  * ipv6: sctp: clone options to avoid use after free
    - LP: #1537859
  * net: add validation for the socket syscall protocol argument
    - LP: #1537859
  * sh_eth: fix kernel oops in skb_put()
    - LP: #1537859
  * net: fix IP early demux races
    - LP: #1537859
  * vlan: Fix untag operations of stacked vlans with REORDER_HEADER off
    - LP: #1537859
  * skbuff: Fix offset error in skb_reorder_vlan_header
    - LP: #1537859
  * pptp: verify sockaddr_len in pptp_bind() and pptp_connect()
    - LP: #1537859
  * bluetooth: Validate socket address length in sco_sock_bind().
    - LP: #1537859
  * fou: clean up socket with kfree_rcu
    - LP: #1537859
  * af_unix: Revert 'lock_interruptible' in stream receive code
    - LP: #1537859
  * KEYS: Fix race between read and revoke
    - LP: #1537859
  * tools: Add a "make all" rule
    - LP: #1537859
  * efi: Disable interrupts around EFI calls, not in the epilog/prolog
    calls
    - LP: #1537859
  * fuse: break infinite loop in fuse_fill_write_pages()
    - LP: #1537859
  * usb: gadget: pxa2...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Sandhya Venugopala (vsandhya) wrote :

Tim,

Kindly let us know which kernel version of Ubuntu 16.04 will have this fix integrated.

Thank you

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers