DLPAR - Memory add operation fails

Bug #1463077 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
powerpc-ibm-utils (Ubuntu)
Fix Released
Undecided
Taco Screen team
Trusty
Triaged
High
Adam Conrad

Bug Description

---Problem Description---
DLPAR - Memory add operation fails

---uname output---
3.19.0-18-generic

Machine Type = POWER8

---Steps to Reproduce---
1) Using latest daily ISO install 14.04.02 as a Power VM guest
2) Upgrade the kernel to 3.19 level (3.19.0-18-generic)
3) Ensure ksh and powerpc-ibm-utils packages are installed.
4) Download following DLPAR packages from http://ausgsa.ibm.com/projects/r/rsctdev/builds/muthu/rmuts006a/ppc64le/

devices.chrp.base.servicerm_2.5.0.1-15111_ppc64el.deb
dynamicrm_2.0.1-3_ppc64el.deb
rsct.core_3.2.0.6-15111_ppc64el.deb
rsct.core.utils_3.2.0.6-15111_ppc64el.deb
src_3.2.0.6-15111_ppc64el.deb

5) Install the packages.
6) Perform a add memory operation via HMC

The operation fails with following message:

    The dynamic partitioning operation failed. - alp9
The dynamic addition of memory resources failed:
location failed for drc 80000172 with -9001
Valid outstanding translations exist.
Unexpected error (src/drmgr/drslot_chrp_mem.c:958). Contact support and provide debug log from /var/log/drmgr.
Allocation failed for drc 80000173 with -9001
Valid outstanding translations exist.
Unexpected error (src/drmgr/drslot_chrp_mem.c:958). Contact support and provide debug log from /var/log/drmgr.
Allocation failed for drc 80000174 with -9001
Valid outstanding translations exist.
Unexpected error (src/drmgr/drslot_chrp_mem.c:958). Contact support and provide debug log from /var/log/drmgr.
Allocation failed for drc 80000175 with -9001
Valid outstanding translations exist.
Unexpected error (src/drmgr/drslot_chrp_mem.c:958). Contact support and provide debug log from /var/log/drmgr.
Allocation failed for drc 80000176 with -9001
.......
.......

Following kernel warning can also be seen in kernel logs :

[ 2086.358528] section number 3210 page number 100 not reserved, was it already online?
[ 2086.358954] online_pages [mem 0xc80000000-0xc8fffffff] failed
[ 2088.983151] ------------[ cut here ]------------
[ 2088.983154] WARNING: at /build/buildd/linux-lts-vivid-3.19.0/drivers/base/memory.c:200
[ 2088.983155] Modules linked in: bnx2x mdio libcrc32c rpadlpar_io rpaphp rtc_generic pseries_rng
[ 2088.983165] CPU: 1 PID: 5867 Comm: systemd-udevd Not tainted 3.19.0-18-generic #18~14.04.1-Ubuntu
[ 2088.983168] task: c0000000fcc8bac0 ti: c0000000fce24000 task.ti: c0000000fce24000
[ 2088.983169] NIP: c000000000668f34 LR: c0000000006699b0 CTR: 0000000000000000
[ 2088.983171] REGS: c0000000fce27910 TRAP: 0700 Not tainted (3.19.0-18-generic)
[ 2088.983172] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28022888 XER: 20000000
[ 2088.983178] CFAR: c000000000668ed8 SOFTE: 1
[ 2088.983178] GPR00: c0000000006699b0 c0000000fce27b90 c00000000144c760 0000000000000001
[ 2088.983178] GPR04: 0000000000000c95 0000000000000100 00000000000ca000 f000000003254000
[ 2088.983178] GPR08: c0000000013ac760 0000000000000001 000000000000c950 00000000003fffff
[ 2088.983178] GPR12: c00000000172cb00 c000000007b30900 0000010033ce0010 0000000000000000
[ 2088.983178] GPR16: 0000000010032230 00000000100311d0 00003fffcdd01fb0 0000000000000003
[ 2088.983178] GPR20: 00003fffcdd06a10 00000000100322b0 0000000001312d00 0000010033d0ab20
[ 2088.983178] GPR24: 00000000100322d0 00003fffcdd01fb0 c0000000fce27e00 0000000000000000
[ 2088.983178] GPR28: 0000000000001000 0000000000000000 00000000000c9000 00000000000c9500
[ 2088.983199] NIP [c000000000668f34] pages_correctly_reserved+0x134/0x1c0
[ 2088.983201] LR [c0000000006699b0] memory_subsys_online+0x70/0x140
[ 2088.983202] Call Trace:
[ 2088.983203] [c0000000fce27b90] [0000000000000006] 0x6 (unreliable)
[ 2088.983206] [c0000000fce27c00] [c0000000006699b0] memory_subsys_online+0x70/0x140
[ 2088.983208] [c0000000fce27c40] [c0000000006476f4] device_online+0xb4/0x120
[ 2088.983210] [c0000000fce27c80] [c00000000066987c] store_mem_state+0x8c/0x150
[ 2088.983212] [c0000000fce27cc0] [c000000000643618] dev_attr_store+0x68/0xa0
[ 2088.983215] [c0000000fce27d00] [c00000000035afd0] sysfs_kf_write+0x80/0xb0
[ 2088.983217] [c0000000fce27d40] [c000000000359f0c] kernfs_fop_write+0x18c/0x1f0
[ 2088.983220] [c0000000fce27d90] [c0000000002b450c] vfs_write+0xdc/0x260
[ 2088.983222] [c0000000fce27de0] [c0000000002b53bc] SyS_write+0x6c/0x110
[ 2088.983225] [c0000000fce27e30] [c000000000009258] system_call+0x38/0xd0
[ 2088.983226] Instruction dump:
[ 2088.983227] 419e0024 788a2428 7d095214 2fa80000 41de0014 7d29502a 38e74000 7928ffe3
[ 2088.983230] 4082ff7c 3d02fff6 892808e3 69290001 <0b090000> 2fa90000 40de0068 38600000
[ 2088.983234] ---[ end trace 4d71c6f643fed65a ]---
[ 2088.983657] online_pages [mem 0xc90000000-0xc9fffffff] failed
[ 2091.419306] online_pages [mem 0xca0000000-0xcafffffff] failed

root@alp9:/var/log# lssrc -a
Subsystem Group PID Status
 ctrmc rsct 5528 active
 IBM.ServiceRM rsct_rm 5616 active
 IBM.DRM rsct_rm 5628 active
 IBM.HostRM rsct_rm 5690 active
 IBM.MgmtDomainRM rsct_rm 5729 active
 ctcas rsct inoperative
 IBM.ERRM rsct_rm inoperative
 IBM.AuditRM rsct_rm inoperative
 IBM.SensorRM rsct_rm inoperative

I was able to successfully add and remove memory using the latest git pull of the powerpc-utils package. This is a known issue in that we need to provide Ubuntu with a cherry picked list of patches that they need to pick up. The version of powerpc-utils installed is 3 (soon to be 4) releases old and there have been several memory hotplug and LE fixes that have gone in.

List of commit id's for the patches we need to have pulled in.

09630ae59954de064f379f085e710ba34b999ae4 (drmgr: Correct memory affinity when adding memory)

7bc6978561aa994f7336318ef693bf8debeaac91 (drmgr: Fix LMB lookup by index)

948bbf83b43c9a3bc00f521326e096ca6c7848c5 (drmgr: Fix CPU/LMB add/removal using drc_index)

7f1ba6f2cefbc72112ee809d2b5fd90e7525417d (drmgr: Fix to check for drmgr REPLACE (-R) flag)

acfa92352e3b2bddf256de23e7ef07ad3a47ec08 (drmgr/lsslot: Fix broken memory support for little endian)

cacaab1631651ebb161088649d5a8861112ce667 (drmgr: Correct the -s option handling correction)

dd25be513fb5e31cca77bf259d66c52fec44ecff (drmgr: do not remove the last CPU)

019291552d6af9ee7ccea79f24a82ad76f48024f (drmgr: Correct -s option handling)

b8afd967c9f4fe27f0b31d681170978e52ec04f3 (drmgr: Correct null pointer usage

15b063dc3a41a649a4860bdeeb5b1ef93f0cfc5b (ofpathname: Fix checking for hbtl)

c0e88a73b556ac633e55cd16bc13355fe82b520b (lsslot/drmgr: little endian support for memory)

e12d18c28acb428ff03e520c30a63b2488ddf503 (ofpathname: Convert OF format to logical device for virtio-scsi devices)

51f927e5caff80d08e3cf22d3f1270d29d4d7a61 (ofpathname: Convert logical path to OF device path for virtio-scsi devices)

ac4b16e40fe912054f82448789a998a9421e623d (snap: Display message for Ubuntu platform)

Revision history for this message
bugproxy (bugproxy) wrote : /var/log/drmgr after the failure

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-125714 severity-critical targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : Contents of /var/log/syslog after failure

Default Comment by Bridge

bugproxy (bugproxy)
tags: added: targetmilestone-inin14043
removed: targetmilestone-inin---
Luciano Chavez (lnx1138)
affects: ubuntu → powerpc-utils (Ubuntu)
Changed in powerpc-utils (Ubuntu):
assignee: nobody → Taco Screen team (taco-screen-team)
Revision history for this message
Steve Langasek (vorlon) wrote :

I'm assuming based on the list of commits that this bug is fixed in 1.2.25 upstream, so marking fixed for the development series.

affects: powerpc-utils (Ubuntu) → powerpc-ibm-utils (Ubuntu)
Changed in powerpc-ibm-utils (Ubuntu):
status: New → Fix Released
Changed in powerpc-ibm-utils (Ubuntu Trusty):
status: New → Triaged
assignee: nobody → Adam Conrad (adconrad)
milestone: none → ubuntu-14.04.3
importance: Undecided → High
Revision history for this message
Breno Leitão (breno-leitao) wrote :

Right, Steve. We now need to fix it on 14.04, as you already started to track.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-06-24 10:38 EDT-------
*** Bug 126099 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-07-29 20:13 EDT-------
*** This bug has been marked as a duplicate of bug 124876 ***

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.