Kernel getting crashed on DLPAR add/remove commands for halfsail on ubuntu14.04.5

Bug #1621544 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
New
Undecided
Taco Screen team

Bug Description

== Comment: #0 - Naresh Bannoth - 2016-07-01 02:07:26 ==
---Problem Description---
executing Dlpar add/remove adapter operation on Halfsail card in ubuntu14.04.5.
on executing these command for 5-10 times, the kernel getting crashed.

Syslog messages while crashing
----------------------------------------

[52469.743034] Unable to handle kernel paging request for instruction fetch
[52469.743045] Faulting instruction address: 0x552f73746f6c732c
[52469.743051] Oops: Kernel access of bad area, sig: 11 [#1]
[52469.743055] SMP NR_CPUS=2048 NUMA pSeries
[52469.743061] Modules linked in: rpadlpar_io rpaphp dm_round_robin pseries_rng rtc_generic dm_multipath qla2xxx ibmvscsi ibmveth scsi_transport_fc
[52469.743079] CPU: 10 PID: 1954 Comm: multipathd Not tainted 4.4.0-29-generic #48~14.04.1-Ubuntu
[52469.743086] task: c0000003f9f42a20 ti: c0000003f22d4000 task.ti: c0000003f22d4000
[52469.743091] NIP: 552f73746f6c732c LR: c00000000004803c CTR: 552f73746f6c732f
[52469.743097] REGS: c0000003f22d6d60 TRAP: 0400 Not tainted (4.4.0-29-generic)
[52469.743101] MSR: 8000000140009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28008424 XER: 20000010
[52469.743116] CFAR: c000000000008468 SOFTE: 1
GPR00: c000000000048018 c0000003f22d6fe0 c000000001598f00 c000000006e6c000
GPR04: 0000000000000001 0000000000000000 000000000000de0b c000000000f7fdc8
GPR08: 00000003fe920000 552f73746f6c732f 0000000000000000 c000000006121e40
GPR12: 552f73746f6c732f c000000007af5f00 00003fff8f00eea8 00003fff8f00eea8
GPR16: 00003fff8f00eea8 00003fff8f00eea8 00003fff8f00eea8 0000000000000001
GPR20: 0000000000000000 0000000000000083 00003fff8eeb04cc 00003fff8f023810
GPR24: 000000000000004e c0000000062dfc28 c0000003f2c24000 0000000000000100
GPR28: c0000000015050e0 c000000005018e80 c0000003f3282c00 c000000006e6c000
[52469.743181] NIP [552f73746f6c732c] 0x552f73746f6c732c
[52469.743188] LR [c00000000004803c] pcibios_release_device+0x5c/0x80
[52469.743192] Call Trace:
[52469.743196] [c0000003f22d6fe0] [c000000000048018] pcibios_release_device+0x38/0x80 (unreliable)
[52469.743205] [c0000003f22d7010] [c0000000005c15f4] pci_release_dev+0x84/0xd0
[52469.743212] [c0000003f22d7040] [c0000000006b2390] device_release+0x60/0xf0
[52469.743219] [c0000003f22d70c0] [c00000000056bd14] kobject_cleanup+0xd4/0x240
[52469.743226] [c0000003f22d7140] [c0000000006b2b24] put_device+0x34/0x50
[52469.743232] [c0000003f22d7170] [c00000000073c034] scsi_host_dev_release+0x124/0x1a0
[52469.743239] [c0000003f22d71b0] [c0000000006b2390] device_release+0x60/0xf0
[52469.743246] [c0000003f22d7230] [c00000000056bd14] kobject_cleanup+0xd4/0x240
[52469.743253] [c0000003f22d72b0] [c0000000006b2b24] put_device+0x34/0x50
[52469.743265] [c0000003f22d72e0] [d0000000023d097c] fc_rport_dev_release+0x2c/0x50 [scsi_transport_fc]
[52469.743275] [c0000003f22d7310] [c0000000006b2390] device_release+0x60/0xf0
[52469.743283] [c0000003f22d7390] [c00000000056bd14] kobject_cleanup+0xd4/0x240
[52469.743292] [c0000003f22d7410] [c0000000006b2b24] put_device+0x34/0x50
[52469.743301] [c0000003f22d7440] [c000000000748a50] scsi_target_dev_release+0x40/0x60
[52469.743309] [c0000003f22d7470] [c0000000006b2390] device_release+0x60/0xf0
[52469.743317] [c0000003f22d74f0] [c00000000056bd14] kobject_cleanup+0xd4/0x240
[52469.743324] [c0000003f22d7570] [c0000000006b2b24] put_device+0x34/0x50
[52469.743332] [c0000003f22d75a0] [c00000000074d3e8] scsi_device_dev_release_usercontext+0x178/0x1b0
[52469.743341] [c0000003f22d7600] [c0000000000da4f4] execute_in_process_context+0xa4/0xd0
[52469.743349] [c0000003f22d7630] [c00000000074d254] scsi_device_dev_release+0x34/0x50
[52469.743357] [c0000003f22d7660] [c0000000006b2390] device_release+0x60/0xf0
[52469.743364] [c0000003f22d76e0] [c00000000056bd14] kobject_cleanup+0xd4/0x240
[52469.743371] [c0000003f22d7760] [c0000000006b2b24] put_device+0x34/0x50
[52469.743379] [c0000003f22d7790] [c0000000007399e0] scsi_device_put+0x40/0x60
[52469.743386] [c0000003f22d77c0] [c00000000075b528] scsi_disk_put+0x58/0x90
[52469.743394] [c0000003f22d7800] [c0000000003282c8] __blkdev_put+0x348/0x3d0
[52469.743402] [c0000003f22d78f0] [c0000000008cbc1c] dm_put_table_device+0xcc/0x130
[52469.743410] [c0000003f22d7930] [c0000000008d09b0] dm_put_device+0xa0/0x130
[52469.743418] [c0000003f22d79b0] [d000000004a41830] free_priority_group+0xb0/0x110 [dm_multipath]
[52469.743427] [c0000003f22d7a10] [d000000004a41914] free_multipath+0x84/0xf0 [dm_multipath]
[52469.743436] [c0000003f22d7a60] [c0000000008d1b30] dm_table_destroy+0xb0/0x1a0
[52469.743443] [c0000003f22d7af0] [c0000000008d711c] dev_suspend+0x14c/0x330
[52469.743449] [c0000003f22d7b30] [c0000000008d7fec] ctl_ioctl+0x1cc/0x380
[52469.743456] [c0000003f22d7d10] [c0000000008d81d8] dm_ctl_ioctl+0x38/0x50
[52469.743464] [c0000003f22d7d40] [c0000000002ec710] do_vfs_ioctl+0x4d0/0x7e0
[52469.743472] [c0000003f22d7de0] [c0000000002ecaf4] SyS_ioctl+0xd4/0xf0
[52469.743480] [c0000003f22d7e30] [c000000000009204] system_call+0x38/0xb4
[52469.743486] Instruction dump:
[52469.743490] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[52469.743501] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[52469.743516] ---[ end trace 93e18aa68a2c9922 ]---
[52469.747068]
[52469.747074] Sending IPI to other CPUs
[52469.748095] IPI complete
I'm in purgatory

---uname output---
Linux ubuntu 4.4.0-29-generic #48~14.04.1-Ubuntu SMP Wed Jun 29 19:55:03 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = 8284-22A

---Steps to Reproduce---
 1. Install all required rsct packages for dlpar operation
2. make sure that the adapter is added as Desired in HMC
3. Activate the Dlpar services using "starsrc -g rsct" and "starsrc -g rsct_rm"
4. Make sure that thedlpar service are active using lssrc -a

root@ubuntu:~# lssrc -a
Subsystem Group PID Status
 ctrmc rsct 1558 active
 IBM.DRM rsct_rm 1720 active
 IBM.MgmtDomainRM rsct_rm 1785 active
 IBM.ServiceRM rsct_rm 1796 active
 IBM.HostRM rsct_rm 1855 active
 ctcas rsct 2918 active
 IBM.ERRM rsct_rm 2926 active
 IBM.AuditRM rsct_rm 2927 active
 IBM.SensorRM rsct_rm 2928 active
root@ubuntu:~#

Now start executing the Dlpar commands as follows,

1. List the adapter present in the lpar and note down the lpar_id and drc_index of the adapter on which you want to execute Dlpar operations,
lshwres -r io -m server-name --rsubtype slot --filter "lpar_names=lparname"

eg:
lshwres -r io -m tuletapio2-fsp --rsubtype slot --filter "lpar_names=tuletapio2-lp5Naresh-ubuntu16.04.1"

output:
--------
unit_phys_loc=U78CB.001.WZS00E2,bus_id=29,phys_loc=C9,drc_index=2103001D,lpar_name=tuletapio2-lp5Naresh-ubuntu16.04.1,lpar_id=8,slot_io_pool_id=none,description=Fibre Channel Serial Bus,feature_codes=none,pci_vendor_id=1077,pci_device_id=2532,pci_subs_vendor_id=1014,pci_subs_device_id=F304,pci_class=0C04,pci_revision_id=02,bus_grouping=0,iop=0,parent_slot_drc_index=none,drc_name=U78CB.001.WZS00E2C9,interposer_present=0,interposer_pcie=0,lpar_assignment_capable=1,dynamic_lpar_assignment_capable=1

Remove command:
-----------------------
chhwres -r io -m tuletapio2-fsp -o r --id lpar_id -l drc_index

eg:
chhwres -r io -m tuletapio2-fsp -o r --id 8 -l 2103001D

Add command
-------------------
chhwres -r io -m tuletapio2-fsp -o a --id lpar_id -l drc_index

eg:
chhwres -r io -m tuletapio2-fsp -o a --id 8 -l 2103001D

Userspace tool common name: DLPAR(rsct)

== Comment: #12 - Mauricio Faria De Oliveira - 2016-09-08 12:19:56 ==
Hi Canonical,

Can you please apply this patch in 16.04.x and latest 14.04.x HWE?
It is accepted in linux-4.8-rc5 [1].

Thanks,

[1] 2016-08-22 "powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)"
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/arch/powerpc/kernel/pci-common.c?id=refs/tags/v4.8-rc5

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-143294 severity-critical targetmilestone-inin14045
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Tim Gardner (timg-tpi) wrote :

I think this is a duplicate, or at least bug #1618151 requires the same commit.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.