Ubuntu:talclp1: Kdump failed with multipath disk

Bug #1635597 reported by bugproxy on 2016-10-21
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
High
Canonical Kernel Team
linux (Ubuntu)
High
Canonical Kernel Team
Trusty
High
Unassigned
Xenial
High
Unassigned
Zesty
High
Unassigned
makedumpfile (Ubuntu)
High
Canonical Kernel Team
Trusty
High
Canonical Kernel Team
Xenial
High
Thadeu Lima de Souza Cascardo
Zesty
High
Canonical Kernel Team

Bug Description

[Impact]
When the target device where to dump the kernel is under a multipath configuration, dumping will fail, possibly leaving the system stuck in the kdump kernel.
The fix is to include some scsi device handlers needed for the multipath setup inside the initramfs image that is used by kdump.
All modules currently loaded in the system are included.

[Test Case]
Setting up kdump to target a multipath device using an appropriate storage that requires such scsi_dh modules and triggering a crash will demonstrate that kdump fails.
After the fix, it works fine.

[Regression Potential]
If a bug is introduced, loading kdump might fail, and a crash will not be generated. A worse regression that might be considered is the system is stuck in such a kdump kernel and needs to be rebooted locally (and the crash file is not generated either). But since this is what we are trying to fix, we don't expect other systems to break. This didn't happen on a small (less than 1GiB of RAM) x86 VM, though.

Problem Description
==========================
On talclp1, I enabled kdump. But kdump failed and it drop to BusyBox.

root@talclp1:~# echo c> /proc/sysrq-trigger
[ 132.643690] sysrq: SysRq : Trigger a crash
[ 132.643739] Unable to handle kernel paging request for data at address 0x00000000
[ 132.643745] Faulting instruction address: 0xc0000000005c28f4
[ 132.643749] Oops: Kernel access of bad area, sig: 11 [#1]
[ 132.643753] SMP NR_CPUS=2048 NUMA pSeries
[ 132.643758] Modules linked in: fuse ufs qnx4 hfsplus hfs minix ntfs msdos jfs rpadlpar_io rpaphp rpcsec_gss_krb5 nfsv4 dccp_diag cifs nfs dns_resolver dccp tcp_diag fscache udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc xfs libcrc32c pseries_rng rng_core ghash_generic gf128mul vmx_crypto sg nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache crc32c_generic btrfs xor raid6_pq dm_round_robin sr_mod sd_mod cdrom ses enclosure scsi_transport_sas ibmveth crc32c_vpmsum ipr scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath dm_mod
[ 132.643819] CPU: 49 PID: 10174 Comm: bash Not tainted 4.8.0-15-generic #16-Ubuntu
[ 132.643824] task: c000000111767080 task.stack: c0000000d82e0000
[ 132.643828] NIP: c0000000005c28f4 LR: c0000000005c39d8 CTR: c0000000005c28c0
[ 132.643832] REGS: c0000000d82e3990 TRAP: 0300 Not tainted (4.8.0-15-generic)
[ 132.643836] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28242422 XER: 00000001
[ 132.643848] CFAR: c0000000000087d0 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: c0000000005c39d8 c0000000d82e3c10 c000000000f67b00 0000000000000063
GPR04: c00000011d04a9b8 c00000011d05f7e0 c00000047fb00000 0000000000015998
GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000001
GPR12: c0000000005c28c0 c000000007b4b900 ffffffffffffffff 0000000022000000
GPR16: 0000000010170dc8 000001002b566368 0000000010140f58 00000000100c7570
GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608
GPR24: 00003ffffe87a294 0000000000000001 c000000000ebff60 0000000000000004
GPR28: c000000000ec0320 0000000000000063 c000000000e72a90 0000000000000000
[ 132.643906] NIP [c0000000005c28f4] sysrq_handle_crash+0x34/0x50
[ 132.643911] LR [c0000000005c39d8] __handle_sysrq+0xe8/0x280
[ 132.643914] Call Trace:
[ 132.643917] [c0000000d82e3c10] [c000000000a245e8] 0xc000000000a245e8 (unreliable)
[ 132.643923] [c0000000d82e3c30] [c0000000005c39d8] __handle_sysrq+0xe8/0x280
[ 132.643928] [c0000000d82e3cd0] [c0000000005c4188] write_sysrq_trigger+0x78/0xa0
[ 132.643935] [c0000000d82e3d00] [c0000000003ad770] proc_reg_write+0xb0/0x110
[ 132.643941] [c0000000d82e3d50] [c00000000030fc3c] __vfs_write+0x6c/0xe0
[ 132.643946] [c0000000d82e3d90] [c000000000311144] vfs_write+0xd4/0x240
[ 132.643950] [c0000000d82e3de0] [c000000000312e5c] SyS_write+0x6c/0x110
[ 132.643957] [c0000000d82e3e30] [c0000000000095e0] system_call+0x38/0x108
[ 132.643961] Instruction dump:
[ 132.643963] 38425240 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 3949ba60
[ 132.643972] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6
[ 132.643981] ---[ end trace eed6bbcd2c3bdfdf ]---
[ 132.646105]
[ 132.646176] Sending IPI to other CPUs
[ 132.647490] IPI complete
I'm in purgatory
 -> smp_release_cpus()
spinning_secondaries = 104
 <- smp_release_cpus()
[ 2.011346] alg: hash: Test 1 failed for crc32c-vpmsum
[ 2.729254] sd 0:2:0:0: [sda] Assuming drive cache: write through
[ 2.731554] sd 1:2:5:0: [sdn] Assuming drive cache: write through
[ 2.739087] sd 1:2:4:0: [sdm] Assuming drive cache: write through
[ 2.739089] sd 1:2:6:0: [sdo] Assuming drive cache: write through
[ 2.739110] sd 1:2:7:0: [sdp] Assuming drive cache: write through
[ 2.739115] sd 1:2:0:0: [sdi] Assuming drive cache: write through
[ 2.739122] sd 1:2:3:0: [sdl] Assuming drive cache: write through
[ 2.739123] sd 1:2:2:0: [sdk] Assuming drive cache: write through
[ 2.739148] sd 1:2:1:0: [sdj] Assuming drive cache: write through
[ 2.748938] sd 0:2:1:0: [sdb] Assuming drive cache: write through
[ 2.748939] sd 0:2:7:0: [sdh] Assuming drive cache: write through
[ 2.748940] sd 0:2:6:0: [sdg] Assuming drive cache: write through
[ 2.748942] sd 0:2:2:0: [sdc] Assuming drive cache: write through
[ 2.748958] sd 0:2:5:0: [sdf] Assuming drive cache: write through
[ 2.748963] sd 0:2:4:0: [sde] Assuming drive cache: write through
[ 2.748978] sd 0:2:3:0: [sdd] Assuming drive cache: write through
[ 2.999087] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.119912] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.252513] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.343680] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.381234] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 3.419515] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.474587] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 3.482188] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.531439] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 3.552824] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.594489] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 3.619222] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.672208] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.680298] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 3.731718] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.761333] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 3.794955] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.819212] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 3.871913] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.889439] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 3.922620] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 3.960707] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 4.002959] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 4.035611] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 4.054476] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 4.092241] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 4.099432] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 4.182358] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 4.182823] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 4.234767] device-mapper: table: 254:1: multipath: error attaching hardware handler
[ 4.333309] device-mapper: table: 254:0: multipath: error attaching hardware handler
[ 4.402827] device-mapper: table: 254:0: multipath: error attaching hardware handler

Gave up waiting for root device. Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT! UUID=853769e5-1dc5-41be-a689-b430320d207f does not exist. Dropping to a shell!

BusyBox v1.22.1 (Ubuntu 1:1.22.0-19ubuntu2) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)

== Comment: #7 - Vaishnavi Bhat <email address hidden> - 2016-10-07 05:37:53 ==
The blkid output does not show any device with UUID=853769e5-1dc5-41be-a689-b430320d207f
which is the root device used in the kexec command line (from kdump-config show)
kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.8.0-15-generic root=UUID=853769e5-1dc5-41be-a689-b430320d207f ro xmon=on splash quiet irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

Hence the kdump kernel is failing to boot here.

== Comment: #11 - Xue Sheng Li <email address hidden> - 2016-10-17 01:54:56 ==
recreated with -24 kernel.

root@talclp1:~# echo c > /proc/sysrq-trigger
[ 72.655416] sysrq: SysRq : Trigger a crash
[ 72.655458] Unable to handle kernel paging request for data at address 0x00000000
[ 72.655463] Faulting instruction address: 0xc00000000069d148
[ 72.655469] Oops: Kernel access of bad area, sig: 11 [#1]
[ 72.655472] SMP NR_CPUS=2048 NUMA pSeries
[ 72.655477] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 nfsv4 nfs cifs fscache binfmt_misc xfs pseries_rng vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs xor raid6_pq dm_round_robin ses enclosure scsi_transport_sas bnx2x ipr mdio libcrc32c crc32c_vpmsum scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[ 72.655521] CPU: 25 PID: 9730 Comm: bash Not tainted 4.8.0-24-generic #26-Ubuntu
[ 72.655525] task: c0000001d8451e00 task.stack: c0000001d8494000
[ 72.655529] NIP: c00000000069d148 LR: c00000000069e198 CTR: c00000000069d120
[ 72.655534] REGS: c0000001d84979f0 TRAP: 0300 Not tainted (4.8.0-24-generic)
[ 72.655537] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28242222 XER: 00000001
[ 72.655549] CFAR: c000000000008750 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: c00000000069e198 c0000001d8497c70 c000000001476700 0000000000000063
GPR04: c00000047e64aca0 c00000047e65fb40 c00000047df00000 0000000000015ed8
GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000001
GPR12: c00000000069d120 c000000007b3e100 ffffffffffffffff 0000000022000000
GPR16: 0000000010170dc8 0000010036d36398 0000000010140f58 00000000100c7570
GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608
GPR24: 00003ffff5582464 0000000000000001 c00000000138e6a0 0000000000000004
GPR28: c00000000138ea60 0000000000000063 c000000001342590 0000000000000000
[ 72.655608] NIP [c00000000069d148] sysrq_handle_crash+0x28/0x30
[ 72.655613] LR [c00000000069e198] __handle_sysrq+0xe8/0x280
[ 72.655616] Call Trace:
[ 72.655619] [c0000001d8497c70] [c00000000069e178] __handle_sysrq+0xc8/0x280 (unreliable)
[ 72.655625] [c0000001d8497d10] [c00000000069e8ec] write_sysrq_trigger+0x6c/0x90
[ 72.655631] [c0000001d8497d40] [c0000000003a9568] proc_reg_write+0x88/0xd0
[ 72.655637] [c0000001d8497d70] [c00000000030c40c] __vfs_write+0x3c/0x70
[ 72.655642] [c0000001d8497d90] [c00000000030d674] vfs_write+0xd4/0x240
[ 72.655647] [c0000001d8497de0] [c00000000030f1c8] SyS_write+0x68/0x110
[ 72.655652] [c0000001d8497e30] [c000000000009584] system_call+0x38/0xec
[ 72.655656] Instruction dump:
[ 72.655658] 60000000 60000000 3c4c00de 384295e0 7c0802a6 60000000 3d22001a 3949c8e0
[ 72.655667] 39200001 912a0000 7c0004ac 39400000 <992a0000> 4e800020 3c4c00de 384295b0
[ 72.655677] ---[ end trace 43b490f085103bf5 ]---
[ 72.659366]
[ 72.659429] Sending IPI to other CPUs
[ 72.660740] IPI complete
I'm in purgatory
 -> smp_release_cpus()
spinning_secondaries = 104
 <- smp_release_cpus()
[ 1.699068] ibmveth 30000002 (unnamed net_device) (uninitialized): unable to change IPv4 checksum offload settings. 1 rc=4
[ 1.699093] ibmveth 30000002 (unnamed net_device) (uninitialized): unable to change IPv6 checksum offload settings. 1 rc=4
[ 1.699101] ibmveth 30000002 (unnamed net_device) (uninitialized): unable to change tso settings. 1 rc=4
[ 2.657700] sd 0:2:1:0: [sdb] Assuming drive cache: write through
[ 2.657701] sd 0:2:0:0: [sda] Assuming drive cache: write through
[ 2.657781] sd 0:2:2:0: [sdc] Assuming drive cache: write through
[ 2.660641] sd 0:2:7:0: [sdh] Assuming drive cache: write through
[ 2.667731] sd 0:2:4:0: [sde] Assuming drive cache: write through
[ 2.677685] sd 0:2:6:0: [sdg] Assuming drive cache: write through
[ 2.677688] sd 0:2:5:0: [sdf] Assuming drive cache: write through
[ 2.677708] sd 0:2:3:0: [sdd] Assuming drive cache: write through
[ 2.697737] sd 1:2:6:0: [sdo] Assuming drive cache: write through
[ 2.697743] sd 1:2:1:0: [sdj] Assuming drive cache: write through
[ 2.697744] sd 1:2:4:0: [sdm] Assuming drive cache: write through
[ 2.697747] sd 1:2:2:0: [sdk] Assuming drive cache: write through
[ 2.697749] sd 1:2:3:0: [sdl] Assuming drive cache: write through
[ 2.697753] sd 1:2:5:0: [sdn] Assuming drive cache: write through
[ 2.699340] sd 1:2:7:0: [sdp] Assuming drive cache: write through
[ 2.699360] sd 1:2:0:0: [sdi] Assuming drive cache: write through
[ 3.350794] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 3.471468] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 3.540387] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 3.628523] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 3.657731] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 3.733416] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 3.752066] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 3.808884] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 3.838148] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 3.919247] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 3.950262] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 3.997839] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 4.007810] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 4.082174] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 4.089411] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 4.162200] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 4.202441] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 4.252289] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 4.279870] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 4.311712] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 4.348150] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 4.402076] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 4.432069] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 4.487871] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 4.518282] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 4.573338] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 4.599280] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 4.632144] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 4.671142] device-mapper: table: 252:1: multipath: error attaching hardware handler
[ 4.713352] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 4.782117] device-mapper: table: 252:0: multipath: error attaching hardware handler
[ 4.890336] device-mapper: table: 252:0: multipath: error attaching hardware handler

== Comment: #13 - Hari Krishna Bathini <email address hidden> - 2016-10-19 16:26:57 ==
(In reply to comment #12)
> Hi Hari,
>
> Can you please take a look at this issue and suggest what would be the next
> step ?
> We are facing this issue with -24 kernel as well. Can this be a issue with
> kdump kernel that has missing multipath modules or some other issue ?
>

Hi Vaishnavi,

Necessary hardware handler modules are missing in the kdump initrd.
Here is the console log of kdump kernel that says the same:

--
Begin: Loading multipath hardware handlers ... Failure: failed to load module scsi_dh_alua.
Failure: failed to load module scsi_dh_rdac.
Failure: failed to load module scsi_dh_emc.
--

Including this modules explicitly and rebuilding initrd for kdump, able to get to a point
where makedumpfile starts to capture dump but fails with:

    "get_mem_map: Can't distinguish the memory type."

which is already tracked with bug 146571

Thanks
Hari

PS1: To explicitly add modules to kdump initrd

      1. List the necessary modules in /var/lib/kdump/initramfs-tools/modules file
      2. mkinitramfs -d /var/lib/kdump/initramfs-tools -o /var/lib/kdump/initrd.img-$kver
      3. systemctl restart kdump-tools.service

Mirroring this bug to Canonical for their inputs if to include the missing hardware modules to the kdump initrd or to proceed with the workaround.

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-146907 severity-high targetmilestone-inin---

Default Comment by Bridge

Default Comment by Bridge

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)

I think this is likely a kdump issue. It should rebuild the initrd with appropriate modules when installed.

Changed in makedumpfile (Ubuntu):
assignee: nobody → Louis Bouchard (louis-bouchard)
Louis Bouchard (louis) on 2016-11-18
Changed in linux (Ubuntu):
status: New → Confirmed
Changed in makedumpfile (Ubuntu):
status: New → Confirmed
status: Confirmed → Triaged
Changed in linux (Ubuntu):
status: Confirmed → Invalid

Default Comment by Bridge

Default Comment by Bridge

------- Comment From <email address hidden> 2017-03-16 01:49 EDT-------
Hi Canonical,
Please advice which build would have the fix for this issue ?

Thank you.

Louis Bouchard (louis) on 2017-07-06
Changed in makedumpfile (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Trusty):
status: New → Invalid
Changed in linux (Ubuntu Xenial):
status: New → Invalid
Changed in linux (Ubuntu Zesty):
status: New → Invalid
Changed in makedumpfile (Ubuntu Trusty):
assignee: nobody → Louis Bouchard (louis)
Changed in makedumpfile (Ubuntu Xenial):
assignee: nobody → Louis Bouchard (louis)
Changed in makedumpfile (Ubuntu Zesty):
assignee: nobody → Louis Bouchard (louis)
Changed in ubuntu-power-systems:
status: New → In Progress

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in makedumpfile (Ubuntu Trusty):
status: New → Confirmed
Changed in makedumpfile (Ubuntu Xenial):
status: New → Confirmed
Changed in makedumpfile (Ubuntu Zesty):
status: New → Confirmed
Louis Bouchard (louis) wrote :

Hello,

Would it be possible to test a potential fix in the following PPA :

ppa:louis/kdump-tools-multipath

I do not have the hardware to fully test a kernel dump but the partial test I have confirm that the modules are correctly loaded.

Please let me know the outcome of you tests whenever possible.

Kind regards,

Manoj Iyer (manjo) on 2017-07-19
Changed in ubuntu-power-systems:
importance: Undecided → High
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Trusty):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
Changed in linux (Ubuntu Zesty):
importance: Undecided → High
Changed in makedumpfile (Ubuntu):
importance: Undecided → High
Changed in makedumpfile (Ubuntu Trusty):
importance: Undecided → High
Changed in makedumpfile (Ubuntu Xenial):
importance: Undecided → High
Changed in makedumpfile (Ubuntu Zesty):
importance: Undecided → High
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Manoj Iyer (manjo) on 2017-08-07
tags: added: triage-a
Manoj Iyer (manjo) on 2017-08-14
Changed in makedumpfile (Ubuntu):
assignee: Louis Bouchard (louis) → David Britton (davidpbritton)
Changed in makedumpfile (Ubuntu Trusty):
assignee: Louis Bouchard (louis) → David Britton (davidpbritton)
Changed in makedumpfile (Ubuntu Xenial):
assignee: Louis Bouchard (louis) → David Britton (davidpbritton)
Changed in makedumpfile (Ubuntu Zesty):
assignee: Louis Bouchard (louis) → David Britton (davidpbritton)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.1-2

---------------
makedumpfile (1:1.6.1-2) sid; urgency=medium

  * d/kernel-postinst-generate-initrd : Add scsi_dh_* modules if in
    use so the system can dump a crash when root is on multipath
    (LP: #1635597) (Closes: 862411)

  [ dann frazier ]
  * Select appropriate /etc/default/grub.d/kdump-tools.cfg at build time.
    A side-effect of this is that kdump-tools is no longer Arch: all.
    (Closes: #863858)

 -- Louis Bouchard <email address hidden> Wed, 16 Aug 2017 16:10:01 +0200

Changed in makedumpfile (Ubuntu):
status: In Progress → Fix Released
Changed in linux (Ubuntu):
status: Invalid → New
Changed in linux (Ubuntu Trusty):
status: Invalid → New
Changed in linux (Ubuntu Xenial):
status: Invalid → New
Changed in linux (Ubuntu Zesty):
status: Invalid → New
Manoj Iyer (manjo) on 2017-08-23
Changed in makedumpfile (Ubuntu):
assignee: David Britton (davidpbritton) → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
Download full text (6.6 KiB)

------- Comment From <email address hidden> 2017-08-23 08:54 EDT-------
Retested kdump today (23rd Aug 2017) on Ubuntu1610 and kdump hangs still:
-------------------------
root@thymelp3:~# echo c> /proc/sysrq-trigger
[ 1314.534126] sysrq: SysRq : Trigger a crash
[ 1314.534139] Unable to handle kernel paging request for data at address 0x00000000
[ 1314.534143] Faulting instruction address: 0xc0000000006a2428
[ 1314.534147] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1314.534150] SMP NR_CPUS=2048 NUMA pSeries
[ 1314.534154] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache binfmt_misc pseries_rng vmx_crypto sunrpc ip_tables x_tables autofs4 dm_round_robin btrfs xor raid6_pq lpfc crc32c_vpmsum be2net scsi_transport_fc scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[ 1314.534177] CPU: 9 PID: 3421 Comm: bash Not tainted 4.8.0-59-generic #64-Ubuntu
[ 1314.534181] task: c0000003efc25200 task.stack: c0000000fb970000
[ 1314.534184] NIP: c0000000006a2428 LR: c0000000006a3478 CTR: c0000000006a2400
[ 1314.534187] REGS: c0000000fb9739f0 TRAP: 0300 Not tainted (4.8.0-59-generic)
[ 1314.534190] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28222222 XER: 00000001
[ 1314.534198] CFAR: c000000000008750 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: c0000000006a3478 c0000000fb973c70 c000000001467500 0000000000000063
GPR04: c0000003ff64aca0 c0000003ff65fb40 c0000003ff380000 00000000000080a0
GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000001
GPR12: c0000000006a2400 c000000007b35100 0000000000000000 0000000022000000
GPR16: 0000000010170dc8 000001000bbf0258 0000000010140528 00000000100c6f60
GPR20: 0000000000000000 000000001017dd58 0000000010152bf0 000000001017b608
GPR24: 00003fffd72098a4 00003fffd72098a0 c00000000137e6e0 0000000000000004
GPR28: c00000000137eaa0 0000000000000063 c000000001332590 0000000000000000
[ 1314.534242] NIP [c0000000006a2428] sysrq_handle_crash+0x28/0x30
[ 1314.534246] LR [c0000000006a3478] __handle_sysrq+0xe8/0x280
[ 1314.534248] Call Trace:
[ 1314.534250] [c0000000fb973c70] [c0000000006a3458] __handle_sysrq+0xc8/0x280 (unreliable)
[ 1314.534255] [c0000000fb973d10] [c0000000006a3bcc] write_sysrq_trigger+0x6c/0x90
[ 1314.534260] [c0000000fb973d40] [c0000000003adb48] proc_reg_write+0x88/0xd0
[ 1314.534265] [c0000000fb973d70] [c0000000003105ac] __vfs_write+0x3c/0x70
[ 1314.534268] [c0000000fb973d90] [c000000000311814] vfs_write+0xd4/0x240
[ 1314.534272] [c0000000fb973de0] [c000000000313368] SyS_write+0x68/0x110
[ 1314.534276] [c0000000fb973e30] [c000000000009584] system_call+0x38/0xec
[ 1314.534279] Instruction dump:
[ 1314.534281] 60000000 60000000 3c4c00dc 38425100 7c0802a6 60000000 3d22001a 3949bc60
[ 1314.534288] 39200001 912a0000 7c0004ac 39400000 <992a0000> 4e800020 3c4c00dc 384250d0
[ 1314.534296] ---[ end trace efc32115f1d43c62 ]---
[ 1314.537099]
[ 1314.537123] Sending IPI to other CPUs
[ 1314.538149] IPI complete
I'm in purgatory
-> smp_release_cpus()
spinning_secondaries = 8
<- smp_release_cpus()
[ 0.172393] pci 001b:50:00.0: of_irq_parse_pci() failed with rc=-22
[ 0.425077] Kernel panic - not syncing: Out of memory and no killable processes.....

Read more...

bugproxy (bugproxy) wrote :
Download full text (5.9 KiB)

------- Comment From <email address hidden> 2017-08-23 09:29 EDT-------
(In reply to comment #28)
> Retested kdump today (23rd Aug 2017) on Ubuntu1610 and kdump hangs still:
> -------------------------
> root@thymelp3:~# echo c> /proc/sysrq-trigger
> [ 1314.534126] sysrq: SysRq : Trigger a crash
> [ 1314.534139] Unable to handle kernel paging request for data at address
> 0x00000000
> [ 1314.534143] Faulting instruction address: 0xc0000000006a2428
> [ 1314.534147] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 1314.534150] SMP NR_CPUS=2048 NUMA pSeries
> [ 1314.534154] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss
> nfsv4 nfs lockd grace fscache binfmt_misc pseries_rng vmx_crypto sunrpc
> ip_tables x_tables autofs4 dm_round_robin btrfs xor raid6_pq lpfc
> crc32c_vpmsum be2net scsi_transport_fc scsi_dh_emc scsi_dh_rdac scsi_dh_alua
> dm_multipath
> [ 1314.534177] CPU: 9 PID: 3421 Comm: bash Not tainted 4.8.0-59-generic
> #64-Ubuntu
> [ 1314.534181] task: c0000003efc25200 task.stack: c0000000fb970000
> [ 1314.534184] NIP: c0000000006a2428 LR: c0000000006a3478 CTR:
> c0000000006a2400
> [ 1314.534187] REGS: c0000000fb9739f0 TRAP: 0300 Not tainted
> (4.8.0-59-generic)
> [ 1314.534190] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28222222
> XER: 00000001
> [ 1314.534198] CFAR: c000000000008750 DAR: 0000000000000000 DSISR: 42000000
> SOFTE: 1
> GPR00: c0000000006a3478 c0000000fb973c70 c000000001467500 0000000000000063
> GPR04: c0000003ff64aca0 c0000003ff65fb40 c0000003ff380000 00000000000080a0
> GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000001
> GPR12: c0000000006a2400 c000000007b35100 0000000000000000 0000000022000000
> GPR16: 0000000010170dc8 000001000bbf0258 0000000010140528 00000000100c6f60
> GPR20: 0000000000000000 000000001017dd58 0000000010152bf0 000000001017b608
> GPR24: 00003fffd72098a4 00003fffd72098a0 c00000000137e6e0 0000000000000004
> GPR28: c00000000137eaa0 0000000000000063 c000000001332590 0000000000000000
> [ 1314.534242] NIP [c0000000006a2428] sysrq_handle_crash+0x28/0x30
> [ 1314.534246] LR [c0000000006a3478] __handle_sysrq+0xe8/0x280
> [ 1314.534248] Call Trace:
> [ 1314.534250] [c0000000fb973c70] [c0000000006a3458]
> __handle_sysrq+0xc8/0x280 (unreliable)
> [ 1314.534255] [c0000000fb973d10] [c0000000006a3bcc]
> write_sysrq_trigger+0x6c/0x90
> [ 1314.534260] [c0000000fb973d40] [c0000000003adb48] proc_reg_write+0x88/0xd0
> [ 1314.534265] [c0000000fb973d70] [c0000000003105ac] __vfs_write+0x3c/0x70
> [ 1314.534268] [c0000000fb973d90] [c000000000311814] vfs_write+0xd4/0x240
> [ 1314.534272] [c0000000fb973de0] [c000000000313368] SyS_write+0x68/0x110
> [ 1314.534276] [c0000000fb973e30] [c000000000009584] system_call+0x38/0xec
> [ 1314.534279] Instruction dump:
> [ 1314.534281] 60000000 60000000 3c4c00dc 38425100 7c0802a6 60000000
> 3d22001a 3949bc60
> [ 1314.534288] 39200001 912a0000 7c0004ac 39400000 <992a0000> 4e800020
> 3c4c00dc 384250d0
> [ 1314.534296] ---[ end trace efc32115f1d43c62 ]---
> [ 1314.537099]
> [ 1314.537123] Sending IPI to other CPUs
> [ 1314.538149] IPI complete
> I'm in purgatory
> -> smp_release_cpus()
> spinning_secondaries = 8
> <- smp_release_cpus()
> [ 0.1723...

Read more...

Kernel team,

makedumpfile is marked fix-released in this bug, but the series are marked as confirmed. Could you please take a look at this bug and make sure we have makedumpfile fix is applied to those series as well ?

Changed in makedumpfile (Ubuntu Trusty):
assignee: David Britton (davidpbritton) → Canonical Kernel Team (canonical-kernel-team)
Changed in makedumpfile (Ubuntu Xenial):
assignee: David Britton (davidpbritton) → Canonical Kernel Team (canonical-kernel-team)
Changed in makedumpfile (Ubuntu Zesty):
assignee: David Britton (davidpbritton) → Canonical Kernel Team (canonical-kernel-team)

I understand there is a fix in artful, as per version 1:1.6.1-2. I am doing some backport to xenial, will need someone at IBM to test it. In fact, Louis Bouchard had already asked for testing of a package on his ppa. Can someone test it? It's at ppa:louis/kdump-tools-multipath.

Regards.
Cascardo.

Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Manoj Iyer (manjo) on 2017-09-11
tags: added: triage-g
removed: triage-a

------- Comment From <email address hidden> 2017-09-11 14:27 EDT-------
Xue,

Canonical is wanting us to take if the kernel at ppa:louis/kdump-tools-multipath fixes it before applying it to Ubuntu. I need you to test it and let Canononical knows.

bugproxy (bugproxy) wrote :
Download full text (8.0 KiB)

------- Comment From <email address hidden> 2017-09-13 13:18 EDT-------
Hi

Today I tested kdump with 16.10 on talclp3
Access info :
HMC: hmc-lte2.isst.aus.stglabs.ibm.com (hscroot/abc123)

Console Access: rmvterm -m talc -p talclp3;mkvterm -m talc -p talclp3;

Logs:

root@talclp3:~# echo c > /proc/sysrq-trigger
[ 424.180480] sysrq: SysRq : Trigger a crash
[ 424.180497] Unable to handle kernel paging request for data at address 0x00000000
[ 424.180500] Faulting instruction address: 0xc0000000006a2428
[ 424.180504] Oops: Kernel access of bad area, sig: 11 [#1]
[ 424.180506] SMP NR_CPUS=2048 NUMA pSeries
[ 424.180509] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_ib(OE) pseries_rng ib_core(OE) vmx_crypto binfmt_misc dm_round_robin sunrpc dm_multipath knem(OE) ip_tables x_tables autofs4 btrfs xor raid6_pq mlx4_en(OE) ibmvfc scsi_transport_fc ibmvscsi bnx2x mlx4_core(OE) devlink mlx_compat(OE)
mdio libcrc32c be2net crc32c_vpmsum
[ 424.180541] CPU: 0 PID: 2733 Comm: bash Tainted: G OE 4.8.0-59-generic #64-Ubuntu
[ 424.180545] task: c0000000b3d78600 task.stack: c0000000a2104000
[ 424.180547] NIP: c0000000006a2428 LR: c0000000006a3478 CTR: c0000000006a2400
[ 424.180550] REGS: c0000000a21079f0 TRAP: 0300 Tainted: G OE (4.8.0-59-generic)
[ 424.180553] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28222222 XER: 00000001
[ 424.180560] CFAR: c000000000008750 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: c0000000006a3478 c0000000a2107c70 c000000001467500 0000000000000063
GPR04: c0000000bd00aca0 c0000000bd01fb40 c00000017fd2e300 000000000000b240
GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000001
GPR12: c0000000006a2400 c000000007b30000 0000000000000000 0000000022000000
GPR16: 0000000010170dc8 000001000df90258 0000000010140528 00000000100c6f60
GPR20: 0000000000000000 000000001017dd58 0000000010152bf0 000000001017b608
GPR24: 00003ffff97be144 00003ffff97be140 c00000000137e6e0 0000000000000004
GPR28: c00000000137eaa0 0000000000000063 c000000001332590 0000000000000000
[ 424.180599] NIP [c0000000006a2428] sysrq_handle_crash+0x28/0x30
[ 424.180602] LR [c0000000006a3478] __handle_sysrq+0xe8/0x280
[ 424.180604] Call Trace:
[ 424.180606] [c0000000a2107c70] [c0000000006a3458] __handle_sysrq+0xc8/0x280 (unreliable)
[ 424.180610] [c0000000a2107d10] [c0000000006a3bcc] write_sysrq_trigger+0x6c/0x90
[ 424.180614] [c0000000a2107d40] [c0000000003adb48] proc_reg_write+0x88/0xd0
[ 424.180619] [c0000000a2107d70] [c0000000003105ac] __vfs_write+0x3c/0x70
[ 424.180622] [c0000000a2107d90] [c000000000311814] vfs_write+0xd4/0x240
[ 424.180625] [c0000000a2107de0] [c000000000313368] SyS_write+0x68/0x110
[ 424.180629] [c0000000a2107e30] [c000000000009584] system_call+0x38/0xec
[ 424.180631] Instruction dump:
[ 424.180633] 60000000 60000000 3c4c00dc 38425100 7c0802a6 60000000 3d22001a 3949bc60
[ 424.180639] 39200001 912a0000 7c0004ac 39400000 <992a0000> 4e800020 3c4c00dc 384250d0
[ 424.180645] ---[ end trace 8fd1cd...

Read more...

bugproxy (bugproxy) wrote :
Download full text (6.3 KiB)

------- Comment From <email address hidden> 2017-09-13 13:50 EDT-------
(In reply to comment #36)
> Hi
>
> Today I tested kdump with 16.10 on talclp3
> Access info :
> HMC: hmc-lte2.isst.aus.stglabs.ibm.com (hscroot/abc123)
>
> Console Access: rmvterm -m talc -p talclp3;mkvterm -m talc -p talclp3;
>
> Logs:
>
> root@talclp3:~# echo c > /proc/sysrq-trigger
> [ 424.180480] sysrq: SysRq : Trigger a crash
> [ 424.180497] Unable to handle kernel paging request for data at address
> 0x00000000
> [ 424.180500] Faulting instruction address: 0xc0000000006a2428
> [ 424.180504] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 424.180506] SMP NR_CPUS=2048 NUMA pSeries
> [ 424.180509] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss
> nfsv4 nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE)
> configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE)
> mlx5_core(OE) mlx4_ib(OE) pseries_rng ib_core(OE) vmx_crypto binfmt_misc
> dm_round_robin sunrpc dm_multipath knem(OE) ip_tables x_tables autofs4 btrfs
> xor raid6_pq mlx4_en(OE) ibmvfc scsi_transport_fc ibmvscsi bnx2x
> mlx4_core(OE) devlink mlx_compat(OE)
> mdio libcrc32c be2net crc32c_vpmsum
> [ 424.180541] CPU: 0 PID: 2733 Comm: bash Tainted: G OE
> 4.8.0-59-generic #64-Ubuntu
> [ 424.180545] task: c0000000b3d78600 task.stack: c0000000a2104000
> [ 424.180547] NIP: c0000000006a2428 LR: c0000000006a3478 CTR:
> c0000000006a2400
> [ 424.180550] REGS: c0000000a21079f0 TRAP: 0300 Tainted: G OE
> (4.8.0-59-generic)
> [ 424.180553] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28222222
> XER: 00000001
> [ 424.180560] CFAR: c000000000008750 DAR: 0000000000000000 DSISR: 42000000
> SOFTE: 1
> GPR00: c0000000006a3478 c0000000a2107c70 c000000001467500 0000000000000063
> GPR04: c0000000bd00aca0 c0000000bd01fb40 c00000017fd2e300 000000000000b240
> GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000001
> GPR12: c0000000006a2400 c000000007b30000 0000000000000000 0000000022000000
> GPR16: 0000000010170dc8 000001000df90258 0000000010140528 00000000100c6f60
> GPR20: 0000000000000000 000000001017dd58 0000000010152bf0 000000001017b608
> GPR24: 00003ffff97be144 00003ffff97be140 c00000000137e6e0 0000000000000004
> GPR28: c00000000137eaa0 0000000000000063 c000000001332590 0000000000000000
> [ 424.180599] NIP [c0000000006a2428] sysrq_handle_crash+0x28/0x30
> [ 424.180602] LR [c0000000006a3478] __handle_sysrq+0xe8/0x280
> [ 424.180604] Call Trace:
> [ 424.180606] [c0000000a2107c70] [c0000000006a3458]
> __handle_sysrq+0xc8/0x280 (unreliable)
> [ 424.180610] [c0000000a2107d10] [c0000000006a3bcc]
> write_sysrq_trigger+0x6c/0x90
> [ 424.180614] [c0000000a2107d40] [c0000000003adb48] proc_reg_write+0x88/0xd0
> [ 424.180619] [c0000000a2107d70] [c0000000003105ac] __vfs_write+0x3c/0x70
> [ 424.180622] [c0000000a2107d90] [c000000000311814] vfs_write+0xd4/0x240
> [ 424.180625] [c0000000a2107de0] [c000000000313368] SyS_write+0x68/0x110
> [ 424.180629] [c0000000a2107e30] [c000000000009584] system_call+0x38/0xec
> [ 424.180631] Instruction dump:
> [ 424.180633] 60000000 60000000 3c4c00dc 38425100 7c0802a6 60000000
> 3d22001a 3949bc6...

Read more...

bugproxy (bugproxy) wrote :
Download full text (4.5 KiB)

------- Comment From <email address hidden> 2017-09-14 01:58 EDT-------
Increased the crashkernel to 512MB and triggered crash. Console logs shows multipath: error getting device. But I see dump is collected under /var/crash

I'm in purgatory
-> smp_release_cpus()
spinning_secondaries = 47
<- smp_release_cpus()
[ 0.184883] pci 002b:50:00.0: of_irq_parse_pci() failed with rc=-22
/dev/sdc2: recovering journal
/dev/sdc2: clean, 88120/2514944 files, 892667/10046464 blocks
[ 11.676763] device-mapper: table: 253:2: multipath: error getting device
[ 11.689487] device-mapper: table: 253:2: multipath: error getting device

Complete console log is below:
root@talclp3:~# dmesg | grep crash
[ 0.000000] Reserving 512MB of memory at 32MB for crashkernel (System RAM: 6144MB)
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinux-4.8.0-59-generic root=UUID=30629c5d-7ff0-48db-b2ca-7c2255d0fa18 ro splash quiet crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M@32M maxcpus=1
root@talclp3:/var/crash# echo c > /proc/sysrq-trigger
[ 93.923245] sysrq: SysRq : Trigger a crash
[ 93.923263] Unable to handle kernel paging request for data at address 0x00000000
[ 93.923266] Faulting instruction address: 0xc0000000006a2428
[ 93.923269] Oops: Kernel access of bad area, sig: 11 [#1]
[ 93.923271] SMP NR_CPUS=2048 NUMA pSeries
[ 93.923275] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_ib(OE) pseries_rng ib_core(OE) vmx_crypto binfmt_misc dm_round_robin sunrpc knem(OE) dm_multipath ip_tables x_tables autofs4 btrfs xor raid6_pq mlx4_en(OE) ibmvfc scsi_transport_fc ibmvscsi bnx2x mlx4_core(OE) devlink mlx_compat(OE) mdio libcrc32c be2net crc32c_vpmsum
[ 93.923307] CPU: 0 PID: 2665 Comm: bash Tainted: G OE 4.8.0-59-generic #64-Ubuntu
[ 93.923310] task: c0000000b3a5ce00 task.stack: c0000000b6c08000
[ 93.923313] NIP: c0000000006a2428 LR: c0000000006a3478 CTR: c0000000006a2400
[ 93.923316] REGS: c0000000b6c0b9f0 TRAP: 0300 Tainted: G OE (4.8.0-59-generic)
[ 93.923318] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28222222 XER: 00000001
[ 93.923326] CFAR: c000000000008750 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: c0000000006a3478 c0000000b6c0bc70 c000000001467500 0000000000000063
GPR04: c0000000bd00aca0 c0000000bd01fb40 c00000017fd2e300 000000000000b040
GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000001
GPR12: c0000000006a2400 c000000001b30000 0000000000000000 0000000022000000
GPR16: 0000000010170dc8 0000010033a96e18 0000000010140528 00000000100c6f60
GPR20: 0000000000000000 000000001017dd58 0000000010152bf0 000000001017b608
GPR24: 00003ffff7d5f954 00003ffff7d5f950 c00000000137e6e0 0000000000000004
GPR28: c00000000137eaa0 0000000000000063 c000000001332590 0000000000000000
[ 93.923365] NIP [c0000000006a2428] sysrq_handle_crash+0x28/0x30
[ 93.923368] LR [c0000000006a3478] __handle_sysrq+0xe8/0x280
[ 93.923370] Call Trace:
[ 93.923372] [c0000000b6c0bc70] [c0000000006a3458] _...

Read more...

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-18 06:29 EDT-------
Hi

Its is same config.Both lpars its multipath disks.With that we recreated the issue

Thanks
Lekshmi

Manoj Iyer (manjo) on 2017-09-18
Changed in ubuntu-power-systems:
status: Incomplete → Confirmed
tags: added: triage-r
removed: triage-g

By "recreated the issue", do you mean it fixes the issue? Cause in the previous message, you mentioned you could create the crash files? Was that using the proposed package from louis ppa? Can you clarify?

Thank you very much.
Cascardo.

------- Comment on attachment From <email address hidden> 2017-09-29 09:56 EDT-------

(In reply to comment #41)
> Hi
>
>
> Its is same config.Both lpars its multipath disks.With that we recreated the
> issue
>
>

Hi Lekshmi

I am not really convinced it is the same configuration looking at the output of
blkid command on the system this was tried (see attachment). The root disk in
this case does not seem to be multipath based. Also, the fix package mentioned
by Breno or Cascardo doesn't seem to have been used for this validation.
Could you please try this on the same system the issue was initially reported,
with package at ppa:louis/kdump-tools-multipath. Alternatively, I would be happy
to validate, if you can provide access to the system where the issue was
initially reported.

(In reply to comment #42)
> By "recreated the issue", do you mean it fixes the issue? Cause in the
> previous message, you mentioned you could create the crash files? Was that
> using the proposed package from louis ppa? Can you clarify?
>

Hello Canonical/Cascardo,

The issue was not seen on one of our system with kdump-tools verison 1:1.6.0-2ubuntu1.2.
Not sure if the has the fix. Neither I am sure the failure was seen on this system to start with.
Our test team will try to setup the failed configuration to validate this again..

Thanks
Hari

Changed in makedumpfile (Ubuntu Xenial):
assignee: Canonical Kernel Team (canonical-kernel-team) → Thadeu Lima de Souza Cascardo (cascardo)
description: updated
tags: added: patch
summary: - Ubuntu16.10:talclp1: Kdump failed with multipath disk
+ Ubuntu:talclp1: Kdump failed with multipath disk
tags: added: ppc64el-kdump

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.5.9-5ubuntu0.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in makedumpfile (Ubuntu Xenial):
status: Confirmed → Fix Committed
tags: added: verification-needed verification-needed-xenial
tags: added: triage-a
removed: triage-r
Manoj Iyer (manjo) on 2018-02-26
Changed in ubuntu-power-systems:
status: Confirmed → Incomplete

------- Comment From <email address hidden> 2018-02-28 02:25 EDT-------
Hi

We are now not testing Ubuntu in POWERVM as 18.04 is not supported on POWERVM machines.

We can test only with 16.04.04 ,which HST test is going on now.

But it require sometime to do that.ALso the inital system config has changed.But we can verify on a multipath disk system

Thanks
Lekshmi

Please, remember to install packages from -proposed for the testing.

Cascardo.

Manoj Iyer (manjo) on 2018-03-05
Changed in makedumpfile (Ubuntu Zesty):
status: Confirmed → Won't Fix
Changed in linux (Ubuntu Zesty):
status: New → Won't Fix
tags: added: triage-g
removed: triage-a
Andrew Cloke (andrew-cloke) wrote :

This fix has been in -proposed for approximately 80 days, and has not been verified. If it cannot be verified by the end of next week, it will be reverted.

Default Comment by Bridge

------- Comment on attachment From <email address hidden> 2017-09-29 09:56 EDT-------

(In reply to comment #41)
> Hi
>
>
> Its is same config.Both lpars its multipath disks.With that we recreated the
> issue
>
>

Hi Lekshmi

I am not really convinced it is the same configuration looking at the output of
blkid command on the system this was tried (see attachment). The root disk in
this case does not seem to be multipath based. Also, the fix package mentioned
by Breno or Cascardo doesn't seem to have been used for this validation.
Could you please try this on the same system the issue was initially reported,
with package at ppa:louis/kdump-tools-multipath. Alternatively, I would be happy
to validate, if you can provide access to the system where the issue was
initially reported.

(In reply to comment #42)
> By "recreated the issue", do you mean it fixes the issue? Cause in the
> previous message, you mentioned you could create the crash files? Was that
> using the proposed package from louis ppa? Can you clarify?
>

Hello Canonical/Cascardo,

The issue was not seen on one of our system with kdump-tools verison 1:1.6.0-2ubuntu1.2.
Not sure if the has the fix. Neither I am sure the failure was seen on this system to start with.
Our test team will try to setup the failed configuration to validate this again..

Thanks
Hari

Default Comment by Bridge

Please, make sure to use the package from xenial-proposed, not from any ppa.

Cascardo.

------- Comment From <email address hidden> 2018-03-15 14:00 EDT-------
Marking Lekshmi's note external :

I verified this bug with Ubuntu Ubuntu 16.04.4 LTS with kernel 4.13.0-37-generic with bootdisk as multipath . Its working fine and crash file is generated

Manoj Iyer (manjo) on 2018-03-19
tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
Changed in ubuntu-power-systems:
status: Incomplete → Fix Committed
Robie Basak (racb) wrote :

What version of makedumpfile was used for SRU verification please?

Łukasz Zemczak (sil2100) wrote :

Since this update is lingering in -proposed for so long, assuming from #36 context (and earlier #35 notice) that the xenial-proposed version of the package has been used. Also, the package was in -proposed for so long that if there were regressions we would have known by now.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.5.9-5ubuntu0.6

---------------
makedumpfile (1:1.5.9-5ubuntu0.6) xenial; urgency=medium

  * d/kernel-postinst-generate-initrd : Add scsi_dh_* modules if in
    use so the system can dump a crash when root is on multipath
    (LP: #1635597) (Closes: 862411)

  * KDUMP_CMDLINE_APPEND: add noirqdistrib to default command line. As it's
    only used by ppc64el, it's not required to be conditionally added.
    (LP: #1658733)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Tue, 29 Aug 2017 16:56:04 -0300

Changed in makedumpfile (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for makedumpfile has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

bugproxy (bugproxy) on 2018-03-26
tags: added: targetmilestone-inin16044
removed: targetmilestone-inin---
Manoj Iyer (manjo) on 2018-04-05
Changed in makedumpfile (Ubuntu Trusty):
status: Confirmed → Won't Fix
Changed in linux (Ubuntu Trusty):
status: New → Won't Fix
Manoj Iyer (manjo) on 2018-04-16
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.