[Ubuntu 15.04] Support firmware assisted dump on ppc64le

Bug #1415562 reported by bugproxy on 2015-01-28
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Andy Whitcroft
makedumpfile (Ubuntu)
Medium
Chris J Arges

Bug Description

Starting from POWER6, the firmware now has a capability to preserve the partition memory dump during system crash and boot into a fresh copy of the kernel with fully-reset system. This feature adds the necessary support to exploit the dump capture capability provided by Power firmware. With this feature support, the production kernel will register for firmware-assisted dump using RTAS (Runtime Abstraction Service) calls and builds required ELF header which then gets exported through '/proc/vmcore' in the second kernel after crash. This feature improves Power serviceability by making it more robust compared to current kdump mechanism on Linux.

Ubuntu 15.04 kernel already includes the necessary code for fadump. The only kernel change needed is to enable CONFIG_FA_DUMP in the kernel configuration. In addition, an update is needed for a script in kdump-tools package.

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-120986 severity-medium targetmilestone-inin1504
Luciano Chavez (lnx1138) on 2015-01-28
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: nobody → Taco Screen team (taco-screen-team)
tags: added: kernel-da-key
Chris J Arges (arges) on 2015-01-29
Changed in makedumpfile (Ubuntu):
assignee: nobody → Chris J Arges (arges)
importance: Undecided → Medium
status: New → In Progress
Chris J Arges (arges) wrote :

Another version of the above patch formatted as a vivid debdiff.

tags: added: patch
Louis Bouchard (louis) on 2015-01-29
Changed in linux (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Louis Bouchard (louis) wrote :

Hello,

I have just synchronized the debian version of makedumpfile with Ubuntu as there is no longer any delta between both. I would hate to reintroduce such a delta only for Ubuntu so I would greatly prefer to implement the fadump feature in Debian first.

I have just created a bug on the Debian BTS to track this request : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776574

I am starting work on the debian side of things

Chris J Arges (arges) wrote :

Here is a built version of the debian makedumpfile for ppc64el:
http://people.canonical.com/~arges/lp1415562/

Note that I did a few changes to the original patch:
The changes I have made were:
- Don't rename load/unload to start/stop
- Less verbose descriptions
- Minor changes to logic and removal of extra echos

If someone has a kernel with the CONFIG changes handy could they verify this package?
In addition, we should ensure that _both_ modes are possible on ppc64el (kdump/fadump).

Thanks

Chris J Arges (arges) on 2015-02-02
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)

------- Comment From <email address hidden> 2015-02-03 05:35 EDT-------
(In reply to comment #10)
> Here is a built version of the debian makedumpfile for ppc64el:
> http://people.canonical.com/~arges/lp1415562/
>
> Note that I did a few changes to the original patch:
> The changes I have made were:
> - Don't rename load/unload to start/stop
> - Less verbose descriptions
> - Minor changes to logic and removal of extra echos
>
> If someone has a kernel with the CONFIG changes handy could they verify this
> package?
> In addition, we should ensure that _both_ modes are possible on ppc64el
> (kdump/fadump).
>
> Thanks

I don't see any fadump related changes in the kdump-tools_1.5.7-5_all.deb shared.
Can you please check if the correct package is shared?

Thanks
Hari

Louis Bouchard (louis) wrote :

Indeed, that's my fault. Looks like I got mixed up somewhere. Let me rebuild a source pkg so Chris can build for ppc64el.

Sorry about that

Andy Whitcroft (apw) wrote :

Committed for the v3.19 kernel "unstable".

Changed in linux (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → Andy Whitcroft (apw)
milestone: none → ubuntu-15.02
status: Confirmed → Fix Committed
Chris J Arges (arges) wrote :

Hari,
Ok I've updated the packages here:
http://people.canonical.com/~arges/lp1415562/

Please test with these packages. Thanks,

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-02-04 05:32 EDT-------
(In reply to comment #16)
> Hari,
> Ok I've updated the packages here:
> http://people.canonical.com/~arges/lp1415562/
>
> Please test with these packages. Thanks,

I'm sorry. I don't see any packages at
http://people.canonical.com/~arges/lp1415562/

Thanks
Hari

Chris J Arges (arges) wrote :

Hari,
Sorry about that. The files are there now:
http://people.canonical.com/~arges/lp1415562/

Thanks,

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-02-04 19:20 EDT-------
Louis & Chris, thanks for your help.
The packages work as intended in both kdump & fadump modes.
Couple of observations though:

1. When doing load/unload twice in succession, echo throws an error

root@lop831:~# kdump-config load
* fadump registered successfully
root@lop831:~# kdump-config load
/usr/sbin/kdump-config: line 301: echo: write error: Invalid argument
* fadump registered successfully
root@lop831:~#
--

root@lop831:~# kdump-config unload
* fadump un-registered successfully
root@lop831:~# kdump-config unload
/usr/sbin/kdump-config: line 323: echo: write error: Invalid argument
* fadump un-registered successfully
root@lop831:~#

2. While using network dump (SSH), would be nice if we can warn the user to run
'kdump-config propagate' before 'kdump-config load', if /root/.ssh/kdump_id_rsa is missing.
This applies to both kdump & fadump modes

Thanks
Hari

On 02/04/2015 01:29 PM, bugproxy wrote:
> ------- Comment From <email address hidden> 2015-02-04 19:20 EDT-------
> Louis & Chris, thanks for your help.
> The packages work as intended in both kdump & fadump modes.
> Couple of observations though:
>
> 1. When doing load/unload twice in succession, echo throws an error
>
> root@lop831:~# kdump-config load
> * fadump registered successfully
> root@lop831:~# kdump-config load
> /usr/sbin/kdump-config: line 301: echo: write error: Invalid argument
> * fadump registered successfully
> root@lop831:~#
> --
>
> root@lop831:~# kdump-config unload
> * fadump un-registered successfully
> root@lop831:~# kdump-config unload
> /usr/sbin/kdump-config: line 323: echo: write error: Invalid argument
> * fadump un-registered successfully
> root@lop831:~#
>

The code is where we set values into the sysfs directory:
echo 1 > $sys_fadump_registered

Where:
sys_fadump_registered=/sys/kernel/fadump_registered

Hari, can you confirm if kdump-config load/unload are actually changing
these values correctly? Perhaps the code needs to be adjusted.

> 2. While using network dump (SSH), would be nice if we can warn the user to run
> 'kdump-config propagate' before 'kdump-config load', if /root/.ssh/kdump_id_rsa is missing.
> This applies to both kdump & fadump modes
>
> Thanks
> Hari
>

Louis,
This might be a good general bug/issue to look into.

Thanks,
--chris

Louis Bouchard (louis) wrote :

kdump-config propagate is a manual configuration option that needs to remain as such. It creates a passwordless ssh key and will propagate to the remote ssh server, which most probably will prompt for the remote server's password in doing so.

This cannot be done systematically. It also has a direct impact on the security of remote servers so it must be an <opt in> requirement from the user.

------- Comment From <email address hidden> 2015-02-06 15:55 EDT-------
(In reply to comment #20)
>
> On 02/04/2015 01:29 PM, bugproxy wrote:
> > Louis & Chris, thanks for your help.
> > The packages work as intended in both kdump & fadump modes.
> > Couple of observations though:
> >
> > 1. When doing load/unload twice in succession, echo throws an error
> >
> > root@lop831:~# kdump-config load
> > * fadump registered successfully
> > root@lop831:~# kdump-config load
> > /usr/sbin/kdump-config: line 301: echo: write error: Invalid argument
> > * fadump registered successfully
> > root@lop831:~#
> > --
> >
> > root@lop831:~# kdump-config unload
> > * fadump un-registered successfully
> > root@lop831:~# kdump-config unload
> > /usr/sbin/kdump-config: line 323: echo: write error: Invalid argument
> > * fadump un-registered successfully
> > root@lop831:~#
> >
>
> The code is where we set values into the sysfs directory:
> echo 1 > $sys_fadump_registered
>
> Where:
> sys_fadump_registered=/sys/kernel/fadump_registered
>
> Hari, can you confirm if kdump-config load/unload are actually changing
> these values correctly? Perhaps the code needs to be adjusted.

Chris, I see the error from cmdline as well,
while echo'ing 1/0 repeatedly to /sys/kernel/fadump_registered.

---

root@lop831:~# cat /sys/kernel/fadump_registered
0
root@lop831:~# echo 1 > /sys/kernel/fadump_registered
root@lop831:~# echo 1 > /sys/kernel/fadump_registered
echo: write error: Invalid argument
root@lop831:~#

---

root@lop831:~# cat /sys/kernel/fadump_registered
1
root@lop831:~# echo 0 > /sys/kernel/fadump_registered
root@lop831:~# echo 0 > /sys/kernel/fadump_registered
echo: write error: Invalid argument
root@lop831:~#

---

It seems to be the way this node is handled in fadump kernel code.
To avoid this error, we could adjust the kdump-config load/unload functions,
to check for the value and set the node only if needed,

for load:
if [ `cat $sys_fadump_registered` != 1 ]
echo 1 > $sys_fadump_registered

for unload:
if [ `cat $sys_fadump_registered` != 0 ]
echo 0 > $sys_fadump_registered

I hope this is ok with you..

Thanks
Hari

>
> > 2. While using network dump (SSH), would be nice if we can warn the user to run
> > 'kdump-config propagate' before 'kdump-config load', if /root/.ssh/kdump_id_rsa is missing.
> > This applies to both kdump & fadump modes
> >
> > Thanks
> > Hari
> >
>
> Louis,
> This might be a good general bug/issue to look into.
>
> Thanks,
> --chris
>
> kdump-config propagate is a manual configuration option that needs to remain
> as such. It creates a passwordless ssh key and will propagate to the remote
> ssh server, which most probably will prompt for the remote server's
> password in doing so.
>
> This cannot be done systematically. It also has a direct impact on the
> security of remote servers so it must be an <opt in> requirement from the
> user.

Chris J Arges (arges) wrote :

Posted an updated patch to debian.

Louis Bouchard (louis) wrote :

Chris,

updated source package is available at : http://people.canonical.com/~lbouchard/makedumpfile-1.5.7-5/

I have prefixed the package version so it will not cause problems once the official package comes out.

Chris J Arges (arges) wrote :

Hari,
Can you test the following? If this works, we'll get this into debian then sync into Ubuntu/vivid.
Thanks,
http://people.canonical.com/~arges/lp1415562v2/

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-02-10 09:42 EDT-------
Chris & Louis, I have tested the packages successfully for
kdump & fadump modes.

Thanks
Hari

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-02-10 10:04 EDT-------
Below is the documentation for fadump:
________________

Firmware-assisted dump

Firmware Assisted Dump (fadump) is an alternative to kdump crash dumping mechanism,
available in powerpc architecture. To understand how fadump works, please refer to
the kernel documentation below:

https://www.kernel.org/doc/Documentation/powerpc/firmware-assisted-dump.txt

Two steps are needed to use fadump as the crash dumping mechanism. Firstly,
enabling fadump by passing "fadump=on" to kernel. Secondly, registering fadump
by echo'ing `1` to /sys/kernel/fadump_registered.

1. To enable fadump:

a. Add "fadump=on" to GRUB_CMDLINE_LINUX in /etc/default/grub file.
b. Rebuild grub config
# grub-mkconfig -o /boot/gru/grub.cfg
c. Reboot

2. To register fadump:

kdump-tools, scripts and tools for automating kdump, is updated to make it fadump aware.
When fadump is enabled, kdump-tools registers fadump as crash dumping mechanism,
by echo'ing `1` to /sys/kernel/fadump_registered. For more help, see:

# kdump-config help

NOTE: If fadump fails to collect dump with Out Of Memory error, use "fadump_reserve_mem="
parameter to spike up the memory reserved for firmware-assisted dump.

____________________

Breno, please add this to
https://wiki.ubuntu.com/ppc64el/

Thanks
Hari

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.5.7-5

---------------
makedumpfile (1:1.5.7-5) experimental; urgency=medium

  * Fix panic_on_oops erratic handling
    Closes: #776582

  [ Hari Bathini <email address hidden> ]
  * Add firmware assisted dump support
    Closes: #776574, LP: #1415562

 -- Louis Bouchard <email address hidden> Thu, 30 Jan 2015 14:04:47 +0100

Changed in makedumpfile (Ubuntu):
status: In Progress → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-02-20 06:50 EDT-------
Current daily ISO builds for 15.04 still has CONFIG_FA_DUMP disabled. Any idea when this config will be enabled in 15.04 ?

# cat /boot/config-3.18.0-13-generic | grep FA_DUMP
# CONFIG_FA_DUMP is not set
#

Breno Leitão (breno-leitao) wrote :

Just checked in the in the 3.19 kernel and the CONFIG_FA_DUMP is enabled.

ubuntu@ubuntu1504:/boot$ cat /boot/config-3.19.0-6-generic | grep FA_DUMP
CONFIG_FA_DUMP=y

Breno Leitão (breno-leitao) wrote :

Tested internally at IBM and the problem is solved in kernel 3.19 from the ckt PPA.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.19.0-7.7

---------------
linux (3.19.0-7.7) vivid; urgency=low

  [ Andy Whitcroft ]

  * Release Tracking Bug
    - LP: #1426013

  [ Upstream Kernel Changes ]

  * x86/irq: Fix regression caused by commit b568b8601f05
  * cxl: Fix leaking interrupts if attach process fails
    - LP: #1415102
  * cxl: Early return from cxl_handle_fault for a shut down context
    - LP: #1415102
  * cxl: Disable AFU debug flag
    - LP: #1415102
  * cxl: Disable SPAP register when freeing SPA
    - LP: #1415102
  * cxl: remove redundant increment of hwirq
    - LP: #1415102
  * cxl: Add tracepoints
    - LP: #1415102
  * cxl: Update CXL ABI documentation
    - LP: #1415102
  * cxl: Use image state defaults for reloading FPGA
    - LP: #1415102
  * cxl: Add image control to sysfs
    - LP: #1415102
  * cxl: Enable CAPP recovery
    - LP: #1415102
  * cxl: Add ability to reset the card
    - LP: #1415102
  * cxl: Fix device_node reference counting
    - LP: #1415102
  * cxl: Export optional AFU configuration record in sysfs
    - LP: #1415102
  * cxl: Fail AFU initialisation if an invalid configuration record is
    found
    - LP: #1415102
  * cxl: Add missing return statement after handling AFU errror
    - LP: #1415102
  * powerpc/eeh: Introduce flag EEH_PE_REMOVED
    - LP: #1415102
  * powerpc/eeh: Allow to set maximal frozen times
    - LP: #1415102
  * HID: i2c-hid: Limit reads to wMaxInputLength bytes for input events
 -- Andy Whitcroft <email address hidden> Thu, 26 Feb 2015 16:00:18 +0000

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
bugproxy (bugproxy) on 2015-09-09
tags: removed: bugnameltc-120986 kernel-da-key patch severity-medium
bugproxy (bugproxy) on 2016-01-25
tags: added: bugnameltc-120986 severity-medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.