PowerNV: Restart opal-prd daemon on any kind of failure

Bug #1671019 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
High
Unassigned
skiboot (Ubuntu)
Fix Released
High
Ubuntu on IBM Power Systems Bug Triage
Xenial
Fix Released
Undecided
Steve Langasek
Yakkety
Fix Released
Undecided
Steve Langasek
Zesty
Fix Released
Undecided
Unassigned

Bug Description

[SRU Justification]
As a hardware diagnostic service, it's important that opal-prd be kept running even in the face of hardware unreliability. The most effective way to do this is with a systemd unit policy of Restart=always, to ensure the service is not allowed to accidentally die.

[Test case]
1. Install the opal-prd package on a system that has /dev/mtd0.
2. Verify that the service is running with 'systemctl status opal-prd'.
3. Get the pid of the opal-prd process from systemctl and kill it with 'sudo kill -9 $pid'.
4. Verify via 'systemctl status opal-prd' that the service is no longer running.
5. Install opal-prd from -proposed.
6. Verify via 'systemctl status opal-prd' that the service is running again.
7. Kill the new process with 'sudo kill -9 $pid'.
8. Verify via 'systemctl status opal-prd' that the service has been restarted.
9. Install the opal-prd package from -proposed on a system that does not have /dev/mtd0.
10. Verify that 'systemctl status opal-prd' shows the service is inactive, and has not been allowed to restart indefinitely after failure, driving up the system load.

[Regression potential]
Since the package may be installed on systems where opal-prd is useless and will not run, it's important to verify for each release that Restart=always doesn't cause systemd to go into a busy loop trying to restart the service under these conditions. The uploaded change should guard against this by checking for the correct path before starting the job, and the test case should further confirm this.

== Comment: #0 - VASANT HEGDE <email address hidden> - 2017-03-08 03:12:33 ==
---Problem Description---
opal-prd package contains prd daemon (Processor Runtime Diagnostics).

One of the requirements from a field service perspective is to be able
to restart opal-prd when it fails for whatever reason.

Direct systemd to restart the opal-prd service on any kind of failure.

Ubuntu contains opal-prd package. This bug is to restart daemon after failure

Contact Information = <email address hidden>

Machine Type = All Open Power Systems

Userspace tool common name: opal-prd

Userspace rpm: opal-prd

The userspace tool has the following bit modes: 64bit

== Comment: #3 - Ananth Narayan M G <email address hidden> - 2017-03-08 03:37:46 ==
Posted patch upstream for this -- https://lists.ozlabs.org/pipermail/skiboot/2017-March/006612.html

Revision history for this message
bugproxy (bugproxy) wrote : Direct systemd to always restart the daemon

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-152351 severity-high targetmilestone-inin1704
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → skiboot (Ubuntu)
summary: - Restart opal-prd daemon on any kind of failure
+ PowerNV: Restart opal-prd daemon on any kind of failure
Revision history for this message
Michael Hohnbaum (hohnbaum) wrote : Re: [Bug 1671019] [NEW] Restart opal-prd daemon on any kind of failure

Steve,

Can Foundations please look at this request. Thanks.

                   Michael

On 03/08/2017 01:49 AM, Launchpad Bug Tracker wrote:
> bugproxy (bugproxy) has assigned this bug to you for Ubuntu:
>
> == Comment: #0 - VASANT HEGDE <email address hidden> - 2017-03-08 03:12:33 ==
> ---Problem Description---
> opal-prd package contains prd daemon (Processor Runtime Diagnostics).
>
> One of the requirements from a field service perspective is to be able
> to restart opal-prd when it fails for whatever reason.
>
> Direct systemd to restart the opal-prd service on any kind of failure.
>
> Ubuntu contains opal-prd package. This bug is to restart daemon after failure
>
> Contact Information = <email address hidden>
>
> Machine Type = All Open Power Systems
>
>
> Userspace tool common name: opal-prd
>
> Userspace rpm: opal-prd
>
> The userspace tool has the following bit modes: 64bit
>
> == Comment: #3 - Ananth Narayan M G <email address hidden> - 2017-03-08 03:37:46 ==
> Posted patch upstream for this -- https://lists.ozlabs.org/pipermail/skiboot/2017-March/006612.html
>
> ** Affects: ubuntu
> Importance: Undecided
> Assignee: Taco Screen team (taco-screen-team)
> Status: New
>
>
> ** Tags: architecture-ppc64le bugnameltc-152351 severity-high targetmilestone-inin1704

--
Michael Hohnbaum
OIL Program Manager
Power (ppc64el) Development Project Manager
Canonical, Ltd.

Revision history for this message
Frédéric Bonnard (frediz) wrote :

I'm planning a new upstream release in Debian and I'm interested by this patch, but it seems it's still discussed for improvement at the moment.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-03-09 07:59 EDT-------
https://lists.ozlabs.org/pipermail/skiboot/2017-March/006612.html will be the final version of the patch. We do want rate-limiting and immediate restart. Please feel free to pick up the patch for Debian.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-03-09 09:13 EDT-------
Canonical,

Can you pick up this patch for both 16.04 LTS release and 17.04 release?

-Vasant

Revision history for this message
Steve Langasek (vorlon) wrote :

This is fine for inclusion in 17.04, but to SRU it into 16.04 we would need more information about why, specifically, this is needed. Is opal-prd unstable? Are there specific cases where it's known to crash?

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package skiboot - 5.3.3-1ubuntu1

---------------
skiboot (5.3.3-1ubuntu1) zesty; urgency=medium

  * debian/opal-prd.service: Always restart the opal-prd daemon,
    irrespective of why it stopped. Thanks to Ananth N Mavinakayanahalli
    <email address hidden>. LP: #1671019.

 -- Steve Langasek <email address hidden> Thu, 09 Mar 2017 23:21:50 -0800

Changed in skiboot (Ubuntu):
status: New → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-03-10 04:16 EDT-------
(In reply to comment #10)
> This is fine for inclusion in 17.04, but to SRU it into 16.04 we would need
> more information about why, specifically, this is needed. Is opal-prd
> unstable? Are there specific cases where it's known to crash?

opal-prd is not unstable and is not known to crash either. We need opal-prd to be running all the time on OPAL based systems for all Processor Runtime Diagnostics for POWER systems barring which, in the event of any hardware failures (Memory UEs, core/nest errors, etc)., the hardware components will not be isolated at the next IPL (reboot), causing the machine to potentially not boot successfully at all.

In order to avoid such a scenario, we would:
a. Want opal-prd to be installed and started by default on all Distributions running OPAL (Debian/Ubuntu included). https://bugs.launchpad.net/ubuntu/+source/skiboot/+bug/1555904 is intended to address this issue.
b. Want systemd to restart the daemon if it stops for whatever reason (even a kill -9 by an ignorant superuser) and that is the purpose of this bug

Manoj Iyer (manjo)
Changed in skiboot (Ubuntu):
status: Fix Released → In Progress
Revision history for this message
Manoj Iyer (manjo) wrote :

Earlier this week, I spoke with our foundations team and I heard that this is on their TODO list and they have not gotten around to SRUing this yet. I will update this bug with more current information soon.

Changed in skiboot (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
importance: Undecided → High
Changed in ubuntu-power-systems:
importance: Undecided → High
Manoj Iyer (manjo)
tags: added: ubuntu-16.04
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → In Progress
Steve Langasek (vorlon)
Changed in skiboot (Ubuntu):
status: In Progress → Fix Released
Changed in skiboot (Ubuntu Xenial):
assignee: nobody → Steve Langasek (vorlon)
milestone: none → ubuntu-16.04.3
Steve Langasek (vorlon)
Changed in skiboot (Ubuntu Zesty):
status: New → Fix Released
Steve Langasek (vorlon)
description: updated
Changed in skiboot (Ubuntu Yakkety):
status: New → In Progress
Steve Langasek (vorlon)
Changed in skiboot (Ubuntu Xenial):
status: New → In Progress
Changed in skiboot (Ubuntu Yakkety):
assignee: nobody → Steve Langasek (vorlon)
milestone: none → yakkety-updates
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted skiboot into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/skiboot/5.3.3-1ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-yakkety to verification-done-yakkety. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-yakkety. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in skiboot (Ubuntu Yakkety):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-yakkety
Changed in skiboot (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello bugproxy, or anyone else affected,

Accepted skiboot into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/skiboot/5.1.13-0ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-07-11 04:12 EDT-------
I have installed a P9 9006-22C Hardware with Ubuntu 16.04.3 OS.

root@bostonp16:~# uname -a
Linux bostonp16 4.10.0-27-generic #30~16.04.2-Ubuntu SMP Thu Jun 29 16:06:52 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
root@bostonp16:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

root@bostonp16:~# ps -ef | grep opal
root 202 2 0 01:58 ? 00:00:00 [kopald]
root 15325 1 0 02:58 ? 00:00:00 /usr/sbin/opal-prd --pnor /dev/mtd0
root 16275 16260 0 03:10 pts/0 00:00:00 grep --color=auto opal

Then I killed the PID of the opal-prd daemon and verified if it again re-spawns.

root@bostonp16:~# kill -9 15325

root@bostonp16:~# ps -ef | grep opal
root 202 2 0 01:58 ? 00:00:00 [kopald]
root 16288 1 0 03:11 ? 00:00:00 /usr/sbin/opal-prd --pnor /dev/mtd0
root 16299 16260 0 03:11 pts/0 00:00:00 grep --color=auto opal

As can be seen the opal-prd daemon restarts.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-07-11 04:51 EDT-------
I have installed a P8 8335-GTB Hardware with Ubuntu 17.04 OS.

root@ltc-garri2:~# uname -a
Linux ltc-garri2 4.10.0-24-generic #28-Ubuntu SMP Wed Jun 14 08:14:41 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
root@ltc-garri2:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="17.04 (Zesty Zapus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 17.04"
VERSION_ID="17.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=zesty
UBUNTU_CODENAME=zesty

root@ltc-garri2:~# ps -ef | grep opal
root 780 2 0 03:21 ? 00:00:00 [kopald]
root 3434 1 0 03:21 ? 00:00:00 /usr/sbin/opal-prd --pnor /dev/mtd0
root 73729 4905 0 03:48 pts/0 00:00:00 grep --color=auto opal

Then I killed the PID of the opal-prd daemon and verified if it again re-spawns.

root@ltc-garri2:~# kill -9 3434

root@ltc-garri2:~# ps -ef | grep opal
root 780 2 0 03:21 ? 00:00:00 [kopald]
root 73732 1 2 03:49 ? 00:00:00 /usr/sbin/opal-prd --pnor /dev/mtd0
root 73736 4905 0 03:49 pts/0 00:00:00 grep --color=auto opal

As can be seen the opal-prd daemon restarts.

tags: added: verification-done verification-done-xenial verification-done-yakkety
removed: verification-needed verification-needed-xenial verification-needed-yakkety
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-07-11 05:06 EDT-------
Hello,

We have verified and its working fine. Please commit changes to main repo.

-Vasant

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

I don't see any testing feedback regarding yakkety, even though it's marked as tested there. I see results of testing for zesty and xenial, but not yakkety. Should this be marked as verified for yakkety?

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package skiboot - 5.1.13-0ubuntu3

---------------
skiboot (5.1.13-0ubuntu3) xenial; urgency=medium

  * debian/opal-prd.service: set Restart=always, which is the correct policy
    for this hardware-related service in the unlikely event of crashes.
    LP: #1671019.

 -- Steve Langasek <email address hidden> Sat, 10 Jun 2017 21:34:52 -0700

Changed in skiboot (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for skiboot has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Revision history for this message
Robie Basak (racb) wrote :

This does not appear verified on Yakkety.

tags: added: verification-needed-yakkety
removed: verification-done-yakkety
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (4.0 KiB)

------- Comment From <email address hidden> 2017-07-19 10:14 EDT-------
I have installed a P8 8335-GCA Hardware with Ubuntu 16.10 OS.

root@ltc-firep1:~# uname -a
Linux ltc-firep1 4.8.0-59-generic #64-Ubuntu SMP Thu Jun 29 19:36:04 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
root@ltc-firep1:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.10 (Yakkety Yak)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.10"
VERSION_ID="16.10"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="http://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=yakkety
UBUNTU_CODENAME=yakkety

root@ltc-firep1:~# cat /proc/cpuinfo | tail
processor : 159
cpu : POWER8 (raw), altivec supported
clock : 3491.000000MHz
revision : 2.0 (pvr 004d 0200)

timebase : 512000000
platform : PowerNV
model : 8335-GCA
machine : PowerNV 8335-GCA
firmware : OPAL

Then installed the yakkety proposed opal-prd package on the same.

root@ltc-firep1:~# apt-get update
Hit:1 http://us.ports.ubuntu.com/ubuntu-ports yakkety InRelease
Hit:2 http://us.ports.ubuntu.com/ubuntu-ports yakkety-updates InRelease
Hit:3 http://ports.ubuntu.com/ubuntu-ports yakkety-security InRelease
Hit:4 http://ppa.launchpad.net/ibmpackages/pmlinux/ubuntu trusty InRelease
Hit:5 http://us.ports.ubuntu.com/ubuntu-ports yakkety-backports InRelease
Get:6 http://ports.ubuntu.com/ubuntu-ports yakkety-proposed InRelease [102 kB]
Get:7 http://ports.ubuntu.com/ubuntu-ports yakkety-proposed/main ppc64el Packages [17.6 kB]
Get:8 http://ports.ubuntu.com/ubuntu-ports yakkety-proposed/main Translation-en [11.9 kB]
Get:9 http://ports.ubuntu.com/ubuntu-ports yakkety-proposed/universe ppc64el Packages [16.6 kB]
Get:10 http://ports.ubuntu.com/ubuntu-ports yakkety-proposed/universe Translation-en [12.1 kB]
Fetched 160 kB in 1s (99.1 kB/s)
Reading package lists... Done

root@ltc-firep1:~# apt-get install opal-prd/yakkety-proposed
Reading package lists... Done
Building dependency tree
Reading state information... Done
Selected version '5.3.3-1ubuntu0.1' (Ubuntu:16.10/yakkety-proposed [ppc64el]) for 'opal-prd'
The following packages will be upgraded:
opal-prd
1 upgraded, 0 newly installed, 0 to remove and 15 not upgraded.
Need to get 36.3 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://ports.ubuntu.com/ubuntu-ports yakkety-proposed/universe ppc64el opal-prd ppc64el 5.3.3-1ubuntu0.1 [36.3 kB]
Fetched 36.3 kB in 0s (112 kB/s)
(Reading database ... 345024 files and directories currently installed.)
Preparing to unpack .../opal-prd_5.3.3-1ubuntu0.1_ppc64el.deb ...
Warning: Stopping opal-prd.service, but it can still be activated by:
opal-prd.socket
Unpacking opal-prd (5.3.3-1ubuntu0.1) over (5.3.3-1) ...
Processing triggers for ureadahead (0.100.0-19) ...
ureadahead will be reprofiled on next reboot
Processing triggers for systemd (231-9ubuntu5) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up opal-prd (5.3.3-1ubuntu0.1) ...

root@ltc-firep1:~# which opal-prd
/usr/sbin/opal-prd
root@ltc-firep1:~# dpkg -S /usr/sbin/...

Read more...

tags: added: verification-done-yakkety
removed: verification-needed-yakkety
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package skiboot - 5.3.3-1ubuntu0.1

---------------
skiboot (5.3.3-1ubuntu0.1) yakkety; urgency=medium

  * debian/opal-prd.service: set Restart=always, which is the correct policy
    for this hardware-related service in the unlikely event of crashes.
    LP: #1671019.

 -- Steve Langasek <email address hidden> Sat, 10 Jun 2017 13:06:37 -0700

Changed in skiboot (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.