HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000 00004502

Bug #1592990 reported by Ron Argent on 2016-06-15
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Critical
Dimitri John Ledkov
debian-installer (Ubuntu)
Critical
Dimitri John Ledkov
s390-tools (Ubuntu)
Critical
Dimitri John Ledkov
Xenial
Critical
Dimitri John Ledkov

Bug Description

zVM v6.3 zBC12 V7000 FICON SAN048B

Installed Ubuntu on zBC12 with FICON attached (via SAN048B switches) using instructions at https://wiki.ubuntu.com/S390X/Installation%20In%20zVM.

Installation build completed OK but IPL fails (after initial boot options appear and timeout to default) with following error message:
HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000 00004502

A trace was taken using:

#CP TR SSCH RUN
#CP IPL <ipldev>
#cp tr end

The resulting trace appears to fail at the same point in the trace on every IPL.
Several builds have been attempted on different VMs with identical symptoms.

Related branches

Ron Argent (ron-argent) wrote :
tags: added: s390x
Frank Heimes (frank-heimes) wrote :

Hi Ron, thanks for opening this ticket.
The wiki page you mentioned covers an installation with dasd disks, in this case it's a zFCP/SCSI installation, so some additional information would be helpful:
I assume SCSI disks are used directly and no emulated devices (EDEV), right?
Please can you provide the 'user direct' of the guest and the output of the following CP commands:
#CP Q V FCP
#CP Q LOADDEV
#CP Q V DA
And are there any special kernel boot / zipl parameters in use (like cio_ignore) ?

It is a while since I tried to read those dumps.
It seems sequential CCW'S in the beginning and then switching to repeat the same CCW/SSH group all the time.

Since the one that eventually fails doesn't seem to be different of the ones before - and those repeat - I wonder if Linux is already in error/recovery retry mode before to then give up eventually.

Also I'd be curious what your vdev 0100 actually is - since it is FCP only setup.
It that an EDEV or is it the bit of initial setup of virtual hba?
Anyway attaching an output of "Q V" and a "Q EDEV ALL" and describing your disk setup in general so we could try to recreate might help.

Finally a dump of the stopped system might help to indicate its last status.
 Prep your dump disk, then after crash
 cp cpu all stop
 cp cpu 0 store status
 cp ipl <yourdumpdisk>
But given you already attached a trace you clearly know what you are doing and can use the way to dump that you personally prefer.

@Frank/Dann - I almost expect we will need to get a dump sooner or later while handling this (depending on what the queries already tell us). In that case we need a way the reporter can share files with us up to (depends on the dump process) the size of system memory. Please try to prepare a way (or share if you have one already) for that to happen.

Changed in ubuntu-z-systems:
status: New → Triaged
importance: Undecided → High
Download full text (3.7 KiB)

Hi Frank the minidisk is carved out of one large EDEV and it should be an FBA installation, not an FCP/SCSI installation.

Although it is a SCSI disk it is attached as an emulated device which is carved into FBA minidisks.

Output from the User Directory is shown below.

0232 PEEK A0 V 80 Trunc=80 Size=5 Line=0 Col=1 Alt=0
File C02UBU2 DIRECT from DIRMAINT at TESTVM Format is NETDATA.
* * * Top of File * * *
USER C02UBU2 LBYONLY 2G 2G G 06140150

   INCLUDE COGLIN1 06140150

   MDISK 0100 9336 140000032 20000000 TSTM02 06140150

   MDISK 0191 9336 200032 200000 TSTM02 W 06140151

*DVHOPT LNK0 LOG1 RCM1 SMS0 NPW1 LNGAMENG PWC20160614 CRCsª

Output from commands requested
CP Q V FCP
HCPQFC040E Device FCP does not exist

CP Q LOADDEV
HCPFCL2824I No LOADDEV parameters are currently defined

CP Q V DA
DASD 0100 9336 TSTM02 R/W 20000000 BLK ON DASD 1006 SUBCHANNEL = 0000
DASD 0190 9336 TSTRES R/O 308160 BLK ON DASD 1002 SUBCHANNEL = 0009
DASD 0191 9336 TSTM02 R/W 200000 BLK ON DASD 1006 SUBCHANNEL = 0001
DASD 019D 9336 TSTRES R/O 420480 BLK ON DASD 1002 SUBCHANNEL = 000A
DASD 019E 9336 TSTRES R/O 720000 BLK ON DASD 1002 SUBCHANNEL = 000B

I hope this helps

Regards

Ron Argent
Director
TES Enterprise Solutions

Mobile: +44 (0)7999-576926
Office:  +44 (0)1695-712664
<email address hidden>
www.tes-es.com

West Lancashire Investment Centre, Maple View, White Moss Business Park, Skelmersdale, Lancs. WN8 9TG. Registered office: 10 Western Road, Romford, Essex England RM1 3JT. Registered in England & Wales. Company no. 08308394

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Frank Heimes
Sent: 16 June 2016 07:18
To: Ron Argent
Subject: [Bug 1592990] Re: HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000 00004502

Hi Ron, thanks for opening this ticket.
The wiki page you mentioned covers an installation with dasd disks, in this case it's a zFCP/SCSI installation, so some additional information would be helpful:
I assume SCSI disks are used directly and no emulated devices (EDEV), right?
Please can you provide the 'user direct' of the guest and the output of the following CP commands:
#CP Q V FCP
#CP Q LOADDEV
#CP Q V DA
And are there any special kernel boot / zipl parameters in use (like cio_ignore) ?

--
You received this bug notification because you are subscribed to the bug report.
https://bugs.launchpad.net/bugs/1592990

Title:
  HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000
  00004502

Status in Ubuntu on IBM z Systems:
  New

Bug description:
  zVM v6.3 zBC12 V7000 FICON SAN048B

  Installed Ubuntu on zBC12 with FICON attached (via SAN048B switches)
  using instructions at
  https://wiki.ubuntu.com/S390X/Installation%20In%20zVM.

  Installation build completed OK but IPL fails (after initial boot options appear and timeout to default) with following error message:
  HCPGIR450W CP entered; di...

Read more...

Ron Argent (ron-argent) wrote :

As i understand the v700 connection - it is FCP encaptulated in FICON. So I assuem that the actual transmission from zVM to the channel subsystem (the trace output) would be FICON protocol not FCP. The actual device would see the FCP after it has been stripped of the FICON wrapper by the switch.
If I am correct in the above - does the Ubuntu build support this? Is this where the problem lies!?

**************************************************************************

I am adding an update from another site (I hope that's within the rules) - it could be useful

From: Linux on 390 Port [mailto:<email address hidden>] On Behalf Of Stefan Haberland
>
> ======================================================================
> ==========
>> -> 0000000000002590 SSCH B2339000 000000000000FF20 CC 0
>> SCH 0000 DEV 0100
>> CPA 00008410 PARM 00000000 KEY 0 FPI 80 LPM 80
>> VDEV 0100 CCW 66400001 00000080 STS 0E
>> -> 0000000000002590 SSCH B2339000 000000000000FF20 CC 3
>> Start subchannel failed
> :
>> HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000
> 00004502
>
> Hmmmm.... What the heck is CCW op code X'66'? Something took the device
> offline or detached it from the guest. A real I/O error would have been
> reflected to the OPERATOR's console.

I do not know where the CCW 0x66 comes from. The bootloader only uses 0x06, 0x41, 0x42, 0x43 and 0x63 for FBA.
Also I was unable to reproduce this issue on my system but one of our tester reported he was able to do so.
I will try to investigate what happens.

Regards
Stefan

Ron Argent (ron-argent) wrote :

Hi Christian - this is my first time through raising a problem in this way. I'm still not sure when to use eMail or attach directly to the Bug 1592990 thread.

Have you seen the latest update I entered on the thread. Do you still need me to take a system dump?

Ron Argent
Director
TES Enterprise Solutions

Mobile: +44 (0)7999-576926
Office:  +44 (0)1695-712664
<email address hidden>
www.tes-es.com

West Lancashire Investment Centre, Maple View, White Moss Business Park, Skelmersdale, Lancs. WN8 9TG. Registered office: 10 Western Road, Romford, Essex England RM1 3JT. Registered in England & Wales. Company no. 08308394

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of ChristianEhrhardt
Sent: 16 June 2016 07:45
To: Ron Argent
Subject: [Bug 1592990] Re: HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000 00004502

It is a while since I tried to read those dumps.
It seems sequential CCW'S in the beginning and then switching to repeat the same CCW/SSH group all the time.

Since the one that eventually fails doesn't seem to be different of the ones before - and those repeat - I wonder if Linux is already in error/recovery retry mode before to then give up eventually.

Also I'd be curious what your vdev 0100 actually is - since it is FCP only setup.
It that an EDEV or is it the bit of initial setup of virtual hba?
Anyway attaching an output of "Q V" and a "Q EDEV ALL" and describing your disk setup in general so we could try to recreate might help.

Finally a dump of the stopped system might help to indicate its last status.
 Prep your dump disk, then after crash
 cp cpu all stop
 cp cpu 0 store status
 cp ipl <yourdumpdisk>
But given you already attached a trace you clearly know what you are doing and can use the way to dump that you personally prefer.

--
You received this bug notification because you are subscribed to the bug report.
https://bugs.launchpad.net/bugs/1592990

Title:
  HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000
  00004502

Status in Ubuntu on IBM z Systems:
  New

Bug description:
  zVM v6.3 zBC12 V7000 FICON SAN048B

  Installed Ubuntu on zBC12 with FICON attached (via SAN048B switches)
  using instructions at
  https://wiki.ubuntu.com/S390X/Installation%20In%20zVM.

  Installation build completed OK but IPL fails (after initial boot options appear and timeout to default) with following error message:
  HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000 00004502

  A trace was taken using:

  #CP TR SSCH RUN
  #CP IPL <ipldev>
  #cp tr end

  The resulting trace appears to fail at the same point in the trace on every IPL.
  Several builds have been attempted on different VMs with identical symptoms.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1592990/+subscriptions

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2016.0.7640 / Virus Database: 4604/12430 - Release Date: 06/16/16

Hi Ron,
never mind - we are a friendly bunch of people.
The truth is - mail and web are the same.

If you reply on the Mail - it will become a new comment on the web UI.
If you comment on the WEB UI - it becomes a mail.

The only thing to watch out usually is if answering by mail to kill all the history not needed for the comment you want to make.

That said - the dump isn't needed right now - only once one explicitly asks for it.
I think the setup info you shared was good so we can try to reproduce.
Frank who is trying to set this up might get to you with more questions.

For the protocol question - it is not FCP encapsulated in FICON - here a bit simplified:
I hope that keeps format :-)

+--------+ +--------------------+
| V7000 | | z/VM |
| | | | +-------+
| | | | | Linux |
| | SCSI | EDEV | ECKD | Guest |
| +-----------+ Emulation +---------+ |
+--------+ FCP | | VRITUAL | |
                     | | +-------+
+--------+ | |
| DS8K | | | +-------+
| | ECKD | | ECKD | Linux |
| +-----------+ +---------+ Guest |
| | FICON | | VIRTUAL | |
| | | | | |
+--------+ +--------------------+ +-------+

FCP and FICON are alternative protocols you can drive via fiber.
ECKD and SCSI are the languages that that a driver can talk to a device (one level above the FCP/FICON thing).

So with EDEVs in z/VM for Ubuntu as a Linux guest it is actually the "same" as if you have a dasd disk on a DS8K.

Although the EDEV one it might have a few special attributes / characteristics.

Ron Argent (ron-argent) wrote :
Download full text (3.5 KiB)

You made me smile.

Once upon a time - many years ago - I think I used to know what I was talking about. Then I moved into sales and 30 years of engineering was tucked away. Sometimes it likes to reach out for the sun again.

Thanks for the overview - it's nice to learn about the underlying details again.

Ron Argent
Director
TES Enterprise Solutions

Mobile: +44 (0)7999-576926
Office:  +44 (0)1695-712664
<email address hidden>
www.tes-es.com

West Lancashire Investment Centre, Maple View, White Moss Business Park, Skelmersdale, Lancs. WN8 9TG. Registered office: 10 Western Road, Romford, Essex England RM1 3JT. Registered in England & Wales. Company no. 08308394

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of ChristianEhrhardt
Sent: 16 June 2016 17:35
To: Ron Argent
Subject: [Bug 1592990] Re: HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000 00004502

For the protocol question - it is not FCP encapsulated in FICON - here a bit simplified:
I hope that keeps format :-)

+--------+ +--------------------+
| V7000 | | z/VM |
| | | | +-------+
| | | | | Linux |
| | SCSI | EDEV | ECKD | Guest |
| +-----------+ Emulation +---------+ |
+--------+ FCP | | VRITUAL | |
                     | | +-------+
+--------+ | |
| DS8K | | | +-------+
| | ECKD | | ECKD | Linux |
| +-----------+ +---------+ Guest |
| | FICON | | VIRTUAL | |
| | | | | |
+--------+ +--------------------+ +-------+

FCP and FICON are alternative protocols you can drive via fiber.
ECKD and SCSI are the languages that that a driver can talk to a device (one level above the FCP/FICON thing).

So with EDEVs in z/VM for Ubuntu as a Linux guest it is actually the "same" as if you have a dasd disk on a DS8K.

Although the EDEV one it might have a few special attributes / characteristics.

--
You received this bug notification because you are subscribed to the bug report.
https://bugs.launchpad.net/bugs/1592990

Title:
  HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000
  00004502

Status in Ubuntu on IBM z Systems:
  Triaged

Bug description:
  zVM v6.3 zBC12 V7000 FICON SAN048B

  Installed Ubuntu on zBC12 with FICON attached (via SAN048B switches)
  using instructions at
  https://wiki.ubuntu.com/S390X/Installation%20In%20zVM.

  Installation build completed OK but IPL fails (after initial boot options appear and timeout to default) with following error message:
  HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000 00004502

  A trace was taken using:

  #CP TR SSCH RUN
  #CP IPL <ipldev>
  #cp tr end

  The resulting trace appears to fail at the same ...

Read more...

Hello Ron,

I'm sorry if you already said that, but.. how are you configuring those EDEVs ?

What attributes are you using for underlaying FCP_DEVs of your EDEVs ?

For V7K storages, iirc, IBM doesn't support direct attached connections and they ask you to use ATTRIBUTE 2145 in the EDEV:

https://www.ibm.com/support/knowledgecenter/SSB27U_6.3.0/com.ibm.zvm.v630.hcpb7/sedv.htm

That is why I'm asking that.

Could you share the EDEV and its paths (FCP_DEVs) definition ?

Thank you in advance

Rafael Tinoco

Ron Argent (ron-argent) wrote :

The Ubuntu build is on device 100

q dasd
DASD 0100 9336 TSTM02 R/W 20000000 BLK ON DASD 1006 SUBCHANNEL = 0000
DASD 0190 9336 TSTRES R/O 308160 BLK ON DASD 1002 SUBCHANNEL = 0009
DASD 0191 9336 TSTM02 R/W 200000 BLK ON DASD 1006 SUBCHANNEL = 0001
DASD 019D 9336 TSTRES R/O 420480 BLK ON DASD 1002 SUBCHANNEL = 000A
DASD 019E 9336 TSTRES R/O 720000 BLK ON DASD 1002 SUBCHANNEL = 000B

EDEV 1006 TYPE FBA ATTRIBUTES 2145
  VENDOR: IBM PRODUCT: 2145 REVISION: 0000
  BLOCKSIZE: 512 NUMBER OF BLOCKS: 209715200
  PATHS:
    FCP_DEV: A000 WWPN: 500507680B217632 LUN: 0006000000000000 PREF
      CONNECTION TYPE: SWITCHED STATUS: ONLINE
    FCP_DEV: A200 WWPN: 500507680B217633 LUN: 0006000000000000 NOTPREF
      CONNECTION TYPE: SWITCHED STATUS: ONLINE
    FCP_DEV: B000 WWPN: 500507680B337632 LUN: 0006000000000000 PREF
      CONNECTION TYPE: SWITCHED STATUS: ONLINE
    FCP_DEV: B200 WWPN: 500507680B337633 LUN: 0006000000000000 NOTPREF
      CONNECTION TYPE: SWITCHED STATUS: ONLINE
  EQID: 600507640081815B90000000000000F2C2000000000C7FFFFF

q edev all
EDEV 1000 TYPE FBA ATTRIBUTES 2145
EDEV 1001 TYPE FBA ATTRIBUTES 2145
EDEV 1002 TYPE FBA ATTRIBUTES 2145
EDEV 1003 TYPE FBA ATTRIBUTES 2145
EDEV 1004 TYPE FBA ATTRIBUTES 2145
EDEV 1005 TYPE FBA ATTRIBUTES 2145
EDEV 1006 TYPE FBA ATTRIBUTES 2145
EDEV 1007 TYPE FBA ATTRIBUTES 2145
EDEV 1008 TYPE FBA ATTRIBUTES 2145
EDEV 1009 TYPE FBA ATTRIBUTES 2145
EDEV 1010 TYPE FBA ATTRIBUTES 2145
EDEV 2000 TYPE UNK ATTRIBUTES UNK
EDEV 2001 TYPE UNK ATTRIBUTES UNK
EDEV 2002 TYPE UNK ATTRIBUTES UNK
EDEV 2003 TYPE UNK ATTRIBUTES UNK
EDEV 2004 TYPE UNK ATTRIBUTES UNK
EDEV 2006 TYPE UNK ATTRIBUTES UNK

Just looked a bit with jfh who prepped a similar setup - thanks.
jfh found some issues when recreating like: "fdasd: ioctl() error -- Could not retrieve disk size."

IMHO the root cause is that the installer uses fdasd/dasdfmt for all dasd devices.
Sounds right, but isn't always :-/

If the dasd is a FBA dasd - like in this case - it is not working in the usual track/record mode, but instead in fixed blocks.

That can be checked in the install environment per disk e.g.:
~ # cat /sys/block/dasda/device/discipline
FBA

If preferred one can also use "lszdev" for that check. Start on "--list-types" on page 408 on http://public.dhe.ibm.com/software/dw/linux390/docu/l4n4dd29.pdf. From there one can find how to get device info do via lszdev. E.g. parsing output from "lszdev -i 0.0.0106", but IMO the kernel interface should be more stable and realiable.

Be Aware:
See page 159 in http://public.dhe.ibm.com/software/dw/linux390/docu/l4n4dd29.pdf
We can see that FBA devices don't support CDL/LDL formatting and also can't be formatted by Linux itself.
Instead the documentation refers to "z/VM tools".
So be aware that Linux can't "fix up" a not yet formatted FBA-dasd.

jfh can share a bit about formatting them via z/VM if needed by anybody.
As a starter one might look at http://www.vm.ibm.com/devpages/ALTMARKA/diskfmt.html

In general handling is similar, but not always the same:
Enabling (the same)
$ sudo chzdev -e 0.0.0106
FBA DASD 0.0.0106 configured

Display (the same, but watch the Type column)
$ lsdasd -a
Bus-ID Status Name Device Type BlkSz Size Blocks
==============================================================================
0.0.0200 active dasda 94:0 ECKD 4096 7042MB 1802880
0.0.0106 active dasdd 94:12 FBA 512 65535MB 134217724

Partitioning (different)
fdisk (like on scsi luns, instead of fdasd as on dasd's)

Afterwards everything is the same again for all devices as form now on it is a linux blockdev.

TODO (IMHO):
- fix installer to recognize and handle FBA-dasd's special
- add to documentation that FBA dasds have to be pre-formatted by the user via z/VM tools referring to IBM doc

Changed in ubuntu-z-systems:
status: Triaged → Confirmed
assignee: nobody → Dimitri John Ledkov (xnox)
Ron Argent (ron-argent) wrote :

Is this suggesting that there is a work-around - or would the format be part of a solution along with a fix to the installer?
If there is a workaround what disk(s) would need to be pre-formatted and could you please detail the precise steps to help ensure we get it right first time.

Many Thanks

Ron Argent

Hi Ron,
no it is not a workaround.
I added that to make it clear, that even once the installer is fixed (and maybe more).
The documentation has to hold a statement that tells users in the dasd-fba case to format the disks from z/VM as described in the IBM device driver handbook.

Ron Argent (ron-argent) wrote :

Thanks Christian.

What's the process for getting this fixed - especially timescales. This is really holding up a key project now.

Best Regards

Hi Ron,
I'm not the one working on it, just a friendly helper with System z
background.
jfh and xnox are more on it and might be able to share you that.

But for you as a temporary workaround - If you might consider not using
EDEVs ok you can go on working with native FCP disks just fine.
Which btw (I'm an old IBM performance guy) is faster and has less overhead
anyway.

So provide (virtual) HBA adapters to the guests and get it going.
If you want to go that path JFH might find a few howto's for that and share
them with you outside the scope of that bug.
On the other hand if that makes you change "too much" of your setup than
you feel comfortable you better wait for jfh/xnox getting it properly fixed.

Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Mon, Jun 20, 2016 at 6:33 PM, Ron Argent <email address hidden> wrote:

> Thanks Christian.
>
> What's the process for getting this fixed - especially timescales. This
> is really holding up a key project now.
>
> Best Regards
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1592990
>
> Title:
> HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000
> 00004502
>
> Status in Ubuntu on IBM z Systems:
> Confirmed
>
> Bug description:
> zVM v6.3 zBC12 V7000 FICON SAN048B
>
> Installed Ubuntu on zBC12 with FICON attached (via SAN048B switches)
> using instructions at
> https://wiki.ubuntu.com/S390X/Installation%20In%20zVM.
>
> Installation build completed OK but IPL fails (after initial boot
> options appear and timeout to default) with following error message:
> HCPGIR450W CP entered; disabled wait PSW 00020000 80000000 00000000
> 00004502
>
> A trace was taken using:
>
> #CP TR SSCH RUN
> #CP IPL <ipldev>
> #cp tr end
>
> The resulting trace appears to fail at the same point in the trace on
> every IPL.
> Several builds have been attempted on different VMs with identical
> symptoms.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu-z-systems/+bug/1592990/+subscriptions
>

------- Comment From <email address hidden> 2016-06-22 07:26 EDT-------
Reverse mirror of Launchpad :
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1592990

tags: added: architecture-s39064 bugnameltc-142947 severity-high targetmilestone-inin---

Just to keep the bug status up-to-date quoting associated IBM Developer Stefan Haberland from LINUX-390 Mailing list about this:

16th June:
"I do not know where the CCW 0x66 comes from. The bootloader only uses
0x06, 0x41, 0x42, 0x43 and 0x63 for FBA.
Also I was unable to reproduce this issue on my system but one of our
tester reported he was able to do so.
I will try to investigate what happens."

22nd June:
"after some debugging I found a bug in the FBA loader that seems to occur
randomly. But in fact it depends on the size of the the kernel image
and/or some offsets within it. So some kernel images might hit it, most
of them do not.
The image vmlinuz-4.4.0-24-generic I used for a reproduction has an
offset within the file that triggers the bug.
Unfortunately I am not able to recommend a workaround beside using a
different kernel image (vmlinuz-4.4.0-21-generic worked for me).

We already created a reverse mirror of launchpad 1592990. I will attach
a patch to it after I did some further testing."

------- Comment on attachment From <email address hidden> 2016-06-24 05:36 EDT-------

The FBA loader has only a limited amount of memory to build CCW requests.
Therefore larger I/O requests need to be split.
This splitting was off by one leading to the fact that one CCW request uses
memory of another data structure which in turn leads to corrupted data.
The resulting error message during IPL of a FBA device is:

Start subchannel failed
disabled wait PSW 00020000 80000000 00000000 00004502

The error might occur randomly depending on the size of the kernel image and
offsets within it.

Fix by correcting the split rule.

Signed-off-by: Stefan Haberland <email address hidden>

tags: added: targetmilestone-inin1404
removed: targetmilestone-inin---
Changed in s390-tools (Ubuntu):
status: New → In Progress
Changed in s390-tools (Ubuntu Xenial):
status: New → In Progress
Changed in s390-tools (Ubuntu):
importance: Undecided → Critical
Changed in s390-tools (Ubuntu Xenial):
importance: Undecided → Critical
Changed in s390-tools (Ubuntu):
assignee: nobody → Dimitri John Ledkov (xnox)
Changed in s390-tools (Ubuntu Xenial):
assignee: nobody → Dimitri John Ledkov (xnox)
Changed in debian-installer (Ubuntu):
status: New → In Progress
Changed in debian-installer (Ubuntu Xenial):
status: New → In Progress
Changed in debian-installer (Ubuntu):
importance: Undecided → Critical
Changed in debian-installer (Ubuntu Xenial):
importance: Undecided → Critical
Changed in debian-installer (Ubuntu):
assignee: nobody → Dimitri John Ledkov (xnox)
Changed in debian-installer (Ubuntu Xenial):
assignee: nobody → Dimitri John Ledkov (xnox)
Changed in ubuntu-z-systems:
importance: High → Critical
status: Confirmed → In Progress

Default Comment by Bridge

------- Comment on attachment From <email address hidden> 2016-06-24 05:36 EDT-------

The FBA loader has only a limited amount of memory to build CCW requests.
Therefore larger I/O requests need to be split.
This splitting was off by one leading to the fact that one CCW request uses
memory of another data structure which in turn leads to corrupted data.
The resulting error message during IPL of a FBA device is:

Start subchannel failed
disabled wait PSW 00020000 80000000 00000000 00004502

The error might occur randomly depending on the size of the kernel image and
offsets within it.

Fix by correcting the split rule.

Signed-off-by: Stefan Haberland <email address hidden>

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package s390-tools - 1.34.0-0ubuntu11

---------------
s390-tools (1.34.0-0ubuntu11) yakkety; urgency=medium

  * zipl: Import off-by-one patch to resolve failure to start subchannel
    in FBA loader. LP: #1592990.

 -- Dimitri John Ledkov <email address hidden> Fri, 24 Jun 2016 12:41:30 +0100

Changed in s390-tools (Ubuntu):
status: In Progress → Fix Released

I'm not sure how to read the last update " This bug was fixed in the package s390-tools - 1.34.0-0ubuntu11"

Does it mean that the bug has just been fixed - or that it was fixed in the past?

How do I integrate the fix into the install procedure in https://wiki.ubuntu.com/S390X/Installation%20In%20zVM

I've just checked the ports.ubuntu.com repository and nothing seems to have changed there.

Thanks in advance

Dimitri John Ledkov (xnox) wrote :

@ron-argent

Please familiarise yourself with launchpad and Ubuntu bug tracking process, and specifically bugs that affect multiple releases and multiple packages.

s390-tools, which has zipl component is fixed released in Yakkety Yak (development series to become 16.10) which means zipl there is now fixed.

This bug is marked as affecting debian-installer, that is installer needs to be rebuild to include the updated s390-tools package. Once this is done, there will be another message saying that this is fix released for the debian-installer for the Yakkety Yak development series.

Thus bug is marked as affeecting xenial series too, which is 16.04 LTS stable. This means that the affected packages will have to follow through Stable Release Updates procedure to backport these fixes into 16.04 series.

Yes, ports.ubuntu.com is updated -> it has the new s390-tools package, for yakkety/ and devel/ series. But that's only one out of four updates that need to be built and published. A more user-friendly website to look up packages per series is http://packages.ubuntu.com/search?keywords=s390-tools

For more information about Ubuntu bug tracker on Launchpad please see -

https://wiki.ubuntu.com/StableReleaseUpdates

https://help.ubuntu.com/community/ReportingBugs

https://wiki.ubuntu.com/Bugs/Bug%20statuses

https://wiki.ubuntu.com/Bugs/Importance

https://wiki.ubuntu.com/Bugs

In summary, one out of five tasks are done to complete resolve reported issue. For further advice, support and guidance please contact your z Ubuntu Advantage representative.

Regards,

Dimitri.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package debian-installer - 20101020ubuntu462

---------------
debian-installer (20101020ubuntu462) yakkety; urgency=medium

  * Rebuild with updated s390-tools. LP: #1592990.

 -- Dimitri John Ledkov <email address hidden> Sat, 25 Jun 2016 04:29:49 +0100

Changed in debian-installer (Ubuntu):
status: In Progress → Fix Released
Dimitri John Ledkov (xnox) wrote :

[ubuntu/xenial-proposed] s390-tools 1.34.0-0ubuntu8.2 (Waiting for approval)

s390-tools (1.34.0-0ubuntu8.2) xenial; urgency=medium

  * zipl: Import off-by-one patch to resolve failure to start subchannel
    in FBA loader. LP: #1592990.

Date: Tue, 28 Jun 2016 10:51:39 +0100
Changed-By: Dimitri John Ledkov <email address hidden>
Maintainer: Ubuntu Developers <email address hidden>
https://launchpad.net/ubuntu/+source/s390-tools/1.34.0-0ubuntu8.2

==

 OK: s390-tools_1.34.0.orig.tar.bz2
 OK: s390-tools_1.34.0-0ubuntu8.2.debian.tar.xz
 OK: s390-tools_1.34.0-0ubuntu8.2.dsc
     -> Component: main Section: admin

Upload Warnings:
Redirecting ubuntu xenial to ubuntu xenial-proposed.
This upload awaits approval by a distro manager

Hello Ron, or anyone else affected,

Accepted s390-tools into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/s390-tools/1.34.0-0ubuntu8.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in s390-tools (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed

Install retried using the files from ports.ubuntu.com/dists/xenial-proposed/main/installer-s390x/current/images/generic dated 16th June 08:25.

Installation was successful - and the IPL worked! We now have Ubuntu installed under zVM on our zBC12

Thanks guys for providing the fix :-)

Dimitri John Ledkov (xnox) wrote :

Removing d-i/xenial task, as it's not required. zipl is not shipped in the udeb, and instead is executed from the installed system, and thus only s390-tools needs to be fixed in the -updates pocket.

no longer affects: debian-installer (Ubuntu Xenial)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
bugproxy (bugproxy) on 2016-07-05
tags: added: targetmilestone-inin1604
removed: targetmilestone-inin1404
Dimitri John Ledkov (xnox) wrote :

Tested with FBA disk install.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package s390-tools - 1.34.0-0ubuntu8.2

---------------
s390-tools (1.34.0-0ubuntu8.2) xenial; urgency=medium

  * zipl: Import off-by-one patch to resolve failure to start subchannel
    in FBA loader. LP: #1592990.

 -- Dimitri John Ledkov <email address hidden> Tue, 28 Jun 2016 10:51:39 +0100

Changed in s390-tools (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for s390-tools has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

------- Comment From <email address hidden> 2016-08-02 05:28 EDT-------
s390-tools - 1.34.0-0ubuntu8.2 is available per xenial-updates and contains the fix. Closing this bug as per previous comments.

Default Comment by Bridge

------- Comment on attachment From <email address hidden> 2016-06-24 05:36 EDT-------

The FBA loader has only a limited amount of memory to build CCW requests.
Therefore larger I/O requests need to be split.
This splitting was off by one leading to the fact that one CCW request uses
memory of another data structure which in turn leads to corrupted data.
The resulting error message during IPL of a FBA device is:

Start subchannel failed
disabled wait PSW 00020000 80000000 00000000 00004502

The error might occur randomly depending on the size of the kernel image and
offsets within it.

Fix by correcting the split rule.

Signed-off-by: Stefan Haberland <email address hidden>

Default Comment by Bridge

Default Comment by Bridge

Default Comment by Bridge

Default Comment by Bridge

Default Comment by Bridge

Default Comment by Bridge

Default Comment by Bridge

Default Comment by Bridge

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers