linux: 4.6 kernel fails to boot on ppc64el multi-path system

Bug #1588421 reported by Tim Gardner
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1588421

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Stefan Bader (smb) wrote :

Initial data gathered:
- The base disks seem present
- The multipath modules, too (afaiks)
  dm_round_robin
  dm_multipath
  dm_round_robin
  scsi_dh_alua
- Trying to manually create the multipath mappings fails
  somewhere related to the ALUA device handler.
[ 526.285984] sd 0:2:0:0: alua: supports implicit TPGS
[ 526.286088] sd 0:2:0:0: alua: No device descriptors found
[ 526.286135] sd 0:2:0:0: alua: Attach failed (-22)
[ 526.286192] device-mapper: table: 252:0: multipath: error attaching hardware handler

Revision history for this message
Stefan Bader (smb) wrote :

Comparing that to the working 4.4 kernel it looks like device handler fails to receive valid data. The info should look like this:

[ 1.653229] sd 0:2:0:0: alua: supports implicit TPGS
[ 1.653395] sd 0:2:0:0: alua: port group c339 rel port c339
[ 1.653426] sd 0:2:0:0: alua: rtpg failed with 8070002
[ 1.653459] sd 0:2:0:0: alua: port group c339 state A preferred supports TOlUSNA
[ 1.653616] sd 0:2:0:0: Attached scsi generic sg13 type 0

While the exact symptoms differ, this sounds a bit like there might be some relation to the device scan problems that are seen on zSeries (LP: #1567602).

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → High
Revision history for this message
Stefan Bader (smb) wrote :

Or maybe not as 4.6 seems to fix that race. Though I found one change in the ALUA handler post 4.6 which modifies the handling of missing VPD identification (which leads to the "No device descriptors found" message):

commit fe8b9534a0a0356f8a76467e2c561194bdb53c84
Author: Hannes Reinecke <email address hidden>
Date: Fri May 6 10:34:35 2016 +0200

    scsi_dh_alua: do not fail for unknown VPD identification

    Not every device will return a useable VPD identification, but still
    might support ALUA. Rather than disable ALUA support we should be
    allowing the device identification to be empty and attach individual
    ALUA device handler to each devices.

Revision history for this message
Stefan Bader (smb) wrote :

So I can confirm that above patch cherry picked into 4.6 allows to boot and configure multipath devices again (as long as there is not that UNIX=m thing in which case there won't be any disks to multipath on).

Revision history for this message
Matthew Shapiro (matthewsha) wrote :

The same error appeared in 4.6.1 for amd64 . 4.6.0 works without problem.
Matthew

Revision history for this message
Stefan Bader (smb) wrote :

This should be fixed in Xenial/Yakkety by now. Unfortunately as part of the backport for bug #1567602 (fixed in 4.4.0-31.50 and later).

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.