curtin discovers HP /dev/cciss/c0d0 incorrectly

Bug #1263181 reported by Chris Stratford
54
This bug affects 10 people
Affects Status Importance Assigned to Milestone
curtin
Fix Released
Medium
Unassigned
curtin (Ubuntu)
Fix Released
Medium
Unassigned
Trusty
Fix Released
Medium
Unassigned
Vivid
Fix Released
Medium
Unassigned

Bug Description

=== Begin SRU Template ===

[Description]
Installing to an HP DL385, or any other system that has drives in a "HP Smart Array" will fail, showing:
  lsblk: /dev/cciss!c0d0: not a block device

This is due to curtin's fairly innocent understanding of device name paths.

The fix is both
  - better parsing of /sys to get information on device names and partitions that are on them.
  - specific handling of the very odd named '!ccis' devices.

[Impact]
Installation via curtin can not be done on these systems.

[Test Case]
Positive test case:
  Install via maas and curtin onto a system with a HP Smart Array.
  Successful installation and boot is success anything else is failure.

Negative test case:
  Install on a sytem without these devices and expect it to boot.

[Regression Potential]
Curtin's understanding of /dev/ and /sys generally improved. However, there are possibly still corner cases.
It seems unlikely that anything that worked before would fail with these changes.

[Other]
Related bugs:
  * bug 1401190 : curtin makes assumptions about partition names on all devices
=== End SRU Template ===

Trying the MaaS fast-installer against an HP DL385, it fails to run one of the scripts with:

lsblk: /dev/cciss!c0d0: not a block device

This looks to be due to the way the device is reported to lsblk:

lsblk --noheadings --bytes --pairs --out=ALIGNMENT,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,FSTYPE,GROUP,KNAME,LABEL,LOG-SEC,MAJ:MIN,MIN-IO,MODE,MODEL,MOUNTPOINT,NAME,OPT-IO,OWNER,PHY-SEC,RM,RO,ROTA,RQ-SIZE,SIZE,STATE,TYPE,UUID --nodeps
ALIGNMENT="0" DISC-ALN="0" DISC-GRAN="0" DISC-MAX="0" DISC-ZERO="0" FSTYPE="" GROUP="root" KNAME="cciss!c0d0" LABEL="" LOG-SEC="512" MAJ:MIN="104:0" MIN-IO="512" MODE="" MODEL="LOGICAL VOLUME " MOUNTPOINT="" NAME="cciss!c0d0" OPT-IO="0" OWNER="root" PHY-SEC="512" RM="0" RO="0" ROTA="1" RQ-SIZE="128" SIZE="440432713728" STATE="" TYPE="disk" UUID=""
ALIGNMENT="0" DISC-ALN="0" DISC-GRAN="0" DISC-MAX="0" DISC-ZERO="0" FSTYPE="" GROUP="cdrom" KNAME="sr0" LABEL="" LOG-SEC="512" MAJ:MIN="11:0" MIN-IO="512" MODE="brw-rw----" MODEL="CD-ROM TS-L162C" MOUNTPOINT="" NAME="sr0" OPT-IO="0" OWNER="root" PHY-SEC="512" RM="1" RO="0" ROTA="1" RQ-SIZE="128" SIZE="1073741312" STATE="running" TYPE="rom" UUID=""
ALIGNMENT="0" DISC-ALN="0" DISC-GRAN="0" DISC-MAX="0" DISC-ZERO="0" FSTYPE="ext4" GROUP="disk" KNAME="sda" LABEL="cloudimg-rootfs" LOG-SEC="512" MAJ:MIN="8:0" MIN-IO="512" MODE="brw-rw----" MODEL="VIRTUAL-DISK " MOUNTPOINT="/media/root-ro" NAME="sda" OPT-IO="0" OWNER="root" PHY-SEC="512" RM="0" RO="1" ROTA="1" RQ-SIZE="128" SIZE="1476395008" STATE="running" TYPE="disk" UUID="35938370-3a39-42a9-bf68-7ce0a75e4316"

The device name should be /dev/cciss/c0d0 so I guess the device name parsing needs special treatment for HP cciss devices.

Related bugs:
  * bug 1401190 : curtin makes assumptions about partition names on all devices

Revision history for this message
Chris Stratford (chris-gondolin) wrote :

After attempting to hack around the cciss!c0d0 problem I discovered another issue: The partition command/function appears to assume that a device, sda has partitions called sda1, sda2, etc. For HP cciss devices this isn't the case, c0d0's partitions are c0d0p1, c0d0p2, etc. (note the extra "p")

Revision history for this message
Scott Moser (smoser) wrote :

Thanks for the good bug report.
Can you do a 'find /sys/block/' and attach ?

I Agree that the conversion of "NAME" output of lsblk to device path (/dev/XXXX) is simplistic. It seems that we could could possibly account for this specific case by replacing '!' with '/'. However, I assume that other parts of the code are doing basename and that would cause issues.

I don't know of a simple consistent, *good* way to convert lsblk device to "entry in /dev". Possibly I could go looking for devices that matched the given MAJ:MIN.

Then, there is the issue of 'p1'. One way seems to be looking for /dev/block/(basename(device)/basename(device)p1 or falling back to basename(device)1. Another would be to use the heuristic of "if it ends in a digit, then use 'pX' rather than just 'X'.

Changed in curtin:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Chris Stratford (chris-gondolin) wrote :

ubuntu@scandium:~$ find /sys/block/
/sys/block/
/sys/block/ram0
/sys/block/ram1
/sys/block/ram2
/sys/block/ram3
/sys/block/ram4
/sys/block/ram5
/sys/block/ram6
/sys/block/ram7
/sys/block/ram8
/sys/block/ram9
/sys/block/ram10
/sys/block/ram11
/sys/block/ram12
/sys/block/ram13
/sys/block/ram14
/sys/block/ram15
/sys/block/loop0
/sys/block/loop1
/sys/block/loop2
/sys/block/loop3
/sys/block/loop4
/sys/block/loop5
/sys/block/loop6
/sys/block/loop7
/sys/block/fd0
/sys/block/cciss!c0d0
/sys/block/sr0

And, in case it's useful:

$ cat /proc/partitions
major minor #blocks name

 104 0 430110072 cciss/c0d0
 104 1 421892096 cciss/c0d0p1
 104 2 1 cciss/c0d0p2
 104 5 8215552 cciss/c0d0p5
  11 0 1048575 sr0

Revision history for this message
Abassett (abassett) wrote :

Chris, I've hit the same problem. I'm not trying to install to the device, but I think it's just being scanned. Were you able to find a workaround?

JuanJo Ciarlante (jjo)
tags: added: canonical-bootstack
Revision history for this message
JuanJo Ciarlante (jjo) wrote :

I was able to make it work by hacking around curtin files as per this
live diff (against installed files):
  http://paste.ubuntu.com/8564279/
line63-66 there are the only 'cciss' specific ones, and should be
reworked to scan /sys/class/block/ instead.

tags: added: cts
Revision history for this message
Rob Thomas (xrobau) wrote :

Well, I just confirmed that this is still a problem., and the paste, above, is a bit stale, and needed some manual patching to get it in place:

oot@t0:/# patch -p0 < ~/hp.patch
patching file /usr/lib/python2.7/dist-packages/curtin/commands/install.py
Hunk #1 succeeded at 334 with fuzz 1 (offset 68 lines).
patching file /usr/lib/python2.7/dist-packages/curtin/commands/block_meta.py
Hunk #2 FAILED at 93.
1 out of 2 hunks FAILED -- saving rejects to file /usr/lib/python2.7/dist-packages/curtin/commands/block_meta.py.rej
patching file /usr/lib/python2.7/dist-packages/curtin/block/__init__.py
Hunk #1 succeeded at 25 (offset 1 line).
Hunk #2 FAILED at 61.
Hunk #3 succeeded at 132 (offset 1 line).
1 out of 3 hunks FAILED -- saving rejects to file /usr/lib/python2.7/dist-packages/curtin/block/__init__.py.rej
patching file /usr/lib/curtin/helpers/common
Hunk #1 FAILED at 43.
Hunk #2 succeeded at 191 (offset 107 lines).
Hunk #3 FAILED at 117.
Hunk #4 succeeded at 250 (offset 86 lines).
2 out of 4 hunks FAILED -- saving rejects to file /usr/lib/curtin/helpers/common.rej

Revision history for this message
Rob Thomas (xrobau) wrote :

Here's a tidied up patch

http://pastebin.com/VahpFT66

Revision history for this message
Rob Thomas (xrobau) wrote :

Curtin was just updated, and is still suffering from this bug. The patch is REALLY self explanatory, surely it can't be that hard to merge?

Revision history for this message
Rob Thomas (xrobau) wrote :

Still broken.

Unexpected error while running command.
Command: ['lsblk', '--noheadings', '--bytes', '--pairs', '--output=ALIGNMENT,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,FSTYPE,GROUP,KNAME,LABEL,LOG-SEC,MAJ:MIN,MIN-IO,MODE,MODEL,MOUNTPOINT,NAME,OPT-IO,OWNER,PHY-SEC,RM,RO,ROTA,RQ-SIZE,SIZE,STATE,TYPE,UUID', '/dev/cciss!c0d1']
Exit code: 1
Reason: -
Stdout: ''
Stderr: u'lsblk: /dev/cciss!c0d1: not a block device\n'
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'simple']
Exit code: 3
Reason: -
Stdout: "Unexpected error while running command.\nCommand: ['lsblk', '--noheadings', '--bytes', '--pairs', '--output=ALIGNMENT,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,FSTYPE,GROUP,KNAME,LABEL,LOG-SEC,MAJ:MIN,MIN-IO,MODE,MODEL,MOUNTPOINT,NAME,OPT-IO,OWNER,PHY-SEC,RM,RO,ROTA,RQ-SIZE,SIZE,STATE,TYPE,UUID', '/dev/cciss!c0d1']\nExit code: 1\nReason: -\nStdout: ''\nStderr: u'lsblk: /dev/cciss!c0d1: not a block device\\n'\n"
Stderr: ''

Revision history for this message
Rob Thomas (xrobau) wrote :

The previous patch still applies, with a bit of fuzz.

I honestly can't understand why this patch hasn't been applied? This bug breaks __every single HP server ever made__. By not merging this in, you CAN NOT use HP machines - be it blades, or standalone servers, with MAAS.

Revision history for this message
Scott Moser (smoser) wrote :

Hi,
  I commented in the MP at https://code.launchpad.net/~niedbalski/ubuntu/vivid/curtin/fix-1263181/+merge/250163
  Feedback there would be appreciated.

Revision history for this message
Narinder Gupta (narindergupta) wrote :

There is wrong assumption that all HP servers are affected due to this bug. HP cciss driver is very old and used to be only for G1/G2/G3/G4 series of server which HP does not support anymore.

While all HP latest servers starting from G5/G6/G7/Gen8 and Gen9 have hpsa driver which gives the device name as sda/sdb etc.. rather than cciss. And Canonical certify those servers with MAAS and Trusty which uses hpsa driver rather than cciss driver.

Revision history for this message
James Troup (elmo) wrote : Re: [Bug 1263181] Re: curtin discovers HP /dev/cciss/c0d0 incorrectly

Narinder Gupta <email address hidden> writes:

> While all HP latest servers starting from G5/G6/G7/Gen8 and Gen9
> have hpsa driver which gives the device name as sda/sdb etc.. rather
> than cciss. And Canonical certify those servers with MAAS and Trusty
> which uses hpsa driver rather than cciss driver.

| root@wildcherry:~# mount | head -n 1
| /dev/cciss/c0d0p1 on / type ext3 (rw,noatime,errors=remount-ro)
| root@wildcherry:~# lshw -C system
| wildcherry
| description: Rack Mount Chassis
| product: ProLiant DL585 G5 (500924-421)
               ^^^^^^^^^^^^^^^^^
| vendor: HP

[...]

| root@wildcherry:~#

Regardless, I certainly never said *all* HP servers were affected by
this. I believe I said "a large swathe of HP servers". While I
recognise these servers are no longer supported by HP, they are still
widely deployed and in use. I apologise if my choice of wording
caused confusion or was misleading but I don't think it really changes
whether or not we should fix the bug.

--
James

Revision history for this message
Dave Chiluk (chiluk) wrote :

I agree with elmo. Even though these servers are older they are still in use. If they aren't in production deployments they are definitely still being used in test deployments (where old hardware goes to die).

Revision history for this message
Firl (celpa-firl) wrote :

This looks to be effecting my dl 380 g5 system with a p400 controller

Revision history for this message
Scott Moser (smoser) wrote :

this is believed to be fix-released in wily.
I'd appreciate someone telling me if it is not working.

Changed in curtin:
status: Triaged → Fix Committed
Changed in curtin (Ubuntu):
status: New → Fix Released
importance: Undecided → Medium
Changed in curtin (Ubuntu Trusty):
status: New → Confirmed
Changed in curtin (Ubuntu Vivid):
status: New → Confirmed
Changed in curtin (Ubuntu Trusty):
importance: Undecided → Medium
Changed in curtin (Ubuntu Vivid):
importance: Undecided → Medium
Scott Moser (smoser)
description: updated
Revision history for this message
eduardo serrano (eduardoutez) wrote :

hi

hey Scott Moser (smoser)

Any questions?

I have the problem with CCISS
Do you fix the error achievement MAAS with modified Curtin?
Do I have to download the package curtin you have in your profile and install it manually?

Revision history for this message
eduardo serrano (eduardoutez) wrote :

I have 6 teams ProLiant DL360 G5.
but I can not move forward with landscape as it does not recognize my team.
require your support please

Revision history for this message
eduardo serrano (eduardoutez) wrote :

Scott Moser

i run sudo apt-get install Curtin and install 0.1.0 ~ ~ 14.04.1 bzr201-0ubuntu1
but sciss problem persists MAAS still detected in one (1GB) of storage.
and I can not advance to use juju Landscape

ubuntu 14.04.2
MAAS 1.7

    - lshw:node:
              id: storage
              claimed: true
              class: storage
              handle: SCSI:02
              - lshw:description:
                RAID bus controller
              - lshw:product:
                Smart Array Controller
              - lshw:vendor:
                Hewlett-Packard Company
              - lshw:physid:
                0
              - lshw:businfo:
                pci@0000:06:00.0
              - lshw:logicalname:
                scsi2
              - lshw:version:
                01
              - lshw:width:
                units: bits
                64
              - lshw:clock:
                units: Hz
                33000000
              - lshw:configuration:
                - lshw:setting:
                  id: driver
                  value: cciss
                - lshw:setting:
                  id: latency
                  value: 0
              - lshw:capabilities:
                - lshw:capability:
                  id: storage
                - lshw:capability:
                  id: pciexpress
                  PCI Express
                - lshw:capability:
                  id: msix
                  MSI-X
                - lshw:capability:
                  id: pm
                  Power Management
                - lshw:capability:
                  id: vpd
                  Vital Product Data
                - lshw:capability:
                  id: bus_master
                  bus mastering
                - lshw:capability:
                  id: cap_list
                  PCI capabilities listing
                - lshw:capability:
                  id: rom
                  extension ROM
                - lshw:capability:
                  id: scsi-host
                  SCSI host adapter
              - lshw:resources:
                - lshw:resource:
                  type: irq
                  value: 16
                - lshw:resource:
                  type: memory
                  value: fdd00000-fddfffff
                - lshw:resource:
                  type: ioport
                  value: 4000(size=256)
                - lshw:resource:
                  type: memory
                  value: fdcf0000-fdcf0fff
                - lshw:resource:
                  type: memory
                  value: d0100000-d013ffff

Scott Moser (smoser)
description: updated
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Chris, or anyone else affected,

Accepted curtin into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr221-0ubuntu1~14.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in curtin (Ubuntu Trusty):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Chris J Arges (arges) wrote :

Hello Chris, or anyone else affected,

Accepted curtin into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr221-0ubuntu1~14.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in curtin (Ubuntu Vivid):
status: Confirmed → Fix Committed
Revision history for this message
Timo (ti-mo) wrote :

Thanks a lot, Curtin package (bzr221) from vivid-proposed works for me.
Lucky I just stumbled across this issue right after this patch was merged.

Cheers!

Scott Moser (smoser)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr221-0ubuntu1~14.04.1

---------------
curtin (0.1.0~bzr221-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * New upstream snapshot.
    - support installation to multipath devices. (LP: #1371634)
    - know that kernel version 4.2.0 maps to linux-generic-lts-wily
    - support install to arm64 systems that use UEFI for boot (LP: #1447834)
    - fix remaining usage of 'lsblk --out' rather than 'lsblk --output'
      (LP: #1386275)
    - retry 'apt-get update' on failure to avoid transient failures
      (LP: #1403133)
    - run udevadm settle before unmounting /dev in a target to avoid transient
      failures (LP: #1462139)
    - fixes and additions to tools used in development.
    - Add --no-nvram to the grub-install command for UEFI. (LP: #1311827)
    - avoid race condition and transient failure due busy device in mkfs
      (LP: #1443542)
    - improvements to device and partition naming code which allow installation
      devices with HP cciss smart array drives(LP: #1401190, #1263181)
    - do not consider devices < 1G as installable targets
  * debian/README.source fix doc on how to create new upstream snapshots

 -- Scott Moser <email address hidden> Wed, 24 Jun 2015 14:31:14 -0400

Changed in curtin (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for curtin has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr221-0ubuntu1~14.10.1

---------------
curtin (0.1.0~bzr221-0ubuntu1~14.10.1) vivid-proposed; urgency=medium

  * New upstream snapshot.
    - support installation to multipath devices. (LP: #1371634)
    - know that kernel version 4.2.0 maps to linux-generic-lts-wily
    - support install to arm64 systems that use UEFI for boot (LP: #1447834)
    - fix remaining usage of 'lsblk --out' rather than 'lsblk --output'
      (LP: #1386275)
    - retry 'apt-get update' on failure to avoid transient failures
      (LP: #1403133)
    - run udevadm settle before unmounting /dev in a target to avoid transient
      failures (LP: #1462139)
    - fixes and additions to tools used in development.
    - Add --no-nvram to the grub-install command for UEFI. (LP: #1311827)
    - avoid race condition and transient failure due busy device in mkfs
      (LP: #1443542)
    - improvements to device and partition naming code which allow installation
      devices with HP cciss smart array drives(LP: #1401190, #1263181)
    - do not consider devices < 1G as installable targets
  * debian/README.source fix doc on how to create new upstream snapshots

 -- Scott Moser <email address hidden> Wed, 24 Jun 2015 16:12:59 -0400

Changed in curtin (Ubuntu Vivid):
status: Fix Committed → Fix Released
Revision history for this message
iulianpojar (iulianpojar) wrote :

same problem with HP DL380 G5 , latest MAAS and Curtin packages

Revision history for this message
iulianpojar (iulianpojar) wrote :

how can we get a Fix for Wily ?

Revision history for this message
iulianpojar (iulianpojar) wrote :

a fix for Xenial ?

Revision history for this message
iulianpojar (iulianpojar) wrote :

An error occured handling 'cciss!c0d0': FileNotFoundError - [Errno 2] No such file or directory: '/sys/block/c0d0/holders'
[Errno 2] No such file or directory: '/sys/block/c0d0/holders'
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'custom']
Exit code: 3
Reason: -
Stdout: b"An error occured handling 'cciss!c0d0': FileNotFoundError - [Errno 2] No such file or directory: '/sys/block/c0d0/holders'\n[Errno 2] No such file or directory: '/sys/block/c0d0/holders'\n"
Stderr: ''

Revision history for this message
Stefan Berg (stefandberg) wrote :

Same for me

Error: /dev/cciss/c0d0: unrecognised disk label
An error occured handling 'cciss!c0d0': OSError - [Errno 2] No such file or directory: '/sys/block/c0d0/holders'
[Errno 2] No such file or directory: '/sys/block/c0d0/holders'
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'custom']
Exit code: 3
Reason: -
Stdout: "Error: /dev/cciss/c0d0: unrecognised disk label\nAn error occured handling 'cciss!c0d0': OSError - [Errno 2] No such file or directory: '/sys/block/c0d0/holders'\n[Errno 2] No such file or directory: '/sys/block/c0d0/holders'\n"
Stderr: ''

Revision history for this message
Stefan Berg (stefandberg) wrote :

Will this be fixed for Ubuntu 16.04 LTS Xenial Xerus?

Revision history for this message
Wesley Wiedenmeier (wesley-wiedenmeier) wrote :

Hi iulianpojar and stefandberg,

It appears that the trace you posted recently is actually a similar but slightly different issue. Work on it is being tracked in (LP: 1562249):
https://bugs.launchpad.net/curtin/+bug/1562249

I will try to have a patch shortly, and work on getting it approved as a SRU to Xenial and Trusty, sorry about the inconvenience.

Revision history for this message
Scott Moser (smoser) wrote : Fixed in Curtin 17.1

This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.