get_device_blkid (swift-storage-relation-joined hook) failure when inspecting in-use block devices

Bug #1567198 reported by Ryan Beisner
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Swift Storage Charm
Fix Released
High
Unassigned
swift-storage (Juju Charms Collection)
Invalid
High
Unassigned

Bug Description

On two ppc64el machines, the charm fails hard when querying a disk device which has a mounted partition:

2016-04-07 02:47:02 INFO swift-storage-relation-joined subprocess.CalledProcessError: Command '['blkid', '-s', 'UUID', u'/dev/sde']' returned non-zero exit status 2

Trusty-Liberty + HWE-W Kernel

### From the swift-proxy unit log:
2016-04-07 02:47:01 INFO juju-log swift-storage:39: Valid ensured block devices: [u'/dev/sdb', u'/dev/sdc', u'/dev/sdd', u'/dev/sde']
2016-04-07 02:47:02 DEBUG juju-log swift-storage:39: Adding device 'b7f91dec-ea54-4e2e-8aee-b9d40f02e266' with blkid='sdb' to devstore
2016-04-07 02:47:02 DEBUG juju-log swift-storage:39: Adding device 'd7349737-83a6-4fb8-8fdd-f8768a0ce027' with blkid='sdc' to devstore
2016-04-07 02:47:02 DEBUG juju-log swift-storage:39: Adding device 'cc76982f-6c08-449c-8549-a3b749bbef88' with blkid='sdd' to devstore
2016-04-07 02:47:02 INFO swift-storage-relation-joined Traceback (most recent call last):
2016-04-07 02:47:02 INFO swift-storage-relation-joined File "/var/lib/juju/agents/unit-swift-storage-z1-0/charm/hooks/swift-storage-relation-joined", line 202, in <module>
2016-04-07 02:47:02 INFO swift-storage-relation-joined main()
2016-04-07 02:47:02 INFO swift-storage-relation-joined File "/var/lib/juju/agents/unit-swift-storage-z1-0/charm/hooks/swift-storage-relation-joined", line 194, in main
2016-04-07 02:47:02 INFO swift-storage-relation-joined hooks.execute(sys.argv)
2016-04-07 02:47:02 INFO swift-storage-relation-joined File "/var/lib/juju/agents/unit-swift-storage-z1-0/charm/hooks/charmhelpers/core/hookenv.py", line 717, in execute
2016-04-07 02:47:02 INFO swift-storage-relation-joined self._hooks[hook_name]()
2016-04-07 02:47:02 INFO swift-storage-relation-joined File "/var/lib/juju/agents/unit-swift-storage-z1-0/charm/hooks/swift-storage-relation-joined", line 128, in swift_storage_relation_joined
2016-04-07 02:47:02 INFO swift-storage-relation-joined remember_devices(devs)
2016-04-07 02:47:02 INFO swift-storage-relation-joined File "/var/lib/juju/agents/unit-swift-storage-z1-0/charm/hooks/lib/swift_storage_utils.py", line 383, in remember_devices
2016-04-07 02:47:02 INFO swift-storage-relation-joined blk_uuid = get_device_blkid("/dev/%s" % (dev))
2016-04-07 02:47:02 INFO swift-storage-relation-joined File "/var/lib/juju/agents/unit-swift-storage-z1-0/charm/hooks/lib/swift_storage_utils.py", line 362, in get_device_blkid
2016-04-07 02:47:02 INFO swift-storage-relation-joined blk_uuid = subprocess.check_output(['blkid', '-s', 'UUID', dev])
2016-04-07 02:47:02 INFO swift-storage-relation-joined File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
2016-04-07 02:47:02 INFO swift-storage-relation-joined raise CalledProcessError(retcode, cmd, output=output)
2016-04-07 02:47:02 INFO swift-storage-relation-joined subprocess.CalledProcessError: Command '['blkid', '-s', 'UUID', u'/dev/sde']' returned non-zero exit status 2
2016-04-07 02:47:02 ERROR juju.worker.uniter.operation runhook.go:107 hook "swift-storage-relation-joined" failed: exit status 1

### From the metal unit (manual repro):
ubuntu@gengar-ppc64:~$ sudo blkid -s UUID /dev/sda
/dev/sda: UUID="9d31bdbc-8092-4fe2-912e-dd6db4e6c86d"
ubuntu@gengar-ppc64:~$ sudo blkid -s UUID /dev/sdb
/dev/sdb: UUID="b7f91dec-ea54-4e2e-8aee-b9d40f02e266"
ubuntu@gengar-ppc64:~$ sudo blkid -s UUID /dev/sdc
/dev/sdc: UUID="d7349737-83a6-4fb8-8fdd-f8768a0ce027"
ubuntu@gengar-ppc64:~$ sudo blkid -s UUID /dev/sdd
/dev/sdd: UUID="cc76982f-6c08-449c-8549-a3b749bbef88"
ubuntu@gengar-ppc64:~$ sudo blkid -s UUID /dev/sde
ubuntu@gengar-ppc64:~$ echo $?
2

ubuntu@gengar-ppc64:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 264.3G 0 disk
sdb 8:16 0 264.3G 0 disk /srv/node/sdb
sdc 8:32 0 264.3G 0 disk /srv/node/sdc
sdd 8:48 0 264.3G 0 disk /srv/node/sdd
sde 8:64 0 264.3G 0 disk
├─sde1 8:65 0 8M 0 part
└─sde2 8:66 0 264.3G 0 part /
sr0 11:0 1 1024M 0 rom

...

### Confirmed same blkid behavior on a different unit with slightly different storage layout:
ubuntu@loudred-ppc64:~$ sudo blkid -s UUID /dev/sda
ubuntu@loudred-ppc64:~$ echo $?
2
ubuntu@loudred-ppc64:~$ sudo blkid -s UUID /dev/sdb
/dev/sdb: UUID="3c766875-511e-41c6-9143-453f4c669a6e"
ubuntu@loudred-ppc64:~$ sudo blkid -s UUID /dev/sdc
ubuntu@loudred-ppc64:~$ echo $?
2
ubuntu@loudred-ppc64:~$ sudo blkid -s UUID /dev/sdd
/dev/sdd: UUID="e4e8cb19-c716-4dbd-a1e9-dec9be1b9f94"
ubuntu@loudred-ppc64:~$ sudo blkid -s UUID /dev/sde
/dev/sde: UUID="c9db39ea-f0db-4919-baa8-1102eedf906f"

ubuntu@loudred-ppc64:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 264.3G 0 disk
└─sda1 8:1 0 264.3G 0 part /
sdb 8:16 0 264.3G 0 disk
sdc 8:32 0 264.3G 0 disk
└─sdc1 8:33 0 8M 0 part
sdd 8:48 0 264.3G 0 disk
sde 8:64 0 264.3G 0 disk
sr0 11:0 1 1024M 0 rom

Revision history for this message
Ryan Beisner (1chb1n) wrote :

I think get_device_blkid() could do with a Try/Except instead of the If/Then:

https://github.com/openstack/charm-swift-storage/blob/master/lib/swift_storage_utils.py#L361

Revision history for this message
Ryan Beisner (1chb1n) wrote :

This is also impacting multi-lpar deploys on s390x.

tags: added: s390x
Changed in swift-storage (Juju Charms Collection):
status: New → Confirmed
importance: Undecided → High
milestone: none → 16.10
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Manually confirmed that blkid exits 2 on all disks in all lpars that we have in use at the moment.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Xenial for the s390x ^

James Page (james-page)
Changed in swift-storage (Juju Charms Collection):
milestone: 16.10 → 17.01
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

Shouldn't blkid _not_ fail to query a disk ID with a mounted partition? On my local machine, it exits 0:

$ blkid /dev/sda
$ echo $?
0

James Page (james-page)
Changed in charm-swift-storage:
importance: Undecided → High
status: New → Confirmed
Changed in swift-storage (Juju Charms Collection):
status: Confirmed → Invalid
Revision history for this message
Edward Hope-Morley (hopem) wrote :

I just hit this when adding an additional device (xenial 17.08):

...
unit-swift-storage-z2-0: 21:17:26 DEBUG unit.swift-storage-z2/0.config-changed blk_uuid = get_device_blkid("/dev/%s" % (dev))
unit-swift-storage-z2-0: 21:17:26 DEBUG unit.swift-storage-z2/0.config-changed File "/var/lib/juju/agents/unit-swift-storage-z2-0/charm/hooks/lib/swift_storage_utils.py", line 367, in get_device_blkid
unit-swift-storage-z2-0: 21:17:26 DEBUG unit.swift-storage-z2/0.config-changed blk_uuid = subprocess.check_output(['blkid', '-s', 'UUID', dev])
unit-swift-storage-z2-0: 21:17:26 DEBUG unit.swift-storage-z2/0.config-changed File "/usr/lib/python2.7/subprocess.py", line 574, in check_output
unit-swift-storage-z2-0: 21:17:26 DEBUG unit.swift-storage-z2/0.config-changed raise CalledProcessError(retcode, cmd, output=output)
unit-swift-storage-z2-0: 21:17:26 DEBUG unit.swift-storage-z2/0.config-changed subprocess.CalledProcessError: Command '['blkid', '-s', 'UUID', u'/dev/vdc']' returned non-zero exit status 2
unit-swift-storage-z2-0: 21:17:26 ERROR juju.worker.uniter.operation hook "config-changed" failed: exit status 1

Lets get this fixed.

Changed in charm-swift-storage:
milestone: none → 17.11
Changed in swift-storage (Juju Charms Collection):
milestone: 17.01 → none
tags: added: stable-backport
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-swift-storage (master)

Reviewed: https://review.openstack.org/503849
Committed: https://git.openstack.org/cgit/openstack/charm-swift-storage/commit/?id=656e79da1854bee7db063644d0b2c0ce0e8955fd
Submitter: Jenkins
Branch: master

commit 656e79da1854bee7db063644d0b2c0ce0e8955fd
Author: Edward Hope-Morley <email address hidden>
Date: Wed Sep 13 16:00:09 2017 -0600

    Catch blkid error when device is not yet formatted

    When a new device is added to the ring we first try to
    identify whether the device is already in the ring by
    polling for an fs uuid. If the device has never been
    used this is expected to fail so lets catch the error.

    Also fixes log message.

    Change-Id: I20354dedfa27a6b8dec92828cabb50a20d0d8838
    Closes-Bug: 1567198

Changed in charm-swift-storage:
status: Confirmed → Fix Committed
James Page (james-page)
Changed in charm-swift-storage:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.