juju stuck attaching storage to OSD

Bug #1778033 reported by Michał Ajduk on 2018-06-21
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
Ian Booth
2.3
Undecided
Unassigned
2.4
High
Ian Booth

Bug Description

When deploying bundle with storage support for ceph OSD. OSD units are stuck in an infinite pending state:
ceph-osd/0 waiting allocating 3 100.80.61.4 agent initializing
ceph-osd/1 waiting allocating 4 100.80.61.10 agent initializing
ceph-osd/2 waiting allocating 5 100.80.41.8 agent initializing
ceph-osd/3 waiting allocating 6 100.80.61.6 agent initializing
ceph-osd/4 waiting allocating 7 100.80.41.2 agent initializing
ceph-osd/5 waiting allocating 8 100.80.31.1 agent initializing
ceph-osd/6 waiting allocating 9 100.80.41.6 agent initializing
ceph-osd/7 waiting allocating 10 100.80.51.2 agent initializing
ceph-osd/8 waiting allocating 11 100.80.51.4 agent initializing
ceph-osd/9 waiting allocating 12 100.80.61.1 agent initializing
ceph-osd/10* waiting allocating 13 100.80.41.1 agent initializing
ceph-osd/11 waiting allocating 14 100.80.51.3

I can see that storage attachment is pending in ceph-osd/0 unit logs:

2018-06-20 00:54:18 DEBUG juju.worker.uniter.storage resolver.go:125 next hook op for storage-osd-devices-0: {Kind:0 Life:alive Attached:false Location:}
2018-06-20 00:54:18 DEBUG juju.worker.uniter.storage resolver.go:125 next hook op for storage-osd-devices-1: {Kind:0 Life:alive Attached:false Location:}
2018-06-20 00:54:18 DEBUG juju.worker.uniter.storage resolver.go:125 next hook op for storage-osd-devices-2: {Kind:0 Life:alive Attached:false Location:}
2018-06-20 00:54:18 DEBUG juju.worker.uniter.storage resolver.go:125 next hook op for storage-osd-journals-3: {Kind:0 Life:alive Attached:false Location:}
2018-06-20 00:54:18 DEBUG juju.worker.uniter.storage resolver.go:91 still pending [storage-osd-devices-0 storage-osd-devices-1 storage-osd-devices-2 storage-osd-journals-3]
2018-06-20 00:58:23 DEBUG juju.worker.uniter.remotestate watcher.go:455 update status timer triggered

Storage pools are created:

deployer@ic-skbrat2-infra1:~$ juju storage-pools
Name Provider Attrs
loop loop
maas maas
rootfs rootfs
rotary maas tags=rotary
ssd maas tags=ssd
tmpfs tmpfs

According to unit ceph-osd/0 log above storage-osd-devices-0 is not attached: {Kind:0 Life:alive Attached:false Location:} but juju shows storage-osd-devices-0 is attached:

deployer@ic-skbrat2-infra1:~$ juju show-storage osd-devices/0
osd-devices/0:
  kind: block
  life: alive
  status:
    current: attached
    since: 19 Jun 2018 23:09:56Z
  persistent: false
  attachments:
    units:
      ceph-osd/0:
        machine: "3"
        life: alive

All storage devices are attached:

deployer@ic-skbrat2-infra1:~$ juju storage
[Storage]
Unit Id Type Pool Provider id Size Status Message
ceph-osd/0 osd-devices/0 block rotary volume-0 1.5TiB attached
ceph-osd/0 osd-devices/1 block rotary volume-1 1.5TiB attached
ceph-osd/0 osd-devices/2 block rotary volume-2 1.5TiB attached
ceph-osd/0 osd-journals/3 block ssd volume-3 1.5TiB attached
ceph-osd/1 osd-devices/4 block rotary volume-4 1.5TiB attached
...

Reproduce steps:
- juju create-storage-pool ssd maas tags=ssd
- juju create-storage-pool rotary maas tags=rotary
- juju deploy =bundle.yaml

Relevant bundle part:
  ceph-osd:
    charm: ../charms/ceph-osd
    num_units: *ceph-osd-count
    constraints: tags=storage-all
    storage:
      osd-devices: rotary,3
      osd-journals: ssd,1
    bindings:
      "": *oam-space
      public: *ceph-public-space
      cluster: *ceph-cluster-space
    options:
      osd-devices: *osd-devices
      osd-journal: *osd-journal
      source: *source
      customize-failure-domain: True
      autotune: true
      aa-profile-mode: enforce
      key: *key

Software version:
~$ juju version
juju 2.3.7-xenial-amd64

~$ juju status
Model Controller Cloud/Region Version SLA
openstack foundations-maas foundations-maas 2.3.8.1 unsupported

MAAS:
2.3.3 6498-ge4db91d-0ubuntu1~16.04.1

Attached:
- output of maas node read for machine 3 hosting ceph-osd/0
- machine log and unit log from machine 3 hosting ceph-osd/0

Michał Ajduk (majduk) wrote :
Ian Booth (wallyworld) wrote :

The symptoms seem to indicate that Juju has not recorded the block volumes as being provisioned. This is a prerequisite for running the attached hooks.

Could you please attach the machine-0.log file with debug level logging (ie we need to see the controller logs). This will allow inspection of the storage provisioning operations.

I've attached all the logs (juju ha)

Ian Booth (wallyworld) wrote :

The logs are huge :-) Nothing immediately jumps out. The issue may well be specific to this particular deployment. If would be great to get access to the maas so we could deploy the minimum bundle needed to reproduce the issue with an instrumented version of juju to capture the extra info needed to figure out what's happening. Is that possible?

Changed in juju:
status: New → Incomplete
Dmitrii Shcherbakov (dmitriis) wrote :

I hit the same issue just now on 2.3.8.

Launchpad Janitor (janitor) wrote :

[Expired for juju 2.4 because there has been no activity for 60 days.]

Launchpad Janitor (janitor) wrote :

[Expired for juju 2.3 because there has been no activity for 60 days.]

Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
Ante Karamatić (ivoks) wrote :

What we see from the trace is:

2018-11-20 09:55:05 DEBUG juju.worker.uniter.remotestate watcher.go:451 got storage change: [osd-devices/38 osd-devices/39] ok=true
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:225 -> {"request-id":78,"type":"Uniter","version":8,"request":"StorageAttachmentLife","params":{"ids":[{"storage-tag":"storage-osd-devices-38","unit-tag":"uni
t-testing-osd-0"},{"storage-tag":"storage-osd-devices-39","unit-tag":"unit-testing-osd-0"}]}}
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:120 <- {"request-id":78,"response":{"results":[{"life":"alive"},{"life":"alive"}]}}
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:225 -> {"request-id":79,"type":"Uniter","version":8,"request":"WatchStorageAttachments","params":{"ids":[{"storage-tag":"storage-osd-devices-38","unit-tag":"u
nit-testing-osd-0"}]}}
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:120 <- {"request-id":79,"response":{"results":[{"NotifyWatcherId":"20"}]}}
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:225 -> {"request-id":80,"type":"Uniter","version":8,"request":"StorageAttachments","params":{"ids":[{"storage-tag":"storage-osd-devices-38","unit-tag":"unit-t
esting-osd-0"}]}}
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:225 -> {"request-id":81,"type":"NotifyWatcher","version":1,"id":"20","request":"Next","params":{}}
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:120 <- {"request-id":80,"response":{"results":[{"result":{"storage-tag":"","owner-tag":"","unit-tag":"","kind":0,"location":"","life":""},"error":{"message":"
storage osd-devices/38 not provisioned","code":"not provisioned"}}]}}
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:225 -> {"request-id":82,"type":"Uniter","version":8,"request":"WatchStorageAttachments","params":{"ids":[{"storage-tag":"storage-osd-devices-39","unit-tag":"unit-testing-osd-0"}]}}
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:120 <- {"request-id":82,"response":{"results":[{"NotifyWatcherId":"21"}]}}
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:225 -> {"request-id":83,"type":"Uniter","version":8,"request":"StorageAttachments","params":{"ids":[{"storage-tag":"storage-osd-devices-39","unit-tag":"unit-testing-osd-0"}]}}
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:225 -> {"request-id":84,"type":"NotifyWatcher","version":1,"id":"21","request":"Next","params":{}}
2018-11-20 09:55:05 TRACE juju.rpc.jsoncodec codec.go:120 <- {"request-id":83,"response":{"results":[{"result":{"storage-tag":"","owner-tag":"","unit-tag":"","kind":0,"location":"","life":""},"error":{"message":"storage osd-devices/39 not provisioned","code":"not provisioned"}}]}}

Changed in juju:
status: Expired → Confirmed
Ante Karamatić (ivoks) wrote :

lsblk from the machine:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdf 8:80 0 1.5T 0 disk
sdd 8:48 0 1.5T 0 disk
sdb 8:16 0 1.5T 0 disk
├─sdb2 8:18 0 500M 0 part
│ └─md0 9:0 0 499.7M 0 raid1 /boot
├─sdb3 8:19 0 204.9G 0 part
│ └─md1 9:1 0 204.8G 0 raid1 /
└─sdb1 8:17 0 96M 0 part
sdi 8:128 0 1.8T 0 disk
sdg 8:96 0 1.5T 0 disk
sde 8:64 0 1.5T 0 disk
sdc 8:32 0 1.5T 0 disk
sda 8:0 0 1.5T 0 disk
├─sda2 8:2 0 500M 0 part
│ └─md0 9:0 0 499.7M 0 raid1 /boot
├─sda3 8:3 0 204.9G 0 part
│ └─md1 9:1 0 204.8G 0 raid1 /
└─sda1 8:1 0 96M 0 part /boot/efi
sdj 8:144 0 1.8T 0 disk
sdh 8:112 0 1.5T 0 disk

Felipe Reyes (freyes) wrote :

machine-115.log is logging about a non matching path for the partitions sdn{1,2,3} and sdn{1,2,3}[0] which according to lsblk[1] they are used to build a raid, specifically:

sdn1 -> used for '/boot/efi'
sdo1 -> unused
sdn2 + sdo2 -> md0 used for '/boot'
sdn3 + sdo3 -> md1 used for '/'

The debug message "non matching DEVPATH for ..."[2] doesn't look like it's related to the stuck state of the hooks, because it's simply indicating that a scsi path couldn't be found based on the device name.

The reason why the DEVPATH message is being printed over and over again giving the idea juju is stuck it's because this is part of the diskmanager worker which polls disks every 30 seconds[3]

The sequence of calls is:

diskmanager worker every 30 secs:
-> listBlockDevices()
-> addHardwareInfo()
-> logger.Debugf("non matching DEVPATH for %q: %q", dev.DeviceName, devpath)

[0] $ grep DEVPATH machine-115.log | awk -F' ' '{print $10}' | sort | uniq | cut -d'"' -f2
sdn1
sdn2
sdn3
sdo1
sdo2
sdo3
[1] NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdf 8:80 0 1.5T 0 disk
sdo 8:224 0 222.6G 0 disk
|-sdo3 8:227 0 204.9G 0 part
| `-md1 9:1 0 204.8G 0 raid1 /
|-sdo1 8:225 0 96M 0 part
`-sdo2 8:226 0 500M 0 part
`-md0 9:0 0 499.7M 0 raid1 /boot
sdd 8:48 0 1.5T 0 disk
sdm 8:192 0 1.5T 0 disk
sdb 8:16 0 1.5T 0 disk
sdk 8:160 0 1.5T 0 disk
sdi 8:128 0 1.5T 0 disk
sdg 8:96 0 1.5T 0 disk
sde 8:64 0 1.5T 0 disk
sdn 8:208 0 222.6G 0 disk
|-sdn3 8:211 0 204.9G 0 part
| `-md1 9:1 0 204.8G 0 raid1 /
|-sdn1 8:209 0 96M 0 part /boot/efi
`-sdn2 8:210 0 500M 0 part
`-md0 9:0 0 499.7M 0 raid1 /boot
sdc 8:32 0 1.5T 0 disk
sdl 8:176 0 1.5T 0 disk
sda 8:0 0 1.5T 0 disk
sdj 8:144 0 1.5T 0 disk
sdh 8:112 0 1.5T 0 disk
[2] https://github.com/juju/juju/blob/2.3/worker/diskmanager/lsblk.go#L228-L252
[3] https://github.com/juju/juju/blob/2.3/worker/diskmanager/diskmanager.go#L24

tags: added: sts
Felipe Reyes (freyes) wrote :

The machine 115 which hosts the unit testing-osd2/3 is informing that the there
are still pending storages to be processed[0], this message comes from the
uniter/storage/resolver.go[1] after this message there is a check if the state
install hook was already run, if it hasn't, then it will keep waiting.

A fresh "juju status" shows that the machine is in "allocating" state[2], so the
question here is why the agent hasn't reached "ready" state.

[0] $ grep "still pending" juju-{1,2,3}/var/log/juju/logsink.log |grep testing-osd2-3
juju-2/var/log/juju/logsink.log:dc746ae6-fac2-415c-88e8-cad7c6e2c41e: unit-testing-osd2-3 2018-11-20 14:02:28 DEBUG juju.worker.uniter.storage resolver.go:91 still pending [storage-osd-devices-46 storage-osd-devices-47]
juju-2/var/log/juju/logsink.log:dc746ae6-fac2-415c-88e8-cad7c6e2c41e: unit-testing-osd2-3 2018-11-20 14:07:27 DEBUG juju.worker.uniter.storage resolver.go:91 still pending [storage-osd-devices-46 storage-osd-devices-47]
juju-2/var/log/juju/logsink.log:dc746ae6-fac2-415c-88e8-cad7c6e2c41e: unit-testing-osd2-3 2018-11-20 14:12:12 DEBUG juju.worker.uniter.storage resolver.go:91 still pending [storage-osd-devices-46 storage-osd-devices-47]
juju-2/var/log/juju/logsink.log:dc746ae6-fac2-415c-88e8-cad7c6e2c41e: unit-testing-osd2-3 2018-11-20 14:17:19 DEBUG juju.worker.uniter.storage resolver.go:91 still pending [storage-osd-devices-46 storage-osd-devices-47]
juju-2/var/log/juju/logsink.log:dc746ae6-fac2-415c-88e8-cad7c6e2c41e: unit-testing-osd2-3 2018-11-20 14:21:26 DEBUG juju.worker.uniter.storage resolver.go:91 still pending [storage-osd-devices-46 storage-osd-devices-47]
[1] https://github.com/juju/juju/blob/2.3/worker/uniter/storage/resolver.go#L91
[2] testing-osd2/3 waiting allocating 115 100.80.51.1 agent initializing

Ian Booth (wallyworld) wrote :

When Juju provisions a machine with storage, depending on the provider, it generally gets back from the start api call information from the provider about each volume that has been provisioned, including:
- wwn
- hardware id

plus info about how that volume is/will be attached to the machine, including:
- bus address
- device link
- device name

The content of the volumes and volumeattachments collections shows that MAAS is providing Juju with the wwn of each volume and no information about the attachment of that volume.

For the uniter to run the storage attached hook, it needs to know that the storage attachment has been completed, and this is done by polling the block devices using lsblk. The info from lsblk is recorded and matched with the volume/attachment info obtained from the maas apis when the machine was provisioned.

The block device info from lsblk that Juju can parse includes device name, size, label uuid, type etc. So just the basic info to record a block device exists.

So we're down to matching my WWN. There is a volume info record for "osd-devices/40" which has a WWN of "0x600605b00d6be9e023586a261060047f". And in lsblk output there is a record of that block device being present:

2018-11-20 11:08:05 INFO juju.worker.diskmanager diskmanager.go:67 block devices changed: [{sda [/dev/disk/by-id/scsi-3600605b00d6be9e023586a261060047f /dev/disk/by-id/wwn-0x600605b00d6be9e023586a261060047f /dev/disk/by-path/pci-0000:0b:00.0-scsi-0:2:0:0]...

Juju can't reliably parse the WWN from this so it then runs for each device "udevadm":

udevadm info -q property --name <deviceName>

It looks for "ID_WWN=xxxxx" in the output.

So what we need to look further is the udevadm output for say "osd-devices/40" and maybe a few others, plus the content of the "blockdevices" collection. We need to find out why Juju cannot match the block devices with the volume/attachment info from MAAS.

Note, udevadm is used to get the bus address of the device. The bus address is expected to be in the format:

.../X.X.X.X/block/<devicename>

where as for this deployment the udevadm output produces:

.../1:2:0:0/block/sdo/sdo3

The extra "sdo" bit means that Juju cannot parse a bus address. That might be an issue but it's not here because the volume attachment data from MAAS doesn't contain a bus address we can match against anyway.

Ian Booth (wallyworld) on 2018-11-21
Changed in juju:
importance: Undecided → High
Ante Karamatić (ivoks) wrote :

Currently I can work only with osd-devices/38. It's located on testing-osd/0. It's WWN is 0x600508b1001cf7030d45af15fa4ea14d. On the machine, this relates to /dev/sdc:

# udevadm info -q property --name sdc
DEVLINKS=/dev/disk/by-id/wwn-0x600508b1001cf7030d45af15fa4ea14d /dev/disk/by-id/scsi-3600508b1001cf7030d45af15fa4ea14d /dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:2
DEVNAME=/dev/sdc
DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:03:00.0/host0/target0:1:0/0:1:0:2/block/sdc
DEVTYPE=disk
ID_BUS=scsi
ID_MODEL=LOGICAL_VOLUME
ID_MODEL_ENC=LOGICAL\x20VOLUME\x20\x20
ID_PATH=pci-0000:03:00.0-scsi-0:1:0:2
ID_PATH_TAG=pci-0000_03_00_0-scsi-0_1_0_2
ID_REVISION=6.06
ID_SCSI=1
ID_SCSI_SERIAL=PDNLH0BRH9630O
ID_SERIAL=3600508b1001cf7030d45af15fa4ea14d
ID_SERIAL_SHORT=600508b1001cf7030d45af15fa4ea14d
ID_TYPE=disk
ID_VENDOR=HP
ID_VENDOR_ENC=HP\x20\x20\x20\x20\x20\x20
ID_WWN=0x600508b1001cf703
ID_WWN_VENDOR_EXTENSION=0x0d45af15fa4ea14d
ID_WWN_WITH_EXTENSION=0x600508b1001cf7030d45af15fa4ea14d
MAJOR=8
MINOR=32
SUBSYSTEM=block
TAGS=:systemd:
USEC_INITIALIZED=6417369

Hello Ian,

I've collected all data you asked for, but on a different machine (115).
https://pastebin.canonical.com/p/ZvB6Rg64gm/
Note that ID_WWN matches all the devices, but the ID_WWN_WITH_EXTENSION is the correct match.

Ian Booth (wallyworld) wrote :

I looked at this pastebin which has the block device info as recorded by Juju
https://pastebin.canonical.com/p/VM4Gj5m5X4/

Around line 5488

The block device record in Juju (as determined from running udevadm) records the wwn as

0x600508b1001cf703

It is not the wwn with extension. Juju only grabs the unxtended wwn from udevadm.

Yet maas is reporting what Juju expects as the wwn as the extended wwn:

0x600508b1001cf7030d45af15fa4ea14d

So Juju is trying to match an extended wwn with the unextended one.

We need to either fix maas, or extend Juju to handle both wwn and extended wwn. The best approach is probably to enhance juju to handle both so it's more robust.

Changed in juju:
milestone: none → 2.5-beta2
assignee: nobody → Ian Booth (wallyworld)
Ian Booth (wallyworld) wrote :

https://github.com/juju/juju/pull/9494

Further research indicates that where there are logical volumes on say a RAID controller, the controller is identified by the WWN and each volume by the extended vendor WWN. So if there's an extended WWN there, we'll use that since it seems that's how the volume is referenced under /dev/disk/by-id and hence is what's reported by MAAS.

Hello Ian,

I've got the full deployment finished, with success. Thanks for fixing the issue in such a short time!

Ian Booth (wallyworld) on 2018-11-26
Changed in juju:
status: Confirmed → Fix Committed
Ian Booth (wallyworld) on 2018-11-30
Changed in juju:
milestone: 2.5-beta2 → 2.5-beta3
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers