Azure provider: storage stuck in 'pending'

Bug #1884018 reported by Haw Loeung
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Ian Booth

Bug Description

Hi,

Trying to deploy the ubuntu-repository-cache charm, which I'm working on updating to include support for Juju storage. When deploying to Azure, it's stuck in 'allocating' state. It looks to be that the storage volumes are still in 'pending':

| Unit Workload Agent Machine Public address Ports Message
| ubuntu-repository-cache/5 waiting allocating 5 40.81.225.8 agent initializing

| Unit Storage id Type Pool Size Status Message
| ubuntu-repository-cache/5 ubuntu-repository-cache/2 filesystem azure pending

The storage is definitely attached, and I can see it there both via SSH to the unit itself and via the Azure portal.

| ubuntu@machine-5:~$ df -h -t ext4
| Filesystem Size Used Avail Use% Mounted on
| /dev/sda1 31G 1.6G 30G 5% /
| /dev/sdc1 98G 61M 93G 1% /mnt

| ubuntu@machine-5:~$ sudo fdisk -l /dev/sdb
| Disk /dev/sdb: 100 GiB, 107374182400 bytes, 209715200 sectors
| Units: sectors of 1 * 512 = 512 bytes
| Sector size (logical/physical): 512 bytes / 4096 bytes
| I/O size (minimum/optimal): 4096 bytes / 4096 bytes

This is seen with Juju 2.7.6.

Changed in juju:
milestone: none → 2.7.7
Haw Loeung (hloeung)
summary: - Azure storage stuck in 'pending'
+ Azure provider: storage stuck in 'pending'
Changed in juju:
importance: Undecided → Critical
Haw Loeung (hloeung)
summary: - Azure provider: storage stuck in 'pending'
+ Azure provider: storage stuck in 'pending' blocking deployments (stuck
+ 'allocating')
Haw Loeung (hloeung)
Changed in juju:
status: New → Triaged
Revision history for this message
Ian Booth (wallyworld) wrote : Re: Azure provider: storage stuck in 'pending' blocking deployments (stuck 'allocating')

The likely cause is that the volume is mounted at as a device where juju cannot reconcile the mount path with the pending volume. Matching is done based on a variety of factors (eg WWN, device id etc) and each substrate is slightly different. Often, a substrate might introduce a new storage class that mounts slightly differently and Juju needs updating to account for it. we've sen this on AWS with nvme volumes.

Revision history for this message
Haw Loeung (hloeung) wrote :

It looks like this is also with the OpenStack provider with Juju 2.7.6 as well as 2.8.0.

Separate to Azure, I've tried using the shared Juju 2.x controllers in PS4.5 and that also fails:

| Model Controller Cloud/Region Version SLA Timestamp
| stg-is-content-cache prodstack-is prodstack-45/bootstack-ps45 2.7.6 unsupported 05:22:56Z
| ...
| ubuntu-repository-cache-new/0* waiting allocating 4 10.2xx.xxx.xxx agent initializing

| $ juju list-storage
| ubuntu-repository-cache-new/0 ubuntu-repository-cache/3 filesystem pending

As well as some random CI environment, different controller, 2.8.0:

| Model Controller Cloud/Region Version SLA Timestamp
| XXX jenkins-ci-controller prodstack4.5/bootstack-ps45 2.8.0 unsupported 05:24:00Z
| ubuntu-repository-cache-new/0* waiting allocating 0 10.2x.xxx.xx agent initializing

| $ juju list-storage
| Unit Storage id Type Size Status Message
| ubuntu-repository-cache-new/0 ubuntu-repository-cache/0 filesystem pending

In both cases storage-default-block-source is 'cinder'. In Azure's case, it's 'azure'.

Revision history for this message
Haw Loeung (hloeung) wrote :

FWIW, before upgrading the shared 2.x controllers to 2.7.6, it was running 2.6.10 and was able to deploy units with attached storage.

summary: - Azure provider: storage stuck in 'pending' blocking deployments (stuck
- 'allocating')
+ Azure and OpenStack provider: storage stuck in 'pending'
Pen Gale (pengale)
Changed in juju:
milestone: 2.7.7 → 2.8.1
Revision history for this message
Ian Booth (wallyworld) wrote : Re: Azure and OpenStack provider: storage stuck in 'pending'

For the openstack case (on prodstack 4.5), juju storage --format yaml shows the error

'invalid status (400): {"badRequest": {"message": "Invalid input received: Availability zone ''prodstack-zone-1'' is invalid", "code": 400}}'

In 2.7, we introduced code to include the AZ of the host machine to which the storage would be attached to the CreateVolume() API call. In 2.6 this parameter was empty.

$ nova availability-zone-list
+------------------+---------------+
| Name | Status |
+------------------+---------------+
| prodstack-zone-2 | available |
| prodstack-zone-1 | available |
| nova | not available |
+------------------+---------------+

The compute node is in prodstack-zone-1 and that's what Juju is using for CreateVolume().

Yet showing an existing volumes created outside of this test Juju deployment,

$ nova volume-show 663443e6-a558-485d-83a6-b880745b05f5

availability_zone | nova

So it looks like the allowable AZ for volumes do not match those for compute nodes. Juju doesn't know how to handle this.

Revision history for this message
Haw Loeung (hloeung) wrote :

It looks like it's two separate bugs here, one for the OpenStack provider and the other for Azure (unless it's also the bit of code to include the AZ that's causing an issue with Azure too?).

Should we split this into two bugs or keep it as one?

For OpenStack, in our set up there's only a single AZ for cinder:

$ nova availability-zone-list
+------------------+---------------+
| Name | Status |
+------------------+---------------+
| prodstack-zone-2 | available |
| prodstack-zone-1 | available |
| nova | not available |
+------------------+---------------+

$ cinder availability-zone-list
+------+-----------+
| Name | Status |
+------+-----------+
| nova | available |
+------+-----------+

Unlike with compute, where there are two and the default one 'nova' disabled.

Revision history for this message
Ian Booth (wallyworld) wrote :

Yes, there's 2 separate issues here. Different bugs are warranted.

Revision history for this message
Haw Loeung (hloeung) wrote :

Split out OpenStack issues to LP:1885639. So keeping this one as Azure specific.

summary: - Azure and OpenStack provider: storage stuck in 'pending'
+ Azure provider: storage stuck in 'pending'
Revision history for this message
Ian Booth (wallyworld) wrote :
Changed in juju:
milestone: 2.8.1 → 2.7.8
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
Ian Booth (wallyworld)
Changed in juju:
status: Fix Released → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.