Unable to deploy with storage if AZ of storage doesn't match AZ of compute instances

Bug #2020871 reported by Tom Haddon
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Undecided
Joseph Phillips

Bug Description

We have an OpenStack cloud in which storage is reported (by OpenStack) as `nova`, but instances are deployed into `availability-zone-1`, `availability-zone-2` or `availability-zone-3`.

I ran `juju deploy ubuntu --series focal --attach-storage files/8` given https://paste.ubuntu.com/p/T3dgWmVDJP/ and got the following in controller logs:

```
2023-05-26 09:22:19 INFO juju.worker.provisioner provisioner_task.go:1003 provisioning in zones: [availability-zone-1 availability-zone-2 availability-zone-3]
2023-05-26 09:22:19 INFO juju.worker.provisioner provisioner_task.go:504 found machine pending provisioning id:14, details:14
2023-05-26 09:22:19 DEBUG juju.worker.provisioner workerpool.go:138 worker 8: processing task "start-instance 14"
2023-05-26 09:22:19 DEBUG juju.worker.provisioner provisioner_task.go:1183 machine 14 does not match az availability-zone-2: excluded machine id
2023-05-26 09:22:19 DEBUG juju.worker.provisioner provisioner_task.go:1183 machine 14 does not match az availability-zone-1: excluded machine id
2023-05-26 09:22:19 DEBUG juju.worker.provisioner provisioner_task.go:1183 machine 14 does not match az availability-zone-3: excluded machine id
2023-05-26 09:22:19 ERROR juju.worker.provisioner provisioner_task.go:1312 cannot start instance for machine "14": suitable availability zone for machine 14 not found
```

Tags: canonical-is
Tom Haddon (mthaddon)
tags: added: canonical-is
Revision history for this message
Tom Haddon (mthaddon) wrote :

This is on juju 2.9.42, to be clear.

Revision history for this message
Ian Booth (wallyworld) wrote :

What availability zone does the volume exist in? The error message seems to indicate that the volume is reported to juju as being in an AZ that is not one where the instance can be provisioned. Generally, juju will only attach a volume to a machine if the machine can be provisioned in the same AZ as the volume.

I would have hoped that a suitable error message would have been displayed in status indicating why provisioning failed. You should not have had to dig into log messages.

Revision history for this message
Tom Haddon (mthaddon) wrote :

The availability zone of the storage is reported as `nova`.

Revision history for this message
Tom Haddon (mthaddon) wrote :

Also, just to mention that we can attach the storage after the unit has been deployed without any problem, so rather than doing:

```
juju deploy ubuntu --series focal --attach-storage files/8
```

We do:

```
juju deploy ubuntu --series focal
# Wait for active/idle status
juju attach-storage ubuntu/${unit} files/8
```

Revision history for this message
Ian Booth (wallyworld) wrote :

You are right, attach-storage does not perform the same AZ validation as does deploy. This is actually an attach-storage bug. The Juju model does currently require any volume AZ matches that of the machine to which it is attached, otherwise things can break.

Is there any way to adjust the AZ reported by the storage? If the Juju modelling bug is fixed, the work around will break. Or we'd need to add an option to ignore AZ mismatches or something.

Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
status: New → Triaged
Revision history for this message
Tom Haddon (mthaddon) wrote :

Wouldn't that mean we could only attach storage that was created in the same availability zone as the compute instance? So if we have three availability zones we could only attach a particular volume to one out of three instances? In a disaster recovery scenario that sounds less than ideal.

Revision history for this message
Haw Loeung (hloeung) wrote :

Is this a regression? I remember running into something similar way back, LP:1885639 (OpenStack) and LP:1884018 (Azure).

Revision history for this message
Ian Booth (wallyworld) wrote :

It looks like there's approx 3 different code paths in play here, and I'm not sure they're all consistent.

Originally, when storage was added to juju, there was an underlying constraint that the machine AZ must match the volume AZ, or else attachment of said volume to machine would not be possible. I think this very much was driven by an AWS requirement.

There's various places in the code that enforce the above constraint. As already noted in this bug, it is checked at deploy time. It is also checked if a placement directive is used, eg

juju deploy blah --to zone:foo

does validation on the zone placement which might return an error formatted as

"cannot create instance in zone %q, as this will prevent attaching the requested disks in zone %q"

And yet also as noted, using --attach-storage does not do the placement check.

As noted by Haw, it seems code was added to Openstack provider to specifically handle the (Openstack specific) case where the default volume zone "nova" is used, and strict AZ matches with machine are skipped. But there's been several changes to the provisioner worker since then, eg to deploy machines in parallel.

Looks like we need to revisit the logic here and consistently add a provider specific AZ check in all provisioning workflows.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.