juju deploy bundle model comparison doesn't take into account still-deploying units

Bug #1786965 reported by Drew Freiberger on 2018-08-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
Medium
Unassigned

Bug Description

I am deploying an openstack cloud with MAAS provider. I have deployed successfully 8 of 12 servers.

Two had failures. I cleaned up one that had several container placements on it, and then ran juju deploy ./bundle.yaml again. It properly calculated the missing machine and containers/application-units for re-deployment of that node.

My model currently has this status in juju machines:

23 pending 10.216.5.88 mg7msp xenial rack-2 Deploying
23/lxd/0 pending pending xenial
23/lxd/1 pending pending xenial
23/lxd/2 pending pending xenial
23/lxd/3 pending pending xenial
23/lxd/4 pending pending xenial
23/lxd/5 pending pending xenial

and these services:
ceph-mon/3 waiting allocating 23/lxd/0 waiting for machine
ceph-osd/9 waiting allocating 23 10.216.5.88 waiting for machine
ceph-radosgw/3 waiting allocating 23/lxd/1 waiting for machine
cinder/3 waiting allocating 23/lxd/2 waiting for machine
designate/3 waiting allocating 23/lxd/3 waiting for machine
mysql/3 waiting allocating 23/lxd/4 waiting for machine
nova-compute-kvm/7 waiting allocating 23 10.216.5.88 waiting for machine
openstack-dashboard/3 waiting allocating 23/lxd/5 waiting for machine

I then went to delete the other failed machine which was a VM with a rabbitmq-server instance on it.

I then re-ran the juju deploy bundle.yaml while the machine 23 above was still pending maas deployment, thinking that it'd just rebuild the last rabbitmq-server VM that's not currently in model or pending in model.

The bug comes in that it allocated a new machine with the same services as machine 23 above, then resulting in lack of matching hardware to deploy on:

23 pending 10.216.5.88 mg7msp xenial rack-2 Deploying: 'curtin' configuring partition: nvme0n1-part3
23/lxd/0 pending pending xenial
23/lxd/1 pending pending xenial
23/lxd/2 pending pending xenial
23/lxd/3 pending pending xenial
23/lxd/4 pending pending xenial
23/lxd/5 pending pending xenial
24 pending pending xenial failed to start machine 24 in zone "default", retrying in 10s with new availability zone: failed to acquire node: No available machine matches constraints: [('agent_name', ['38815777-75ac-4d98-80c9-24507f2b11a1']), ('tags', ['compute', 'rack-2']), ('zone', ['default'])] (resolved to "tags=compute,rack-2 zone=default")

I then went to delete that machine juju remove-machine --force 24, and it's still stuck in the model, though the containers on it did get removed from the model.

Drew Freiberger (afreiberger) wrote :

machine 24 did finally clear out of the model.

Tim Penhey (thumper) on 2018-08-14
tags: added: bundles
affects: juju-core → juju
Changed in juju:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers