juju deploy bundle model comparison doesn't take into account still-deploying units

Bug #1786965 reported by Drew Freiberger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

I am deploying an openstack cloud with MAAS provider. I have deployed successfully 8 of 12 servers.

Two had failures. I cleaned up one that had several container placements on it, and then ran juju deploy ./bundle.yaml again. It properly calculated the missing machine and containers/application-units for re-deployment of that node.

My model currently has this status in juju machines:

23 pending 10.216.5.88 mg7msp xenial rack-2 Deploying
23/lxd/0 pending pending xenial
23/lxd/1 pending pending xenial
23/lxd/2 pending pending xenial
23/lxd/3 pending pending xenial
23/lxd/4 pending pending xenial
23/lxd/5 pending pending xenial

and these services:
ceph-mon/3 waiting allocating 23/lxd/0 waiting for machine
ceph-osd/9 waiting allocating 23 10.216.5.88 waiting for machine
ceph-radosgw/3 waiting allocating 23/lxd/1 waiting for machine
cinder/3 waiting allocating 23/lxd/2 waiting for machine
designate/3 waiting allocating 23/lxd/3 waiting for machine
mysql/3 waiting allocating 23/lxd/4 waiting for machine
nova-compute-kvm/7 waiting allocating 23 10.216.5.88 waiting for machine
openstack-dashboard/3 waiting allocating 23/lxd/5 waiting for machine

I then went to delete the other failed machine which was a VM with a rabbitmq-server instance on it.

I then re-ran the juju deploy bundle.yaml while the machine 23 above was still pending maas deployment, thinking that it'd just rebuild the last rabbitmq-server VM that's not currently in model or pending in model.

The bug comes in that it allocated a new machine with the same services as machine 23 above, then resulting in lack of matching hardware to deploy on:

23 pending 10.216.5.88 mg7msp xenial rack-2 Deploying: 'curtin' configuring partition: nvme0n1-part3
23/lxd/0 pending pending xenial
23/lxd/1 pending pending xenial
23/lxd/2 pending pending xenial
23/lxd/3 pending pending xenial
23/lxd/4 pending pending xenial
23/lxd/5 pending pending xenial
24 pending pending xenial failed to start machine 24 in zone "default", retrying in 10s with new availability zone: failed to acquire node: No available machine matches constraints: [('agent_name', ['38815777-75ac-4d98-80c9-24507f2b11a1']), ('tags', ['compute', 'rack-2']), ('zone', ['default'])] (resolved to "tags=compute,rack-2 zone=default")

I then went to delete that machine juju remove-machine --force 24, and it's still stuck in the model, though the containers on it did get removed from the model.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

machine 24 did finally clear out of the model.

Tim Penhey (thumper)
tags: added: bundles
affects: juju-core → juju
Changed in juju:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Medium → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.