Canonical Juju

juju deploy bundle model comparison doesn't take into account still-deploying units

Bug #1786965 reported by Drew Freiberger on 2018-08-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Triaged	Low	Unassigned

Bug Description

I am deploying an openstack cloud with MAAS provider. I have deployed successfully 8 of 12 servers.

Two had failures. I cleaned up one that had several container placements on it, and then ran juju deploy ./bundle.yaml again. It properly calculated the missing machine and containers/application-units for re-deployment of that node.

My model currently has this status in juju machines:

23 pending 10.216.5.88 mg7msp xenial rack-2 Deploying
23/lxd/0 pending pending xenial
23/lxd/1 pending pending xenial
23/lxd/2 pending pending xenial
23/lxd/3 pending pending xenial
23/lxd/4 pending pending xenial
23/lxd/5 pending pending xenial

and these services:
ceph-mon/3 waiting allocating 23/lxd/0 waiting for machine
ceph-osd/9 waiting allocating 23 10.216.5.88 waiting for machine
ceph-radosgw/3 waiting allocating 23/lxd/1 waiting for machine
cinder/3 waiting allocating 23/lxd/2 waiting for machine
designate/3 waiting allocating 23/lxd/3 waiting for machine
mysql/3 waiting allocating 23/lxd/4 waiting for machine
nova-compute-kvm/7 waiting allocating 23 10.216.5.88 waiting for machine
openstack-dashboard/3 waiting allocating 23/lxd/5 waiting for machine

I then went to delete the other failed machine which was a VM with a rabbitmq-server instance on it.

I then re-ran the juju deploy bundle.yaml while the machine 23 above was still pending maas deployment, thinking that it'd just rebuild the last rabbitmq-server VM that's not currently in model or pending in model.

The bug comes in that it allocated a new machine with the same services as machine 23 above, then resulting in lack of matching hardware to deploy on:

23 pending 10.216.5.88 mg7msp xenial rack-2 Deploying: 'curtin' configuring partition: nvme0n1-part3
23/lxd/0 pending pending xenial
23/lxd/1 pending pending xenial
23/lxd/2 pending pending xenial
23/lxd/3 pending pending xenial
23/lxd/4 pending pending xenial
23/lxd/5 pending pending xenial
24 pending pending xenial failed to start machine 24 in zone "default", retrying in 10s with new availability zone: failed to acquire node: No available machine matches constraints: [('agent_name', ['38815777-75ac-4d98-80c9-24507f2b11a1']), ('tags', ['compute', 'rack-2']), ('zone', ['default'])] (resolved to "tags=compute,rack-2 zone=default")

I then went to delete that machine juju remove-machine --force 24, and it's still stuck in the model, though the containers on it did get removed from the model.

Tags:

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2018-08-14:

machine 24 did finally clear out of the model.

Tim Penhey (thumper) on 2018-08-14

tags:	added: bundles
affects:	juju-core → juju
Changed in juju:
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

Canonical Juju QA Bot (juju-qa-bot) wrote on 2022-11-03:

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance:	Medium → Low
tags:	added: expirebugs-bot

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.