can wait forever when one or more units fail during deployment

Bug #1512472 reported by Ryan Beisner
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mojo: Continuous Delivery for Juju
Invalid
Undecided
Unassigned
juju-deployer
Invalid
Undecided
Tim Van Steenburgh
python-jujuclient
New
Undecided
Tim Van Steenburgh

Bug Description

When a unit hangs/fails during deployment, mojo (and juju-deployer) deployments can infinitely wait for deploy completion. The expected behavior, regardless of the health of the deployed units, would be for it raise and exit when the timeout threshold is hit.

This is occurring when one or more units in the deployment have crashed or hung during hook execution (that is an unrelated separate issue, such as a hung_task).

jenkins@juju-osci-machine-16:~$ apt-cache policy juju-core
juju-core:
  Installed: 1.25.0-0ubuntu1~14.04.1~juju1
  Candidate: 1.25.0-0ubuntu1~14.04.1~juju1
  Version table:
 *** 1.25.0-0ubuntu1~14.04.1~juju1 0
        500 http://ppa.launchpad.net/juju/stable/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
     1.22.8-0ubuntu1~14.04.1 0
        500 http://nova.clouds.archive.ubuntu.com/ubuntu/ trusty-updates/universe amd64 Packages
     1.18.1-0ubuntu1 0
        500 http://nova.clouds.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages

jenkins@juju-osci-machine-16:~$ apt-cache policy juju-deployer
juju-deployer:
  Installed: 0.6.0-1
  Candidate: 0.6.0-1
  Version table:
 *** 0.6.0-1 0
        500 http://ppa.launchpad.net/juju/stable/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
     0.3.6-0ubuntu2 0
        500 http://nova.clouds.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages

jenkins@juju-osci-machine-16:~$ apt-cache policy mojo
mojo:
  Installed: 0.1.15
  Candidate: 0.1.15
  Version table:
 *** 0.1.15 0
        500 http://ppa.launchpad.net/mojo-maintainers/ppa/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status

# mojo/deployer output:
http://paste.ubuntu.com/13085527/

# juju stat
http://paste.ubuntu.com/13085505/

Related branches

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Added juju-deployer as we did observe the same with vanilla (non-mojo) juju-deployer test runs.

ex.

Stuck for over a day, juju-deployer never timed out:

http://paste.ubuntu.com/13086320/

description: updated
Changed in juju-deployer:
assignee: nobody → Tim Van Steenburgh (tvansteenburgh)
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Just to clarify, this isn't specific to the openstack provider. Here is a bare metal deployment using the maas provider, where 2 metal units failed to come up (separate issue from this bug), and juju-deployer hasn't exited after nearly 1 day. The timeout threshold is not kicking in, and the "deploy is ready" logic is apparently still churning.

http://paste.ubuntu.com/13101588/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

\o/

Running with the patched python-jujuclient on the metal deployment mentioned in my preceding comment, it successfully times out at the default 2700s value.

2015-11-04 18:16:57 [ERROR] deployer.import: Reached deployment timeout.. exiting
2015-11-04 18:16:57 [INFO] deployer.cli: Deployment stopped. run time: 2700.63

Changed in python-jujuclient:
assignee: nobody → Tim Van Steenburgh (tvansteenburgh)
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Exercised positive and negative aspects of this with the patched python-jujuclient. ie. Confirmed that the 45min timeout is now hit, and that successful deploys exit cleanly.

juju 1.25.0-0ubuntu1~14.04.1~juju1

Many thanks!

Ryan Beisner (1chb1n)
Changed in juju-deployer:
status: New → Invalid
Changed in mojo:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.