juju-gui

Canonistack fails causing continuous integration tests to hang

Bug #1161890 reported by Gary Poster on 2013-03-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	juju-gui	Invalid	Low	Unassigned

Bug Description

Sometimes when Canonistack fails, Juju notices, or other parts of the infrastructure notice, and our CI tests fail, and we can automatically retry.

Other times we aren't that lucky: Juju doesn't notice the Canonistack error, and our tests will hang indefinitely.

This happens most often in lib/deploy_charm_for_testing.py.

If we could periodically check whether our calls to juju bootstrap or juju deploy or juju status are still interacting with a healthy canonistack, that would be convenient.

The Python 2.x subprocess does not have a concept of a timeout, so this is somewhat difficult to implement. Two reasonable solutions are to use a signal (http://stackoverflow.com/questions/1191374/subprocess-with-timeout/1191537#1191537) or to use the subprocess32 port, that does support timeout. The file attached to this bug is an untested sketch that uses subprocess32.

Once we had this, we could change the juju_command in deploy_charm_for_testing to verify that we did not have any machines in a stopped or error state according to euca-describe-instances, and verify that we had not been waiting longer than 10 or 15 minutes max for a given command to finish.

Note that this behavior should only be in place when we are using Canonistack, which the deploy_charm_for_testing script considers to be when os.environ.get("JUJU_INSTANCE_IP") exists. Otherwise (such as on ec2), we should not use euca to check status.

I'm not pursuing this now because we have more important issues to address.