Canonistack fails causing continuous integration tests to hang

Bug #1161890 reported by Gary Poster
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-gui
Invalid
Low
Unassigned

Bug Description

Sometimes when Canonistack fails, Juju notices, or other parts of the infrastructure notice, and our CI tests fail, and we can automatically retry.

Other times we aren't that lucky: Juju doesn't notice the Canonistack error, and our tests will hang indefinitely.

This happens most often in lib/deploy_charm_for_testing.py.

If we could periodically check whether our calls to juju bootstrap or juju deploy or juju status are still interacting with a healthy canonistack, that would be convenient.

The Python 2.x subprocess does not have a concept of a timeout, so this is somewhat difficult to implement. Two reasonable solutions are to use a signal (http://stackoverflow.com/questions/1191374/subprocess-with-timeout/1191537#1191537) or to use the subprocess32 port, that does support timeout. The file attached to this bug is an untested sketch that uses subprocess32.

Once we had this, we could change the juju_command in deploy_charm_for_testing to verify that we did not have any machines in a stopped or error state according to euca-describe-instances, and verify that we had not been waiting longer than 10 or 15 minutes max for a given command to finish.

Note that this behavior should only be in place when we are using Canonistack, which the deploy_charm_for_testing script considers to be when os.environ.get("JUJU_INSTANCE_IP") exists. Otherwise (such as on ec2), we should not use euca to check status.

I'm not pursuing this now because we have more important issues to address.

Revision history for this message
Gary Poster (gary) wrote :
Changed in juju-gui:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.