HA tests timeout

Bug #1351030 reported by Curtis Hovey
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Menno Finlay-Smits

Bug Description

As of commit 62e172632c3e9d8496805ed5223f9f4acc28986a , the HA tests timeout.
    http://juju-ci.vapour.ws:8080/job/functional-ha-recovery/560/console
    http://juju-ci.vapour.ws:8080/job/functional-ha-backup-restore/455/console

The second test is an elaboration of the first test, functional-ha-recovery. As the devel branch has several recent commits, I cannot say this commit is the root cause, but given that several other regressions can be related to this revisions changes to journalling, this commit is a good candidate.

Tags: ci regression
Revision history for this message
Curtis Hovey (sinzui) wrote :
Changed in juju-core:
assignee: nobody → Horacio Durán (hduran-8)
Revision history for this message
Horacio Durán (hduran-8) wrote :

This might have been fixed by commit https://github.com/juju/juju/commit/62e172632c3e9d8496805ed5223f9f4acc28986a as per sinzui, I could not get pass a race condition with upgrade while trying to fix it.
I would end up getting
2014-08-04 22:27:42 ERROR juju.cmd supercommand.go:323 upgrade in progress - Juju functionality is limited
Running from am amazon machine.

Changed in juju-core:
assignee: Horacio Durán (hduran-8) → nobody
Changed in juju-core:
assignee: nobody → Horacio Durán (hduran-8)
Revision history for this message
Horacio Durán (hduran-8) wrote :

The upgrade in progress issue was a red herring, the CI test was just running too fast (or the upgrade too slow) I changed the CI test in my local version to wait between bootstrap and the rest of the test and that let me go through, I am finding a lot of amazon related errors though.

Revision history for this message
Horacio Durán (hduran-8) wrote :

ok I reverted to an old version, before the suspicious merge, I used 8275fa461e1644ac073db177c5e456db24991a0c
I Find that the error that seems to be causing failures is still present, below the relevant part from which I was not able to get any useful conclusion.

As a side note the test seems to run faster after reverting, but the error is present before and after so I believe the timeout error on the CI test might be just masking something else.

To trigger this I ran the test as follows:

bzr branch lp:juju-ci-tools/repository
bzr branch lp:juju-ci-tools
export JUJU_REPOSITORY=<path to>/repository
cd juju-ci-tools

then changed jujupy:

------------------------------------------------------------------------------------------------------------------------------------
=== modified file 'jujupy.py'
--- jujupy.py 2014-06-25 21:35:55 +0000
+++ jujupy.py 2014-07-24 15:29:21 +0000
@@ -69,7 +69,7 @@
     def __init__(self, version, full_path):
         self.version = version
         self.full_path = full_path
- self.debug = False
+ self.debug = True

     @classmethod
     def get_version(cls):

------------------------------------------------------------------------------------------------------------------------------------

then ran ./test_recovery.py --ha --charm-prefix=local:precise/ $GOPATH/bin/ <your env>

for this you need to publish your tool set to have a custom stream, you can create that with the attached script, after running the script it will provide instructions to change your environements file accordingly.

With all this the output is.

The relevant part of the output below:

Waiting for port to close on ec2-54-190-51-172.us-west-2.compute.amazonaws.com
Closed.
!!! 2014-08-05 22:30:42 INFO juju.cmd supercommand.go:37 running juju [1.21-alpha1-trusty-amd64 gc]
2014-08-05 22:30:42 DEBUG juju.api api.go:151 trying cached API connection settings
2014-08-05 22:30:42 INFO juju.api api.go:234 connecting to API addresses: [ec2-54-190-83-191.us-west-2.compute.amazonaws.com:17070 ip-10-81-2-210.us-west-2.compute.internal:17070 10.81.2.210:17070 54.190.83.191:17070 ec2-54-190-51-172.us-west-2.compute.amazonaws.com:17070 ip-10-238-43-121.us-west-2.compute.internal:17070 10.238.43.121:17070 54.190.51.172:17070 ec2-54-188-205-191.us-west-2.compute.amazonaws.com:17070 ip-10-36-71-92.us-west-2.compute.internal:17070 10.36.71.92:17070 54.188.205.191:17070]
2014-08-05 22:30:42 INFO juju.state.api apiclient.go:252 dialing "wss://ec2-54-190-83-191.us-west-2.compute.amazonaws.com:17070/environment/be46764e-be85-4c76-8d5d-d28c922b2d50/api"
2014-08-05 22:30:42 INFO juju.state.api apiclient.go:175 connection established to "wss://ec2-54-190-83-191.us-west-2.compute.amazonaws.com:17070/environment/be46764e-be85-4c76-8d5d-d28c922b2d50/api"
2014-08-05 22:30:44 INFO juju.provider.ec2 ec2.go:202 opening environment "perritoec2"
2014-08-05 22:30:44 DEBUG juju.environs utils.go:85 StateServerInstances returned: [i-8a384581]
2014-08-05 22:30:49 DEBUG juju.environs utils.go:67 error getting state instances: no instances found
2014-08-05 22:34:59 ERROR juju.cmd supercommand.go:323 connection is shut down

Changed in juju-core:
assignee: Horacio Durán (hduran-8) → Menno Smits (menno.smits)
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

Likely caused by 62e172632c3e9d8496805ed5223f9f4acc28986a.

Some discussions in progress over email.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

Problematic revision reverted in https://github.com/juju/juju/pull/472

The HA CI tests now work when run manually when they didn't before.

Changed in juju-core:
status: Triaged → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.