HA and backup recovery tests failed

Bug #1450573 reported by Curtis Hovey
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Triaged
Critical
Unassigned

Bug Description

Build: #2581 Revision: gitbranch:1.24:github.com/juju/juju fffe3e4f Version: 1.24-alpha1
     http://reports.vapour.ws/releases/2581

Failed tests
aws-deployer-bundle build #267 http://reports.vapour.ws/releases/2581/job/aws-deployer-bundle/attempt/267
functional-backup-restore build #2501 http://reports.vapour.ws/releases/2581/job/functional-backup-restore/attempt/2501
functional-ha-backup-restore build #1767 http://reports.vapour.ws/releases/2581/job/functional-ha-backup-restore/attempt/1767
functional-ha-recovery build #1690 http://reports.vapour.ws/releases/2581/job/functional-ha-recovery/attempt/1690
functional-restricted-network build #1455 http://reports.vapour.ws/releases/2581/job/functional-restricted-network/attempt/1455

The functional-restricted-network and aws-deployer-bundle do not assess recovery, but they all played in aws, and they involve interesting networking.

Curtis Hovey (sinzui)
description: updated
Revision history for this message
Ian Booth (wallyworld) wrote :

The deployer bundle test logs show:

    containers:
      2/lxc/0:
        agent-state-info: 'failed to retrieve the template to clone: template container
          "juju-trusty-lxc-template" did not stop'
        instance-id: pending
        series: trusty
      2/lxc/1:
        agent-state-info: 'lxc container cloning failed: cannot clone a running container'
        instance-id: pending
        series: trusty

This is already reported as bug 1441319 (may be the same issue). When this has happened in the past, logging onto the node afterwards shows that the template containers did eventually stop, but not in time for juju before it gave up. Juju waits 5 minutes, but if the host node is I/O bound or has other similar issues, the container takes too long to run cloud init and then shutdown.

Revision history for this message
Ian Booth (wallyworld) wrote :

The stock functional-backup-restore test seems to show that the units are still installing the charm and that the test got impatient and gave up. ie Juju still seemed to be functioning ok, but was slow. This coupled with the deployer failure due to slowness seems to hint that perhaps the system was experiencing performance issues when the test was run.

Revision history for this message
Horacio Durán (hduran-8) wrote :

I agree with Ian, the new status indicates it is still doing something (although I was under the impression that cs:ubuntu was a shallow charm) so it seems to me it is a matter of slownes of the system.
All machines log and the unit's log say everything went ok too.
The unit log though seems to imply that the charm is indeed installed properly so I wonder if there is not something broken with the charm itself.

Revision history for this message
Ian Booth (wallyworld) wrote :

I think this is actually a duplicate of bug 1450917
Some logic to transition the reported workload state from installing wasn't being run because missing hooks were not causing the correct code path to be executed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.