juju-core

HA and backup recovery tests failed

Bug #1450573 reported by Curtis Hovey on 2015-04-30

This bug report is a duplicate of: Bug #1450917: afterHook logic skipped if hook is missing. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	juju-core	Triaged	Critical	Unassigned	juju-core 1.24-alpha1

Bug Description

Build: #2581 Revision: gitbranch:1.24:github.com/juju/juju fffe3e4f Version: 1.24-alpha1
http://reports.vapour.ws/releases/2581

Failed tests
aws-deployer-bundle build #267 http://reports.vapour.ws/releases/2581/job/aws-deployer-bundle/attempt/267
functional-backup-restore build #2501 http://reports.vapour.ws/releases/2581/job/functional-backup-restore/attempt/2501
functional-ha-backup-restore build #1767 http://reports.vapour.ws/releases/2581/job/functional-ha-backup-restore/attempt/1767
functional-ha-recovery build #1690 http://reports.vapour.ws/releases/2581/job/functional-ha-recovery/attempt/1690
functional-restricted-network build #1455 http://reports.vapour.ws/releases/2581/job/functional-restricted-network/attempt/1455

The functional-restricted-network and aws-deployer-bundle do not assess recovery, but they all played in aws, and they involve interesting networking.

See original description

Tags:

Curtis Hovey (sinzui) on 2015-04-30

description:

updated

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-04-30:

The deployer bundle test logs show:

    containers:
      2/lxc/0:
        agent-state-info: 'failed to retrieve the template to clone: template container
          "juju-trusty-lxc-template" did not stop'
        instance-id: pending
        series: trusty
      2/lxc/1:
        agent-state-info: 'lxc container cloning failed: cannot clone a running container'
        instance-id: pending
        series: trusty

This is already reported as bug 1441319 (may be the same issue). When this has happened in the past, logging onto the node afterwards shows that the template containers did eventually stop, but not in time for juju before it gave up. Juju waits 5 minutes, but if the host node is I/O bound or has other similar issues, the container takes too long to run cloud init and then shutdown.

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-04-30:

The stock functional-backup-restore test seems to show that the units are still installing the charm and that the test got impatient and gave up. ie Juju still seemed to be functioning ok, but was slow. This coupled with the deployer failure due to slowness seems to hint that perhaps the system was experiencing performance issues when the test was run.

Revision history for this message

Horacio Durán (hduran-8) wrote on 2015-05-01:

I agree with Ian, the new status indicates it is still doing something (although I was under the impression that cs:ubuntu was a shallow charm) so it seems to me it is a matter of slownes of the system.
All machines log and the unit's log say everything went ok too.
The unit log though seems to imply that the charm is indeed installed properly so I wonder if there is not something broken with the charm itself.

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-05-04:

I think this is actually a duplicate of bug 1450917
Some logic to transition the reported workload state from installing wasn't being run because missing hooks were not causing the correct code path to be executed.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1450917 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.