deployer and quickstart are broken in 1.24-alpha1
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | juju-core |
Critical
|
Horacio Durán | ||
| | 1.24 |
Critical
|
Horacio Durán | ||
Bug Description
There appears to be changes to api or a decline its reliability that broken deployer and later quickstart in aws, hp, and joyent. MAAS continued to work until recently
Deployer broke in on or before commit e374bae as is seen in the bundle tests section in:
http://
The deployer stack appears to be up, but I think the relation between the django units and the haproxy is missing.
Quickstart continued to work until commit 0f79f48 as is seen in the bundle tests section in:
http://
Quickstart isn't informative because it swallows what it is doing.
The last successful run of both deployer and quickstart was
http://
which looks like this
http://
I believe CI's tests also changed so the issue might be the test script exiting prematurely.
| Changed in juju-ci-tools: | |
| status: | New → Triaged |
| importance: | Undecided → High |
| Curtis Hovey (sinzui) wrote : | #1 |
| Curtis Hovey (sinzui) wrote : | #2 |
We have retest 1.24 and 1.23 and we can see that 1.23 works with deployer and quickstart in maas, hp, aws, and joyent, but none of these work for 1.24. This looks like a regression in 1.24.
The maas 1.7 deployer test is the most informative because it has the highest timeout set. We see deployer gived up
2015-04-09 14:19:10 [DEBUG] deployer.env: Delta unit: landscape/0 change:executing
2015-04-09 14:19:20 [DEBUG] deployer.env: Delta unit: landscape/0 change:idle
2015-04-09 15:02:43 [DEBUG] deployer.env: Connecting to environment...
2015-04-09 15:02:44 [DEBUG] deployer.env: Connected to environment
2015-04-09 15:02:44 [ERROR] deployer.import: Reached deployment timeout.. exiting
2015-04-09 15:02:44 [INFO] deployer.cli: Deployment stopped. run time: 3041.44
We expect too see messages like this after connecting
2015-04-09 17:52:46 [INFO] deployer.import: Adding relations...
2015-04-09 17:52:46 [INFO] deployer.import: Adding relation landscape <-> rabbitmq-server
2015-04-09 17:52:46 [INFO] deployer.import: Adding relation landscape <-> haproxy
2015-04-09 17:52:47 [INFO] deployer.import: Adding relation landscape:
2015-04-09 17:52:47 [INFO] deployer.import: Adding relation landscape:db-admin <-> postgresql:db-admin
2015-04-09 17:52:47 [INFO] deployer.import: Adding relation haproxy:website <-> apache2:
2015-04-09 17:52:48 [INFO] deployer.import: Adding relation landscape-msg <-> rabbitmq-server
2015-04-09 17:52:48 [INFO] deployer.import: Adding relation landscape-msg <-> haproxy
2015-04-09 17:52:48 [INFO] deployer.import: Adding relation landscape-
2015-04-09 17:52:49 [DEBUG] deployer.import: Waiting for relation convergence 60s
2015-04-09 17:53:53 [INFO] deployer.import: Exposing service 'apache2'
2015-04-09 17:53:53 [INFO] deployer.cli: Deployment complete in 500.56 seconds
| tags: | added: api deployer quickstart |
| Ian Booth (wallyworld) wrote : | #3 |
Issue appears to be related to failure to start lxc instance:
2/lxc/0:
series: trusty
2/lxc/1:
series: trusty
Probably related or a duplicate of this bug 1441319
| Curtis Hovey (sinzui) wrote : | #4 |
The lxc container output looks like the deployer test of the many app-servers behind a haproxy. The quickstart test uses the landscape scalable bundle which doesn't use containers
http://
| Curtis Hovey (sinzui) wrote : | #5 |
This might provide more information. Download
https:/
With juju 1.24-alpha1
juju --debug deployer --deploy-delay 10 --config landscape-
| Ian Booth (wallyworld) wrote : | #6 |
I tried the deployer on AWS. Got a deployer timeout. Tried juju debug-log, but that appears broken also:
$ juju debug-log -n 1000
ERROR cannot open log file: open /var/log/
NB I had to revert to an earlier revision of 1.24 to avoid the ssh bug preventing bootstrap.
The status below shows the landscape charms are stuck installing. sshing into machine 4 and looking at the unit log shows the unit agent restarting due to rsyslog connection errors:
2015-04-23 04:27:08 INFO juju.worker runner.go:261 start "rsyslog"
2015-04-23 04:27:08 DEBUG juju.worker.rsyslog worker.go:93 starting rsyslog worker mode 1 for "unit-landscape
2015-04-23 04:27:08 DEBUG juju.worker.rsyslog worker.go:190 making syslog connection for "juju-unit-
2015-04-23 04:27:08 ERROR juju.worker runner.go:219 exited "rsyslog": dial tcp 10.236.
2015-04-23 04:27:08 INFO juju.worker runner.go:253 restarting "rsyslog" in 3s
So it appears maybe recent changes to logging are:
1. breaking debug-log
2. stopping some unit agents from starting
juju --debug deployer --deploy-delay 10 --config ~/landscape-
2015-04-23 03:19:31 INFO juju.cmd supercommand.go:37 running juju [1.24-alpha1-
2015-04-23 13:19:31 Using deployment landscape-scalable
2015-04-23 13:19:31 Starting deployment of landscape-scalable
2015-04-23 13:19:48 Deploying services...
2015-04-23 13:19:51 Deploying service apache2 using cs:trusty/apache2-4
2015-04-23 13:20:14 Deploying service haproxy using cs:trusty/haproxy-1
2015-04-23 13:20:36 Deploying service landscape using cs:trusty/
2015-04-23 13:21:02 Deploying service landscape-msg using cs:trusty/
2015-04-23 13:21:26 Deploying service postgresql using cs:trusty/
2015-04-23 13:21:53 Deploying service rabbitmq-server using cs:trusty/
2015-04-23 14:16:44 Reached deployment timeout.. exiting
2015-04-23 14:16:44 Deployment stopped. run time: 3433.34
2015-04-23 04:16:44 ERROR juju.cmd supercommand.go:430 subprocess encountered error code 1
$ juju status
[Machines]
ID STATE VERSION DNS INS-ID SERIES HARDWARE
0 started 1.24-alpha1.1 54.82.28.180 i-d741242a trusty arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M availability-
1 started 1.24-alpha1.1 54.158.244.52 i-da1a3626 trusty arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M availability-
2 started 1.24-alpha1.1 54.91.176.112 i-3b4ba4ed trusty arch=amd64 cpu-cores=1 cpu-power=100 mem=1740M root-disk=8192M availability-
3 started 1.24-alpha1.1 54.90.7.109 i-1ceddc33 trusty arch=amd64 cpu-cores=1 cpu-power=300 mem=3840M root-disk=8192M availability-
4 started 1.24-alpha1.1 54.146.74.189 i-03eddc2c trusty arch=amd64 cpu-cores=1 cpu-power=300 mem=3840M root-disk=8192M availability-
5 started 1.24-alpha1.1 54.145.18.41 i-8d4b2e70 trust...
| tags: | added: blocker |
| Changed in juju-core: | |
| importance: | High → Critical |
| Changed in juju-core: | |
| milestone: | 1.24-alpha1 → 1.25.0 |
| importance: | Critical → High |
| Ian Booth (wallyworld) wrote : | #7 |
In comment 6 https:/
| Changed in juju-core: | |
| importance: | High → Critical |
| Horacio Durán (hduran-8) wrote : | #8 |
This was a regression introduced by changes in multiwatcher, I am working on a patch to solve it.
| Changed in juju-core: | |
| assignee: | nobody → Horacio Durán (hduran-8) |
| Ian Booth (wallyworld) wrote : Re: | #9 |
You found the cause, awesome. What was the regression?
On 29/04/15 04:04, Horacio Durán wrote:
> This was a regression introduced by changes in multiwatcher, I am
> working on a patch to solve it.
>
The legacy status for units in the megawatcher was not being properly set,
it did not follow the correct rules resulting in very odd statuses
(different from the ones in status output) and therefore never arriving to
the statuses expected by deployer.
On Tue, Apr 28, 2015 at 6:12 PM, Ian Booth <email address hidden> wrote:
> You found the cause, awesome. What was the regression?
>
> On 29/04/15 04:04, Horacio Durán wrote:
> > This was a regression introduced by changes in multiwatcher, I am
> > working on a patch to solve it.
> >
>
> --
> You received this bug notification because you are a bug assignee.
> https:/
>
> Title:
> deployer and quickstart are broken in 1.24-alpha1
>
> Status in Juju CI Tools:
> Triaged
> Status in juju-core:
> Triaged
> Status in juju-core 1.24 series:
> In Progress
>
> Bug description:
> There appears to be changes to api or a decline its reliability that
> broken deployer and later quickstart in aws, hp, and joyent. MAAS
> continued to work until recently
>
> Deployer broke in on or before commit e374bae as is seen in the bundle
> tests section in:
> http://
> The deployer stack appears to be up, but I think the relation between
> the django units and the haproxy is missing.
>
>
> Quickstart continued to work until commit 0f79f48 as is seen in the
> bundle tests section in:
> http://
> Quickstart isn't informative because it swallows what it is doing.
>
> The last successful run of both deployer and quickstart was
> http://
> which looks like this
>
> http://
>
> I believe CI's tests also changed so the issue might be the test
> script exiting prematurely.
>
> To manage notifications about this bug go to:
> https:/
>
| Horacio Durán (hduran-8) wrote : | #11 |
I just proposed the fix:
http://
http://
| Changed in juju-core: | |
| assignee: | Horacio Durán (hduran-8) → nobody |
| status: | Triaged → In Progress |
| Changed in juju-core: | |
| status: | In Progress → Fix Committed |
| assignee: | nobody → Horacio Durán (hduran-8) |
| no longer affects: | juju-ci-tools |
| Changed in juju-core: | |
| status: | Fix Committed → Fix Released |


We can see in last test of 1.23-beta3 that everything passed reports. vapour. ws/releases/ 2521
http://
Then we tested 1.23-beta4 and Hp failed because some instances were left behind from a previous test reports. vapour. ws/releases/ 2522 reports. vapour. ws/releases/ 2523
http://
http://
but when we tested master before and after the 1.23 version we see total failure. We restested some of substrates several times. reports. vapour. ws/releases/ 2520 reports. vapour. ws/releases/ 2525
http://
http://
So we know we can test 1.22 and 1.23 and expect all to pass, but one might need retesting because the substrate was dirty.