bundle deployments appear broken under 2.5.2 models

Bug #1821418 reported by Drew Freiberger on 2019-03-22
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju
Critical
Heather Lanigan

Bug Description

When running juju deploy ./bundle.yaml --dry-run, I'm receiving the error:

ERROR cannot deploy bundle: connection is shut down

Output from a juju deploy --debug and the server-side logs here:

https://pastebin.ubuntu.com/p/XFjY8SJCgG/

I ruled out things like MTU/networking issues. It's happening on two discrete sites in different datacenters. One was upgraded from 2.4.1 to 2.5.2, and the other was upgraded from 2.4.4 to 2.5.2 in the past week.

Seyeong Kim (xtrusia) wrote :

I think log time is not match between server and client.

There is 3 mins differences.

Drew Freiberger (afreiberger) wrote :

Sorry for the side-track with the timestamps. these were example logs taken from two of hundreds of different attempts to run this with various debugging options. The times are in-sync between client and server.

Drew Freiberger (afreiberger) wrote :

When I run this against a blank model, the bundle deployment continues onward. The step that is just after the timeout outputs:

DEBUG juju.cmd.juju.application bundle.go:273 model: &bundlechanges.Model{
    Applications: {
    },
    Machines: {
    },
    Relations: nil,
    ConstraintsEqual: func(string, string) bool {...},
    Sequence: {},
    sequence: {},
    MachineMap: {},
    logger: nil,
}

then it continues on to resolve charm URLs as the next step.

perhaps the bundlechanges.Model is bombing out during generation by the server causing connection abort.

Drew Freiberger (afreiberger) wrote :

When I switch to the controller model which has only the 3 machines in it, it also presents "connection is shut down" even though it is a very simple model.

Drew Freiberger (afreiberger) wrote :

Adding field critical.

This issue is delaying several project deployments on multiple customer sites.

Changed in juju:
status: New → Incomplete
status: Incomplete → Triaged
importance: Undecided → Critical
assignee: nobody → Heather Lanigan (hmlanigan)
Drew Freiberger (afreiberger) wrote :

I'm able to reproduce this simply in a lab on LXD provider:

sudo snap refresh juju --channel 2.4/stable
juju bootstrap localhost lxd --config config.yaml
juju deploy ubuntu
juju export-bundle > ./bundle24.yaml
juju deploy --dry-run ./bundle24.yaml (result is "no changes to apply")
sudo snap refresh juju --channel 2.5/stable
juju deploy --dry-run ./bundle24.yaml (result is "no changes to apply")
juju upgrade-model -m controller; sleep 300
juju deploy --dry-run ./bundle24.yaml (result is "ERROR cannot deploy bundle: connection is shut down")
juju export-bundle > ./bundle25.yaml
juju deploy ./bundle25.yaml (result is "ERROR cannot deploy bundle: connection is shut down")

no diff between the bundle24/25.yaml files, it seems the error is on the controller upgrade from 2.4 to 2.5.

config.yaml contents:

default-series: xenial
apt-http-proxy: http://10.0.8.1:8000

Drew Freiberger (afreiberger) wrote :

reproducer bundle:

series: xenial
applications:
  ubuntu:
    charm: cs:ubuntu-12
    num_units: 1
    to:
    - "0"
machines:
  "0": {}

Changed in juju:
status: Triaged → In Progress
Heather Lanigan (hmlanigan) wrote :

Reproduced with info in comments #6 and #7.

The error is specific to models created before upgrade of the controller, new 2.5.2 models can deploy the bundle with no errors.

I did notice, if you go back to use the 2.4.7 juju client successful to deploy:

$ /snap/bin/juju version
2.5.2-cosmic-amd64
$ /snap/bin/juju status
Model Controller Cloud/Region Version SLA Timestamp
ubuntu 1821418-2.4.7 localhost/localhost 2.5.2 unsupported 14:56:25-04:00

App Version Status Scale Charm Store Rev OS Notes
ubuntu 16.04 active 1 ubuntu jujucharms 12 ubuntu

Unit Workload Agent Machine Public address Ports Message
ubuntu/0* active idle 0 10.121.191.12 ready

Machine State DNS Inst id Series AZ Message
0 started 10.121.191.12 juju-1f3ed3-0 xenial Running

8$ /snap/bin/juju deploy ./bundle25.yaml
ERROR cannot deploy bundle: connection is shut down
$ sudo snap refresh juju --channel 2.4/stable
juju (2.4/stable) 2.4.7 from Canonical✓ refreshed
$ /snap/bin/juju version
2.4.7-cosmic-amd64
$ /snap/bin/juju deploy ./bundle25.yaml
No changes to apply.
$

Heather Lanigan (hmlanigan) wrote :

The failure happens during this call:
annotations, err := apiRoot.GetAnnotations(annotationTags)

https://github.com/juju/juju/blob/2.5/cmd/juju/application/bundle.go#L1504

James Troup (elmo) wrote :

Since a viable work around is to downgrade your client, I'm downgrading this to field-high.

Tim Penhey (thumper) wrote :

Found the bug, it is a missing upgrade step for the machine modifications status document, along with a long existent bug that had never been triggered before.

The missing document triggered an error status return over the API. The deserialization code couldn't handle it and the low level API RPC code just closes the connection.

I'm uploading a fix for the old status serialization bug, but we still need another branch to add an upgrade step for the necessary status documents to be added for machines on upgrade.

Changed in juju:
status: In Progress → Fix Committed
Heather Lanigan (hmlanigan) wrote :

This was fixed released in 2.5.4.

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers