`juju upgrade-juju --upload-tools` leaves local environment unusable
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | juju-core |
High
|
Andrew Wilkins | ||
| | 1.24 |
Critical
|
Andrew Wilkins | ||
Bug Description
I've been running the devel releases of juju 1.24. Each upgrade, from beta1 -> beta2, beta2 -> beta3, and beta3-beta4, has left the local environment unusable.
My environment:
Trusty, running inside a Vagrant VM
Juju 1.24-beta3
provider: local
Steps to reproduce:
1) apt-get update && apt-get upgrade
2) verify new version with `juju version`
3) run `juju upgrade-juju --upload-tools`
Once the above steps are run, juju commands become non-responsive. The `juju status --debug` output shows a connection refused: https:/
I restarted `juju-agent-
The machine-0.log: https:/
I've been able to recreate this reliably with each beta upgrade. The only solution I've found is to `juju destroy-environment --force` and re-bootstrap.
| Changed in juju-core: | |
| milestone: | none → 1.24-beta5 |
| importance: | Undecided → High |
| Changed in juju-core: | |
| status: | New → Triaged |
| tags: | added: upgrade-juju vagrant |
| description: | updated |
| tags: | added: local-provider |
| Andrew Wilkins (axwalk) wrote : | #2 |
Unassigning myself for the minute, as the bug I was working on isn't actually fixed yet.
| Jesse Meek (waigani) wrote : | #3 |
This may be due to a bug in the golang.org/x/net package where a websocket was not being closed correctly. We hit similar error messages when we discovered this bug, in particular: "error closing codec: EOF".
The bug has been fixed in the upstream net package and in 1.24-beta4 which uses the new revision (bb64f4dc73). This appears to fix the issue on my box, but as it is intermittent could others also test and verify? Update dependencies.tsv:
golang.org/x/net git bb64f4dc73d4ab9
| Andrew Wilkins (axwalk) wrote : | #4 |
@waigani: I'm pretty sure I was was testing on head of 1.25 yesterday, but will confirm later on. Also, this is new; our usage of websockets is not.
| Andrew Wilkins (axwalk) wrote : | #5 |
err sorry, s/1.25/1.24/
| Andrew Wilkins (axwalk) wrote : | #6 |
After much printf debugging, it appears that a call to LeadershipServi
| Andrew Wilkins (axwalk) wrote : | #7 |
So this turns out to be quite an insidious bug, related to lease/leadership. BlockUntilLeade
- BlockUntilLeade
- Subscribers are notified (of failure) when the lease manager exits.
or something like that.
| Changed in juju-core: | |
| assignee: | nobody → Andrew Wilkins (axwalk) |
| status: | Triaged → Fix Committed |
| Changed in juju-core: | |
| status: | Fix Committed → Fix Released |


I can repro, but it doesn't happen all the time for me. Looking into it.