juju upgrade-juju --upload-tools broken on ec2
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
juju-core |
Invalid
|
High
|
Jesse Meek |
Bug Description
juju1.24 on ec2:
Server before upgrade:
ubuntu@
0
$ juju upgrade-juju --upload-tools
available tools:
1.24-
best version:
1.24-beta5.2
ERROR error receiving message: read tcp 54.82.43.195:17070: connection timed out
This error is consistently hit. On the server:
ubuntu@
22
ubuntu@
tcp 1 0 10.230.164.3:33074 91.189.88.141:80 CLOSE_WAIT 13218/jujud
tcp 1 0 10.230.164.3:47129 54.231.15.9:80 CLOSE_WAIT 13218/jujud
...
jujud is not upgraded:
ubuntu@
lrwxrwxrwx 1 root root 0 May 26 22:30 /proc/13218/exe -> /var/lib/
ubuntu@
1.24-beta5.
ubuntu@
jujud 13218 root 36u IPv4 21947 0t0 TCP ip-10-230-
jujud 13218 root 37u IPv4 21900 0t0 TCP ip-10-230-
...
The 22 connections are all from machine-0 to s3 and cloud images.
$ juju upgrade-juju
...
best version:
1.24-beta5.2
This selects the tools previously uploaded. Sometimes everything breaks at this point: juju status hangs, the apiserver does not come back up, the old jujud and CLOSE_WAIT sockets are still up - rebooting the server (killing the old stuck jujud) resolves the issue.
Sometimes the upgrade succeeds:
$ juju status
...
agent-version: 1.24-beta5.2
And the server looks healthy:
/var/lib/
1.24-beta5.
ubuntu@
0
ubuntu@
0
ubuntu@
tcp 0 0 127.0.0.1:41624 127.0.0.1:17070 ESTABLISHED 13583/jujud
tcp 0 0 10.230.164.3:45996 10.230.164.3:37017 ESTABLISHED 13583/jujud
Repeating the process:
$juju upgrade-juju --upload-tools
(hangs as above)
ubuntu@
3
ubuntu@
jujud 13583 root 59u IPv4 41392 0t0 TCP ip-10-230-
... (all s3)
$juju upgrade-juju --upload-tools
(hangs as above)
ubuntu@
6
3 more CLOSE_WAIT connections to s3 are added everytime `juju upgrade-juju --upload-tools` is called. I left the server over night and the connections had not been closed by the morning.
Changed in juju-core: | |
assignee: | nobody → Jesse Meek (waigani) |
status: | New → In Progress |
Changed in juju-core: | |
milestone: | 1.24-beta6 → none |
After much poking, it looks like this is a network problem - isolated either to my router or ISP. I just tested on the university network in town and it works no problem.