juju upgrade-juju --upload-tools broken on ec2

Bug #1459047 reported by Jesse Meek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Invalid
High
Jesse Meek

Bug Description

juju1.24 on ec2:

Server before upgrade:

ubuntu@ip-10-230-164-3:~$ sudo lsof | grep CLOSE_WAIT | wc -l
0

$ juju upgrade-juju --upload-tools
available tools:
    1.24-beta5.2-trusty-amd64
best version:
    1.24-beta5.2
ERROR error receiving message: read tcp 54.82.43.195:17070: connection timed out

This error is consistently hit. On the server:

ubuntu@ip-10-230-164-3:~$ sudo netstat -pan | grep CLOSE_WAIT | wc -l
22

ubuntu@ip-10-230-164-3:~$ sudo netstat -pan | grep CLOSE_WAIT
tcp 1 0 10.230.164.3:33074 91.189.88.141:80 CLOSE_WAIT 13218/jujud
tcp 1 0 10.230.164.3:47129 54.231.15.9:80 CLOSE_WAIT 13218/jujud
...

 jujud is not upgraded:

ubuntu@ip-10-230-164-3:~$ sudo ls -l /proc/13218/exe
lrwxrwxrwx 1 root root 0 May 26 22:30 /proc/13218/exe -> /var/lib/juju/tools/1.24-beta5.1-trusty-amd64/jujud

ubuntu@ip-10-230-1/var/lib/juju/tools/machine-0/jujud version
1.24-beta5.1-trusty-amd64

ubuntu@ip-10-230-164-3:~$ sudo lsof -p 13218 | grep CLOSE_WAIT
jujud 13218 root 36u IPv4 21947 0t0 TCP ip-10-230-164-3.ec2.internal:33028->cloud-images-ubuntu-com.sawo.canonical.com:http (CLOSE_WAIT)
jujud 13218 root 37u IPv4 21900 0t0 TCP ip-10-230-164-3.ec2.internal:47074->s3-1-w.amazonaws.com:http (CLOSE_WAIT)
...

The 22 connections are all from machine-0 to s3 and cloud images.

$ juju upgrade-juju
...
best version:
    1.24-beta5.2

This selects the tools previously uploaded. Sometimes everything breaks at this point: juju status hangs, the apiserver does not come back up, the old jujud and CLOSE_WAIT sockets are still up - rebooting the server (killing the old stuck jujud) resolves the issue.

Sometimes the upgrade succeeds:
$ juju status
...
agent-version: 1.24-beta5.2

And the server looks healthy:

/var/lib/juju/tools/machine-0/jujud version
1.24-beta5.2-trusty-amd64

ubuntu@ip-10-230-164-3:~$ sudo netstat -an | grep CLOSE_WAIT | wc -l
0

ubuntu@ip-10-230-164-3:~$ sudo lsof | grep s3 | wc -l
0

ubuntu@ip-10-230-164-3:~$ sudo netstat -pan | grep jujud
tcp 0 0 127.0.0.1:41624 127.0.0.1:17070 ESTABLISHED 13583/jujud
tcp 0 0 10.230.164.3:45996 10.230.164.3:37017 ESTABLISHED 13583/jujud

Repeating the process:

$juju upgrade-juju --upload-tools
(hangs as above)

ubuntu@ip-10-230-164-3:~$sudo netstat -an | grep CLOSE_WAIT | wc -l
3

ubuntu@ip-10-230-164-3:~$ sudo lsof -p 13583 | grep CLOSE_WAIT
jujud 13583 root 59u IPv4 41392 0t0 TCP ip-10-230-164-3.ec2.internal:40429->s3-1-w.amazonaws.com:http (CLOSE_WAIT)
... (all s3)

$juju upgrade-juju --upload-tools
(hangs as above)

ubuntu@ip-10-230-164-3:~$ sudo lsof -p 13583 | grep CLOSE_WAIT | wc -l
6

3 more CLOSE_WAIT connections to s3 are added everytime `juju upgrade-juju --upload-tools` is called. I left the server over night and the connections had not been closed by the morning.

Jesse Meek (waigani)
Changed in juju-core:
assignee: nobody → Jesse Meek (waigani)
status: New → In Progress
Revision history for this message
Jesse Meek (waigani) wrote :

After much poking, it looks like this is a network problem - isolated either to my router or ISP. I just tested on the university network in town and it works no problem.

Changed in juju-core:
status: In Progress → Invalid
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.24-beta6 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.