Juju-ci3 cannot upgrade to 1.21.1

Bug #1417308 reported by Curtis Hovey
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Triaged
High
Unassigned

Bug Description

juju-ci3 is in upgrade hell to 1.21.1 state server upgraded in a few minutes, after an hour it fell over. I restarted the agents and can see it, but all the other machines are down. They are trying to contact 10.0.3.1...which is not where the state server is

35 of 36 agents are wrong after buggered upgrade

2015-02-02 21:21:32 INFO juju.worker runner.go:260 start "api"
2015-02-02 21:21:32 INFO juju.state.api apiclient.go:242 dialing "wss://10.0.3.1:17070/"
2015-02-02 21:21:32 INFO juju.state.api apiclient.go:250 error dialing "wss://10.0.3.1:17070/": websocket.Dial wss://10.0.3.1:17070/: dial tcp 10.0.3.1:17070: connection refused
2015-02-02 21:21:32 ERROR juju.worker runner.go:218 exited "api": unable to connect to "wss://10.0.3.1:17070/"
2015-02-02 21:21:32 INFO juju.worker runner.go:252 restarting "api" in 3s

The address should be 172.31.9.3 or ip-172-31-9-3.ec2.internal

Setting the apiaddress to ip-172-31-9-3.ec2.internal:17070 in both machine and service agent.conf files allowed the agents to call home. They saw they needed
an upgrade...

2015-02-02 21:51:04 INFO juju.worker.upgrader upgrader.go:134 upgrade requested from 1.20.13-precise-amd64 to 1.21.1
2015-02-02 21:51:04 INFO juju.worker.upgrader upgrader.go:167 fetching tools from "https://10.0.3.1:17070/environment/e7513400-44ff-40a0-8e6a-8776784d94fa/tools/1.21.1-precise-amd64"
2015-02-02 21:51:04 INFO juju.utils http.go:66 hostname SSL verification disabled
2015-02-02 21:51:04 ERROR juju.worker.upgrader upgrader.go:157 failed to fetch tools from "https://10.0.3.1:17070/environment/e7513400-44ff-40a0-8e6a-8776784d94fa/tools/1.21.1-precise-amd64": Get https://10.0.3.1:17070/environment/e7513400-44ff-40a0-8e6a-8776784d94fa/tools/1.21.1-precise-amd64: dial tcp 10.0.3.1:17070: connection refused

Which is again the wrong address. Reseting the apiaddress to the private ip
or the private name fails because the value is rewritten to an impossible
value.

Attached is a log from machine-8 (juju-reports, the app behind reports.vapour.ws) We can see the api server was connacted after I changed the api address, but the value is immediately changed back to 10.0.3.1.

Revision history for this message
Curtis Hovey (sinzui) wrote :
Revision history for this message
Curtis Hovey (sinzui) wrote :

I tried changing state-server to
    apiaddress: ip-172-31-9-3.ec2.internal
and restarted, but it like all the other machines are ignoring the value, it is alaways rewritten to 10.0.3.1

Revision history for this message
Tim Penhey (thumper) wrote :

This is the same problem as the manual provider install with lxc.

the 10.0.3.1 is the default ip address of lxcbr0

Somewhere, the code that is looking at the ip addresses of the local machine, it is getting the bridge address.

This is then considered a 'cloud private' address because it is in the class A private network 10.0.0.0/8.

Revision history for this message
Paul Gear (paulgear) wrote :

This may be related to bug 1416928.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.