juju occasionally switches a units public-address if an additional interface is added post-deployment
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | juju-core |
High
|
Michael Foord | ||
| | 1.24 |
High
|
Michael Foord | ||
| | 1.25 |
High
|
Michael Foord | ||
Bug Description
If an additional port is added to a guest then the juju public-address of that unit will occasionally change which breaks some of the Openstack teams tests (see below). It would be good if public-address didn't flip like this.
In the below example additional nics were added to neutron-gateway/0 and neutron-gateway/1. The ip of neutron-gateway/0 flipped but neutron-gateway/1 did not.
$ juju status neutron-gateway
environment: lytrusty
machines:
"11":
agent-state: started
agent-version: 1.23-beta1.1
dns-name: 10.5.21.19
instance-id: f9b14208-
instance-state: ACTIVE
series: trusty
hardware: arch=amd64 cpu-cores=1 mem=1536M root-disk=10240M availability-
"12":
agent-state: started
agent-version: 1.23-beta1.1
dns-name: 10.5.21.10
instance-id: 67b917f1-
instance-state: ACTIVE
series: trusty
hardware: arch=amd64 cpu-cores=1 mem=1536M root-disk=10240M availability-
services:
neutron-gateway:
charm: local:trusty/
exposed: false
relations:
amqp:
- rabbitmq-server
cluster:
- neutron-gateway
neutron-
- neutron-api
quantum-
- nova-cloud-
shared-db:
- mysql
units:
neutron-
machine: "11"
neutron-
machine: "12"
$ nova list | grep -E '\-(11|12)'
| f9b14208-
| 67b917f1-
neutron-gateway/0 is still up and running but since it has switched to an ip which doesn't have the hosts services listening on it juju cmds fail:
$ juju ssh neutron-gateway/0 "uname -n"
ERROR subprocess encountered error code 1
ssh_exchange_
ERROR subprocess encountered error code 255
$ juju ssh neutron-gateway/1 "uname -n"
juju-lytrusty-
Connection to 10.5.21.10 closed.
$ ssh 10.5.21.9 "uname -n"
juju-lytrusty-
Why does this matter? The Openstack teams CI tests sometime break because the neutron-gateway guest becomes inaccessible by juju {run,ssh}. The reason for this is that during the post depoloyment network setup an additional nic (eth1) is added to the guest. The additional nic is on the same network as eth0 but acts as an external port and cannot be directly contacted for guest access.
| Liam Young (gnuoy) wrote : | #1 |
| Liam Young (gnuoy) wrote : | #2 |
| Liam Young (gnuoy) wrote : | #3 |
| Changed in juju-core: | |
| importance: | Undecided → High |
| status: | New → Triaged |
| tags: | added: network |
| Changed in juju-core: | |
| milestone: | none → 1.24-alpha1 |
| Changed in juju-core: | |
| milestone: | 1.24-alpha1 → 1.25.0 |
| Dimiter Naydenov (dimitern) wrote : | #4 |
Michael, once done with the foreport of the feature flag stuff, please have a look at this one.
| Dimiter Naydenov (dimitern) wrote : | #5 |
I had a quick chat with Liam on this one. It so far appears like the cause might be ordering issue. We're sorting addresses in lexicographical order when we see new ones and before updating them in the state db.
It will be useful to run $ juju set-env logging-config '<root>=TRACE' on the environment and post the unit (and its host machine) logs of for the affected unit, once it happens. At TRACE level we log in detail which address we pick for private/public when we have a list of possible addresses.
| James Page (james-page) wrote : | #6 |
OK - so I reproduced this on 1.23.2 - it happens in a very specific set of circumstances - four units have a second port allocated:
| 2f0e5d14-
| 4d1a00f3-
| d99e5fb0-
| 03b1db44-
only juju-devel3-
| Michael Foord (mfoord) wrote : | #7 |
Would it be sufficient to change address setting to leave the *first* address in place and only sort subsequent addresses?
| Michael Foord (mfoord) wrote : | #8 |
(So long as the first address is still in the new list of addresses to be set of course.)
| Dimiter Naydenov (dimitern) wrote : | #9 |
Preserving the order so the first address is on top is one option, but the real problem is we're not acting consistently. After a charm runs $ unit-get private-address (or public-address) the address we return should be the same every time (assuming it's still there - e.g. if it was on a NIC which is now down, we should pick another one valid I guess). So there might be a good idea to add the "address we picked initially for private/public" metadata to the address in state. It has to be backwards-
| Dimiter Naydenov (dimitern) wrote : | #10 |
We won't manage to fix this for the scheduled 1.24 release on May 25, it will be in a follow-up point release or in 1.25. I'm dropping the 1.24 milestone from it for that reason.
| no longer affects: | juju-core/1.24 |
| Edward Hope-Morley (hopem) wrote : | #11 |
@dimitern If you want to rely solely on what information the api can provide, I think a good approach would be as follows:
1. deploy service and juju creates instance with 1 interface attached
2. juju gets address given to that interface as allocated by Nova and uses this as unit address
3. juju gets port-id of that interface and remembers it
4. if address of the interface remembered in (3) changes, the unit address will change accordingly
This should give us the behaviour we want and be sufficiently deterministic and persistent assuming that the primary interface (port-id) never changes.
| tags: | added: addressability openstack-provider |
| Darryl Weaver (dweaver) wrote : | #12 |
This also applies to a MAAS environment, for example deploying a multi-network Openstack bundle exhibits the same inconsistency with addresses and the private address can change to another network plugged in.
| Cheryl Jennings (cherylj) wrote : | #13 |
This might be a dup of bug #1463480
| Dimiter Naydenov (dimitern) wrote : | #14 |
We aim to address this issue (most likely in the way suggested in comment #11) as soon as the feature freeze for 1.25.0 kicks in (on or around August, 20).
| tags: | added: bug-squad |
| Changed in juju-core: | |
| assignee: | nobody → Michael Foord (mfoord) |
| Michael Foord (mfoord) wrote : | #15 |
The current way we pick public / private addresses for a unit looks for the "best match" for the requested scope (public / private) and type (ipv4 / ipv6) - allowing fallbacks if an exact match isn't available.
So we can't just switch to picking one and always returning that, as an exact match might not be available the *first time* we're asked - but an exact match may become available later.
My suggestion is to switch to something like the following:
First time we're asked for an address we use the current algorithm to find the best match on scope and type. Whatever is found we store as the "default address" (we will store a default public and a default private address).
On subsequent requests check if the stored default is an exact match (and still available) for the requested scope / type.
If it is still available and an exact match we just return it.
If it is no longer available we remove the default and start again (we'll address using the same NIC for changed addresses at another point as that's more complex).
If it is still available, but it wasn't an exact match and an exact match is now available - we replace the current default with the exact match and return that. Subsequent requests will now always see the new default.
How does this sound Ed?
| Changed in juju-core: | |
| milestone: | 1.25-alpha1 → 1.25-beta1 |
| Liam Young (gnuoy) wrote : | #16 |
Michael, that sounds like it would work perfectly for me, thanks.
| Dimiter Naydenov (dimitern) wrote : | #17 |
Fix for 1.25 is proposed and should be landing early next week: http://
| Changed in juju-core: | |
| status: | Triaged → In Progress |
| Changed in juju-core: | |
| milestone: | 1.25-beta1 → 1.25-beta2 |
| Michael Foord (mfoord) wrote : | #18 |
A fix for this is committed to 1.24. Forward ports to 1.25 and master "in progress".
| Changed in juju-core: | |
| status: | In Progress → Fix Committed |
| Michael Foord (mfoord) wrote : | #19 |
On 1.25 and master as well now.
| Changed in juju-core: | |
| milestone: | 1.25-beta2 → 1.26-alpha1 |
| Changed in juju-core: | |
| status: | Fix Committed → Fix Released |
| tags: | added: sts |


I believe I've seen this on multiple versions of juju but the one the debug was taken from above was 1.23-beta1- trusty- amd64. The environment type was openstack.
I'll attach logs from the bootstrap node and from neutron-gateway/0