juju agent using lxcbr0 address as apiaddress instead of juju-br0 breaks agents
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | juju-core |
High
|
James Tunnicliffe | ||
| | 1.21 |
Critical
|
Dimiter Naydenov | ||
| | 1.22 |
Critical
|
James Tunnicliffe | ||
Bug Description
juju-core 1.21.1
node 0: bootstrap, lxc/0-2
- juju-br0 10.10.18.51 (eth0)
- lxcbr0 10.0.3.1
- eth1 in promisc mode
node 1: lxc/0-6
- juju-br0 10.10.18.52 (eth0)
- lxcbr0 10.0.3.1
- virbr0 192.168.122.1
- eth1 in promisc mode
node 2: metal only
- juju-br0 10.10.18.53 (eth0)
- virbr0 192.168.122.1
- eth1 in promisc mode
- on node 0, physical and lxc machines and all unit agents have the ip from node 0 lxcbr0 bridge assigned to apiaddress in agent.conf. They report correctly their state.
- on node 1, physical and lxc machine and all unit agents have the node 0 lxcbr0 ip assigned to apiaddress in agent.conf. They can't reach WSS.
- on node 2, physical machine agent have the node 0 lxcbr0 ip assigned to apiaddress in agent.conf. Can't reach WSS.
I am assuming all agents are getting the lxcbr0 ip from node 0 as node 2 do not have an lxcbr0 and yet it has the 10.0.3.1 ip assigned to its apiaddress in agent.conf.
Manually changing apiaddress on all agents to juju-br0 ip of node 0, stopping and starting all the agents temporarily solved the issue.
After reboot, same problem happens as agent.conf is overwritten by juju on all machines, lxc and units.
To reproduce with 1.21 in a cloud:
juju bootstrap
juju ssh 0 "sudo grep -A 1 apiaddresses /var/lib/
apiaddresses:
- REAL_ETH0_
juju ssh 0 "sudo apt-get install -y lxc"
juju ssh 0 "sudo reboot -n"
juju ssh 0 "sudo grep -A 1 apiaddresses /var/lib/
apiaddresses:
- 10.0.3.1:17070
Or with manual-provider with lxc installed on the target state-server:
juju bootstrap
juju ssh 0 "sudo grep -A 1 apiaddresses /var/lib/
apiaddresses:
- 10.0.3.1:17070
| description: | updated |
| summary: |
- juju agent using lxcbr0 address as apiaddress instead of juju-br0 after - reboot + juju agent using lxcbr0 address as apiaddress instead of juju-br0 breaks + agents |
| description: | updated |
| tags: | added: lxc network |
| tags: | added: api |
| Paul Gear (paulgear) wrote : | #1 |
| Paul Gear (paulgear) wrote : | #2 |
(Apologies for the repeated comment above - no idea what happened there.)
There are corresponding messages on machine 0, where it shows:
2015-02-03 01:17:36 INFO juju.worker.
...
2015-02-03 01:17:36 DEBUG juju.apiserver apiserver.go:156 <- [6D] machine-0 {"RequestId"
| Changed in juju-core: | |
| status: | New → Triaged |
| importance: | Undecided → High |
| milestone: | none → 1.23 |
| Fabricio Costi (fabricio-9) wrote : | #3 |
This happens only AFTER the bootstrap node is rebooted for the first time.
On a new env I was able to reboot all other nodes without a problem - apiaddress still points to the right IP of bootstrap node - provided the bootstrap node hadn't been rebooted after the first lxc was deployed.
By inspecting the agent.conf from LXC running on the bootstrap node I could see the apiaddress is pointing to the right ip BEFORE reboot.
AFTER the first reboot of bootstrap node, the apiaddress was changed to the lxcbr0 ip. Subsequent reboots of bootstrap node didn't change the behaviour.
Any subsequent reboot of other nodes (AFTER the bootstrap node was rebooted for the first time AFTER the first lxc was deployed) caused a change in the apiaddress to the lxcbr0 ip in all agents on the rebooted node. So seems to be something happening when the bootstrap node is rebooted for the first time.
| Paul Gear (paulgear) wrote : | #4 |
This may be related to bug 1417308.
| Curtis Hovey (sinzui) wrote : | #5 |
Per the duplicates:
manual provider does not work if lxc is already installed on the machine selected to be the state-server
Juju will fail to upgrade to 1.21.1 if a restart has happened.
| Dimiter Naydenov (dimitern) wrote : | #6 |
We need some more information about how to reproduce this and I don't want it to block the 1.21.2 release until then.
I've synced up this with Alexis and Wes.
| Curtis Hovey (sinzui) wrote : | #7 |
Per the duplicates:
A. start an instance in a cloud and install lxc on it. With the manual provider, attempt to bootstrap it.
B. The former juju-ci3 was bootstrapped with an older version of juju, the lxc was installed. the upgrade to 1.21.1 failed.
| Dimiter Naydenov (dimitern) wrote : | #8 |
Thanks for the info!
I still think this shouldn't block 1.21.2, so I'm removing the milestone (because 1.21.3 is not yet available).
| Curtis Hovey (sinzui) wrote : | #9 |
This job is bootstrapping a machine which has lxc installed. When it starts passing, we will now a fix works
http://
| Curtis Hovey (sinzui) wrote : | #10 |
We see evidence that the manual-provider test case was broken by a separate change. We can see 1.21 was passing the case until a recent backport of a feature from master. I will split the manual case from this bug.
The case of a working stack switching the apiaddresses to an lxcbr0 address can be seen with these steps with 1.21 on a real cloud.
1. juju bootstrap
2. deploy ubuntu
3. juju ssh 0 "sudo apt-get install -y lxc"
4. reboot -n
The agents will then fail
~$ sudo grep -A 1 apiaddresses /var/lib/
apiaddresses:
- 10.0.3.1:17070
$ sudo grep -A 1 apiaddresses /var/lib/
apiaddresses:
- 10.0.3.1:17070
This issue also happen when bootstrapping 1.20, installing lxc, reboot, then upgrade to 1.21. This is my command line for the fewest steps:
juju bootatrap
juju ssh 0 "sudo grep -A 1 apiaddresses /var/lib/
apiaddresses:
- 10.185.
juju ssh 0 "sudo apt-get install -y lxc"
juju ssh 0 "sudo reboot -n"
juju ssh 0 "sudo grep -A 1 apiaddresses /var/lib/
apiaddresses:
- 10.0.3.1:17070
| description: | updated |
| Curtis Hovey (sinzui) wrote : | #11 |
I confirmed that manual-provider and 1.21 is indeed broken the same way. With lxc installed on the target state-server:
juju bootstrap
juju ssh 0 "sudo grep -A 1 apiaddresses /var/lib/
apiaddresses:
- 10.0.3.1:17070
| description: | updated |
| Changed in juju-core: | |
| importance: | High → Critical |
| Changed in juju-core: | |
| importance: | Critical → High |
| assignee: | nobody → Dimiter Naydenov (dimitern) |
| Changed in juju-core: | |
| importance: | High → Critical |
| importance: | Critical → High |
| Dimiter Naydenov (dimitern) wrote : | #12 |
I've analyzed the problem and used the given steps to reproduce it.
The issue is due to the way we sort and pick cloud-local addresses for machines (incl. the api server). We should filter out any 10.0.x.y addresses we can see on the machine before considering them as usabled, taking into account the existence and contents of /etc/default/
This should solve the problem and it could be easily ported to 1.20, 1.21, 1.22, and trunk. I'll start on the fix.
| Changed in juju-core: | |
| status: | Triaged → In Progress |
| Dimiter Naydenov (dimitern) wrote : | #13 |
Fix for 1.21 proposed at https:/
Once approved, this fix will be forward ported to 1.22 and 1.23 (trunk).
| Dimiter Naydenov (dimitern) wrote : | #14 |
Fix landed in 1.21, let's see how all the various CI jobs will react.
I'm assigning the forward porting of the same fix to 1.22 and trunk to James.
James, let's have a chat tomorrow about this.
| Changed in juju-core: | |
| status: | In Progress → Triaged |
| assignee: | Dimiter Naydenov (dimitern) → James Tunnicliffe (dooferlad) |
| Curtis Hovey (sinzui) wrote : | #16 |
The last failure was caused by a dirty machine. We found mongod from the failed runs running. After cleaning the machine, we can see fixed juju passed and it cleans up mongod.
| Changed in juju-core: | |
| status: | Triaged → In Progress |
| Changed in juju-core: | |
| status: | In Progress → Fix Committed |
| status: | Fix Committed → In Progress |
| Changed in juju-core: | |
| status: | In Progress → Fix Committed |
| Sacha Yunusic (sacha-m) wrote : | #17 |
I have the same behavior. After the reboot, it got the lxcbr0 IP address instead of the eth0 one.
I have Installed 1.20.14-
What if I manually change apiaddresses value in /var/lib/
BTW, all services are running (sudo juju status | grep agent-state)
| Dimiter Naydenov (dimitern) wrote : | #18 |
No Sacha, it won't help as they will get overwritten.
I'd suggest upgrading to 1.21.3 from the Juju stable releases PPA https:/
| Sacha Yunusic (sacha-m) wrote : | #19 |
I updated to 1.21.3. That fixed the problem. :)
| Dimiter Naydenov (dimitern) wrote : | #20 |
Another happy user :)
| Changed in juju-core: | |
| status: | Fix Committed → Fix Released |
| Changed in juju-core: | |
| milestone: | 1.23 → 1.23-beta1 |


I'm seeing this error on new deploys on juju 1.21.1-trusty-amd64 from the stable PPA. I've attached an extract of the machine log showing it changing the API addresses from the correct values to the wrong values. ?field. comment= I'm seeing this error on new deploys on juju 1.21.1-trusty-amd64 from the stable PPA. I've attached an extract of the (redacted) machine log showing it changing the API addresses from the correct value (192.168.99.0) to the wrong value (10.0.3.1).