MAAS 1.9.3 + Juju 1.25.5 - on the Juju controller node eth0 and juju-br0 interfaces have the same IP address at the same time

Bug #1590689 reported by Lorenzo Cavassa on 2016-06-09
44
This bug affects 6 people
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
juju
High
Unassigned
juju-core
High
Unassigned
1.25
High
Unassigned
ifenslave (Ubuntu)
High
Unassigned

Bug Description

Running MAAS 1.9.3 - Juju 1.25.5 - Ubuntu 14.04.04

After a Juju bootstrap the Juju controller node remains with both the eth0 and the juju-br0 interfaces set at the same time and with the same IP.

eth0 is defined in ENI as manual but also in interfaces.d/eth0.cfg as DHCP

juju-br0 is defined in ENI as static and with the same eth0 IP

It seems Juju rewrites /etc/network/interfaces, but doesn't remove/change /etc/network/interfaces.d/eth0.cfg. This results in one interface defined twice.

Ante Karamatić (ivoks) on 2016-06-09
tags: added: cpec
Dimiter Naydenov (dimitern) wrote :

This should be addressed by backporting the fix for bug 1576674 to 1.25.

Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.0-beta9
status: Triaged → Fix Committed
Changed in maas:
status: New → Invalid
tags: added: 4010
tags: added: sts
removed: 4010
Dimiter Naydenov (dimitern) wrote :

Also related to this is https://github.com/juju/juju/pull/5597, which should be included in backport of https://github.com/juju/juju/pull/5512

Ante Karamatić (ivoks) wrote :

I'd just like to restate that this is super critical for me and my team.

Ante Karamatić (ivoks) wrote :

We also had a case where juju generated ENI and configured an IP on both bond and juju-br0. That breaks LACP.

http://paste.ubuntu.com/17392160/

I'm not sure if suggested code fixes this case too.

It's not clear that the generated ENI is the root cause of LACP failure. But for consistency we should move the address off the device that gets bridged.

> On 16 Jun 2016, at 08:03, Ante Karamatić <email address hidden> wrote:
>
> We also had a case where juju generated ENI and configured an IP on both
> bond and juju-br0. That breaks LACP.
>
> http://paste.ubuntu.com/17392160/
>
> I'm not sure if suggested code fixes this case too.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1590689
>
> Title:
> MAAS 1.9.3 + Juju 1.25.5 - on the Juju controller node eth0 and juju-
> br0 interfaces have the same IP address at the same time
>
> Status in juju-core:
> Fix Committed
> Status in juju-core 1.25 series:
> In Progress
> Status in MAAS:
> Invalid
>
> Bug description:
> Running MAAS 1.9.3 - Juju 1.25.5 - Ubuntu 14.04.04
>
> After a Juju bootstrap the Juju controller node remains with both the
> eth0 and the juju-br0 interfaces set at the same time and with the
> same IP.
>
> eth0 is defined in ENI as manual but also in interfaces.d/eth0.cfg as
> DHCP
>
> juju-br0 is defined in ENI as static and with the same eth0 IP
>
> It seems Juju rewrites /etc/network/interfaces, but doesn't
> remove/change /etc/network/interfaces.d/eth0.cfg. This results in one
> interface defined twice.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1590689/+subscriptions

Curtis Hovey (sinzui) on 2016-06-16
Changed in juju-core:
status: Fix Committed → Fix Released
Dimiter Naydenov (dimitern) wrote :

Fix for 1.25 proposed: https://github.com/juju/juju/pull/5642 (still testing)

Dimiter Naydenov (dimitern) wrote :

Proposed fix tested successfully on AWS with and without the address-allocation feature flag - no regressions. On MAAS 1.9.3 and trusty nodes the infamous /etc/network/interfaces.d/eth0.cfg:
  auto eth0
  iface eth0 inet dhcp
which is sourced by /etc/network/interfaces ('source /etc/network/interfaces.d/*.cfg') still caused issues. Similarly on xenial, with the equivalen '50-cloud-init.cfg'. Both of these are part of the cloud images, and cause issues because the 'eth0' gets a DHCP address IN ADDITION TO its static address allocated by MAAS.

So the fix in comment #7 now omits the 'source' stanza from the modified /e/n/i and avoiding the issue. Pending confirmation from Lorenzo, the fix is approved and can land.

Dimiter Naydenov (dimitern) wrote :

Lorenzo reports after 3 separate successful deployments with binaries including the proposed fix:

<lcavassa> dimitern, baremetals are ok. I deployed using the same charm on 2 different physical hosts.
<lcavassa> dimitern, it's ok.
<lcavassa> deployed 6 LXC container on 2 different hypervisors
<lcavassa> all is good.
<dimitern> lcavassa: awesome! thank you so much for confirming
<dimitern> ivoks: any objections to landing the fix for 1.25.6 then?
<lcavassa> really great work thank you dimitern for the support
<ivoks> none at all

Too bad. The bug showed up again last night during a trial deployment on 7 different baremetals.

Again eth0 and bond0 with 2 different IPs on the same network. Juju installation fails (node can't reach the Juju controller):

https://pastebin.canonical.com/159009/

But the problem seem to be MAAS related. Trying to MAAS re-configure the broken node only (ethX and bonds)

Ante Karamatić (ivoks) wrote :

Marking this as Confirmed for MAAS because node boots up with improper network configuration, before even juju kicks in to do its stuff.

Changed in maas:
status: Invalid → Confirmed
Ante Karamatić (ivoks) wrote :

Note: it might also be cloud-init problem, I don't know. All I know is that once I set up an LACP bond on MAAS, the node provided by MAAS has an IP on both the bond and one of the bond's interfaces (eth0). With or without juju.

Dimiter Naydenov (dimitern) wrote :

The cause I expect is the same - sourcing /e/n/i.d/eth0.cfg (trusty) or /e/n/i.d/50-cloud-init.cfg (xenial).

Andrew McDermott (frobware) wrote :

Per comment in #19, see also: https://bugs.launchpad.net/maas/+bug/1588706

There is at least a working workaround:

late_commands:
  remove_eth0: ["curtin", "in-target", "--", 'rm', '/etc/network/interfaces.d/eth0.cfg']

put in /etc/maas/preseeds/curtin_userdata_custom

Ante Karamatić (ivoks) wrote :

After some discussion on Hangout between Andrew, Dimiter and myself, we realized that ifenslave in Trusty is not compatible with kernel 3.13 nor ifupdown version from Trusty.

So, this would mean that bonding in 14.04 is in general broken.

Bugs:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=742410
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=791906

Changed in ifenslave (Ubuntu):
importance: Undecided → High

the late_command curtin line should be also set in /etc/maas/preseeds/curtin_userdata

Dimiter Naydenov (dimitern) wrote :

Closing this, as the LACP bonding issue on trusty is now tracked separately: https://bugs.launchpad.net/juju-core/+bug/1594855

Cheryl Jennings (cherylj) wrote :

Moving juju-core to "Won't Fix" as well since later discussions show this is a not a problem in juju.

Changed in juju-core:
status: Fix Released → Won't Fix
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ifenslave (Ubuntu):
status: New → Confirmed
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta9 → none
milestone: none → 2.0-beta9
Changed in juju-core:
importance: Undecided → High
status: New → Won't Fix
Ante Karamatić (ivoks) on 2017-09-27
tags: removed: cpec
Changed in maas:
status: Confirmed → Won't Fix
status: Won't Fix → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.