juju bootstrap fails to successfully configure the bridge juju-br0 when deploying with wily 4.2 kernel
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
juju-core |
Fix Released
|
High
|
Andrew McDermott | ||
1.25 |
Fix Released
|
High
|
Andrew McDermott | ||
Ubuntu |
Invalid
|
High
|
Unassigned | ||
Wily |
Invalid
|
High
|
Unassigned |
Bug Description
Maas: MAAS Version 1.8.0+bzr4001-
JuJu Version: 1.24.4-
User Space: Trusty:
HW : Iin development ARM64 platform (Host) and HP Moonshot m400 (McDivitt) -- (Host1) - Also ARM64
Problem Description:
NOTE: The problem described below is also reproducible on a shipping ARM64 system (HP Moonshot Mcdivitt - m400) with Trusty userspace + 4.2 kernel form Wily.
Upon issuing a juju-bootstrap the state server on currently in-development ARM64 hardware platform, it creates a bridge device bound to the pxe nic (eth1) as expected. eth1 should then release its IP address and the bridge should assume priority and route all traffic. This occurs reliably when using a trusty cloud image and appropriate trusty kernel.
In this case, we are enabling some hardware, and I need to specifically use a hacked cloud root-tgz (modified to include the wily kernel (4.2) to a trusty userspace.) I have done all that correctly and able to land the image onto its assigned hardware using MAAS 1.8.
$ uname -a
Linux ms10-39-host 4.2.0-10-generic #11-Ubuntu SMP Sun Sep 13 11:26:21 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty
Now when I use juju to bootstrap the image onto the assigned hardware, I appear to have a problem with the juju bridge, and default pxe nic, The assigned interface appears to not want to let go of the assigned ipv4 address and hand it over to the bridge. Almost as if it's never successfully runnig "$sudo ifdown eth0"
We constantly see the message "received packet on eth1 with own address as source address" in syslog
$ ifconfig
eth0 Link encap:Ethernet HWaddr fc:15:b4:21:00:c2
inet addr:10.229.65.139 Bcast:10.
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2210 errors:0 dropped:0 overruns:0 frame:0
TX packets:1627 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:208450 (208.4 KB) TX bytes:297812 (297.8 KB)
juju-br0 Link encap:Ethernet HWaddr fc:15:b4:21:00:c2
inet addr:10.229.65.139 Bcast:10.
inet6 addr: fe80::fe15:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2212 errors:0 dropped:0 overruns:0 frame:0
TX packets:1478 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:177722 (177.7 KB) TX bytes:288314 (288.3 KB)
I also noticed that /etc/network/
$ cat /etc/network/
auto lo
iface eth1 inet dhcp
# Primary interface (defining the default route)
iface eth0 inet manual
# Bridge to use for LXC/KVM containers
auto juju-br0
iface juju-br0 inet dhcp
bridge_ports eth0
-------
Here is the syslog output from the 2 different stateserver attempts. The first set of logs from 'host' is running a Trusty userspace with wily kernel. Which displays the failure.
The 2nd snippet of syslog 'host1' displays a Trusty userspace and Trusty Kernel, which eventually completes the bootstrap as expected.
Aug 24 18:15:14 host acpid: 1 rule loaded
Aug 24 18:15:14 host acpid: waiting for events: event logging is off
Aug 24 18:15:15 host kernel: [ 46.174096] init: plymouth-
Aug 24 18:15:17 host ntpdate[1216]: adjust time server 91.189.89.199 offset 0.000248 sec
Aug 24 18:15:36 host dhclient: receive_packet failed on eth1: Network is down
Aug 24 18:15:36 host kernel: [ 66.764788] bridge: automatic filtering via arp/ip/ip6tables has been deprecated. Update your scripts to load br_netfilter if you need this.
Aug 24 18:15:36 host kernel: [ 66.772004] device eth1 entered promiscuous mode
Aug 24 18:15:37 host kernel: [ 68.144483] juju-br0: port 1(eth1) entered forwarding state
Aug 24 18:15:37 host kernel: [ 68.144504] juju-br0: port 1(eth1) entered forwarding state
Aug 24 18:15:37 host kernel: [ 68.160693] juju-br0: received packet on eth1 with own address as source address
Aug 24 18:15:37 host dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3 (xid=0x40e77812)
Aug 24 18:15:37 host kernel: [ 68.189099] juju-br0: received packet on eth1 with own address as source address
Aug 24 18:15:37 host dhclient: DHCPREQUEST of 10.110.24.114 on eth1 to 255.255.255.255 port 67 (xid=0x1278e740)
Aug 24 18:15:37 host dhclient: DHCPOFFER of 10.110.24.114 from 10.110.24.210
Aug 24 18:15:37 host kernel: [ 68.189891] juju-br0: received packet on eth1 with own address as source address
Aug 24 18:15:37 host dhclient: DHCPACK of 10.110.24.114 from 10.110.24.210
Aug 24 18:15:37 host dhclient: bound to 10.110.24.114 -- renewal in 298 seconds.
Aug 24 18:15:37 host kernel: [ 68.390614] thunder-nicvf 0002:01:00.2 eth1: eth1: Link is Up 10000 Mbps Full duplex
Aug 24 18:15:37 host dhclient: Internet Systems Consortium DHCP Client 4.2.4
Aug 24 18:15:37 host dhclient: Copyright 2004-2012 Internet Systems Consortium.
Aug 24 18:15:37 host dhclient: All rights reserved.
-------
Below is the output from a "SUCCESFULL" bootstrap using a trusty user space and trusty kernel:
Aug 25 19:02:59 ms10-33-host1 acpid: 1 rule loaded
Aug 25 19:02:59 ms10-33-host1 acpid: waiting for events: event logging is off
Aug 25 19:02:59 ms10-33-host1 cron[1298]: (CRON) INFO (Running @reboot jobs)
Aug 25 19:02:59 ms10-33-host1 iscsid: iSCSI daemon with pid=1196 started!
Aug 25 19:03:00 ms10-33-host1 kernel: [ 34.028770] init: plymouth-
Aug 25 19:03:07 ms10-33-host1 ntpdate[1392]: adjust time server 91.189.89.199 offset 0.000016 sec
Aug 25 19:03:07 ms10-33-host1 kernel: [ 41.596548] mlx4_en: eth0: Close port called
Aug 25 19:03:09 ms10-33-host1 dhclient: receive_packet failed on eth0: Network is down
Aug 25 19:03:09 ms10-33-host1 kernel: [ 43.114195] mlx4_en: eth0: Link Down
Aug 25 19:03:09 ms10-33-host1 kernel: [ 43.135229] Bridge firewalling registered
Aug 25 19:03:09 ms10-33-host1 kernel: [ 43.139025] device eth0 entered promiscuous mode
Aug 25 19:03:09 ms10-33-host1 kernel: [ 43.140380] mlx4_en: eth0: frag:0 - size:1526 prefix:0 align:2 stride:1536
Aug 25 19:03:09 ms10-33-host1 kernel: [ 43.289820] IPv6: ADDRCONF(
Aug 25 19:03:09 ms10-33-host1 kernel: [ 43.291284] IPv6: ADDRCONF(
Aug 25 19:03:10 ms10-33-host1 kernel: [ 44.487804] mlx4_en: eth0: Link Up
Aug 25 19:03:10 ms10-33-host1 kernel: [ 44.487887] IPv6: ADDRCONF(
Aug 25 19:03:10 ms10-33-host1 kernel: [ 44.488297] juju-br0: port 1(eth0) entered forwarding state
Aug 25 19:03:10 ms10-33-host1 kernel: [ 44.488305] juju-br0: port 1(eth0) entered forwarding state
Aug 25 19:03:10 ms10-33-host1 kernel: [ 44.488321] IPv6: ADDRCONF(
Aug 25 19:03:10 ms10-33-host1 dhclient: Internet Systems Consortium DHCP Client 4.2.4
Aug 25 19:03:10 ms10-33-host1 dhclient: Copyright 2004-2012 Internet Systems Consortium.
Aug 25 19:03:10 ms10-33-host1 dhclient: All rights reserved.
Aug 25 19:03:10 ms10-33-host1 dhclient: For info, please visit https:/
Aug 25 19:03:10 ms10-33-host1 dhclient:
Aug 25 19:03:11 ms10-33-host1 dhclient: Listening on LPF/juju-
Aug 25 19:03:11 ms10-33-host1 dhclient: Sending on LPF/juju-
-------
Now , after this problem usually occurs, there is somewhat of a workaround: 1.) Restart the host, which will then boot the system with it's correct network config as outlined in /etc/network/
2.) Manually ifdown / ifup eth1. easier than workaround 1.
After restarting the host at least once, the route tables appear to fix themselves and I can ssh into the host from a system outside of the 10.229/16 net (if vpn allows)
I can provide hardware access for anyone who requests it.
Changed in juju-core: | |
status: | New → Confirmed |
Changed in juju-core: | |
status: | Confirmed → Triaged |
importance: | Undecided → High |
milestone: | none → 1.25-beta2 |
tags: | added: network |
Changed in juju-core: | |
assignee: | nobody → Andrew McDermott (frobware) |
tags: | added: kernel-da-key |
affects: | juju-core → linux |
Changed in linux: | |
assignee: | Andrew McDermott (frobware) → Joseph Salisbury (jsalisbury) |
milestone: | 1.25-beta2 → none |
affects: | linux → ubuntu |
Changed in ubuntu: | |
status: | Triaged → In Progress |
Changed in ubuntu: | |
status: | In Progress → Invalid |
Changed in juju-core: | |
milestone: | none → 1.24.8 |
Changed in juju-core: | |
milestone: | 1.24.8 → 1.25.1 |
Changed in juju-core: | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in juju-core: | |
milestone: | 1.25.1 → 1.26-alpha1 |
Changed in juju-core: | |
status: | Triaged → Fix Committed |
Changed in juju-core: | |
milestone: | 1.26-alpha1 → 1.26-alpha2 |
Changed in juju-core: | |
status: | Fix Committed → Fix Released |
Is this issue related to arm64 only; can juju bootstrap on x86 wily today?