Bug #1603473 “Relation fails as units/machines are on different ...” : Bugs : Canonical Juju

Revision history for this message

Amarendra Meher (amarendra-meher) wrote on 2016-07-15:

#1

ceph_controller_cumpute_nodes_lxd.yaml Edit (5.1 KiB, text/plain)

Alexis Bruemmer (alexis-bruemmer) on 2016-07-15

Changed in juju-core:
importance:	Undecided → High

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-07-15:

#2

Do we have a screenshot of the MAAS 1.9 network setup for these nodes? I am interested to see the order of the interfaces, what is configured (static, dhcp, et al) and which is the PXE interface. Given that I will look at reproducing. Thanks.

Alexis Bruemmer (alexis-bruemmer) on 2016-07-15

Changed in juju-core:
status:	New → Incomplete

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-07-15:

#3

It's also possible that this was working accidentally before this PR landed:

http://reviews.vapour.ws/r/4903/

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-07-15:

#4

Please could you also elaborate on:

"We tried to use --bind to deploy the charm to a specific network space, but it doesn't work."

and the output of:

$ juju list-spaces

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-16:

#5

juju list-spaces
spaces:
  space-0:
    10.9.1.0/24:
      type: ipv4
      provider-id: "2"
      status: in-use
      zones:
      - default
    10.10.11.0/24:
      type: ipv4
      provider-id: "1"
      status: in-use
      zones:
      - default
    192.168.220.0/24:
      type: ipv4
      provider-id: "3"
      status: in-use
      zones:
      - default

Revision history for this message

John A Meinel (jameinel) wrote on 2016-07-18:

#6

To elaborate on what details we are looking for about: "we tried to use --bind".

From what I saw, you seem to be using a Bundle to deploy many applications. We don't currently support '--bind' syntax for bundles, only for single applications.
We either need you to edit the bundle to add a "bindings" section for the various applications, or you could try just deploying one of the applications manually and see if it comes up configured as you would expect.

Revision history for this message

John A Meinel (jameinel) wrote on 2016-07-18:

#7

Looking at the very first pastebin:
2016-07-14 10:10:00 INFO worker.uniter.jujuc server.go:173 running hook tool "relation-get" ["--format=json" "hostname"]
2016-07-14 10:10:00 INFO cloud-compute-relation-changed Traceback (most recent call last):
...
2016-07-14 10:10:00 INFO cloud-compute-relation-changed File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 910, in query
2016-07-14 10:10:00 INFO cloud-compute-relation-changed raise NXDOMAIN
2016-07-14 10:10:00 INFO cloud-compute-relation-changed dns.resolver.NXDOMAIN

Hints that it might even be something else that is failing. Something about how we are telling MAAS about the multiple IP addresses for a container might cause MAAS to get confused and not able to return the normal hostname for one of the secondary addresses.

Revision history for this message

John A Meinel (jameinel) wrote on 2016-07-18:

#8

We have a setup here with MAAS 1.9 with hosts that have multiple network cards. When we do that, we end up with MAAS providing 2 IP addresses for the machine (as expected), but DNS lookup does not return both IP addresses, and reverse DNS lookup fails for the second IP address.

We haven't confirmed if the behavior is the same on MAAS 2, but this bug comment:
https://bugs.launchpad.net/maas/+bug/1599223/comments/5

Seems to say that MAAS 2 will use what it considers the "primary" IP as "host.maas" and then create additional records for "nic.host.maas" for the rest of the network interfaces.

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-18:

#9

Hi John,

Tried installing single application example juju-gui

juju status
MODEL CONTROLLER CLOUD VERSION
controller cplane-controller cplane 2.0-beta11

APP STATUS EXPOSED ORIGIN CHARM REV OS
juju-gui unknown false jujucharms juju-gui 130 ubuntu

UNIT WORKLOAD AGENT MACHINE PORTS PUBLIC-ADDRESS MESSAGE
juju-gui/2 unknown idle 3 80/tcp,443/tcp 10.10.11.141

MACHINE STATE DNS INS-ID SERIES AZ
0 started 10.9.1.160 /MAAS/api/1.0/nodes/node-278facc0-4cb6-11e6-b6fa-5254004a236f/ trusty default
3 started 10.10.11.141 /MAAS/api/1.0/nodes/node-b43a591a-4cb4-11e6-aba7-5254004a236f/ trusty default

and even via juju, while fetching addresses always fetching from fabric=maas-external and space=unused; whereas it suppose to fetch from fabric=maas-management and space default.

juju run --unit juju-gui/2 'unit-get private-address'
10.10.11.141

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-19:

#10

Hi guys, Still I am facing this issue and no resolution yet.

Net info for one of deployed node is as shown below:--

One thing is strange that, both eth0 and br-eth0 contains the same ip and it's not vanishing from interface, even bridge is also active.

ifconfig
br-eth0 Link encap:Ethernet HWaddr 52:54:00:46:62:07
          inet addr:10.9.1.161 Bcast:10.9.1.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe46:6207/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:33215 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1811 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:46808059 (46.8 MB) TX bytes:148181 (148.1 KB)

br-eth1 Link encap:Ethernet HWaddr 52:54:00:7b:fb:24
          inet addr:10.10.11.141 Bcast:10.10.11.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe7b:fb24/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:187342 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10081 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:183660270 (183.6 MB) TX bytes:1556527 (1.5 MB)

eth0 Link encap:Ethernet HWaddr 52:54:00:46:62:07
          inet addr:10.9.1.161 Bcast:10.9.1.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:34705 errors:0 dropped:6 overruns:0 frame:0
          TX packets:2569 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:62410780 (62.4 MB) TX bytes:218908 (218.9 KB)

eth1 Link encap:Ethernet HWaddr 52:54:00:7b:fb:24
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:188811 errors:0 dropped:49 overruns:0 frame:0
          TX packets:10091 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:187626400 (187.6 MB) TX bytes:1557343 (1.5 MB)

brctl show
bridge name bridge id STP enabled interfaces
br-eth0 8000.525400466207 no eth0
br-eth1 8000.5254007bfb24 no eth1

The contents of interfaces file is: vi /etc/network/interfaces
auto lo
iface lo inet loopback
dns-nameservers 10.9.1.10
dns-search maas

iface eth0 inet manual

auto br-eth0
iface br-eth0 inet static
    gateway 10.9.1.10
    address 10.9.1.161/24
    mtu 1500
    bridge_ports eth0

iface eth1 inet manual

auto br-eth1
iface br-eth1 inet static
    address 10.10.11.141/24
    mtu 1500
    bridge_ports eth1

Hi guys, Still I am facing this issue and no resolution yet.

Net info for one of deployed node is as shown below:--

One thing is strange that, both eth0 and br-eth0 contains the same ip and it's not vanishing from interface, even bridge is also active.

ifconfig
br-eth0   Link encap:Ethernet  HWaddr 52:54:00:46:62:07  
          inet addr:10.9.1.161  Bcast:10.9.1.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe46:6207/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:33215 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1811 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:46808059 (46.8 MB)  TX bytes:148181 (148.1 KB)

br-eth1   Link encap:Ethernet  HWaddr 52:54:00:7b:fb:24  
          inet addr:10.10.11.141  Bcast:10.10.11.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe7b:fb24/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:187342 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10081 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:183660270 (183.6 MB)  TX bytes:1556527 (1.5 MB)

eth0      Link encap:Ethernet  HWaddr 52:54:00:46:62:07  
          inet addr:10.9.1.161  Bcast:10.9.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:34705 errors:0 dropped:6 overruns:0 frame:0
          TX packets:2569 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:62410780 (62.4 MB)  TX bytes:218908 (218.9 KB)

eth1      Link encap:Ethernet  HWaddr 52:54:00:7b:fb:24  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:188811 errors:0 dropped:49 overruns:0 frame:0
          TX packets:10091 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:187626400 (187.6 MB)  TX bytes:1557343 (1.5 MB)

brctl show
bridge name	bridge id		STP enabled	interfaces
br-eth0		8000.525400466207	no		eth0
br-eth1		8000.5254007bfb24	no		eth1

The contents of interfaces file is: vi /etc/network/interfaces
auto lo
iface lo inet loopback
    dns-nameservers 10.9.1.10
    dns-search maas

iface eth0 inet manual

auto br-eth0
iface br-eth0 inet static
    gateway 10.9.1.10
    address 10.9.1.161/24
    mtu 1500
    bridge_ports eth0

iface eth1 inet manual

auto br-eth1
iface br-eth1 inet static
    address 10.10.11.141/24
    mtu 1500
    bridge_ports eth1

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-07-19:

#11

Was there a 'source /etc/network/interfaces.d/*.cfg' stanza at the bottom of /etc/network/interfaces? And if so, could you attach any files that are in the interfaces.d directory. Thanks.

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-07-19:

#12

Please could you run the following on the machine (from comment #10) and share the output:

$ ps -ef | grep dhclient

Revision history for this message

John A Meinel (jameinel) wrote on 2016-07-19:

#13

How did you install juju-gui? just plain "juju deploy juju-gui"? can you try "juju deploy juju-gui --bind web=default" And then give the output of:

juju run --unit juju-gui/0 'network-get --primary-address web'

That should be the same value you would get if the GUI charm was running "unit-get private-address" inside of a relation context. (Since it isn't actually related to anything yet, we can't just try unit-get private-address in a context.)

Charms should be updated to use 'network-get' where possible, but in the short term 'unit-get private-address' should be able to give the correct answer once we know what endpoint the Charm is asking about.

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-19:

#14

ubuntu@juju-gui:~$ ps -ef | grep dhclient
root 5281 1 0 08:43 ? 00:00:00 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-19:

#15

The contents of interfaces.d are as follows:--

ubuntu@juju-gui:~$ ls /etc/network/interfaces.d/
eth0.cfg
ubuntu@juju-gui:~$ cat /etc/network/interfaces.d/eth0.cfg
# The primary network interface
auto eth0
iface eth0 inet dhcp
ubuntu@juju-gui:~$

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-19:

#16

Hi john,

Redeployed the application with options you suggested, the various outputs are available at below link

http://paste.ubuntu.com/20042147/

But still DNS address for this machine is from external fabric and unused space.

juju list-spaces
spaces:
  default:
    10.9.1.0/24:
      type: ipv4
      provider-id: "2"
      status: in-use
      zones:
      - default
    192.168.220.0/24:
      type: ipv4
      provider-id: "3"
      status: in-use
      zones:
      - default
  unused:
    10.10.11.0/24:
      type: ipv4
      provider-id: "1"
      status: in-use
      zones:
      - default

Juju still fetches api-endpoints as shown below for private/public-address

juju run --unit juju-gui/1 'unit-get private-address'
10.10.11.141
juju run --unit juju-gui/1 'unit-get public-address'
10.10.11.141

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-07-19:

#17

The issue with the duplicate IP addresses on eth0 and br-eth0 are because of the following bug(s):

https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1563296
https://bugs.launchpad.net/curtin/+bug/1586075
https://bugs.launchpad.net/maas/+bug/1588706
https://bugs.launchpad.net/curtin/+bug/1582410

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-07-19:

#18

I plan to change Juju to not source the additional eth0.cfg's and that is captured as:

https://bugs.launchpad.net/juju-core/+bug/1604482

Revision history for this message

Andrew McDermott (frobware) wrote on 2016-07-19:

#19

Please see:

https://bugs.launchpad.net/maas/+bug/1590689

for the workarounds listed in comments:

  https://bugs.launchpad.net/maas/+bug/1590689/comments/13
  https://bugs.launchpad.net/maas/+bug/1590689/comments/16
  https://bugs.launchpad.net/maas/+bug/1590689/comments/18

@rajesh - would it be possible for you to try these workarounds until the fix for bug #1604482 lands?

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-20:

#20

Hi Andrew,

Using above workaround the ip duplicacy resolved, but issue of wrong ip allocation is still there for nodes.

ifconfig
br-eth0 Link encap:Ethernet HWaddr 52:54:00:46:62:07
          inet addr:10.9.1.161 Bcast:10.9.1.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe46:6207/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:8259 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4217 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:46366996 (46.3 MB) TX bytes:742318 (742.3 KB)

br-eth1 Link encap:Ethernet HWaddr 52:54:00:7b:fb:24
          inet addr:10.10.11.141 Bcast:10.10.11.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe7b:fb24/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:10638 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4685 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:34751479 (34.7 MB) TX bytes:428812 (428.8 KB)

eth0 Link encap:Ethernet HWaddr 52:54:00:46:62:07
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:9739 errors:0 dropped:29 overruns:0 frame:0
          TX packets:5015 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:62080374 (62.0 MB) TX bytes:814704 (814.7 KB)

eth1 Link encap:Ethernet HWaddr 52:54:00:7b:fb:24
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:12064 errors:0 dropped:49 overruns:0 frame:0
          TX packets:4694 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:36175696 (36.1 MB) TX bytes:429526 (429.5 KB)

juju status
MODEL CONTROLLER CLOUD VERSION
controller cplane-controller cplane 2.0-beta11

APP STATUS EXPOSED ORIGIN CHARM REV OS
juju-gui maintenance false jujucharms juju-gui 130 ubuntu

UNIT WORKLOAD AGENT MACHINE PORTS PUBLIC-ADDRESS MESSAGE
juju-gui/0 maintenance executing 1 10.10.11.141 (install) installing charm software

MACHINE STATE DNS INS-ID SERIES AZ
0 started 10.9.1.160 /MAAS/api/1.0/nodes/node-278facc0-4cb6-11e6-b6fa-5254004a236f/ trusty default
1 started 10.10.11.141 /MAAS/api/1.0/nodes/node-b43a591a-4cb4-11e6-aba7-5254004a236f/ trusty default

juju run --unit juju-gui/0 'network-get --primary-address web'
10.9.1.161
juju run --unit juju-gui/0 'unit-get private-address'
10.10.11.141
juju run --unit juju-gui/0 'unit-get public-address'
10.10.11.141

Hi Andrew,

Using above workaround the ip duplicacy resolved, but issue of wrong ip allocation is still there for nodes.

ifconfig
br-eth0   Link encap:Ethernet  HWaddr 52:54:00:46:62:07  
          inet addr:10.9.1.161  Bcast:10.9.1.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe46:6207/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8259 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4217 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:46366996 (46.3 MB)  TX bytes:742318 (742.3 KB)

br-eth1   Link encap:Ethernet  HWaddr 52:54:00:7b:fb:24  
          inet addr:10.10.11.141  Bcast:10.10.11.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe7b:fb24/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:10638 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4685 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:34751479 (34.7 MB)  TX bytes:428812 (428.8 KB)

eth0      Link encap:Ethernet  HWaddr 52:54:00:46:62:07  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:9739 errors:0 dropped:29 overruns:0 frame:0
          TX packets:5015 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:62080374 (62.0 MB)  TX bytes:814704 (814.7 KB)

eth1      Link encap:Ethernet  HWaddr 52:54:00:7b:fb:24  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:12064 errors:0 dropped:49 overruns:0 frame:0
          TX packets:4694 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:36175696 (36.1 MB)  TX bytes:429526 (429.5 KB)

juju status
MODEL       CONTROLLER         CLOUD   VERSION
controller  cplane-controller  cplane  2.0-beta11

APP       STATUS       EXPOSED  ORIGIN      CHARM     REV  OS
juju-gui  maintenance  false    jujucharms  juju-gui  130  ubuntu

UNIT        WORKLOAD     AGENT      MACHINE  PORTS  PUBLIC-ADDRESS  MESSAGE
juju-gui/0  maintenance  executing  1               10.10.11.141    (install) installing charm software

MACHINE  STATE    DNS           INS-ID                                                          SERIES  AZ
0        started  10.9.1.160    /MAAS/api/1.0/nodes/node-278facc0-4cb6-11e6-b6fa-5254004a236f/  trusty  default
1        started  10.10.11.141  /MAAS/api/1.0/nodes/node-b43a591a-4cb4-11e6-aba7-5254004a236f/  trusty  default

juju run --unit juju-gui/0 'network-get --primary-address web'
10.9.1.161
juju run --unit juju-gui/0 'unit-get private-address'
10.10.11.141
juju run --unit juju-gui/0 'unit-get public-address'
10.10.11.141

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-20:

#21

The node-group interfaces are shown here for reference at MAAS:--

http://paste.ubuntu.com/20164460/

Revision history for this message

john (g-john-p) wrote on 2016-07-20:

#22

Hi Andrew and John.

I can provide you with VPN access to our lab/system that these servers are running on if this will help you diagnose the problem easier. I think it may be more efficient if you can debug this and try things in real-time. Please let me know and I will coordinate with you.

thanks,
john

Revision history for this message

john (g-john-p) wrote on 2016-07-20:

#23

Network diagram of the test environment Edit (176.2 KiB, image/png)

Here is the network diagram of the environment.

Revision history for this message

Mick Gregg (macgreagoir) wrote on 2016-07-21:

#24

Just a note that I'm working on reproducing the bug, I'll have some results to post soon, but I haven't yet been able to exactly reproduce the issue.

I can make Juju use the unmanaged subnet as its public-addresses by using a range of a lower sort order than the managed/DHCP/PXE range, but this does not prevent relations.

I will update soon.

Revision history for this message

Mick Gregg (macgreagoir) wrote on 2016-07-22:

#25

@rajesh-canonical

I can reproduce something very similar by using a statically assigned IP address of a lower sort order than the PXE/DHCP address, if the statically assigned address is also assigned by MAAS (therefore seen as a provider address).

To work around this, I added a maas preseed to get a interfaces.d extension config to assign the static address, leaving Juju unaware of the address. Here is an example: http://pastebin.ubuntu.com/20468944/

In this example, my DHCP range (which I want Juju to use as public-address) is 192.168.124.0/24 (.2 is the maas server) and my static (your 'SSH'?) range is 192.168.123.0/24. I do not configure maas nodes with static addresses in the 123.x range, I add then in the /etc/network/interfaces.d/unmanaged.cfg file put in place in the curtin preseed.

Do you mind testing to see if this helps?

Another option is to use different ranges, ensuring the correct range for Juju to use is the lowest in sort order, but this may be more intrusive to your testing than the preseed option.

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-25:

#26

@Mick Gregg

I am trying as you suggested to add maas preseed, But I would like to update you that, I am already using lower sort order net for PXE/DHCP address i.e. (10.9.1.0/24) and for unmanaged net address is (10.10.11.0/24) as per #16.

https://bugs.launchpad.net/juju-core/+bug/1603473/comments/16

Definitely address (10.9.1.0/24) is lower sort order than (10.10.11.0/24), and this should be assign to various nodes and should not matter statically assigned at Maas or via preseed.

I have another setup with similar network address (10.14.0.0/24) for managed and (192.168.7.0/24) for unmanaged with juju 2.0 beta5, where this ip assignment work perfectly as desired for various nodes.

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-25:

#27

@Mick Gregg

As I understood from #25 we are not providing static ip address to eth1 of MAAS and taken care by unmanaged.cfg via preseed where you are creating bridge interface br-eth0 with eth1.

Please correct me, Is that you mean?

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-25:

#28

The MAAS interfaces in my environment is as below

http://paste.ubuntu.com/20849310/

Revision history for this message

Mick Gregg (macgreagoir) wrote on 2016-07-25:

#29

@rajesh-canonical

In my suggested test you would require the MAAS server itself to be configured with IP addresses, probably by a static means as you have in #28. It is the MAAS nodes (the machines deployed by your MAAS server) that would use the preseeds and interfaces files to get the unmanaged.cfg files in place.

Each node needs its own preseed file (named as per my example) and its own unmanaged.cfg file (again, as per my example). I hope that helps.

Revision history for this message

Mick Gregg (macgreagoir) wrote on 2016-07-25:

#30

@rajesh-canonical

I am expecting string, rather than numeric, IP addrs sorting (10.10.x.x lower than 10.9.x.x). Please let me confirm that.

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-25:

#31

Tried as shown below

http://paste.ubuntu.com/20858760/

But not getting effective, even after rebooting MAAS node as well. The 2nd interface i.e. br-eth0 bridge of eth1 is not coming up.

Please guide me, how to proceed? and what should be the interfaces file? I should delete all other files from /e/m/p/ other than curtin_userdata.

Right now I have below mentioned files in that directory

http://paste.ubuntu.com/20859130/

Revision history for this message

Mick Gregg (macgreagoir) wrote on 2016-07-25:

#32

@rajesh-canoncial

In my example, 'curtin_userdata_amd64_generic_trusty_juju0' is a preseed file for a machine named juju0. You will need something similar for each of your machines, using their host names in the file names.

The interface file wget gets (in the preseed) is also specific to each machine. Each machine needs to get its own unmanged.cfg file (served from the maas server /var/www/html/<machine_name>.interfaces_d).

Your unmanaged.cfg file per machine should contain the interface configuration for the IP address you do not want Juju to know about (your 'SSH'/unmanaged interface?).

Mick Gregg (macgreagoir) on 2016-07-25

Changed in juju-core:
assignee:	nobody → Mick Gregg (macgreagoir)

Revision history for this message

Mick Gregg (macgreagoir) wrote on 2016-07-25:

#33

@g-john-p

Yes, please, if you can give me access to your system, I'm keen to take a look.

I'll contact you by email.

Revision history for this message

john (g-john-p) wrote on 2016-07-25: RE: [Bug 1603473] Re: Relation fails as untis/machines are on different subnet on a multi NIC setup, juju 2.0 beta11

#34

Download full text (4.9 KiB)

Hey Mike- let's have a call . Please contact me on my cell # below. I can set you up on our vpn but let's go over one-on-one.
Thanks,
John

John Casey
Founder/CTO
CPLANE NETWORKS
Applications, say hello to your new network!TM
Direct: 415-215-0854
Skype: johnacasey
<email address hidden>
www.cplanenetworks.com
-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Mick Gregg
Sent: Monday, July 25, 2016 7:43 AM
To: John Casey
Subject: [Bug 1603473] Re: Relation fails as untis/machines are on different subnet on a multi NIC setup, juju 2.0 beta11

@g-john-p

Yes, please, if you can give me access to your system, I'm keen to take a look.

I'll contact you by email.

--
You received this bug notification because you are subscribed to the bug report.
https://bugs.launchpad.net/bugs/1603473

Title:
Relation fails as untis/machines are on different subnet on a multi
NIC setup, juju 2.0 beta11

Status in juju-core:
Incomplete

Bug description:
On a multi NIC setup, we notice with Juju 2.0 beta11 and MAAS 1.9, some machines' preferred private addresses are not the IP address provisioned and provided by MAAS but the one from a second NIC where we set the IP manually for SSH remote access. The machines and the containers have a valid IP from both subnets, but when the relation is established between the nova compute and the nova cloud controller, it performs certain reverse DNS operation and it gives the error: http://paste.ubuntu.com/19349672/
With Juju 2.0 beta5, this did not happen.

  Setup details:
  As the charm is still be worked on, I had attached the yaml file to deploy the charms of openstack bundle only, and the yaml file is attached to this bug.
  Note: here we are not using the neutron-gateway and few other nodes from openstack(Which our charm will take care of).

  With Juju 2.0 beta11:
  MAAS (Running in a VM in Virtual machine manager)
                  Version 1.9
                  eth0: 10.10.11.152(Unmanaged)
                  eth1: 10.9.1.10(Managed)

  JUJU (in a VM in Virtual machine manager)
                  Version 2.0 beta11
                  eth0: 10.10.11.151
                  eth1: 10.9.1.151
                  DNS: 10.9.1.10
    .local/share/juju/controller.yaml file content
        cplane-controller:
      unresolved-api-endpoints: ['10.9.1.160:17070', '10.10.11.140:17070']
      uuid: 1a5c7aba-18ad-457b-8bb6-7aa6feb6e1cf
      api-endpoints: ['10.9.1.160:17070', '10.10.11.140:17070']

Juju status: in tabular format: http://paste.ubuntu.com/19501334/

Juju status in yaml format: http://paste.ubuntu.com/19501233/

Here machine 0 is the bootstrap node, where the dns name is an IP from the managed(DNS and DHCP) network that is from 10.9.1.x series.
But in case of other nodes(1, 2 and 3) the dns name are the IP from the unmanaged network that is from 10.10.11.x series

  With Juju 2.0 beta5:
  MAAS (Running in a VM in Virtual machine manager)
                  Version 1.9
                  eth0: 192.168.7.101(Unmanaged)
                  eth1: 10.14.0.1(MANAGED) ...

Hey Mike- let's have a call .   Please contact me on my cell # below.    I can set you up on our vpn but let's go over one-on-one.  
Thanks,
John

John Casey
Founder/CTO
CPLANE NETWORKS
Applications, say hello to your new network!TM
Direct: 415-215-0854
Skype: johnacasey  
john@cplanenetworks.com
www.cplanenetworks.com
-----Original Message-----
From: bounces@canonical.com [mailto:bounces@canonical.com] On Behalf Of Mick Gregg
Sent: Monday, July 25, 2016 7:43 AM
To: John Casey
Subject: [Bug 1603473] Re: Relation fails as untis/machines are on different subnet on a multi NIC setup, juju 2.0 beta11

@g-john-p

Yes, please, if you can give me access to your system, I'm keen to take a look.

I'll contact you by email.

--
You received this bug notification because you are subscribed to the bug report.
https://bugs.launchpad.net/bugs/1603473

Title:
  Relation fails as untis/machines are on different subnet on a multi
  NIC setup, juju 2.0 beta11

Status in juju-core:
  Incomplete

Bug description:
  On a multi NIC setup, we notice with Juju 2.0 beta11 and MAAS 1.9, some machines' preferred private addresses are not the IP address provisioned and provided by MAAS but the one from a second NIC where we set the IP manually for SSH remote access. The machines and the containers have a valid IP from both subnets, but when the relation is established between the nova compute and the nova cloud controller, it performs certain reverse DNS operation and it gives the error: http://paste.ubuntu.com/19349672/
  With Juju 2.0 beta5, this did not happen.

Setup details:
  As the charm is still be worked on, I had attached the yaml file to deploy the charms of openstack bundle only, and the yaml file is attached to this bug. 
  Note: here we are not using the neutron-gateway and few other nodes from openstack(Which our charm will take care of).

With Juju 2.0 beta11:
  MAAS (Running in a VM in Virtual machine manager)
                  Version 1.9
                  eth0: 10.10.11.152(Unmanaged)
                  eth1: 10.9.1.10(Managed)    
              
  JUJU (in a VM in Virtual machine manager)
                  Version 2.0 beta11
                  eth0: 10.10.11.151
                  eth1: 10.9.1.151
                  DNS: 10.9.1.10
    .local/share/juju/controller.yaml file content 
        cplane-controller:
      unresolved-api-endpoints: ['10.9.1.160:17070', '10.10.11.140:17070']
      uuid: 1a5c7aba-18ad-457b-8bb6-7aa6feb6e1cf
      api-endpoints: ['10.9.1.160:17070', '10.10.11.140:17070']

Juju status: in tabular format: http://paste.ubuntu.com/19501334/

Juju status in yaml format: http://paste.ubuntu.com/19501233/

Here machine 0 is the bootstrap node, where the dns name is an IP from the managed(DNS and DHCP) network that is from 10.9.1.x series. 
  But in case of other nodes(1, 2 and 3) the dns name are the IP from the unmanaged network that is from 10.10.11.x series

With Juju 2.0 beta5:
  MAAS (Running in a VM in Virtual machine manager)
                  Version 1.9
                  eth0: 192.168.7.101(Unmanaged)
                  eth1: 10.14.0.1(MANAGED)                
  JUJU (in a VM in Virtual machine manager)
                  Version 2.0 beta5
                  eth0: 192.168.7.103
                  eth1: 10.14.0.151
                  DNS: 10.14.0.1

Content of .local/share/juju/controller.yaml
  controllers:
    local.cplane-controller:
      unresolved-api-endpoints: ['192.168.7.113:17070', '10.14.0.100:17070']
      uuid: eab2b453-e719-40e4-8453-76441986207b
      api-endpoints: ['192.168.7.113:17070', '10.14.0.100:17070']

Juju status:
   [Machines]
  ID         STATE   DNS         INS-ID                                                         SERIES AZ
  0          started 10.14.0.100 /MAAS/api/1.0/nodes/node-22777ed6-0de4-11e6-8c34-000c297e53d6/ trusty default
  1          pending 10.14.0.102 /MAAS/api/1.0/nodes/node-248caeb2-0de4-11e6-94c7-000c297e53d6/ trusty default
  2          pending 10.14.0.103 /MAAS/api/1.0/nodes/node-2523d274-0de4-11e6-8747-000c297e53d6/ trusty default
  3          pending 10.14.0.101 /MAAS/api/1.0/nodes/node-22fd8fda-0de4-11e6-b551-000c297e53d6/ trusty default

Additional info:
  We tried to use --bind to deploy the charm to a specific network space, but it doesn't work.
  We defined 2 spaces in MAAS:
  unused: 10.10.11.x series
  default: 10.9.1.x series

These spaces can be listed through the juju command "juju list-spaces"
  in the juju controller node, which is listing all the spaces in our
  setup properly. We 'juju deploy juju-gui --bind Default<please
  confirm>' but the preferred private-address is still from the
  10.10.11.x series. Used juju run --unit juju-gui/0 "unit-get private-
  address" to check.

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju-core/+bug/1603473/+subscriptions

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-25: Re: Relation fails as untis/machines are on different subnet on a multi NIC setup, juju 2.0 beta11

#35

@Mick Gregg,

Sure, will provide you access, Please contact John over email.

2ndly your suggested workaround #32 worked fine, but as I am trying to deploy openstack which has many applications and that resides on 2-3 hosts and various lxd container's, So in this approach I need to define many <curtin_userdata_amd64_generic_trusty_<hostname>> and similarly their respective interface_d files in /v/w/h/ directory. This approach looks very painful and big manual dependency.

So can we not try 2nd approach of lower IP addrs sorting way. Please help me understanding this concept of IP addrs sorting.

How come in your case 192.168.123.0/24 is lower sort order than 192.168.124.0/24?
and similarly in my case 10.10.11.0/24 is lower sort order than 10.9.1.0/24?

Once this concept get clarity then will make that thumb rule in our upcoming deployments and developments.

Revision history for this message

Mick Gregg (macgreagoir) wrote on 2016-07-26:

#36

@rajesh-canonical

If IP addresses are sorted as strings, 10.1x.x.x will be alphabetically lower than 10.9.x.x (1 before 9).

Revision history for this message

Rajesh Kumar Chaturvedi (rajesh-canonical) wrote on 2016-07-26:

#37

@Mick

Got it!

One more clarification, related to workaround mentioned in #32, for lxd containers how to write <curtin_userdata_amd64_generic_trusty_<hostname>> and interface_d files? An example may help me.

As lxd containers provides hostnames randomly as shown below

http://paste.ubuntu.com/20987920/

Please suggest

Revision history for this message

Mick Gregg (macgreagoir) wrote on 2016-07-26:

#38

@rajesh-canonical

I'm afraid I don't have a very good (or simple) work-around for the containers. I think we are seeing that the sort order of the addresses is causing you problems, so how happy are you (for now) to use different IP addresses ranges in your environment, with the required addresses for Juju in the lower-sorted range?

Even using `--bind managed` (for example, with the unmanaged subnet in its own space) I'm seeing an unamanged address as the machine address, and with bug 1604482 the containers can miss the address they need.

(We are collecting good data here for a bug around all of this.)

Revision history for this message

Mick Gregg (macgreagoir) wrote on 2016-07-26:

#39

Just noting that I've talked to John and have access to the cplane system.

Revision history for this message

Mick Gregg (macgreagoir) wrote on 2016-07-27:

#40

@rajesh-canonical

I want to summarise the results of my testing and update what I think the issues are, please.

I believe the main issue I'm seeing is the trouble-making /etc/network/interfaces.d/eth0.cfg left by cloud-init on both metal and containers. This can break networking, leaving units unable to communicate. For example, the default gateway and routes to nameserver can be broken. (This is an issue mentioned earlier in this bug thread.)

Juju using unexpected ('unmanaged') IP addresses as private-address for units (due to IP address sorting) may be exacerbating this problem (and be a confusion) but, on its own, may not necessarily break as long as the units can communicate to API endpoints on whichever subnets their private-addresses are on.

I have tested with a 'managed' and an 'unmanaged' space for my PXE/DHCP and statically assigned (in MAAS) subnets, respectively. The unmanaged subnet sorts lower (as per your environment) to have Juju use it in preference. Also like your environment, eth1 has the managed subnet, not eth0.

Juju ends up deploying metal machines with their unmanaged address as private-address, while containers get their managed addresses. Applications are deployed with `--bind managed` to have relations use the managed space's subnet, whatever the private-address of the units.

This configuration will (by default) fail with broken container networking, where default gateway and route to nameserver, as well as the unmanaged address, are missing as a result of eth0.cfg trying to use DHCP.

Nothing fails specifically as a result of the container private addresses and the metal private addresses being in different subnets, as both cases can still reach the api server. (The containers just don't have the static unmanaged address, which is removed by the eth0.cfg file replacing the config with dhcp).

To work-around networking, to prove that eth0.cfg may be the main issue, I have treated metal and containers differently. For metal, I have added a late_commands line to
/etc/maas/preseeds/curtin_userdata to remove the eth0.cfg file. (As we did per machine in http://pastebin.ubuntu.com/20468944/ line 10.) For containers, I added an install hook to a fork of the 'ubuntu' charm to remove the file and reload the interfaces. Then, for any applications to be deployed to containers, I first deployed an 'ubuntu' application unit `--to lxd:$some_machine`, before deploying an application --to a machine with an 'ubuntu' unit. This essentially uses a modified 'ubuntu' charm to fix networking before deploying the required application.

With the work-arounds to fix eth0 networking (and the --bind space used in application deployments) I was able to see successful relations added, despite addresses from different subnets used as units' private-address.

When Juju machines were deployed with PXE boot and DHCP on eth0, and the DHCP address range of a lower sort-order than the static range, this behaviour was hidden and I was able to deploy without the work-arounds.

I believe bug 1604482 (to remove eth0.cfg and reload interfaces) should resolve this bug, and we are working towards this fix.