Relation fails as units/machines are on different subnet on a multi NIC setup, juju 2.0 beta11

Bug #1603473 reported by Amarendra Meher
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Invalid
High
Unassigned
2.0
Won't Fix
Undecided
Unassigned

Bug Description

On a multi NIC setup, we notice with Juju 2.0 beta11 and MAAS 1.9, some machines' preferred private addresses are not the IP address provisioned and provided by MAAS but the one from a second NIC where we set the IP manually for SSH remote access. The machines and the containers have a valid IP from both subnets, but when the relation is established between the nova compute and the nova cloud controller, it performs certain reverse DNS operation and it gives the error: http://paste.ubuntu.com/19349672/
With Juju 2.0 beta5, this did not happen.

Setup details:
As the charm is still be worked on, I had attached the yaml file to deploy the charms of openstack bundle only, and the yaml file is attached to this bug.
Note: here we are not using the neutron-gateway and few other nodes from openstack(Which our charm will take care of).

With Juju 2.0 beta11:
MAAS (Running in a VM in Virtual machine manager)
                Version 1.9
                eth0: 10.10.11.152(Unmanaged)
                eth1: 10.9.1.10(Managed)

JUJU (in a VM in Virtual machine manager)
                Version 2.0 beta11
                eth0: 10.10.11.151
                eth1: 10.9.1.151
                DNS: 10.9.1.10
  .local/share/juju/controller.yaml file content
      cplane-controller:
    unresolved-api-endpoints: ['10.9.1.160:17070', '10.10.11.140:17070']
    uuid: 1a5c7aba-18ad-457b-8bb6-7aa6feb6e1cf
    api-endpoints: ['10.9.1.160:17070', '10.10.11.140:17070']

Juju status: in tabular format: http://paste.ubuntu.com/19501334/

Juju status in yaml format: http://paste.ubuntu.com/19501233/

Here machine 0 is the bootstrap node, where the dns name is an IP from the managed(DNS and DHCP) network that is from 10.9.1.x series.
But in case of other nodes(1, 2 and 3) the dns name are the IP from the unmanaged network that is from 10.10.11.x series

With Juju 2.0 beta5:
MAAS (Running in a VM in Virtual machine manager)
                Version 1.9
                eth0: 192.168.7.101(Unmanaged)
                eth1: 10.14.0.1(MANAGED)
JUJU (in a VM in Virtual machine manager)
                Version 2.0 beta5
                eth0: 192.168.7.103
                eth1: 10.14.0.151
                DNS: 10.14.0.1

Content of .local/share/juju/controller.yaml
controllers:
  local.cplane-controller:
    unresolved-api-endpoints: ['192.168.7.113:17070', '10.14.0.100:17070']
    uuid: eab2b453-e719-40e4-8453-76441986207b
    api-endpoints: ['192.168.7.113:17070', '10.14.0.100:17070']

Juju status:
 [Machines]
ID STATE DNS INS-ID SERIES AZ
0 started 10.14.0.100 /MAAS/api/1.0/nodes/node-22777ed6-0de4-11e6-8c34-000c297e53d6/ trusty default
1 pending 10.14.0.102 /MAAS/api/1.0/nodes/node-248caeb2-0de4-11e6-94c7-000c297e53d6/ trusty default
2 pending 10.14.0.103 /MAAS/api/1.0/nodes/node-2523d274-0de4-11e6-8747-000c297e53d6/ trusty default
3 pending 10.14.0.101 /MAAS/api/1.0/nodes/node-22fd8fda-0de4-11e6-b551-000c297e53d6/ trusty default

Additional info:
We tried to use --bind to deploy the charm to a specific network space, but it doesn't work.
We defined 2 spaces in MAAS:
unused: 10.10.11.x series
default: 10.9.1.x series

These spaces can be listed through the juju command "juju list-spaces" in the juju controller node, which is listing all the spaces in our setup properly. We 'juju deploy juju-gui --bind Default<please confirm>' but the preferred private-address is still from the 10.10.11.x series. Used juju run --unit juju-gui/0 "unit-get private-address" to check.

Tags: 11 2.0 beta juju rteam
Revision history for this message
Amarendra Meher (amarendra-meher) wrote :
Changed in juju-core:
importance: Undecided → High
Revision history for this message
Andrew McDermott (frobware) wrote :

Do we have a screenshot of the MAAS 1.9 network setup for these nodes? I am interested to see the order of the interfaces, what is configured (static, dhcp, et al) and which is the PXE interface. Given that I will look at reproducing. Thanks.

Changed in juju-core:
status: New → Incomplete
Revision history for this message
Andrew McDermott (frobware) wrote :

It's also possible that this was working accidentally before this PR landed:

  http://reviews.vapour.ws/r/4903/

Revision history for this message
Andrew McDermott (frobware) wrote :

Please could you also elaborate on:

 "We tried to use --bind to deploy the charm to a specific network space, but it doesn't work."

and the output of:

  $ juju list-spaces

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

juju list-spaces
spaces:
  space-0:
    10.9.1.0/24:
      type: ipv4
      provider-id: "2"
      status: in-use
      zones:
      - default
    10.10.11.0/24:
      type: ipv4
      provider-id: "1"
      status: in-use
      zones:
      - default
    192.168.220.0/24:
      type: ipv4
      provider-id: "3"
      status: in-use
      zones:
      - default

Revision history for this message
John A Meinel (jameinel) wrote :

To elaborate on what details we are looking for about: "we tried to use --bind".

From what I saw, you seem to be using a Bundle to deploy many applications. We don't currently support '--bind' syntax for bundles, only for single applications.
We either need you to edit the bundle to add a "bindings" section for the various applications, or you could try just deploying one of the applications manually and see if it comes up configured as you would expect.

Revision history for this message
John A Meinel (jameinel) wrote :

Looking at the very first pastebin:
2016-07-14 10:10:00 INFO worker.uniter.jujuc server.go:173 running hook tool "relation-get" ["--format=json" "hostname"]
2016-07-14 10:10:00 INFO cloud-compute-relation-changed Traceback (most recent call last):
...
2016-07-14 10:10:00 INFO cloud-compute-relation-changed File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 910, in query
2016-07-14 10:10:00 INFO cloud-compute-relation-changed raise NXDOMAIN
2016-07-14 10:10:00 INFO cloud-compute-relation-changed dns.resolver.NXDOMAIN

Hints that it might even be something else that is failing. Something about how we are telling MAAS about the multiple IP addresses for a container might cause MAAS to get confused and not able to return the normal hostname for one of the secondary addresses.

Revision history for this message
John A Meinel (jameinel) wrote :

We have a setup here with MAAS 1.9 with hosts that have multiple network cards. When we do that, we end up with MAAS providing 2 IP addresses for the machine (as expected), but DNS lookup does not return both IP addresses, and reverse DNS lookup fails for the second IP address.

We haven't confirmed if the behavior is the same on MAAS 2, but this bug comment:
https://bugs.launchpad.net/maas/+bug/1599223/comments/5

Seems to say that MAAS 2 will use what it considers the "primary" IP as "host.maas" and then create additional records for "nic.host.maas" for the rest of the network interfaces.

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

Hi John,

Tried installing single application example juju-gui

juju status
MODEL CONTROLLER CLOUD VERSION
controller cplane-controller cplane 2.0-beta11

APP STATUS EXPOSED ORIGIN CHARM REV OS
juju-gui unknown false jujucharms juju-gui 130 ubuntu

UNIT WORKLOAD AGENT MACHINE PORTS PUBLIC-ADDRESS MESSAGE
juju-gui/2 unknown idle 3 80/tcp,443/tcp 10.10.11.141

MACHINE STATE DNS INS-ID SERIES AZ
0 started 10.9.1.160 /MAAS/api/1.0/nodes/node-278facc0-4cb6-11e6-b6fa-5254004a236f/ trusty default
3 started 10.10.11.141 /MAAS/api/1.0/nodes/node-b43a591a-4cb4-11e6-aba7-5254004a236f/ trusty default

and even via juju, while fetching addresses always fetching from fabric=maas-external and space=unused; whereas it suppose to fetch from fabric=maas-management and space default.

juju run --unit juju-gui/2 'unit-get private-address'
10.10.11.141

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

Hi guys, Still I am facing this issue and no resolution yet.

Net info for one of deployed node is as shown below:--

One thing is strange that, both eth0 and br-eth0 contains the same ip and it's not vanishing from interface, even bridge is also active.

ifconfig
br-eth0 Link encap:Ethernet HWaddr 52:54:00:46:62:07
          inet addr:10.9.1.161 Bcast:10.9.1.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe46:6207/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:33215 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1811 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:46808059 (46.8 MB) TX bytes:148181 (148.1 KB)

br-eth1 Link encap:Ethernet HWaddr 52:54:00:7b:fb:24
          inet addr:10.10.11.141 Bcast:10.10.11.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe7b:fb24/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:187342 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10081 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:183660270 (183.6 MB) TX bytes:1556527 (1.5 MB)

eth0 Link encap:Ethernet HWaddr 52:54:00:46:62:07
          inet addr:10.9.1.161 Bcast:10.9.1.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:34705 errors:0 dropped:6 overruns:0 frame:0
          TX packets:2569 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:62410780 (62.4 MB) TX bytes:218908 (218.9 KB)

eth1 Link encap:Ethernet HWaddr 52:54:00:7b:fb:24
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:188811 errors:0 dropped:49 overruns:0 frame:0
          TX packets:10091 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:187626400 (187.6 MB) TX bytes:1557343 (1.5 MB)

brctl show
bridge name bridge id STP enabled interfaces
br-eth0 8000.525400466207 no eth0
br-eth1 8000.5254007bfb24 no eth1

The contents of interfaces file is: vi /etc/network/interfaces
auto lo
iface lo inet loopback
    dns-nameservers 10.9.1.10
    dns-search maas

iface eth0 inet manual

auto br-eth0
iface br-eth0 inet static
    gateway 10.9.1.10
    address 10.9.1.161/24
    mtu 1500
    bridge_ports eth0

iface eth1 inet manual

auto br-eth1
iface br-eth1 inet static
    address 10.10.11.141/24
    mtu 1500
    bridge_ports eth1

Revision history for this message
Andrew McDermott (frobware) wrote :

Was there a 'source /etc/network/interfaces.d/*.cfg' stanza at the bottom of /etc/network/interfaces? And if so, could you attach any files that are in the interfaces.d directory. Thanks.

Revision history for this message
Andrew McDermott (frobware) wrote :

Please could you run the following on the machine (from comment #10) and share the output:

 $ ps -ef | grep dhclient

Revision history for this message
John A Meinel (jameinel) wrote :

How did you install juju-gui? just plain "juju deploy juju-gui"? can you try "juju deploy juju-gui --bind web=default" And then give the output of:

  juju run --unit juju-gui/0 'network-get --primary-address web'

That should be the same value you would get if the GUI charm was running "unit-get private-address" inside of a relation context. (Since it isn't actually related to anything yet, we can't just try unit-get private-address in a context.)

Charms should be updated to use 'network-get' where possible, but in the short term 'unit-get private-address' should be able to give the correct answer once we know what endpoint the Charm is asking about.

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

ubuntu@juju-gui:~$ ps -ef | grep dhclient
root 5281 1 0 08:43 ? 00:00:00 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

The contents of interfaces.d are as follows:--

ubuntu@juju-gui:~$ ls /etc/network/interfaces.d/
eth0.cfg
ubuntu@juju-gui:~$ cat /etc/network/interfaces.d/eth0.cfg
# The primary network interface
auto eth0
iface eth0 inet dhcp
ubuntu@juju-gui:~$

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

Hi john,

Redeployed the application with options you suggested, the various outputs are available at below link

http://paste.ubuntu.com/20042147/

But still DNS address for this machine is from external fabric and unused space.

juju list-spaces
spaces:
  default:
    10.9.1.0/24:
      type: ipv4
      provider-id: "2"
      status: in-use
      zones:
      - default
    192.168.220.0/24:
      type: ipv4
      provider-id: "3"
      status: in-use
      zones:
      - default
  unused:
    10.10.11.0/24:
      type: ipv4
      provider-id: "1"
      status: in-use
      zones:
      - default

Juju still fetches api-endpoints as shown below for private/public-address

juju run --unit juju-gui/1 'unit-get private-address'
10.10.11.141
juju run --unit juju-gui/1 'unit-get public-address'
10.10.11.141

Revision history for this message
Andrew McDermott (frobware) wrote :
Revision history for this message
Andrew McDermott (frobware) wrote :

I plan to change Juju to not source the additional eth0.cfg's and that is captured as:

  https://bugs.launchpad.net/juju-core/+bug/1604482

Revision history for this message
Andrew McDermott (frobware) wrote :

Please see:

  https://bugs.launchpad.net/maas/+bug/1590689

for the workarounds listed in comments:

  https://bugs.launchpad.net/maas/+bug/1590689/comments/13
  https://bugs.launchpad.net/maas/+bug/1590689/comments/16
  https://bugs.launchpad.net/maas/+bug/1590689/comments/18

@rajesh - would it be possible for you to try these workarounds until the fix for bug #1604482 lands?

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

Hi Andrew,

Using above workaround the ip duplicacy resolved, but issue of wrong ip allocation is still there for nodes.

ifconfig
br-eth0 Link encap:Ethernet HWaddr 52:54:00:46:62:07
          inet addr:10.9.1.161 Bcast:10.9.1.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe46:6207/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:8259 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4217 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:46366996 (46.3 MB) TX bytes:742318 (742.3 KB)

br-eth1 Link encap:Ethernet HWaddr 52:54:00:7b:fb:24
          inet addr:10.10.11.141 Bcast:10.10.11.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe7b:fb24/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:10638 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4685 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:34751479 (34.7 MB) TX bytes:428812 (428.8 KB)

eth0 Link encap:Ethernet HWaddr 52:54:00:46:62:07
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:9739 errors:0 dropped:29 overruns:0 frame:0
          TX packets:5015 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:62080374 (62.0 MB) TX bytes:814704 (814.7 KB)

eth1 Link encap:Ethernet HWaddr 52:54:00:7b:fb:24
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:12064 errors:0 dropped:49 overruns:0 frame:0
          TX packets:4694 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:36175696 (36.1 MB) TX bytes:429526 (429.5 KB)

juju status
MODEL CONTROLLER CLOUD VERSION
controller cplane-controller cplane 2.0-beta11

APP STATUS EXPOSED ORIGIN CHARM REV OS
juju-gui maintenance false jujucharms juju-gui 130 ubuntu

UNIT WORKLOAD AGENT MACHINE PORTS PUBLIC-ADDRESS MESSAGE
juju-gui/0 maintenance executing 1 10.10.11.141 (install) installing charm software

MACHINE STATE DNS INS-ID SERIES AZ
0 started 10.9.1.160 /MAAS/api/1.0/nodes/node-278facc0-4cb6-11e6-b6fa-5254004a236f/ trusty default
1 started 10.10.11.141 /MAAS/api/1.0/nodes/node-b43a591a-4cb4-11e6-aba7-5254004a236f/ trusty default

juju run --unit juju-gui/0 'network-get --primary-address web'
10.9.1.161
juju run --unit juju-gui/0 'unit-get private-address'
10.10.11.141
juju run --unit juju-gui/0 'unit-get public-address'
10.10.11.141

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

The node-group interfaces are shown here for reference at MAAS:--

http://paste.ubuntu.com/20164460/

Revision history for this message
john (g-john-p) wrote :

Hi Andrew and John.

I can provide you with VPN access to our lab/system that these servers are running on if this will help you diagnose the problem easier. I think it may be more efficient if you can debug this and try things in real-time. Please let me know and I will coordinate with you.

thanks,
john

Revision history for this message
john (g-john-p) wrote :

Here is the network diagram of the environment.

Revision history for this message
Mick Gregg (macgreagoir) wrote :

Just a note that I'm working on reproducing the bug, I'll have some results to post soon, but I haven't yet been able to exactly reproduce the issue.

I can make Juju use the unmanaged subnet as its public-addresses by using a range of a lower sort order than the managed/DHCP/PXE range, but this does not prevent relations.

I will update soon.

Revision history for this message
Mick Gregg (macgreagoir) wrote :

@rajesh-canonical

I can reproduce something very similar by using a statically assigned IP address of a lower sort order than the PXE/DHCP address, if the statically assigned address is also assigned by MAAS (therefore seen as a provider address).

To work around this, I added a maas preseed to get a interfaces.d extension config to assign the static address, leaving Juju unaware of the address. Here is an example: http://pastebin.ubuntu.com/20468944/

In this example, my DHCP range (which I want Juju to use as public-address) is 192.168.124.0/24 (.2 is the maas server) and my static (your 'SSH'?) range is 192.168.123.0/24. I do not configure maas nodes with static addresses in the 123.x range, I add then in the /etc/network/interfaces.d/unmanaged.cfg file put in place in the curtin preseed.

Do you mind testing to see if this helps?

Another option is to use different ranges, ensuring the correct range for Juju to use is the lowest in sort order, but this may be more intrusive to your testing than the preseed option.

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

@Mick Gregg

I am trying as you suggested to add maas preseed, But I would like to update you that, I am already using lower sort order net for PXE/DHCP address i.e. (10.9.1.0/24) and for unmanaged net address is (10.10.11.0/24) as per #16.

https://bugs.launchpad.net/juju-core/+bug/1603473/comments/16

Definitely address (10.9.1.0/24) is lower sort order than (10.10.11.0/24), and this should be assign to various nodes and should not matter statically assigned at Maas or via preseed.

I have another setup with similar network address (10.14.0.0/24) for managed and (192.168.7.0/24) for unmanaged with juju 2.0 beta5, where this ip assignment work perfectly as desired for various nodes.

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

@Mick Gregg

As I understood from #25 we are not providing static ip address to eth1 of MAAS and taken care by unmanaged.cfg via preseed where you are creating bridge interface br-eth0 with eth1.

Please correct me, Is that you mean?

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

The MAAS interfaces in my environment is as below

http://paste.ubuntu.com/20849310/

Revision history for this message
Mick Gregg (macgreagoir) wrote :

@rajesh-canonical

In my suggested test you would require the MAAS server itself to be configured with IP addresses, probably by a static means as you have in #28. It is the MAAS nodes (the machines deployed by your MAAS server) that would use the preseeds and interfaces files to get the unmanaged.cfg files in place.

Each node needs its own preseed file (named as per my example) and its own unmanaged.cfg file (again, as per my example). I hope that helps.

Revision history for this message
Mick Gregg (macgreagoir) wrote :

@rajesh-canonical

I am expecting string, rather than numeric, IP addrs sorting (10.10.x.x lower than 10.9.x.x). Please let me confirm that.

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

Tried as shown below

http://paste.ubuntu.com/20858760/

But not getting effective, even after rebooting MAAS node as well. The 2nd interface i.e. br-eth0 bridge of eth1 is not coming up.

Please guide me, how to proceed? and what should be the interfaces file? I should delete all other files from /e/m/p/ other than curtin_userdata.

Right now I have below mentioned files in that directory

http://paste.ubuntu.com/20859130/

Revision history for this message
Mick Gregg (macgreagoir) wrote :

@rajesh-canoncial

In my example, 'curtin_userdata_amd64_generic_trusty_juju0' is a preseed file for a machine named juju0. You will need something similar for each of your machines, using their host names in the file names.

The interface file wget gets (in the preseed) is also specific to each machine. Each machine needs to get its own unmanged.cfg file (served from the maas server /var/www/html/<machine_name>.interfaces_d).

Your unmanaged.cfg file per machine should contain the interface configuration for the IP address you do not want Juju to know about (your 'SSH'/unmanaged interface?).

Mick Gregg (macgreagoir)
Changed in juju-core:
assignee: nobody → Mick Gregg (macgreagoir)
Revision history for this message
Mick Gregg (macgreagoir) wrote :

@g-john-p

Yes, please, if you can give me access to your system, I'm keen to take a look.

I'll contact you by email.

Revision history for this message
john (g-john-p) wrote : RE: [Bug 1603473] Re: Relation fails as untis/machines are on different subnet on a multi NIC setup, juju 2.0 beta11
Download full text (4.9 KiB)

Hey Mike- let's have a call . Please contact me on my cell # below. I can set you up on our vpn but let's go over one-on-one.
Thanks,
John

John Casey
Founder/CTO
CPLANE NETWORKS
Applications, say hello to your new network!TM
Direct: 415-215-0854
Skype: johnacasey
<email address hidden>
www.cplanenetworks.com
-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Mick Gregg
Sent: Monday, July 25, 2016 7:43 AM
To: John Casey
Subject: [Bug 1603473] Re: Relation fails as untis/machines are on different subnet on a multi NIC setup, juju 2.0 beta11

@g-john-p

Yes, please, if you can give me access to your system, I'm keen to take a look.

I'll contact you by email.

--
You received this bug notification because you are subscribed to the bug report.
https://bugs.launchpad.net/bugs/1603473

Title:
  Relation fails as untis/machines are on different subnet on a multi
  NIC setup, juju 2.0 beta11

Status in juju-core:
  Incomplete

Bug description:
  On a multi NIC setup, we notice with Juju 2.0 beta11 and MAAS 1.9, some machines' preferred private addresses are not the IP address provisioned and provided by MAAS but the one from a second NIC where we set the IP manually for SSH remote access. The machines and the containers have a valid IP from both subnets, but when the relation is established between the nova compute and the nova cloud controller, it performs certain reverse DNS operation and it gives the error: http://paste.ubuntu.com/19349672/
  With Juju 2.0 beta5, this did not happen.

  Setup details:
  As the charm is still be worked on, I had attached the yaml file to deploy the charms of openstack bundle only, and the yaml file is attached to this bug.
  Note: here we are not using the neutron-gateway and few other nodes from openstack(Which our charm will take care of).

  With Juju 2.0 beta11:
  MAAS (Running in a VM in Virtual machine manager)
                  Version 1.9
                  eth0: 10.10.11.152(Unmanaged)
                  eth1: 10.9.1.10(Managed)

  JUJU (in a VM in Virtual machine manager)
                  Version 2.0 beta11
                  eth0: 10.10.11.151
                  eth1: 10.9.1.151
                  DNS: 10.9.1.10
    .local/share/juju/controller.yaml file content
        cplane-controller:
      unresolved-api-endpoints: ['10.9.1.160:17070', '10.10.11.140:17070']
      uuid: 1a5c7aba-18ad-457b-8bb6-7aa6feb6e1cf
      api-endpoints: ['10.9.1.160:17070', '10.10.11.140:17070']

  Juju status: in tabular format: http://paste.ubuntu.com/19501334/

  Juju status in yaml format: http://paste.ubuntu.com/19501233/

  Here machine 0 is the bootstrap node, where the dns name is an IP from the managed(DNS and DHCP) network that is from 10.9.1.x series.
  But in case of other nodes(1, 2 and 3) the dns name are the IP from the unmanaged network that is from 10.10.11.x series

  With Juju 2.0 beta5:
  MAAS (Running in a VM in Virtual machine manager)
                  Version 1.9
                  eth0: 192.168.7.101(Unmanaged)
                  eth1: 10.14.0.1(MANAGED) ...

Read more...

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote : Re: Relation fails as untis/machines are on different subnet on a multi NIC setup, juju 2.0 beta11

@Mick Gregg,

Sure, will provide you access, Please contact John over email.

2ndly your suggested workaround #32 worked fine, but as I am trying to deploy openstack which has many applications and that resides on 2-3 hosts and various lxd container's, So in this approach I need to define many <curtin_userdata_amd64_generic_trusty_<hostname>> and similarly their respective interface_d files in /v/w/h/ directory. This approach looks very painful and big manual dependency.

So can we not try 2nd approach of lower IP addrs sorting way. Please help me understanding this concept of IP addrs sorting.

How come in your case 192.168.123.0/24 is lower sort order than 192.168.124.0/24?
and similarly in my case 10.10.11.0/24 is lower sort order than 10.9.1.0/24?

Once this concept get clarity then will make that thumb rule in our upcoming deployments and developments.

Revision history for this message
Mick Gregg (macgreagoir) wrote :

@rajesh-canonical

If IP addresses are sorted as strings, 10.1x.x.x will be alphabetically lower than 10.9.x.x (1 before 9).

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

@Mick

Got it!

One more clarification, related to workaround mentioned in #32, for lxd containers how to write <curtin_userdata_amd64_generic_trusty_<hostname>> and interface_d files? An example may help me.

As lxd containers provides hostnames randomly as shown below

http://paste.ubuntu.com/20987920/

Please suggest

Revision history for this message
Mick Gregg (macgreagoir) wrote :

@rajesh-canonical

I'm afraid I don't have a very good (or simple) work-around for the containers. I think we are seeing that the sort order of the addresses is causing you problems, so how happy are you (for now) to use different IP addresses ranges in your environment, with the required addresses for Juju in the lower-sorted range?

Even using `--bind managed` (for example, with the unmanaged subnet in its own space) I'm seeing an unamanged address as the machine address, and with bug 1604482 the containers can miss the address they need.

(We are collecting good data here for a bug around all of this.)

Revision history for this message
Mick Gregg (macgreagoir) wrote :

Just noting that I've talked to John and have access to the cplane system.

Revision history for this message
Mick Gregg (macgreagoir) wrote :

@rajesh-canonical

I want to summarise the results of my testing and update what I think the issues are, please.

I believe the main issue I'm seeing is the trouble-making /etc/network/interfaces.d/eth0.cfg left by cloud-init on both metal and containers. This can break networking, leaving units unable to communicate. For example, the default gateway and routes to nameserver can be broken. (This is an issue mentioned earlier in this bug thread.)

Juju using unexpected ('unmanaged') IP addresses as private-address for units (due to IP address sorting) may be exacerbating this problem (and be a confusion) but, on its own, may not necessarily break as long as the units can communicate to API endpoints on whichever subnets their private-addresses are on.

I have tested with a 'managed' and an 'unmanaged' space for my PXE/DHCP and statically assigned (in MAAS) subnets, respectively. The unmanaged subnet sorts lower (as per your environment) to have Juju use it in preference. Also like your environment, eth1 has the managed subnet, not eth0.

Juju ends up deploying metal machines with their unmanaged address as private-address, while containers get their managed addresses. Applications are deployed with `--bind managed` to have relations use the managed space's subnet, whatever the private-address of the units.

This configuration will (by default) fail with broken container networking, where default gateway and route to nameserver, as well as the unmanaged address, are missing as a result of eth0.cfg trying to use DHCP.

Nothing fails specifically as a result of the container private addresses and the metal private addresses being in different subnets, as both cases can still reach the api server. (The containers just don't have the static unmanaged address, which is removed by the eth0.cfg file replacing the config with dhcp).

To work-around networking, to prove that eth0.cfg may be the main issue, I have treated metal and containers differently. For metal, I have added a late_commands line to
/etc/maas/preseeds/curtin_userdata to remove the eth0.cfg file. (As we did per machine in http://pastebin.ubuntu.com/20468944/ line 10.) For containers, I added an install hook to a fork of the 'ubuntu' charm to remove the file and reload the interfaces. Then, for any applications to be deployed to containers, I first deployed an 'ubuntu' application unit `--to lxd:$some_machine`, before deploying an application --to a machine with an 'ubuntu' unit. This essentially uses a modified 'ubuntu' charm to fix networking before deploying the required application.

With the work-arounds to fix eth0 networking (and the --bind space used in application deployments) I was able to see successful relations added, despite addresses from different subnets used as units' private-address.

When Juju machines were deployed with PXE boot and DHCP on eth0, and the DHCP address range of a lower sort-order than the static range, this behaviour was hidden and I was able to deploy without the work-arounds.

I believe bug 1604482 (to remove eth0.cfg and reload interfaces) should resolve this bug, and we are working towards this fix.

Revision history for this message
Mick Gregg (macgreagoir) wrote :

I'm testing frobware's fixes for bug 1566801, which look promising for the container networking issues from eth0.cfg.

Revision history for this message
Mick Gregg (macgreagoir) wrote :

@rajesh-canonical

frobware has just landed his change, which I see resolve your issue in my own test env. It should be available in the Juju daily PPA soon.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I looked into this as requested today, but couldn't find additional information not already mentioned in the existing comments. Is this still an issue with latest 2.0 juju beta?

Revision history for this message
john (g-john-p) wrote :

@Dimiter, what is the latest 2.0 beta? As of beta 12, this was still an issue. We did not get notified that a fix was available. can you confirm ? are we at beta 13 yet?

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@John

Yes, we are at beta13 :) With the aim of beta14 at the end of this week.

Could you confirm if this is still an issue with beta13?

Revision history for this message
john (g-john-p) wrote :

@anastasia - we will try this out and get back to you.

Revision history for this message
Richard Harding (rharding) wrote :

actually, with the release of beta14 please use that. There were some improvements landed in beta14 that should apply.

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

@Anastaia

We checked with juju 2.0 beta14, with small applications like juju-gui and a lxd where results are seems to be positive. We are getting ips allocated from proper defined management space.

Still I need to deploy complete bundle to bigger extent to fully confirm it.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Rajesh

Thank you for feedback - very positive \o/

Let us know when you are confident that the issue is fixed :)

Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

@Anastaia

Today when I am trying to deploy complete bundle, then also issue remains as reported.

Even I reinstalled complete MAAS setup, but then also for metal nodes getting ips allocated from Maas-External pool.

The Maas space-lists and juju status portion is available at below link.

http://paste.ubuntu.com/22779786/

please suggest.

juju version using is:--

juju version
2.0-beta14-xenial-amd64

Changed in juju-core:
assignee: Mick Gregg (macgreagoir) → nobody
assignee: nobody → Dimiter Naydenov (dimitern)
Changed in juju-core:
milestone: none → 2.0.0
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

If the "unused" space should not be used, why its subnet 10.10.0.0/x is managed by MAAS?

Without knowing the exact setup of your MAAS I can't really advise you what needs fixing.
Please, provide more details, e.g.

maas profile subnets read
maas profile nodes list

Pasting the contents of /etc/network/interfaces on each machine and container will also help.

summary: - Relation fails as untis/machines are on different subnet on a multi NIC
+ Relation fails as units/machines are on different subnet on a multi NIC
setup, juju 2.0 beta11
Revision history for this message
john (g-john-p) wrote :

@Dimiter - please connect with Mick Gregg @macgreagoir to better understand how OpenStack works. It is beyond this bug report to described the topological network layout of OpenStack. Mick has the context on this bug.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I haven't asked you how OpenStack works, thank you, just a couple of things to paste from your setup. Anyway, I'll leave it up to Mick then.

Changed in juju-core:
assignee: Dimiter Naydenov (dimitern) → nobody
Revision history for this message
Richard Harding (rharding) wrote :

@john - I appreciate your feedback. Mick has been working hard on this bug with Andrew who's on holiday for a couple of weeks. We've not been able to provide a proven fix to this point. I've pulled in Dimiter, who has much more networking experience with Juju, and moved Mick onto other tasks.

Mick and Dimiter have synced up and Dimiter will be driving this through to a final solution. Please collaborate with Dimiter in order to help make that possible.

Changed in juju-core:
assignee: nobody → Dimiter Naydenov (dimitern)
Revision history for this message
john (g-john-p) wrote :

@Richard. Ok, thanks for the backround on this.

@Dimiter, please look at the attachement png which show the network diagram of our lab. We are trying to provide multiple nic interfaces on the server in support of openstack. OpenStack is a case where the MAAS server interfaces cannot be the only network supported on deployed nodes.

I will run the commands that you asked and provide them via pastebin.

Revision history for this message
john (g-john-p) wrote :

output from maas profile subnets http://paste.ubuntu.com/22818899/
output from maas profile nodes http://paste.ubuntu.com/22819085/

Revision history for this message
john (g-john-p) wrote :

@dimiter - I've provide access to my lab to Mick via vpn. So if you need to you can vnc or ssh into these servers directly to run some tests. let me know and I can provide the same access to you.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

@john, I've been trying to do and describe the steps for a clean OpenStack install with Juju 2.0 and MAAS 1.9 (on 4 KVM dual NIC nodes). Unfortunately, there were some issues mostly related to DNS reverse lookups not working properly, causing hook errors in nova-cloud-controller. Deploying with trusty-based charms is also problematic at times -

However, I managed to successfully and cleanly deploy OpenStack with multiple networks on MAAS 2.0. I'll the post details shortly.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I've used MAAS version 2.0.0 (rc4+bzr5187), deployed from ppa:maas/next inside a xenial LXD container with 2 NICs connected to libvirt bridges - one external with NAT and DHCP enabled, the other internal (for MAAS) without NAT and DHCP.

MAAS UI screenshots of:
- rack controller (interfaces and served VLANs): http://pasteboard.co/9n5Ymdrc4.png
- networks (fabrics, VLANs, subnets, spaces): http://pasteboard.co/9n6nU3ubI.png
- /etc/network/interfaces contents: http://paste.ubuntu.com/23061857/

There are 6 dual-NIC KVM nodes, but we'll be using only 4 of them - 1 for Neutron/Ceph, and 3 for Nova/Ceph. All nodes also have 2GB RAM and 21.5GB disk space. UI screenshots:
- all nodes summary: http://pasteboard.co/9n69tmwSU.png
- maas-20-node-0 (the network node): http://pasteboard.co/W6RPapEr.png
- maas-20-node-5 (the compute and bootstrap node; the other compute nodes are configured the same way, except for the different IPs): http://pasteboard.co/9n6SD8H2b.png

MAAS also has 3 zones - default (empty), zone1 (nodes 0, 1, and 2), zone2 (nodes 3, 4, 5).

All of the subnets, except 10.10.20.0/24 (external) and 10.99.20.0/24 (compute-external) have DHCP enabled from a dynamic range 10.X.20.10-10.X.20.99 (X being the VLAN ID or 20 for the PXE subnet), and have a static range 10.X.20.100-10.X.20.200. You can ignore 'demo-* subnets, spaces, and VLANs (they're not related to the OpenStack deployment I'm describing here).

To simplify the deployment steps, I'll be using the following short bash script, deploy-4-nodes-vmaas-20.sh: http://paste.ubuntu.com/23061880/

A slightly modified openstack-base bundle (original: https://jujucharms.com/openstack-base/) is deployed by the script (in bundle-3-nodes.yaml: http://paste.ubuntu.com/23061882/) with some minimal config (openstack-base-config.yaml: http://paste.ubuntu.com/23061885/)

It takes about 40-60 minutes to get OpenStack up and running on the KVMs with Juju 2.0-beta15.
Script output: http://paste.ubuntu.com/23061249/.

Juju status dumps:
 - at the beginning: http://paste.ubuntu.com/23061029/
 - midway, after all machines have started: http://paste.ubuntu.com/23061034/
 - at the end, once everything settles: http://paste.ubuntu.com/23061344/

As you can see the private/public addresses shown in status might look odd, but that's not a problem for the charms as they're using 'network-get <binding> --primary-address' internally (not 'unit-get private|public-address', except as fallback). The bundle contains "bindings" section for setting up the endpoint bindings to spaces, for each application.

@john, Please, have a look at the steps and setup. How different is that from what you're trying to setup? With a properly configured networks and nodes, it should be easy to replicate and modify it as needed.

affects: juju-core → juju
Changed in juju:
milestone: 2.0.0 → none
milestone: none → 2.0.0
Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

Hi Dimiter,

The above reported issus is still valid and unresolved with latest MAAS/JUJU versions.

http://pasteboard.co/10Lqts94j.jpg

http://paste.ubuntu.com/23154383/

cplane@juju:~/dvnd-juju$ juju list-spaces
spaces:
  external:
    10.10.11.0/24:
      type: ipv4
      provider-id: "3"
      status: in-use
      zones:
      - default
  managed:
    10.9.1.0/24:
      type: ipv4
      provider-id: "4"
      status: in-use
      zones:
      - default
  non-maas:
    192.168.122.0/24:
      type: ipv4
      provider-id: "6"
      status: in-use
      zones:
      - default
    192.168.220.0/24:
      type: ipv4
      provider-id: "8"
      status: in-use
      zones:
      - default

Changed in juju:
milestone: 2.0.0 → 2.0-rc1
Revision history for this message
Rajesh Kumar Chaturvedi (rajesh-canonical) wrote :

Hi Dimiter/Narinder,

The various configurations and settings at my lab are as mentioned below:--

Maas-nodes:

http://pasteboard.co/dKtR3i3R.jpg

Networks-fabrics-spaces:

http://pasteboard.co/2dw5ifrff.jpg

Node's-network:

http://pasteboard.co/2dx1tPnOy.jpg

Managed-Vlan:

http://pasteboard.co/1nsxt6YE.jpg

Unmanaged-Vlan:

http://pasteboard.co/2dQJ7fxCM.jpg

/etc/network/interfaces contents:

http://paste.ubuntu.com/23169443/

Deployed Bundle using series trusty:--

http://paste.ubuntu.com/23169424/

JUJU status dump:--

http://paste.ubuntu.com/23169329/

Changed in juju:
status: Incomplete → Triaged
Changed in juju:
assignee: Dimiter Naydenov (dimitern) → nobody
Changed in juju:
milestone: 2.0-rc1 → 2.0-rc2
Changed in juju:
milestone: 2.0-rc2 → 2.0.0
tags: added: teamb
tags: added: rteam
removed: teamb
Changed in juju:
assignee: nobody → Richard Harding (rharding)
milestone: 2.0.0 → 2.1.0
milestone: 2.1.0 → 2.0.0
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-rc3 → 2.0.0
Changed in juju:
milestone: 2.0.0 → 2.0.1
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0.1 → none
Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

Was this fixed in 2.0.1 or is this moved up the timeline?

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Sandor,
We mark fixed issues as "fix committed" or "fix released". This one has not yet been fixed and with 2.0.1 out of the door, this issue will be in the next available release.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Marking as Won't Fix for 2.0.x as no further 2.0.x releases are planned.

Changed in juju:
milestone: none → 2.1.0
assignee: Richard Harding (rharding) → nobody
Revision history for this message
John A Meinel (jameinel) wrote :

The bundle described in https://bugs.launchpad.net/juju/+bug/1603473/comments/61 doesn't have any "bindings" sections, which means you are not informing Juju of which space you want the charm to use, and thus we end up guessing.

Things like:
  ceph-radosgw:
    annotations:
      gui-x: '1000'
      gui-y: '250'
    charm: cs:trusty/ceph-radosgw-19
    num_units: 1
    options:
      source: cloud:trusty-liberty
      use-embedded-webserver: true
    to:
    - lxc:0

Could be updated to look like:
  ceph-radosgw:
    annotations:
      gui-x: '1000'
      gui-y: '250'
    charm: cs:trusty/ceph-radosgw-19
    num_units: 1
    options:
      source: cloud:trusty-liberty
      use-embedded-webserver: true
    to:
    - lxc:0
    bindings:
      "": maas-management

The "" binding is a "default for all otherwise unnamed endpoints". If you want one part of an application to use a different subnet, then you can list it explicitly. Eg:
  ceph-radosgw:
    annotations:
      gui-x: '1000'
      gui-y: '250'
    charm: cs:trusty/ceph-radosgw-19
    num_units: 1
    options:
      source: cloud:trusty-liberty
      use-embedded-webserver: true
    to:
    - lxc:0
    bindings:
      "": maas-management
      public: maas-external
      admin: maas-management
# not necessary as it is covered in "", but maybe explicit is good etc

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Based on comment # 65, I am marking this bug as Invalid as Juju behaves as expected.

Changed in juju:
status: Triaged → Invalid
milestone: 2.1.0 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.