"spaces" are not comprehensible

Bug #1845392 reported by Markus Kienast
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Undecided
Joseph Phillips

Bug Description

Without reading the test scenarios in bridgepolicy_test.go it is practically impossible to understand "spaces". The online docs concerning spaces are completely inconclusive and insufficient.

And still, after reading bridgepolicy_test.go some of your decisions evade me, but at least I know now, that I can stop expecting certain scenarios to work.

I am trying to get this very simple example here to work for days now.
I have been using MAAS 2.4.2 and 2.6.0 and Juju 2.6.8 and 2.6.9 in various combinations.

Apparently I want two networks on my MAAS hosts as well as in the lxd containers. The hosts run ceph-osd, while the ceph-mon units and ceph-fs run in LXD containers on these hosts.

MAAS VLANs, subnets, spaces are configured correctly (well apparently except for the spaces), I have access to the necessary spaces on the host machines and everything works well, if I do not resort to LXD for the ceph-mon units.

But if I try, I always end up with this error:

no obvious space for container "0/lxd/0", host machine has spaces: storage, storagefront, ...

Test scenarios producing the same error are:

func (s *bridgePolicyStateSuite) TestPopulateContainerLinkLayerDevicesNoValidSpace(c *gc.C) {
 // The host machine will be in 2 spaces, but neither one is 'default',
 // thus we are unable to find a valid space to put the container in.

func (s *bridgePolicyStateSuite) TestPopulateContainerLinkLayerDevicesNoDefaultNoConstraints(c *gc.C) {
 // The host machine will be in 2 spaces, but neither one is 'default',
 // thus we are unable to find a valid space to put the container in.

Looks like I have to go for this scenario to make things work:

func (s *bridgePolicyStateSuite) TestPopulateContainerLinkLayerDevicesTwoBridgesNoSpaces(c *gc.C) {
 // The host machine has 2 network devices and 2 bridges, but none of them
 // are in a known space. The container also has no requested space.
 // In that case, we will use all of the unknown bridges for container
 // devices.

or this

func (s *bridgePolicyStateSuite) TestFindMissingBridgesForContainerNoSpaces(c *gc.C) {
 // There is a "default" and "dmz" space, and our machine has 2 network
 // interfaces, but is not part of any known space. In this circumstance,
 // we should try to bridge all of the unknown space devices, not just one
 // of them. This is are fallback mode when we don't understand the spaces of a machine.

Judging from the test scenarios, it sounds like it is not allowed to request two spaces for a container, which does not appear to make much sense from my point of view. Consider the bundle below, which I had been biting my teeth on for the last three days.

This is a rather straight forward real life example.

Ceph is best deployed with a dedicated cluster network (space storage) for background data operations and a dedicated network for client requests (space soragefront).

As I want my ceph-mon nodes to reside on the ceph-osd hosts, I got to go with LXD as the two do not mix well (they overwrite each others configs if you put them on the same host directly).

applications:
  ceph-mon:
    charm: 'cs:~openstack-charmers-next/ceph-mon-395'
    num_units: 3
    options:
      expected-osd-count: 8
      ceph-cluster-network: 10.0.100.0/24
      ceph-public-network: 10.0.101.0/24
      pg-autotune: 'true'
    series: bionic
    annotations:
      gui-x: '750'
      gui-y: '500'
    to:
      - 'lxd:0'
      - 'lxd:1'
      - 'lxd:2'
    bindings:
        public: storagefront
        cluster: storage
  ceph-osd:
    charm: 'cs:~openstack-charmers-next/ceph-osd-420'
    num_units: 3
    options:
      autotune: true
      bluestore: true
      bluestore-block-db-size: '90000000000'
      bluestore-db: /dev/nvme0n1
      ceph-cluster-network: 10.0.100.0/24
      ceph-public-network: 10.0.101.0/24
      osd-devices: /dev/sdb /dev/sdc
    series: bionic
    annotations:
      gui-x: '1000'
      gui-y: '500'
    to:
      - '0'
      - '1'
      - '2'
      - '3'
    bindings:
        public: storagefront
        cluster: storage
  ceph-fs:
    charm: 'cs:~openstack-charmers-next/ceph-fs-56'
    num_units: 1
    options:
      ceph-public-network: 10.0.101.0/24
    series: bionic
    annotations:
      gui-x: '488'
      gui-y: '511'
    to:
      - 'lxd:3'
    bindings:
        public: storagefront
  ntp:
    charm: 'cs:ntp-35'
    series: bionic
    annotations:
      gui-x: '1000'
      gui-y: '0'
relations:
  - - 'ceph-osd:mon'
    - 'ceph-mon:osd'
  - - 'ntp:juju-info'
    - 'ceph-osd:juju-info'
  - - 'ceph-mon:mds'
    - 'ceph-fs:ceph-mds'
machines:
  '0':
    series: bionic
    constraints: tags=storage
  '1':
    series: bionic
    constraints: tags=storage
  '2':
    series: bionic
    constraints: tags=storage
  '3':
    series: bionic
    constraints: tags=storage

I'll try again, getting rid of all space assignments beforehand. According to the test scenarios, I should be lucky then.

Reading this in bridgepolicy.go sounds like a container can only have ONE single space.

// inferContainerSpaces tries to find a valid space for the container to be
// on. This should only be used when the container itself doesn't have any
// valid constraints on what spaces it should be in.
// If ContainerNetworkingMethod is 'local' we fall back to "" and use lxdbr0.
// If this machine is in a single space, then that space is used. Else, if
// the machine has the default space, then that space is used.
// If neither of those conditions is true, then we return an error.

As said above it evades me, why a single space for a container should be a requirement? Isn't it a much more real life scenario, that I want to attach different containers to different sets of networks (sets as in multiple)?

Be it as it may, the documentations does not mention any such requirement nor does it explain, what the "space" construct is intended to be used for. At first sight it seems redundant, as it appears to be nothing more than a naming scheme to allow charms to request being placed in certain subnets without having to type numbers.

Or it might be a way for grouping such subnets into a "space" - ergo into networking scenarios. But when I am not allowed to assign certain subnets to more than one space it seems to make the "space" construct quite restricted as well and unfit for most thinkable scenarios.

So what is it's intended purpose?

Judging from the tests, it seems to be only for partitioning the network into domains of ownership, so department A does not interfere with department B.

https://jaas.ai/docs/spaces does not say that:

"From a security standpoint, with spaces, the Juju environment network topology can be organised in a way such that applications possess only the network connectivity they require."

And https://jaas.ai/docs/spaces#heading--use-case sounds a lot like I need access to multiple spaces if I want to connect the HAProxy in the "dmz" to the "cms" space ...

So assigning to spaces to one container should be fine, shouldn't it?

I am offering to update the Juju documentation with a comprehensive explanation if anybody can help me understand it's intended purpose.

From what I know so far, I will probably only have one single space for the time being.

My best regards,
Markus

Changed in juju:
status: New → Triaged
assignee: nobody → Joseph Phillips (manadart)
Revision history for this message
Joseph Phillips (manadart) wrote :

Hi Markus,

Thanks for the thorough report.

We're working on Juju networking and spaces right now. The documentation should get some love in the process, but to summarise; a space is intended to be a collection of subnets with common ingress and egress rules.

There is no mandate that a container has access just one space. We should be ensuring that the host has a bridge for every space that the container requires access to based on constraints and endpoint bindings, those then being parents of NIC devices in the container.

The unique space requirement only falls out when we fail to make a determination in the first case and fall through to choosing one from those available. If there a more than one we error.

I am looking into this now, because based on the endpoint bindings in your bundle, the spaces required by the container should be correctly determined without falling back to inference. There might be a race between the bundle processing and the provisioner. I'll try to dig it up.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

A default binding needs to be specified to make it work:

    bindings:
        "": oam # <--- this is a default binding to "oam" space
        public: storagefront
        cluster: storage

"oam" would then be used for any endpoint of the ceph-mon charm not explicitly bound to a space (endpoints are entries in metadata.yaml in provides, requires, peers and extra-bindings sections).

There are also spaces for Juju controller HA and Juju agent -> controller communication. If you do not have a default binding and only use 'public' and 'cluster' endpoints juju agents won't be able to communicate with your controller.

https://jaas.ai/docs/configuring-controllers
juju-ha-space
juju-mgmt-space

Spaces resemble L3 networks but in MAAS they are built out of L2 segments (VLANs) with their associated subnets. Those subnets add up to an address space for the L3 network. The routing constraints for hosts in a space are that they must be able to communicate with each other which may require additional routing configuration on end hosts and network equipment (e.g. when hosts are multi-homed, see LP: #1737428).

In the simplest case a space has just one VLAN so if you attach hosts to the same VLAN no external routing configuration or static routes are needed because directly connected routes and ARP broadcasts give you direct reachability.

Going back to your original question, it is perfectly possible to use multi-homed (multi-interface) containers by using multiple endpoint bindings. We use them all the time with a default binding specified.

Revision history for this message
Markus Kienast (elias1884) wrote :

Hi Joseph,

many thanks for your immediate response, rest assured, it is greatly appreciated.

I concur, it must be some kind of race condition.

I have also tried the scenario now, in which I have no spaces at all defined in MAAS and therefore also no bindings in my bundle. According to one particular scenario in bridgepolicy_test.go, this should have resulted in bridges being created for all networks present on the host, creating veth interfaces in the LXD containers accordingly and connecting each to its respective bridge - but it did not work as described.

I ended up with just one bridge and veth created for the first interface (subnet 10.99.0.0/24).

The only scenario, which I got to work was, if I created all the necessary bridges beforehand in MAAS with their specific subnets assigned but no spaces defined or assigned. This way I finally got my LXD network interfaces created and everything working as expected.

For your convenience I have attached two screenshots showing the network configuration on my four hosts as well as the subnets/spaces overview page. The two bridges br100 and br101 each have one 10GbE interface as their only member (and the veth from the container thereafter).

I am happy to give any scenario you wish a try and submit the results. But I pretty much tried everything already and this is the only scenario, which worked.

E.g. I tried the same scenario but with spaces assigned and bound to the charms -> no success. I don't remember the exact result but for all scenarios expect the successful one there were just two possible outcomes:

1. The mentioned complaint about "no obvious space ..."
2. A bridge created only for one interface.

One other thing I should probably mention: I have no Pods defined for this, no "virtualization hosts" or anything, just plain machines. The "to: 0/lxd/0" instruction is the only instruction for juju to put these specific units in lxd containers.

I hope my bug report helps.

My best regards,
Markus

Revision history for this message
Markus Kienast (elias1884) wrote :

Hi Dmitrii,

many thanks for your explanation.

I stumbled over the default space notation and had also given it a try with the same results unfortunately. It does not make a difference, I still got the "no obvious space ..." error.

The race condition Joseph mentioned seems to also apply in this case, unfortunately.

I also presumed juju controller would need to be able to communicate with the hosts, which brings me to another bug (got to file it separately). In order to get juju controller to successfully bootstrap, I had to "remove" all "disconnected" interfaces. Bootstrapping would fail otherwise with no real explanation in the logs.

My best regards,
Markus

Revision history for this message
Markus Kienast (elias1884) wrote :
Revision history for this message
Markus Kienast (elias1884) wrote :
Changed in juju:
status: Triaged → In Progress
Revision history for this message
Joseph Phillips (manadart) wrote :

This patch just landed. It should nullify issues with bridging host NICs for containers if they would occur due to a race between unit document creation and machine provisioning.

https://github.com/juju/juju/pull/10697

Should be available in the edge Snap shortly.

Changed in juju:
milestone: none → 2.7-beta1
Changed in juju:
milestone: 2.7-beta1 → 2.7-rc1
Revision history for this message
Richard Harding (rharding) wrote :

Pulling the milestone on this because there's more to the story than the patches but it's a WIP for sure.

Changed in juju:
milestone: 2.7-rc1 → none
Changed in juju:
status: In Progress → Fix Committed
Ian Booth (wallyworld)
tags: added: maas-provider
removed: maas
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.