cmr not using space binding info correctly in network-get

Bug #1848392 reported by Ian Booth
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Joseph Phillips

Bug Description

As per

https://discourse.jujucharms.com/t/incorrect-binding-used-for-cmr/2232

it seems network-get needs some work to correctly use endpoint bindings when run in the context of a cross model relation.

Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
Revision history for this message
Edward Hope-Morley (hopem) wrote :

The version of Juju used to reproduce the problem linked from the GH PR in discourse was 2.6.9 (for both controller and model).

Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Another potentially relevant piece of information is that Juju is not letting me use --via when relating to the remote model's offer:

ubuntu@bionic-212844:~$ juju status| grep -A 2 SAAS
SAAS Status Store URL
vault-certificates active ostctrl admin/ost.vault-certificates
vault-secrets active ostctrl admin/ost.vault-secrets
ubuntu@bionic-212844:~$ juju add-relation --via 10.100.0.0/24 kubernetes-master:vault-kv vault-secrets
ERROR the --via option can only be used when relating to offers in a different model

Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Joseph Phillips (manadart) wrote :

The info pertinent to this bug is strafed across several locations, so I'll attempt to bring them together here.

The issue was raised on Discourse here:
https://discourse.jujucharms.com/t/incorrect-binding-used-for-cmr/2232

It came up during testing of this patch:
https://github.com/openstack-charmers/charm-interface-vault-kv/pull/6

There is a fundamental issue here around cross-model relations in that when a relation is established, the only guarantee is that the controllers can communicate. There is no explicit contract or recording of data that declares a network route between the two *units*.

The old version of the charm logic works in some cases. I think this is when:
- There is only one subnet in play on either end and those subnets happen to be routable to each other.
- The relation endpoint is bound to a space with a subnet(s) that are routable.
- The relation endpoint is unbound and the unit's public address happens to be in a routable subnet.

I believe the new charm logic (using network_get) would work under the same circumstances.

It fails here, because the secrets endpoint is not bound to a space. When calling network_get in the context of a relation, the question being asked is "as a unit, what do I know about my networking in this context?". When the relation endpoint is not bound to a space, the unit's default public address is returned. In this case it is the local cloud address that the other end knows nothing about.

The Juju team has discussed some options that might help here like adding parameters to the offer/consume commands such as an explicit space that would be used for communication, but at present they are just ideas.

Options to make this work with the current Juju primitives are:

1) Bind the "secrets" endpoint to the "external" space, so that there is an explicit statement that the external subnet is the one used in the relation. This is undesirable when Vault is servicing both the under and over clouds, as it means the under-cloud uses the unit's virtual IP.

2) Modify the charm with another relation endpoint for use by external consumers, and bind each of the endpoints to the appropriate space.

I'm sure this with need further discussion, so if I have anything wrong in my assumptions, or if any of this is unclear, please contact me via any of the usual channels.

Changed in juju:
milestone: 2.7-beta1 → 2.7-rc1
Revision history for this message
Cory Johns (johnsca) wrote :

> I believe the new charm logic (using network_get) would work under the same circumstances.

I think there is still one edge case where the network_get logic would succeed while the existing logic would fail. Consider Vault on an instance with two networks, E and L, where E is external and bound to the "secrets" (or "secrets-external" if that's added). Then, the operator wants to make a CMR to K8s on network P which is an entirely different subnet from E but is routable to it (and P is not routable to L). In this case, the address from K8s over the relation will not match against either E or L since it is a address in P, whereas, since the "secrets" endpoint is explicitly bound to E, then network_get will return the right thing: the ingress-address in E. This was the case I was trying to cover in that PR, but I incorrectly assumed that Juju would be able to magically figure out which of E or L to return from network_get without an explicit binding where it obviously can't because of the rest of your comment.

Since the CMR doesn't imply any guarantee of connectivity between the units, I wonder if it would be reasonable to have the unit agents on either end of the relation perform some sort of "connectivity sanity check"? If we did that as part of the process of establishing the relation, then it could be rejected with a useful message to the operator, but I'm not sure that fits in with Juju's current process for establishing a relation. OTOH, if we had relation status, then Juju could report connectivity issues there, similar to reporting provisioning failures for machines.

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

I think your situation is part of what we need to figure out for the future spec. To make sure that it's clear to Juju what should be happening when you CMR the E and P applications we should be setup such that either the offer of the CMR is aware of P and E? or That the consume side presents information such that Juju can understand that E and P are routable, etc.

The couple of issues is that a relation doesn't necessitate a network traffic setup and the network understandings should be in place when an application is deployed so that we make sure the machine is setup with the interfaces it needs. This is a weakness of the "offer an endpoint on a space" because it would be easy to not have a machine setup able to do that since offering in CMR comes later in the process than a deploy.

If P and E are routable, and share ingress/egress are they actually the same space in Juju's world? Should we allow the operator to add P to the spaces definition even though none of the applications in the model are leveraging that subnet? Does that give us the connective tissue we need in your proposed scenario?

Revision history for this message
Richard Harding (rharding) wrote :

Sorry, that was me above. The cost of spending the day logged in as a bot doing releasy-stuff.

Revision history for this message
Richard Harding (rharding) wrote :

Removing the milestone as this is something we're actively designing/improving but is going to be part of a bigger picture for future releases.

Changed in juju:
milestone: 2.7-rc1 → none
assignee: Joseph Phillips (manadart) → nobody
Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
Pen Gale (pengale)
Changed in juju:
milestone: none → 3.0.0
Changed in juju:
milestone: 3.0.0 → 3.0.1
Changed in juju:
milestone: 3.0.1 → 3.0.2
Changed in juju:
milestone: 3.0.2 → 3.0.3
Changed in juju:
milestone: 3.0.3 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.