Comment 3 for bug 2056218

Revision history for this message
Andre Ruiz (andre-ruiz) wrote (last edit ):

I understand this is a strange network config, but it is a common one for baremetals in public clouds (in this case, equinix, which is a valid target for someone to install sunbeam on).

The real goal is to completely ignore the bond0 interface with both private and public with small netmasks and create other real layer 2 networks which exist across all nodes.

There is a detailed installation description on that other bug (mentioned above), but here is the interesting details for a real case:

in /etc/hosts of ALL nodes:

10.0.1.11 sunbeam11.mydomain sunbeam11
10.0.1.12 sunbeam12.mydomain sunbeam12
10.0.1.13 sunbeam13.mydomain sunbeam13

Machine 1: hostname "sunbeam11.mydomain" which resolves to "10.0.1.11", default-route via public-ip

5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 139.178.81.209/31 brd 255.255.255.255 scope global bond0
    inet 10.65.5.133/31 brd 255.255.255.255 scope global bond0:0
6: bond0.1002@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 10.0.1.11/24 brd 10.0.1.255 scope global bond0.1002
7: bond0.1003@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    <no-ips, to be used with OVN>

Machine 2: hostname "sunbeam12.mydomain" which resolves to "10.0.1.12", default-route via public-ip

5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 147.28.187.65/31 brd 255.255.255.255 scope global bond0
    inet 10.65.5.135/31 brd 255.255.255.255 scope global bond0:0
6: bond0.1002@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 10.0.1.12/24 brd 10.0.1.255 scope global bond0.1002
7: bond0.1003@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    <no-ips, to be used with OVN>

On the manifest file, I state that I want the "10.0.1.0/24" network for the sunbeam management network:

deployment:
  bootstrap:
    management_cidr: 10.0.1.0/24

And yet, when adding a second node, the token generated contains ips from other interfaces that I'm trying to ignore. At first node (which is already deployed successfully):

sunbeam cluster add --format yaml --name sunbeam12.mydomain
17:50:54 DEBUG: Got token: eyJuYW1lIjoic3VuYmVhbTEyLm15ZG9tYWluIiwic2VjcmV0IjoiZGZlNGMyMTgxZTIyM2QzMWQ1NDBlZGRhMWJmYjFhODQ3YzJkNTU0Y2YyYzc4NjVkODJiMDQzNWYwYTAxMzczNiIsImZpbmdlcnByaW50IjoiNjJiOTVhOThlNjE2OTc5OTAxNmRlYzdiNzJlZjUxYjA4MDEzYTE1NjYzNDVjNWQ1MzczZDM5M2YwMmY5NjM4YyIsImpvaW5fYWRkcmVzc2VzIjpbIjEzOS4xNzguODEuMjA5OjcwMDAiXX0=
17:50:54 DEBUG: Decoded token: b'{"name":"sunbeam12.mydomain","secret":"dfe4c2181e223d31d540edda1bfb1a847c2d554cf2c7865d82b0435f0a013736","fingerprint":"62b95a98e6169799016dec7b72ef51b08013a1566345c5d5373d393f02f9638c","join_addresses":["139.178.81.209:7000"]}'

I expected to see an ip from the network 10.0.1.0/24 in the "join_addresses".

But then, this is initially used to form the sunbeam cluster. Later, when adding MicroK8s to this machine, the same thing will happen again, a token will be generated to form the microk8s cluster, and in that case, I also noticed that it suggests you use any of the available ips, and in seemingly random order. We should make it prefer the one for the management network, IMHO.

And finally, this can also be affecting microceph (and this is where maybe this issue is related to the other one pointed above where microceph times out joining it's cluster).

The work for implementing "spaces" for juju might help here because it could force all juju related operations to go through a specific network, where it is defined as the management one, but I'm not sure it will automatically solve the issue for other non-juju parts (like the sunbeam-clustering).