I understand this is a strange network config, but it is a common one for baremetals in public clouds (in this case, equinix, which is a valid target for someone to install sunbeam on).
The real goal is to completely ignore the bond0 interface with both private and public with small netmasks and create other real layer 2 networks which exist across all nodes.
There is a detailed installation description on that other bug (mentioned above), but here is the interesting details for a real case:
Machine 1: hostname "sunbeam11.mydomain" which resolves to "10.0.1.11", default-route via public-ip
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 139.178.81.209/31 brd 255.255.255.255 scope global bond0
inet 10.65.5.133/31 brd 255.255.255.255 scope global bond0:0
6: bond0.1002@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 10.0.1.11/24 brd 10.0.1.255 scope global bond0.1002
7: bond0.1003@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
<no-ips, to be used with OVN>
Machine 2: hostname "sunbeam12.mydomain" which resolves to "10.0.1.12", default-route via public-ip
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 147.28.187.65/31 brd 255.255.255.255 scope global bond0
inet 10.65.5.135/31 brd 255.255.255.255 scope global bond0:0
6: bond0.1002@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 10.0.1.12/24 brd 10.0.1.255 scope global bond0.1002
7: bond0.1003@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
<no-ips, to be used with OVN>
On the manifest file, I state that I want the "10.0.1.0/24" network for the sunbeam management network:
And yet, when adding a second node, the token generated contains ips from other interfaces that I'm trying to ignore. At first node (which is already deployed successfully):
I expected to see an ip from the network 10.0.1.0/24 in the "join_addresses".
But then, this is initially used to form the sunbeam cluster. Later, when adding MicroK8s to this machine, the same thing will happen again, a token will be generated to form the microk8s cluster, and in that case, I also noticed that it suggests you use any of the available ips, and in seemingly random order. We should make it prefer the one for the management network, IMHO.
And finally, this can also be affecting microceph (and this is where maybe this issue is related to the other one pointed above where microceph times out joining it's cluster).
The work for implementing "spaces" for juju might help here because it could force all juju related operations to go through a specific network, where it is defined as the management one, but I'm not sure it will automatically solve the issue for other non-juju parts (like the sunbeam-clustering).
I understand this is a strange network config, but it is a common one for baremetals in public clouds (in this case, equinix, which is a valid target for someone to install sunbeam on).
The real goal is to completely ignore the bond0 interface with both private and public with small netmasks and create other real layer 2 networks which exist across all nodes.
There is a detailed installation description on that other bug (mentioned above), but here is the interesting details for a real case:
in /etc/hosts of ALL nodes:
10.0.1.11 sunbeam11.mydomain sunbeam11
10.0.1.12 sunbeam12.mydomain sunbeam12
10.0.1.13 sunbeam13.mydomain sunbeam13
Machine 1: hostname "sunbeam11. mydomain" which resolves to "10.0.1.11", default-route via public-ip
5: bond0: <BROADCAST, MULTICAST, MASTER, UP,LOWER_ UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 MULTICAST, UP,LOWER_ UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 MULTICAST, UP,LOWER_ UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 139.178.81.209/31 brd 255.255.255.255 scope global bond0
inet 10.65.5.133/31 brd 255.255.255.255 scope global bond0:0
6: bond0.1002@bond0: <BROADCAST,
inet 10.0.1.11/24 brd 10.0.1.255 scope global bond0.1002
7: bond0.1003@bond0: <BROADCAST,
<no-ips, to be used with OVN>
Machine 2: hostname "sunbeam12. mydomain" which resolves to "10.0.1.12", default-route via public-ip
5: bond0: <BROADCAST, MULTICAST, MASTER, UP,LOWER_ UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 MULTICAST, UP,LOWER_ UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 MULTICAST, UP,LOWER_ UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 147.28.187.65/31 brd 255.255.255.255 scope global bond0
inet 10.65.5.135/31 brd 255.255.255.255 scope global bond0:0
6: bond0.1002@bond0: <BROADCAST,
inet 10.0.1.12/24 brd 10.0.1.255 scope global bond0.1002
7: bond0.1003@bond0: <BROADCAST,
<no-ips, to be used with OVN>
On the manifest file, I state that I want the "10.0.1.0/24" network for the sunbeam management network:
deployment: cidr: 10.0.1.0/24
bootstrap:
management_
And yet, when adding a second node, the token generated contains ips from other interfaces that I'm trying to ignore. At first node (which is already deployed successfully):
sunbeam cluster add --format yaml --name sunbeam12.mydomain uYmVhbTEyLm15ZG 9tYWluIiwic2Vjc mV0IjoiZGZlNGMy MTgxZTIyM2QzMWQ 1NDBlZGRhMWJmYj FhODQ3YzJkNTU0Y 2YyYzc4NjVkODJi MDQzNWYwYTAxMzc zNiIsImZpbmdlcn ByaW50IjoiNjJiO TVhOThlNjE2OTc5 OTAxNmRlYzdiNzJ lZjUxYjA4MDEzYT E1NjYzNDVjNWQ1M zczZDM5M2YwMmY5 NjM4YyIsImpvaW5 fYWRkcmVzc2VzIj pbIjEzOS4xNzguO DEuMjA5OjcwMDAi XX0= :"sunbeam12. mydomain" ,"secret" :"dfe4c2181e223 d31d540edda1bfb 1a847c2d554cf2c 7865d82b0435f0a 013736" ,"fingerprint" :"62b95a98e6169 799016dec7b72ef 51b08013a156634 5c5d5373d393f02 f9638c" ,"join_ addresses" :["139. 178.81. 209:7000" ]}'
17:50:54 DEBUG: Got token: eyJuYW1lIjoic3V
17:50:54 DEBUG: Decoded token: b'{"name"
I expected to see an ip from the network 10.0.1.0/24 in the "join_addresses".
But then, this is initially used to form the sunbeam cluster. Later, when adding MicroK8s to this machine, the same thing will happen again, a token will be generated to form the microk8s cluster, and in that case, I also noticed that it suggests you use any of the available ips, and in seemingly random order. We should make it prefer the one for the management network, IMHO.
And finally, this can also be affecting microceph (and this is where maybe this issue is related to the other one pointed above where microceph times out joining it's cluster).
The work for implementing "spaces" for juju might help here because it could force all juju related operations to go through a specific network, where it is defined as the management one, but I'm not sure it will automatically solve the issue for other non-juju parts (like the sunbeam- clustering) .