container/lxd/initialisation_linux.go findNextAvailableIPv4Subnet should be careful about subnet allocation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Horacio Durán |
Bug Description
This function aims at finding an available subnet for the lxd bridge that Juju will use on each machine to spawn LXD containers.
First of all, it assumes it can use 10.0.0.0/16, so if we actually are on that network, it may encounter an issue. A fix strategy would be to look at where the current machine stands, and select something that is not in that range
* if 10.0.0.0/16, move LXD to a 172.16.0.0/20
* If 172.16.0.0 move to 192.168.0.0/24
In any case, select something smaller than the initial network the machine is standing on to minimize collision risk.
Then once the subnet is selected, there is no test that it is not used by the host in some manner. It is actually not very simple to assess the subnet is OK. A couple of ideas:
* In doubt, ask the user for a proper subnet
* probe the selected subnet before picking it up (doing 1, 254, 2, 253... instead of 1, 2, 3... in order to maximize catching the gateway (usually the first or the last IP of the subnet)
# To get into a corner case that fails the deployment due to this issue:
1. Setup a VPC with CIDR 10.0.0.0/16 on AWS or any other cloud with 3 subnets (one for each AZ of the region) that have consecutive CIDRs (ex: 10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24)
2. Deploy a workload such as k8s core that uses LXD to store one of the charms
3. Scale out so that some consumers of that container go into a second subnet
in k8s core, at this stage, the master goes into the first subnet 1.0.1.0/24. It gets a LXD container for easyrsa that gets a 10.0.2.0/24 network (next available subnet).
At this point, the master cannot talk to the 10.0.2.0/24 'real' subnet anymore as lxdbr0 takes precendence.
Any node in this subnet will fail.
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → High |
milestone: | none → 2.2.0 |
Changed in juju: | |
assignee: | nobody → Horacio Durán (hduran-8) |
status: | Triaged → In Progress |
Changed in juju: | |
status: | Fix Committed → Fix Released |
It's true the existing subnet code is short sighted. But the intent has always been to move away from containers behind NAT, and I'd rather spend the effort to get containers into the provider subnet instead of a slightly- better- guessing- game.
There are two ways we're looking to do next steps.
1) containers in the host/provider subnet where the substrate allows it
2) containers on a fan bridge with the fan configured in all machines in the model. So everything within the model is able to talk to all the containers as if they were part of the host network.
If we're going to be putting effort into making it better I think we're better off moving toward one of those ends.