improve reporting of IP address starvation in a subnet (or multiple subnets in a space)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
Medium
|
Witold Krecicki |
Bug Description
I don't think this is a unique per-charm problem so I think there could be better reporting on this.
When IP address starvation happens in a subnet (or space with multiple subnets) one will just get a cryptic error about the lack of network config for a binding.
root@juju-
ERROR no network config found for binding "public"
This can be debugged by looking at machine agent's logs, however, it is still hard to tell why a network config is empty for that specific interface.
root@juju-
2017-10-20 14:32:59 DEBUG juju.worker.
2017-10-20 14:33:02 DEBUG juju.worker.
root@juju-
auto lo eth1 eth0
iface lo inet loopback
dns-nameservers 10.30.20.21
iface eth0 inet static
address 10.30.21.250/22
gateway 10.30.20.21
iface eth1 inet manual
Tracing this to the host it can be seen that the container's veth interface is properly plugged into a host bridge so bridger worked and there is an IP address on that bridge interface. This means (and I confirmed by looking at MAAS) that there is a proper subnet configuration for that interface. The problem is revealed by looking at subnet address allocations - there were no more addresses to allocate for that container.
65: eth1@if66: <BROADCAST,
link/ether 00:16:3e:95:51:7c brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::216:
valid_lft forever preferred_lft forever
root@juju-
...
63: eth0@if64: <BROADCAST,
link/ether 00:16:3e:f6:2b:8c brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.30.21.250/22 brd 10.30.23.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::216:
valid_lft forever preferred_lft forever
65: eth1@if66: <BROADCAST,
link/ether 00:16:3e:95:51:7c brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::216:
valid_lft forever preferred_lft forever
nova003:~$ ip a s | grep if65
66: veth2RWQGO@if65: <BROADCAST,
ubuntu@nova003:~$ brctl show | grep veth2RWQGO
veth2RWQGO
nova003:~$ ip -4 -o a s br-bond0.201
40: br-bond0.201 inet 103.77.105.137/26 brd 103.77.105.191 scope global br-bond0.201\ valid_lft forever preferred_lft forever
Changed in juju: | |
milestone: | none → 2.3.0 |
status: | New → Triaged |
importance: | Undecided → High |
Changed in juju: | |
importance: | High → Medium |
milestone: | 2.3.0 → 2.3-rc1 |
Changed in juju: | |
assignee: | nobody → Witold Krecicki (wpk) |
Changed in juju: | |
status: | Triaged → In Progress |
Changed in juju: | |
status: | In Progress → Fix Committed |
Changed in juju: | |
status: | Fix Committed → Fix Released |
https:/ /github. com/juju/ juju/pull/ 8084
So do I understand the patch correctly, that it is essentially just that our retry logic is interacting poorly. So we create a device, and get a failure to give it an IP address, but then when we retry provisioning we see the device already exists, and assume that it was set up correctly.
A different fix could also be to see that the device exists and validate it more thoroughly. But removing the device seems ok. But it does seem like if you had a network break and that is what caused provisioning to fail, then you would be in the same position. (You can't finish setting up the device *nor* delete it because you lost connectivity.)