Canonical Juju

improve reporting of IP address starvation in a subnet (or multiple subnets in a space)

Bug #1725356 reported by Dmitrii Shcherbakov on 2017-10-20

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	Medium	Witold Krecicki	Canonical Juju 2.3-rc1

Bug Description

I don't think this is a unique per-charm problem so I think there could be better reporting on this.

When IP address starvation happens in a subnet (or space with multiple subnets) one will just get a cryptic error about the lack of network config for a binding.

root@juju-750932-22-lxd-5:/var/lib/juju/agents/unit-keystone-2/charm# network-get --primary-address public
ERROR no network config found for binding "public"

This can be debugged by looking at machine agent's logs, however, it is still hard to tell why a network config is empty for that specific interface.

root@juju-750932-22-lxd-5:/var/lib/juju/agents/unit-keystone-2/charm# grep 'observed network config' /var/log/juju/machine-22-lxd-5.log
2017-10-20 14:32:59 DEBUG juju.worker.machiner machiner.go:172 observed network config updated for "machine-22-lxd-5" to [{1 127.0.0.0/8 65536 0 lo loopback false false loopback 127.0.0.1 [] [] []} {1 ::1/128 65536 0 lo loopback false false loopback ::1 [] [] []} {63 00:16:3e:f6:2b:8c 10.30.20.0/22 1500 0 eth0 ethernet false false static 10.30.21.250 [] [] []} {63 00:16:3e:f6:2b:8c 1500 0 eth0 ethernet false false manual [] [] []} {65 00:16:3e:95:51:7c 1500 0 eth1 ethernet false false manual [] [] []}]
2017-10-20 14:33:02 DEBUG juju.worker.machiner machiner.go:172 observed network config updated for "machine-22-lxd-5" to [{1 127.0.0.0/8 65536 0 lo loopback false false loopback 127.0.0.1 [] [] []} {1 ::1/128 65536 0 lo loopback false false loopback ::1 [] [] []} {2 5a:ac:af:94:c3:9d 1500 0 lxdbr0 bridge false false manual [] [] []} {2 5a:ac:af:94:c3:9d 1500 0 lxdbr0 bridge false false manual [] [] []} {63 00:16:3e:f6:2b:8c 10.30.20.0/22 1500 0 eth0 ethernet false false static 10.30.21.250 [] [] []} {63 00:16:3e:f6:2b:8c 1500 0 eth0 ethernet false false manual [] [] []} {65 00:16:3e:95:51:7c 1500 0 eth1 ethernet false false manual [] [] []}]

root@juju-750932-22-lxd-5:/var/lib/juju/agents/unit-keystone-2/charm# cat /etc/network/interfaces

auto lo eth1 eth0

iface lo inet loopback
dns-nameservers 10.30.20.21

iface eth0 inet static
address 10.30.21.250/22
gateway 10.30.20.21

iface eth1 inet manual

Tracing this to the host it can be seen that the container's veth interface is properly plugged into a host bridge so bridger worked and there is an IP address on that bridge interface. This means (and I confirmed by looking at MAAS) that there is a proper subnet configuration for that interface. The problem is revealed by looking at subnet address allocations - there were no more addresses to allocate for that container.

65: eth1@if66: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:95:51:7c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::216:3eff:fe95:517c/64 scope link
       valid_lft forever preferred_lft forever

root@juju-750932-22-lxd-5:/var/lib/juju/agents/unit-keystone-2/charm# ip a s
...
63: eth0@if64: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:f6:2b:8c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.30.21.250/22 brd 10.30.23.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fef6:2b8c/64 scope link
       valid_lft forever preferred_lft forever
65: eth1@if66: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:95:51:7c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::216:3eff:fe95:517c/64 scope link
       valid_lft forever preferred_lft forever

nova003:~$ ip a s | grep if65
66: veth2RWQGO@if65: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-bond0.201 state UP group default qlen 1000

ubuntu@nova003:~$ brctl show | grep veth2RWQGO
veth2RWQGO

nova003:~$ ip -4 -o a s br-bond0.201
40: br-bond0.201 inet 103.77.105.137/26 brd 103.77.105.191 scope global br-bond0.201\ valid_lft forever preferred_lft forever

Tags:

Ian Booth (wallyworld) on 2017-10-22

Changed in juju:
milestone:	none → 2.3.0
status:	New → Triaged
importance:	Undecided → High

Tim Penhey (thumper) on 2017-11-07

Changed in juju:
importance:	High → Medium
milestone:	2.3.0 → 2.3-rc1

Witold Krecicki (wpk) on 2017-11-10

Changed in juju:
assignee:	nobody → Witold Krecicki (wpk)

Witold Krecicki (wpk) on 2017-11-15

Changed in juju:
status:	Triaged → In Progress

Revision history for this message

John A Meinel (jameinel) wrote on 2017-11-16:

https://github.com/juju/juju/pull/8084

So do I understand the patch correctly, that it is essentially just that our retry logic is interacting poorly. So we create a device, and get a failure to give it an IP address, but then when we retry provisioning we see the device already exists, and assume that it was set up correctly.

A different fix could also be to see that the device exists and validate it more thoroughly. But removing the device seems ok. But it does seem like if you had a network break and that is what caused provisioning to fail, then you would be in the same position. (You can't finish setting up the device *nor* delete it because you lost connectivity.)

Witold Krecicki (wpk) on 2017-11-16

Changed in juju:
status:	In Progress → Fix Committed

Revision history for this message

Dmitrii Shcherbakov (dmitriis) wrote on 2017-12-14:

Probably fix-released now.

Anastasia (anastasia-macmood) on 2017-12-17

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.