Unable to determine local ip using default route
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Snap |
Fix Released
|
High
|
Billy Olsen |
Bug Description
On a node with multiple networks attached and a default route, sunbeam cluster bootstrap is unable to determine the default gateway, which results in bootstrap failing with the below stacktrace.
The python netifaces module is unable to determine the default gateway:
>>> import netifaces
>>> netifaces.
{'default': {}, 2: [('10.245.128.1', 'ens10f1', False), ('10.1.10.2', 'ens10f2', False), ('10.1.10.2', 'ens10f3', False), ('10.1.24.2', 'ens4f0', False), ('10.1.10.2', 'ens10f0', False)]}
though indeed, there is a default gateway for the node:
ubuntu@fleetroc:~$ ip route
default via 10.1.10.2 dev ens10f0 proto static
10.1.10.0/23 dev ens10f3 proto kernel scope link src 10.1.11.126
10.1.10.0/23 dev ens10f2 proto kernel scope link src 10.1.11.172
10.1.10.0/23 dev ens10f0 proto kernel scope link src 10.1.11.71
10.1.24.0/22 dev ens4f0 proto kernel scope link src 10.1.25.5
10.245.128.0/21 dev ens10f1 proto kernel scope link src 10.245.130.10
ubuntu@fleetroc:~$ sunbeam -v cluster bootstrap
[18:16:22] DEBUG Bootstrap node: roles CONTROL,COMPUTE bootstrap.py:139
DEBUG Updating /home/ubuntu/
DEBUG Updating /home/ubuntu/
DEBUG Updating /home/ubuntu/
DEBUG Updating /home/ubuntu/
DEBUG Updating /home/ubuntu/
DEBUG Starting pre-flight check Check for juju snap common.py:195
DEBUG Starting pre-flight check Check for ssh-keys interface common.py:195
DEBUG Starting pre-flight check Check for snap_daemon group membership common.py:195
DEBUG Starting pre-flight check Check for .local/share directory common.py:195
DEBUG /var/snap/
DEBUG 2 utils.py:142
WARNING An unexpected error has occurred. Please run 'sunbeam inspect' to generate an inspection report. utils.py:147
ERROR Error: 2
Changed in snap-openstack: | |
importance: | Undecided → High |
Changed in snap-openstack: | |
status: | Fix Committed → Fix Released |
Changed in snap-openstack: | |
milestone: | none → 2023.1.2 |
assignee: | nobody → Billy Olsen (billy-olsen) |
@billy
As I suspected, it looks to be related to complicated routing generated in netplan which creates a lot of ip rule tables
This is an output after fresh deploy
0: from all lookup local
96: from 10.245.128.0/21 to 10.245.128.0/21 lookup main proto static
97: from 10.1.10.0/23 to 10.1.10.0/23 lookup main proto static
98: from 10.1.10.0/23 to 10.1.10.0/23 lookup main proto static
99: from 10.1.24.0/22 to 10.1.24.0/22 lookup main proto static
100: from 10.1.24.0/22 lookup 4 proto static
100: from 10.1.10.0/23 lookup 3 proto static
100: from 10.1.10.0/23 lookup 2 proto static
100: from 10.245.128.0/21 lookup 1 proto static
32766: from all lookup main
32767: from all lookup default
I played with netplan and got it to
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
and ip r
default via 10.1.10.2 dev ens10f0 proto static
default via 10.245.128.1 dev ens10f1 proto static
10.1.10.0/23 dev ens10f0 proto kernel scope link src 10.1.11.201
10.1.10.0/23 dev ens10f3 proto kernel scope link src 10.1.11.74
10.1.10.0/23 dev ens10f2 proto kernel scope link src 10.1.11.225
10.1.24.0/22 dev ens4f0 proto kernel scope link src 10.1.25.5
10.245.128.0/21 dev ens10f1 proto kernel scope link src 10.245.130.10
which is not great, it takes too long to character appear when typed, but
>>> import netifaces gateways( )
>>> netifaces.
{'default': {2: ('10.1.10.2', 'ens10f0')}, 2: [('10.1.10.2', 'ens10f0', True), ('10.245.128.1', 'ens10f1', False)]}