Unable to determine local ip using default route

Bug #2025403 reported by Billy Olsen
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Snap
Fix Released
High
Billy Olsen

Bug Description

On a node with multiple networks attached and a default route, sunbeam cluster bootstrap is unable to determine the default gateway, which results in bootstrap failing with the below stacktrace.

The python netifaces module is unable to determine the default gateway:

>>> import netifaces
>>> netifaces.gateways()
{'default': {}, 2: [('10.245.128.1', 'ens10f1', False), ('10.1.10.2', 'ens10f2', False), ('10.1.10.2', 'ens10f3', False), ('10.1.24.2', 'ens4f0', False), ('10.1.10.2', 'ens10f0', False)]}

though indeed, there is a default gateway for the node:

ubuntu@fleetroc:~$ ip route
default via 10.1.10.2 dev ens10f0 proto static
10.1.10.0/23 dev ens10f3 proto kernel scope link src 10.1.11.126
10.1.10.0/23 dev ens10f2 proto kernel scope link src 10.1.11.172
10.1.10.0/23 dev ens10f0 proto kernel scope link src 10.1.11.71
10.1.24.0/22 dev ens4f0 proto kernel scope link src 10.1.25.5
10.245.128.0/21 dev ens10f1 proto kernel scope link src 10.245.130.10

ubuntu@fleetroc:~$ sunbeam -v cluster bootstrap
[18:16:22] DEBUG Bootstrap node: roles CONTROL,COMPUTE bootstrap.py:139
           DEBUG Updating /home/ubuntu/snap/openstack/common/etc/deploy-sunbeam-machine from /snap/openstack/182/etc/deploy-sunbeam-machine... bootstrap.py:155
           DEBUG Updating /home/ubuntu/snap/openstack/common/etc/deploy-microk8s from /snap/openstack/182/etc/deploy-microk8s... bootstrap.py:155
           DEBUG Updating /home/ubuntu/snap/openstack/common/etc/deploy-microceph from /snap/openstack/182/etc/deploy-microceph... bootstrap.py:155
           DEBUG Updating /home/ubuntu/snap/openstack/common/etc/deploy-openstack from /snap/openstack/182/etc/deploy-openstack... bootstrap.py:155
           DEBUG Updating /home/ubuntu/snap/openstack/common/etc/deploy-openstack-hypervisor from /snap/openstack/182/etc/deploy-openstack-hypervisor... bootstrap.py:155
           DEBUG Starting pre-flight check Check for juju snap common.py:195
           DEBUG Starting pre-flight check Check for ssh-keys interface common.py:195
           DEBUG Starting pre-flight check Check for snap_daemon group membership common.py:195
           DEBUG Starting pre-flight check Check for .local/share directory common.py:195
           DEBUG /var/snap/openstack/common/state/control.socket service.py:109
           DEBUG 2 utils.py:142
                    Traceback (most recent call last):
                      File "/snap/openstack/182/lib/python3.10/site-packages/sunbeam/utils.py", line 140, in __call__
                        return self.main(*args, **kwargs)
                      File "/snap/openstack/182/lib/python3.10/site-packages/click/core.py", line 1055, in main
                        rv = self.invoke(ctx)
                      File "/snap/openstack/182/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
                        return _process_result(sub_ctx.command.invoke(sub_ctx))
                      File "/snap/openstack/182/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
                        return _process_result(sub_ctx.command.invoke(sub_ctx))
                      File "/snap/openstack/182/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
                        return ctx.invoke(self.callback, **ctx.params)
                      File "/snap/openstack/182/lib/python3.10/site-packages/click/core.py", line 760, in invoke
                        return __callback(*args, **kwargs)
                      File "/snap/openstack/182/lib/python3.10/site-packages/sunbeam/commands/bootstrap.py", line 167, in bootstrap
                        plan.append(ClusterInitStep(roles_to_str_list(roles)))
                      File "/snap/openstack/182/lib/python3.10/site-packages/sunbeam/commands/clusterd.py", line 51, in __init__
                        self.ip = utils.get_local_ip_by_default_route()
                      File "/snap/openstack/182/lib/python3.10/site-packages/sunbeam/utils.py", line 61, in get_local_ip_by_default_route
                        interface = netifaces.gateways()["default"][netifaces.AF_INET][1]
                    KeyError: 2
           WARNING An unexpected error has occurred. Please run 'sunbeam inspect' to generate an inspection report. utils.py:147
           ERROR Error: 2

Revision history for this message
Marian Gasparovic (marosg) wrote :

@billy
As I suspected, it looks to be related to complicated routing generated in netplan which creates a lot of ip rule tables

This is an output after fresh deploy

0: from all lookup local
96: from 10.245.128.0/21 to 10.245.128.0/21 lookup main proto static
97: from 10.1.10.0/23 to 10.1.10.0/23 lookup main proto static
98: from 10.1.10.0/23 to 10.1.10.0/23 lookup main proto static
99: from 10.1.24.0/22 to 10.1.24.0/22 lookup main proto static
100: from 10.1.24.0/22 lookup 4 proto static
100: from 10.1.10.0/23 lookup 3 proto static
100: from 10.1.10.0/23 lookup 2 proto static
100: from 10.245.128.0/21 lookup 1 proto static
32766: from all lookup main
32767: from all lookup default

I played with netplan and got it to

0: from all lookup local
32766: from all lookup main
32767: from all lookup default

and ip r
default via 10.1.10.2 dev ens10f0 proto static
default via 10.245.128.1 dev ens10f1 proto static
10.1.10.0/23 dev ens10f0 proto kernel scope link src 10.1.11.201
10.1.10.0/23 dev ens10f3 proto kernel scope link src 10.1.11.74
10.1.10.0/23 dev ens10f2 proto kernel scope link src 10.1.11.225
10.1.24.0/22 dev ens4f0 proto kernel scope link src 10.1.25.5
10.245.128.0/21 dev ens10f1 proto kernel scope link src 10.245.130.10

which is not great, it takes too long to character appear when typed, but

>>> import netifaces
>>> netifaces.gateways()
{'default': {2: ('10.1.10.2', 'ens10f0')}, 2: [('10.1.10.2', 'ens10f0', True), ('10.245.128.1', 'ens10f1', False)]}

Revision history for this message
Billy Olsen (billy-olsen) wrote :
Changed in snap-openstack:
status: New → Fix Committed
James Page (james-page)
Changed in snap-openstack:
importance: Undecided → High
James Page (james-page)
Changed in snap-openstack:
status: Fix Committed → Fix Released
James Page (james-page)
Changed in snap-openstack:
milestone: none → 2023.1.2
assignee: nobody → Billy Olsen (billy-olsen)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.