ovn-bgp-agent fails on startup if interface names > 16 symbols

Bug #2054599 reported by Dmitriy Rabotyagov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ovn-bgp-agent
In Progress
High
Unassigned

Bug Description

In cases, when a user does stupid thing and defines long enough interface names for bgp_vrf or bgp_nic, ovn-bgp-agent fails on startup early, as tries to re-create interface that's already existing.

Ie:

[DEFAULT]
....
bgp_vrf = BGP-LOOPBACK-NETWORK

Will result on following stack trace on second restart:

2024-02-21 17:53:06.612 64903 ERROR oslo_service.service [-] Error starting thread.: ovn_bgp_agent.exceptions.InterfaceAlreadyExists: Interface BGP-LOOPBACK-NET already exists.
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service Traceback (most recent call last):
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service File "/openstack/venvs/neutron-28.0.1/lib/python3.10/site-packages/oslo_service/service.py", line 806, in run_service
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service service.start()
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service File "/openstack/venvs/neutron-28.0.1/lib/python3.10/site-packages/ovn_bgp_agent/agent.py", line 41, in start
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service self.agent_driver.start()
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service File "/openstack/venvs/neutron-28.0.1/lib/python3.10/site-packages/ovn_bgp_agent/drivers/openstack/nb_ovn_bgp_driver.py", line 111, in start
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service bgp_utils.ensure_base_bgp_configuration()
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service File "/openstack/venvs/neutron-28.0.1/lib/python3.10/site-packages/ovn_bgp_agent/drivers/openstack/utils/bgp.py", line 42, in ensure_base_bgp_configuration
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service linux_net.ensure_vrf(CONF.bgp_vrf, CONF.bgp_vrf_table_id)
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service File "/openstack/venvs/neutron-28.0.1/lib/python3.10/site-packages/ovn_bgp_agent/utils/linux_net.py", line 104, in ensure_vrf
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service ovn_bgp_agent.privileged.linux_net.ensure_vrf(vrf_name, vrf_table)
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service File "/openstack/venvs/neutron-28.0.1/lib/python3.10/site-packages/oslo_privsep/priv_context.py", line 271, in _wrap
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service return self.channel.remote_call(name, args, kwargs,
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service File "/openstack/venvs/neutron-28.0.1/lib/python3.10/site-packages/oslo_privsep/daemon.py", line 215, in remote_call
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service raise exc_type(*result[2])
2024-02-21 17:53:06.612 64903 ERROR oslo_service.service ovn_bgp_agent.exceptions.InterfaceAlreadyExists: Interface BGP-LOOPBACK-NET already exists.

This happens because interface name is validly trimmed. However, when agent tries to ensure state, it's attempting to ensure it with invalid (long) name, and catches NetworkInterfaceNotFound.
But when create request is sent, it trimms network name and results in network existing:

https://opendev.org/openstack/ovn-bgp-agent/src/commit/921e39ba02411d42028088c842f7aa8d52b633fd/ovn_bgp_agent/privileged/linux_net.py#L57-L61

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovn-bgp-agent (master)
Changed in ovn-bgp-agent:
status: New → In Progress
Changed in ovn-bgp-agent:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovn-bgp-agent (master)

Reviewed: https://review.opendev.org/c/openstack/ovn-bgp-agent/+/909788
Committed: https://opendev.org/openstack/ovn-bgp-agent/commit/1ac77aba49c36614628adca010d2815f3b520dee
Submitter: "Zuul (22348)"
Branch: master

commit 1ac77aba49c36614628adca010d2815f3b520dee
Author: Dmitriy Rabotyagov <email address hidden>
Date: Wed Feb 21 19:53:51 2024 +0100

    Trimm interface name consistently

    In create_interface method we trimm interface name up to 16 symbols,
    however in all following methods we do not care about same trimm, which
    causes calls to such interfaces fail with NetworkInterfaceNotFound

    Closes-Bug: #2054599
    Change-Id: I15f773afa64079eec6534c53eab0e9b7dd796d5f

Changed in ovn-bgp-agent:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovn-bgp-agent (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/ovn-bgp-agent/+/910304

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovn-bgp-agent (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/ovn-bgp-agent/+/910304
Committed: https://opendev.org/openstack/ovn-bgp-agent/commit/e37b16bb9c440c10256a74be88833f80905bb2aa
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit e37b16bb9c440c10256a74be88833f80905bb2aa
Author: Dmitriy Rabotyagov <email address hidden>
Date: Wed Feb 21 19:53:51 2024 +0100

    Trimm interface name consistently

    In create_interface method we trimm interface name up to 16 symbols,
    however in all following methods we do not care about same trimm, which
    causes calls to such interfaces fail with NetworkInterfaceNotFound

    Closes-Bug: #2054599
    Change-Id: I15f773afa64079eec6534c53eab0e9b7dd796d5f
    (cherry picked from commit 1ac77aba49c36614628adca010d2815f3b520dee)

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Folks, despite this was partially solved with the patch above, it seems there're more places which needs to be fixed.
Going down the road I found another related issue: https://paste.openstack.org/show/bGynX5STHDxOwvOGLtPh/

And eventually interface was also trimmed in this case:
# ip l | grep br-provider
15: br-provider: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
24: br-provider.311@br-provider: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
#

As you can see - interface is `br-provider.311` while it's expected to be `br-provider.3112`

Changed in ovn-bgp-agent:
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovn-bgp-agent (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovn-bgp-agent 2.0.0.0rc1

This issue was fixed in the openstack/ovn-bgp-agent 2.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.