[NBBGP] After running sync method, it removes all of previously exposed ips
Bug #2078778 reported by
Michel Nederlof
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ovn-bgp-agent |
New
|
Undecided
|
Michel Nederlof |
Bug Description
Sometimes, when the sync interval is running the `sync` method, it clears out all routes of a specific exposed vrf.
For now we've increased the sync interval to a extremely high number (e.g. 3 months or so), but this problem is really hard to debug.
Restarting ovn-bgp-agent normally fixes the issue (or waiting for the next sync interval).
Changed in ovn-bgp-agent: | |
assignee: | nobody → Michel Nederlof (mnederlof) |
To post a comment you must log in.
Our current working theory right now, is that during processing of the events (in the match_fn methods), it calls some functionality in the agent, that causes the agent to believe some logical switch is not exported (yet).
We found at least one case where this could be happening namely in this call stack: ls_provider -> _get_provider_ ls_info
- event -> agent.is_
there, if logical switch is not in self.ovn_ provider_ ls it will populate it with information it receives from other helper methods.
If the sync is just starting (and has not gone through wiring the devices) this dictionary is empty, and that method would fill it with None, as that is the result from this call stack: ls_info -> _get_ls_ localnet_ info -> _get_bridge_ for_localnet_ port
- _get_provider_
which relies on information from self.ovn_ bridge_ mappings (which is also reset during a sync)
I will provide a patch, that should resolve this.