[fullstack] Race condition when updating the router port information and updating the network MTU

Bug #1845364 reported by Rodolfo Alonso
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Expired
Medium
Unassigned

Bug Description

In [1] we can see a race condition between the router port processing and the network (where this port belongs) update processing in the L3 agent. Ordered list of events:

1) In [2] 03:05:13.563: The router starts the updating process.
   Starting router update for bc221d89-8e74-4112-b42a-3c3c0908404e, action
   3, priority 1, update_id 0ce8cd8d-b393-4a48-8dbf-5e8f537a32ae
This process is asynchronous.

2) In [2] 03:05:22.318: BaseRouterInfo.process() starts processing a new network added.
   adding internal network: prefix(qr-), port(
   d15ca83e-d6c7-47c0-ae13-05e074707148)

3) In [3] 03:05:23.348: The network MTU is updated.
   Request body: {'network': {'mtu': 1499}} prepare_request_body
This event is received and processed in the L3 agent in L3NATAgent.network_update. But at this point, the router does not have the internal port added in BaseRouterInfo.internal_ports. This is going to happen in 4).

4) In [2] 03:05:26.671: The port d15ca83e-d6c7-47c0-ae13-05e074707148 is added to BaseRouterInfo.internal_ports
   appending port {'id': 'd15ca83e-d6c7-47c0-ae13-05e074707148', ...,
   'mtu': 1500} to internal_ports cache

LOGS:
[1] https://df0eb3e2e26f1607f7d8-b5f72c94f829be93029a2756be493e29.ssl.cf2.rackcdn.com/679813/2/gate/neutron-fullstack/dfbde3f/testr_results.html.gz
[2] L3 agent: https://df0eb3e2e26f1607f7d8-b5f72c94f829be93029a2756be493e29.ssl.cf2.rackcdn.com/679813/2/gate/neutron-fullstack/dfbde3f/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_mtu_update/neutron-l3-agent--2019-09-12--03-04-54-685728_log.txt.gz
[3] Neutron server: https://df0eb3e2e26f1607f7d8-b5f72c94f829be93029a2756be493e29.ssl.cf2.rackcdn.com/679813/2/gate/neutron-fullstack/dfbde3f/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_mtu_update/neutron-server--2019-09-12--03-04-44-432445_log.txt.gz

description: updated
Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
Bence Romsics (bence-romsics) wrote :

So the error is that router-interface-add is not fully completed when the mtu-update already arrives, right?

According to logstash this seems to be rare in the gate (zero hits in the last 10 days, the linked logs are older than that). In real environments we'd need router-interface-adds and mtu-updates for the same network in close proximity to each other, which also sounds rare. So I'm setting the importance to low, but clearly we have a bug here.

Changed in neutron:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Bence Romsics (bence-romsics) wrote :

I meant medium.

tags: added: gate-failure
tags: added: fullstack
Changed in neutron:
assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) → nobody
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I think we did some changes in how neutron-l3-agent processes events recently. Is this still valid issue? Do we have any fresh data about it? I didn't saw it in the gate failures since long time.

Changed in neutron:
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.