After restarting an ovs agent, it still drops useful flows if the neutron server is busy/down

Bug #1515075 reported by Jian Wen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Jian Wen

Bug Description

How to reproduce:
1. Stop the neutron server to simulate it's busy/down
2. Restart an ovs agent.

After the first rpc times out(default to 60 seconds), the agent will delete some of the existing flows.

Expected:
Keep the existing flows to make sure the instances are happy.

Jian Wen (wenjianhn)
description: updated
Changed in neutron:
assignee: nobody → Jian Wen (wenjianhn)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/243915

Changed in neutron:
status: New → In Progress
Jian Wen (wenjianhn)
description: updated
Jian Wen (wenjianhn)
Changed in neutron:
assignee: Jian Wen (wenjianhn) → nobody
Changed in neutron:
importance: Undecided → High
tags: added: ovs
Jian Wen (wenjianhn)
Changed in neutron:
assignee: nobody → Jian Wen (wenjianhn)
tags: added: liberty-backport-potential
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote : Re: After restarting a ovs agent, it still drops useful flows if the neutron server is busy/down

I agree, this should be back ported to liberty if possible.

Revision history for this message
Ryan Moats (rmoats) wrote :

I'm looking into the possibility that this is occurring in a production kilo cloud as well. If so, then I'll tag it for kilo backport as well

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

@ryan, could be, failure mode & code would be sightly different, since in liberty we introduced the new mechanism to use OF cookies to cleanup old flows when the new ones are installed.

At that time, we just blew up all the old rules, and then introduced the new. But that should not happen if agent is contacting the server, otherwise.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Right, I don't think this really affects Kilo because the Kilo agent just destroys all flows regardless of server load.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Well, I think Kilo is affected too, but that would be kind of as-designed behavior, no?

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Right. Kilo is going to destroy the flows anyway. I suppose we could still defer that behavior until the sync is successful but the back-port will be different to do that.

Changed in neutron:
assignee: Jian Wen (wenjianhn) → Kevin Benton (kevinbenton)
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

My thinking was inline with what kevin said.

Agent is going to flush them anyway, but may be we can avoid it until neutron-server is up?, is it really worth the backport?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/243915
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0c8121ee68085d67a7b865b5b25c4b3880defc89
Submitter: Jenkins
Branch: master

commit 0c8121ee68085d67a7b865b5b25c4b3880defc89
Author: Jian Wen <email address hidden>
Date: Wed Nov 11 11:32:20 2015 +0800

    More graceful ovs-agent restart

    When the neutron server is down/busy the agent is not able to get any
    port info. After the agent restarts, it will not install any new flow.
    Cleaning the existing flows will break all networking until the agent
    succeeds to sync with the neturon server.

    This patch ensures the agent cleans the stale flows only after it
    succeeds to sync with the neturon server.

    Change-Id: I763fc06a73b6d2f010da65e74241182636dda44d
    Closes-bug: #1515075

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/244465

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/244465
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=943b156c36f8b8d02c2a720189f0b4769df6a210
Submitter: Jenkins
Branch: stable/liberty

commit 943b156c36f8b8d02c2a720189f0b4769df6a210
Author: Jian Wen <email address hidden>
Date: Wed Nov 11 11:32:20 2015 +0800

    More graceful ovs-agent restart

    When the neutron server is down/busy the agent is not able to get any
    port info. After the agent restarts, it will not install any new flow.
    Cleaning the existing flows will break all networking until the agent
    succeeds to sync with the neturon server.

    This patch ensures the agent cleans the stale flows only after it
    succeeds to sync with the neturon server.

    Change-Id: I763fc06a73b6d2f010da65e74241182636dda44d
    Closes-bug: #1515075
    (cherry picked from commit 0c8121ee68085d67a7b865b5b25c4b3880defc89)

tags: added: in-stable-liberty
Jian Wen (wenjianhn)
tags: removed: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/kilo)

Related fix proposed to branch: stable/kilo
Review: https://review.openstack.org/249935

Brent Eagles (beagles)
description: updated
description: updated
Changed in neutron:
status: Fix Committed → Fix Released
Jian Wen (wenjianhn)
Changed in neutron:
assignee: Kevin Benton (kevinbenton) → Jian Wen (wenjianhn)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/kilo)

Change abandoned by Doug Wiegley (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/249935
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

Jian Wen (wenjianhn)
summary: - After restarting a ovs agent, it still drops useful flows if the
+ After restarting an ovs agent, it still drops useful flows if the
neutron server is busy/down
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.