l2 pop doesn't always provide the whole list of fdb entries on agent restart

Bug #1799178 reported by Oleg Bondarev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Wishlist
LIU Yulong

Bug Description

The whole list of fdb entries is provided to the agent in case a port form new network appears, or when agent is restarted.
Currently agent restart is detected by agent_boot_time option, 180 sec by default.
In fact boot time differs depending on port count and on some loaded clusters may exceed 180 secs on gateway nodes easily. Changing boot time in config works, but honestly this is not an ideal solution.
There should be a smarter way for agent restart detection (like agent itself sending flag in state report).

Changed in neutron:
importance: Undecided → Wishlist
tags: added: l2-pop l3-dvr-backlog l3-ha
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

I agree that there should be a better way to handle this other than the changing the config option.

Changed in neutron:
status: New → Confirmed
Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/615246

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/634666

Changed in neutron:
assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) → Oleg Bondarev (obondarev)
Changed in neutron:
assignee: Oleg Bondarev (obondarev) → LIU Yulong (dragon889)
Revision history for this message
LIU Yulong (dragon889) wrote :

Another approach to fix this is here:
https://review.openstack.org/#/c/640797/

Revision history for this message
Oleg Bondarev (obondarev) wrote :

RPC change ^^ seems indeed the correct approach, but we also need a backportable fix

Changed in neutron:
assignee: LIU Yulong (dragon889) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → LIU Yulong (dragon889)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Rodolfo Alonso Hernandez (<email address hidden>) on branch: master
Review: https://review.openstack.org/615246

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/645405

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/645406

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/645408

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/640797
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a5244d6d44d2b66de27dc77efa7830fa657260be
Submitter: Zuul
Branch: master

commit a5244d6d44d2b66de27dc77efa7830fa657260be
Author: LIU Yulong <email address hidden>
Date: Mon Mar 4 21:17:20 2019 +0800

    More accurate agent restart state transfer

    Ovs-agent can be very time-consuming in handling a large number
    of ports. At this point, the ovs-agent status report may have
    exceeded the set timeout value. Some flows updating operations
    will not be triggerred. This results in flows loss during agent
    restart, especially for hosts to hosts of vxlan tunnel flow.

    This fix will let the ovs-agent explicitly, in the first rpc loop,
    indicate that the status is restarted. Then l2pop will be required
    to update fdb entries.

    Closes-Bug: #1813703
    Closes-Bug: #1813714
    Closes-Bug: #1813715
    Closes-Bug: #1794991
    Closes-Bug: #1799178

    Change-Id: I8edc2deb509216add1fb21e1893f1c17dda80961

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0rc1

This issue was fixed in the openstack/neutron 14.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/649729

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/634666
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8f20963c5b9e6762b6322f686bce99871bec6be9
Submitter: Zuul
Branch: master

commit 8f20963c5b9e6762b6322f686bce99871bec6be9
Author: Oleg Bondarev <email address hidden>
Date: Mon Feb 4 14:58:27 2019 +0400

    OVS agent: always send start flag during initial sync

    In order to avoid inaccurate agent_boot_time setting,
    this patch suggests to consider agent as "started" only
    after completion of initial sync with server.

    Change-Id: Icba05288889219e8a606c3809efd88b2c234bef3
    Closes-Bug: #1799178

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.openstack.org/650398

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/650399

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/650400

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/650402

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/650403

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/645408
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=62fe7852bbd70a24174853997096c52ee015e269
Submitter: Zuul
Branch: stable/pike

commit 62fe7852bbd70a24174853997096c52ee015e269
Author: LIU Yulong <email address hidden>
Date: Mon Mar 4 21:17:20 2019 +0800

    More accurate agent restart state transfer

    Ovs-agent can be very time-consuming in handling a large number
    of ports. At this point, the ovs-agent status report may have
    exceeded the set timeout value. Some flows updating operations
    will not be triggerred. This results in flows loss during agent
    restart, especially for hosts to hosts of vxlan tunnel flow.

    This fix will let the ovs-agent explicitly, in the first rpc loop,
    indicate that the status is restarted. Then l2pop will be required
    to update fdb entries.

    Conflicts:
     neutron/plugins/ml2/rpc.py

    Conflicts:
     neutron/plugins/ml2/drivers/l2pop/mech_driver.py

    Closes-Bug: #1813703
    Closes-Bug: #1813714
    Closes-Bug: #1813715
    Closes-Bug: #1794991
    Closes-Bug: #1799178

    Change-Id: I8edc2deb509216add1fb21e1893f1c17dda80961
    (cherry picked from commit a5244d6d44d2b66de27dc77efa7830fa657260be)
    (cherry picked from commit cc49ab550179bdc76d79f48be67886681cb43d4e)
    (cherry picked from commit 5ffca4966877454c605442e9e429aa83ea7d7348)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/645405
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cc49ab550179bdc76d79f48be67886681cb43d4e
Submitter: Zuul
Branch: stable/rocky

commit cc49ab550179bdc76d79f48be67886681cb43d4e
Author: LIU Yulong <email address hidden>
Date: Mon Mar 4 21:17:20 2019 +0800

    More accurate agent restart state transfer

    Ovs-agent can be very time-consuming in handling a large number
    of ports. At this point, the ovs-agent status report may have
    exceeded the set timeout value. Some flows updating operations
    will not be triggerred. This results in flows loss during agent
    restart, especially for hosts to hosts of vxlan tunnel flow.

    This fix will let the ovs-agent explicitly, in the first rpc loop,
    indicate that the status is restarted. Then l2pop will be required
    to update fdb entries.

    Conflicts:
     neutron/plugins/ml2/rpc.py

    Closes-Bug: #1813703
    Closes-Bug: #1813714
    Closes-Bug: #1813715
    Closes-Bug: #1794991
    Closes-Bug: #1799178

    Change-Id: I8edc2deb509216add1fb21e1893f1c17dda80961
    (cherry picked from commit a5244d6d44d2b66de27dc77efa7830fa657260be)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.openstack.org/650398
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=382431c944e21ef6aab06f3746f23a84f7425cb2
Submitter: Zuul
Branch: stable/stein

commit 382431c944e21ef6aab06f3746f23a84f7425cb2
Author: Oleg Bondarev <email address hidden>
Date: Mon Feb 4 14:58:27 2019 +0400

    OVS agent: always send start flag during initial sync

    In order to avoid inaccurate agent_boot_time setting,
    this patch suggests to consider agent as "started" only
    after completion of initial sync with server.

    Change-Id: Icba05288889219e8a606c3809efd88b2c234bef3
    Closes-Bug: #1799178
    (cherry picked from commit 8f20963c5b9e6762b6322f686bce99871bec6be9)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/650399
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6b9d8bf3087a8fea2a991d392b314706039191a3
Submitter: Zuul
Branch: stable/rocky

commit 6b9d8bf3087a8fea2a991d392b314706039191a3
Author: Oleg Bondarev <email address hidden>
Date: Mon Feb 4 14:58:27 2019 +0400

    OVS agent: always send start flag during initial sync

    In order to avoid inaccurate agent_boot_time setting,
    this patch suggests to consider agent as "started" only
    after completion of initial sync with server.

    Change-Id: Icba05288889219e8a606c3809efd88b2c234bef3
    Closes-Bug: #1799178
    (cherry picked from commit 8f20963c5b9e6762b6322f686bce99871bec6be9)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/650400
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=55fa2d7ed42d255a89c23cb89df287727bf1ed03
Submitter: Zuul
Branch: stable/queens

commit 55fa2d7ed42d255a89c23cb89df287727bf1ed03
Author: Oleg Bondarev <email address hidden>
Date: Mon Feb 4 14:58:27 2019 +0400

    OVS agent: always send start flag during initial sync

    In order to avoid inaccurate agent_boot_time setting,
    this patch suggests to consider agent as "started" only
    after completion of initial sync with server.

    Change-Id: Icba05288889219e8a606c3809efd88b2c234bef3
    Closes-Bug: #1799178
    (cherry picked from commit 8f20963c5b9e6762b6322f686bce99871bec6be9)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/650402
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8865466e9fc39ff8e58576875f401f9fb8301beb
Submitter: Zuul
Branch: stable/pike

commit 8865466e9fc39ff8e58576875f401f9fb8301beb
Author: Oleg Bondarev <email address hidden>
Date: Mon Feb 4 14:58:27 2019 +0400

    OVS agent: always send start flag during initial sync

    In order to avoid inaccurate agent_boot_time setting,
    this patch suggests to consider agent as "started" only
    after completion of initial sync with server.

    Change-Id: Icba05288889219e8a606c3809efd88b2c234bef3
    Closes-Bug: #1799178
    (cherry picked from commit 8f20963c5b9e6762b6322f686bce99871bec6be9)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/650403
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=64e98ff9597684868fc07979c1e861d2dd506688
Submitter: Zuul
Branch: stable/ocata

commit 64e98ff9597684868fc07979c1e861d2dd506688
Author: Oleg Bondarev <email address hidden>
Date: Mon Feb 4 14:58:27 2019 +0400

    OVS agent: always send start flag during initial sync

    In order to avoid inaccurate agent_boot_time setting,
    this patch suggests to consider agent as "started" only
    after completion of initial sync with server.

    Change-Id: Icba05288889219e8a606c3809efd88b2c234bef3
    Closes-Bug: #1799178
    (cherry picked from commit 8f20963c5b9e6762b6322f686bce99871bec6be9)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.7

This issue was fixed in the openstack/neutron 11.0.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.6

This issue was fixed in the openstack/neutron 12.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.3

This issue was fixed in the openstack/neutron 13.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.6

This issue was fixed in the openstack/neutron 12.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.1

This issue was fixed in the openstack/neutron 14.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ocata)

Reviewed: https://review.opendev.org/649729
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9583dc0549da2b4529a59b5862ba42aebc5ae15f
Submitter: Zuul
Branch: stable/ocata

commit 9583dc0549da2b4529a59b5862ba42aebc5ae15f
Author: LIU Yulong <email address hidden>
Date: Mon Mar 4 21:17:20 2019 +0800

    More accurate agent restart state transfer

    Ovs-agent can be very time-consuming in handling a large number
    of ports. At this point, the ovs-agent status report may have
    exceeded the set timeout value. Some flows updating operations
    will not be triggerred. This results in flows loss during agent
    restart, especially for hosts to hosts of vxlan tunnel flow.

    This fix will let the ovs-agent explicitly, in the first rpc loop,
    indicate that the status is restarted. Then l2pop will be required
    to update fdb entries.

    Conflicts:
     neutron/plugins/ml2/rpc.py

    Conflicts:
     neutron/plugins/ml2/drivers/l2pop/mech_driver.py

    Closes-Bug: #1813703
    Closes-Bug: #1813714
    Closes-Bug: #1813715
    Closes-Bug: #1794991
    Closes-Bug: #1799178

    Change-Id: I8edc2deb509216add1fb21e1893f1c17dda80961
    (cherry picked from commit a5244d6d44d2b66de27dc77efa7830fa657260be)
    (cherry picked from commit cc49ab550179bdc76d79f48be67886681cb43d4e)
    (cherry picked from commit 5ffca4966877454c605442e9e429aa83ea7d7348)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 15.0.0.0b1

This issue was fixed in the openstack/neutron 15.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron ocata-eol

This issue was fixed in the openstack/neutron ocata-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.