Bug #1494682 “l3 agent avoid unnecessary full_sync " : Bugs : neutron

Sudhakar Gariganti (sudhakar-gariganti) on 2015-09-11

Changed in neutron:
assignee:	nobody → Sudhakar Gariganti (sudhakar-gariganti)
status:	New → In Progress
tags:	added: l3-ipam-dhcp removed: l3-dvr-backlog

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-16: Fix proposed to neutron (master)

#1

Fix proposed to branch: master
Review: https://review.openstack.org/224019

Carl Baldwin (carl-baldwin) on 2015-09-21

Changed in neutron:
importance:	Undecided → Low

Revision history for this message

Sudhakar Gariganti (sudhakar-gariganti) wrote on 2015-09-22:

#2

From a functionality point of view, I agree it is LOW. But if we see from the scale point of view, this does impact significantly.

A single random RPC timeout@scale will put the l3 agent in indefinite cycle and has terrible impact on the DB and controller operations, which will eventually degrade the performance of other agents as well.

At just a scale of less than 1000 networks, it was taking multiples of hours for the cloud to get back into shape. Imagine the situation at a higher scale.

Agree its late in the cycle, but if there is chance, I feel its good to have this for Liberty.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-09: Fix merged to neutron (master)

#3

Reviewed: https://review.openstack.org/224019
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4957b5b43521a61873a041fe3e8989ed399903d9
Submitter: Jenkins
Branch: master

commit 4957b5b43521a61873a041fe3e8989ed399903d9
Author: Sudhakar Babu Gariganti <email address hidden>
Date: Wed Sep 16 15:53:57 2015 +0530

Avoid full_sync in l3_agent for router updates

While processing a router update in _process_router_update method,
if an exception occurs, we try to do a full_sync.

We only need to re-sync the router whose update failed.

Addressed a TODO in the same method, which falls in similar lines.

Change-Id: I7c43a508adf46d8524f1cc48b83f1e1c276a2de0
Closes-Bug: #1494682

Changed in neutron:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-18: Fix proposed to neutron (stable/liberty)

#4

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/259510

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-19: Related fix proposed to neutron (master)

#5

Related fix proposed to branch: master
Review: https://review.openstack.org/259708

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-23: Related fix merged to neutron (master)

#6

Reviewed: https://review.openstack.org/259708
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=822ad5f06bcef8f95f36032d4fd4709975cecc31
Submitter: Jenkins
Branch: master

commit 822ad5f06bcef8f95f36032d4fd4709975cecc31
Author: Assaf Muller <email address hidden>
Date: Sat Dec 19 14:13:43 2015 -0500

Force L3 agent to resync router it could not configure

    If the L3 agent fails to configure a router, commit:
    4957b5b43521a61873a041fe3e8989ed399903d9 changed it so
    that instead of performing an expensive full sync, only that
    router is reconfigured. However, it tries to reconfigure the
    cached router. This is a change of behavior from the fullsync
    days. The retry is more likely to succeed if the
    router is retrieved from the server, instead of using
    the locally cached version, in case the user or operator
    fixed bad input, or if the router was retrieved in a bad
    state due to a server-side race condition.

    Note that this is only relevant to full syncs, as those retrieve
    routers from the server and queue updates with the router object.
    Incremental updates queue up updates without router objects,
    so if one of those fails it would always be resynced on a
    second attempt.

Related-Bug: #1494682
Change-Id: Id0565e11b3023a639589f2734488029f194e2f9d

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-23: Related fix proposed to neutron (stable/liberty)

#7

Related fix proposed to branch: stable/liberty
Review: https://review.openstack.org/261044

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-07: Fix merged to neutron (stable/liberty)

#8

Reviewed: https://review.openstack.org/259510
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=430892ab60480ea084429bab2379f378c5b7c5c8
Submitter: Jenkins
Branch: stable/liberty

commit 430892ab60480ea084429bab2379f378c5b7c5c8
Author: Sudhakar Babu Gariganti <email address hidden>
Date: Wed Sep 16 15:53:57 2015 +0530

Avoid full_sync in l3_agent for router updates

While processing a router update in _process_router_update method,
if an exception occurs, we try to do a full_sync.

We only need to re-sync the router whose update failed.

Addressed a TODO in the same method, which falls in similar lines.

    Change-Id: I7c43a508adf46d8524f1cc48b83f1e1c276a2de0
    Closes-Bug: #1494682
    (cherry picked from commit 4957b5b43521a61873a041fe3e8989ed399903d9)

tags:

added: in-stable-liberty

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-07: Related fix merged to neutron (stable/liberty)

#9

Reviewed: https://review.openstack.org/261044
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d11e9cb550f7b80d2d81327cd17d540352dce43d
Submitter: Jenkins
Branch: stable/liberty

commit d11e9cb550f7b80d2d81327cd17d540352dce43d
Author: Assaf Muller <email address hidden>
Date: Sat Dec 19 14:13:43 2015 -0500

Force L3 agent to resync router it could not configure

    If the L3 agent fails to configure a router, commit:
    4957b5b43521a61873a041fe3e8989ed399903d9 changed it so
    that instead of performing an expensive full sync, only that
    router is reconfigured. However, it tries to reconfigure the
    cached router. This is a change of behavior from the fullsync
    days. The retry is more likely to succeed if the
    router is retrieved from the server, instead of using
    the locally cached version, in case the user or operator
    fixed bad input, or if the router was retrieved in a bad
    state due to a server-side race condition.

    Note that this is only relevant to full syncs, as those retrieve
    routers from the server and queue updates with the router object.
    Incremental updates queue up updates without router objects,
    so if one of those fails it would always be resynced on a
    second attempt.

    Related-Bug: #1494682
    Change-Id: Id0565e11b3023a639589f2734488029f194e2f9d
    (cherry picked from commit 822ad5f06bcef8f95f36032d4fd4709975cecc31)

Revision history for this message

Thierry Carrez (ttx) wrote on 2016-01-19: Fix included in openstack/neutron 8.0.0.0b2

#10

This issue was fixed in the openstack/neutron 8.0.0.0b2 development milestone.

Revision history for this message

Doug Hellmann (doug-hellmann) wrote on 2016-01-27: Fix included in openstack/neutron 7.0.2

#11

This issue was fixed in the openstack/neutron 7.0.2 release.

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2016-03-30:

#12

It's at least Medium, maybe High since it hits our scalability really hard.

Changed in neutron:
importance:	Low → Medium

neutron

l3 agent avoid unnecessary full_sync

Bug Description

Other bug subscribers

Remote bug watches