neutron

[RFE] Use call_monitor_timeout of oslo.messaging RPCClient instead of custom backoff mechanism and hardcoded timeouts

Bug #2045058 reported by Ihar Hrachyshka on 2023-11-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	neutron	In Progress	Wishlist	Ihar Hrachyshka

Bug Description

Currently, neutron RPC clients will repeat calls, timeout, back off, repeat again... this logic is implemented in neutron-lib RPCClient itself. This is done to handle requests that take a very long time.

Instead of failing, then bumping timeout and hope that it's enough now (and leave the server unaware), we could instead enable active heartbeating with oslo.messaging call_monitor_timeout option.

See nova did this for their clients: https://opendev.org/openstack/nova/commit/fe26a52024416ed2d37c2d5027da4b23231dc515

I believe this should replace backoff logic in neutron-lib.

Tags:

Mamatisa Nurmatov (isabek) on 2023-11-29

Changed in neutron:
status:	New → Triaged
importance:	Undecided → Wishlist

Brian Haley (brian-haley) on 2023-11-30

tags:

added: rfe-triaged

Brian Haley (brian-haley) on 2023-11-30

tags:

added: rfe-confirmed
removed: rfe-triaged

Revision history for this message

Brian Haley (brian-haley) wrote on 2023-12-01:

I know we can find the review from the commit ID, but here is a direct link. It did have a couple of small follow-ons based on the topic.

https://review.opendev.org/c/openstack/nova/+/566696

Ihar Hrachyshka (ihar-hrachyshka) on 2024-01-12

Changed in neutron:
assignee:	nobody → Ihar Hrachyshka (ihar-hrachyshka)

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2024-01-16:

This RFE was discussed during drivers meeting, and the suggestion was to not require a spec for this change, instead:

"let's add some more details to the RFE like how it works for nova, how to handle shorter that 60sec calls etc, and push PoCn"

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2024-01-16:

Logs of the drivers discussion here: https://meetings.opendev.org/meetings/neutron_drivers/2023/neutron_drivers.2023-12-01-14.00.log.html

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2024-01-16:

The plan is:

- engage the call_monitor_timeout option without touching rpc client backoff mechanism. (In this way, the backoff will serve as a failsafe option when timeout misbehaves for some reason.)
- monitor behavior of the automatic timeout mechanism over several cycles.
- eventually, consider removal of the backoff mechanism from neutron-lib.

Nova enabled the active heartbeating for rpc calls when rpc timeout is bumped from the default 60 seconds. This seems a historical decision, to quote, to "keep the failure timing characteristics that our code likely expects (from history)". I will check with Dan Smith who wrote this (and the patch that integrates the mechanism in nova from ~2018) to see if there is a good reason to follow this example, or we can proactively enable it for all calls. For now, I plan to apply it unconditionally, unless there is a good scaling or stability related reason not to.

Ihar Hrachyshka (ihar-hrachyshka) on 2024-02-14

Changed in neutron:
status:	Triaged → In Progress

Slawek Kaplonski (slaweq) on 2024-07-26

tags:

added: rfe-approved
removed: rfe-confirmed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.