cheesecake failover-host can hit rpc timeout

Bug #1555342 reported by Patrick East
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Patrick East

Bug Description

Right now when handling a failover-host api call we do a blocking RPC 'call' to the volume api which then handles the failover and would return the new backend id.

Unfortunately on some backends when there are large numbers of volumes the failover can take some time as it may require API calls to be done on a per-volume basis to ensure the failover is being done correctly. It is relatively easy to run into the RPC call timeout once you are in the thousands of volumes.

A fix for this would be to make it async and use a RCP 'cast' instead.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/290857

Changed in cinder:
assignee: nobody → Patrick East (patrick-east)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/290857
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=fddb9c7bc08b893960a5d9a09dfca62b94aa27ed
Submitter: Jenkins
Branch: master

commit fddb9c7bc08b893960a5d9a09dfca62b94aa27ed
Author: Patrick East <email address hidden>
Date: Wed Mar 9 11:08:27 2016 -0800

    Switch failover-host from rpc call to cast

    There is some concern that with large numbers of volumes it will be
    difficult for drivers to failover the host before the rpc timeout hits.

    To avoid asking admins to bump the timeout just for these cases we can
    switch it to do a non-blocking cast instead of call. The difference now
    being that the active_backend_id is not returned from the API call to
    failover-host. An admin will have to look at the service-list output
    to see when it has changed states from ‘failing-over’ and then check
    what its active_backend_id is at that time.

    Change-Id: I69b4908fe783cf785d3e1612422fca15fea01c6f
    Closes-Bug: #1555342

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/cinder 8.0.0.0rc1

This issue was fixed in the openstack/cinder 8.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.