Cinder

cheesecake failover-host can hit rpc timeout

Bug #1555342 reported by Patrick East on 2016-03-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Fix Released	Undecided	Patrick East

Bug Description

Right now when handling a failover-host api call we do a blocking RPC 'call' to the volume api which then handles the failover and would return the new backend id.

Unfortunately on some backends when there are large numbers of volumes the failover can take some time as it may require API calls to be done on a per-volume basis to ensure the failover is being done correctly. It is relatively easy to run into the RPC call timeout once you are in the thousands of volumes.

A fix for this would be to make it async and use a RCP 'cast' instead.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-09: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/290857

Changed in cinder:
assignee:	nobody → Patrick East (patrick-east)
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-10: Fix merged to cinder (master)

Reviewed: https://review.openstack.org/290857
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=fddb9c7bc08b893960a5d9a09dfca62b94aa27ed
Submitter: Jenkins
Branch: master

commit fddb9c7bc08b893960a5d9a09dfca62b94aa27ed
Author: Patrick East <email address hidden>
Date: Wed Mar 9 11:08:27 2016 -0800

Switch failover-host from rpc call to cast

There is some concern that with large numbers of volumes it will be
difficult for drivers to failover the host before the rpc timeout hits.

    To avoid asking admins to bump the timeout just for these cases we can
    switch it to do a non-blocking cast instead of call. The difference now
    being that the active_backend_id is not returned from the API call to
    failover-host. An admin will have to look at the service-list output
    to see when it has changed states from ‘failing-over’ and then check
    what its active_backend_id is at that time.

Change-Id: I69b4908fe783cf785d3e1612422fca15fea01c6f
Closes-Bug: #1555342

Changed in cinder:
status:	In Progress → Fix Released

Revision history for this message

Thierry Carrez (ttx) wrote on 2016-03-17: Fix included in openstack/cinder 8.0.0.0rc1

This issue was fixed in the openstack/cinder 8.0.0.0rc1 release candidate.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.