OpenStack DBaaS (Trove)

Increase replication timeouts for snapshot/restore

Bug #1362310 reported by Morgan Jones on 2014-08-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack DBaaS (Trove)	Fix Released	High	Nikhil Manchanda	OpenStack DBaaS (Trove) 2014.2 "juno"

Bug Description

In the current implementation, replication snapshots and creating a new slave from a snapshot may fail due to timeouts waiting for large amounts of data to be backed up or restored.

A potential solution is to somehow incorporate monitoring heartbeats from the guestagent to ensure that the operation can have as much time as necessary, without creating a situation where a failed guestagent will lock out the taskmanager. However, such a solution is beyond the scope of implementation for Juno.

As a temporary solution, change the timeouts on on the snapshot backup calls and the instance restore calls to effectively be "timeout = maxint".

Morgan Jones (6-morgan) on 2014-09-08

Changed in trove:
assignee:	nobody → Morgan Jones (6-morgan)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-16: Fix proposed to trove (master)

Fix proposed to branch: master
Review: https://review.openstack.org/121938

Changed in trove:
status:	New → In Progress

Revision history for this message

Amrith Kumar (amrith) wrote on 2014-09-22:

this is a replication bug fix that was targeted for Juno during the mid-cycle.

Changed in trove:
milestone:	none → juno-rc1

Nikhil Manchanda (slicknik) on 2014-09-22

Changed in trove:
importance:	Undecided → High

OpenStack Infra (hudson-openstack) on 2014-09-24

Changed in trove:
assignee:	Morgan Jones (6-morgan) → Nikhil Manchanda (slicknik)

OpenStack Infra (hudson-openstack) on 2014-09-25

Changed in trove:
assignee:	Nikhil Manchanda (slicknik) → Morgan Jones (6-morgan)

OpenStack Infra (hudson-openstack) on 2014-09-26

Changed in trove:
assignee:	Morgan Jones (6-morgan) → Nikhil Manchanda (slicknik)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-30: Fix merged to trove (master)

Reviewed: https://review.openstack.org/121938
Committed: https://git.openstack.org/cgit/openstack/trove/commit/?id=0fe9c9dd59cf4bc084a9875ce789f0b2943c799a
Submitter: Jenkins
Branch: master

commit 0fe9c9dd59cf4bc084a9875ce789f0b2943c799a
Author: Morgan Jones <email address hidden>
Date: Wed Sep 10 10:31:00 2014 -0700

Make the replication snapshot timeout configurable

    There is no way to tell how long the snapshot for replication
    will take, and we have no good way to poll for the slave state.
    Eventually, we will need to have an intelligent poll (perhaps
    based on guest heartbeats), but in the meantime we will have
    the the snapshot use a configurable timeout which can be set
    as needed, and independently of the agent_call timeouts.

    Co-Authored-By: Nikhil Manchanda <email address hidden>
    Change-Id: I6316d748e91d1ec3eebe25a14bb43fbfe10db669
    Closes-bug: 1362310

Changed in trove:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2014-10-03

Changed in trove:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-10-16

Changed in trove:
milestone:	juno-rc1 → 2014.2

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.