Cinder

GlusterFS: Do not time out long-running volume snapshot operations

Bug #1273894 reported by Eric Harney on 2014-01-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Won't Fix	Low	Unassigned	Cinder next
	OpenStack Compute (nova)	Expired	Undecided	Unassigned

Bug Description

Currently, when Cinder sends a snapshot create or delete job to Nova for the GlusterFS driver, it has a fixed timeout window, and if the job takes longer than that, the snapshot operation is failed. (The assumption is that Nova has somehow failed.)

This is problematic because it fails operations that are still active but running very slowly.

The fix proposed here is to use the same update_snapshot_status API which is used to finalize these operations to send periodic updates while the operation is in progress, so that Cinder knows that Nova is still active, and that the job does not need to be timed out.

This is backward compatible for both Havana Cinder and Havana Nova.

Tags:

Eric Harney (eharney) on 2014-01-28

Changed in nova:
assignee:	nobody → Eric Harney (eharney)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-29: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/69759

Changed in cinder:
status:	New → In Progress
Changed in nova:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-29: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/69761

Eric Harney (eharney) on 2014-01-29

Changed in cinder:
milestone:	none → icehouse-3
tags:	added: glusterfs libvirt

Eric Harney (eharney) on 2014-01-30

Changed in cinder:
importance:	Undecided → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-02-26: Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/76587

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-02-27: Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/76587
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=5c79c08f7bb88484a96427d30244057f7cd7cdfc
Submitter: Jenkins
Branch: master

commit 5c79c08f7bb88484a96427d30244057f7cd7cdfc
Author: Eric Harney <email address hidden>
Date: Wed Feb 26 11:38:29 2014 -0500

GlusterFS: Increase snapshot delete job timeout to two hours

    Increase the timeout for Nova snapshot delete operations from ten
    minutes to two hours. This helps prevent Cinder from terminating
    operations prematurely that are still being processed by Nova.

    It is not uncommon for snapshot delete jobs to run for longer than
    ten minutes depending on the size of the snapshot and speed of the
    storage backend.

This will be followed up with a more robust mechanism to keep track
of snapshot job progress as a later effort.

Related-Bug: 1273894

Change-Id: I1ad52568aed1ce1bf593e71704e481b6fe5f44fb

Revision history for this message

Eric Harney (eharney) wrote on 2014-03-04:

Nothing really in shape here for I-3. Moving to RC1 in hopes of finding some decent solution.

Changed in cinder:
milestone:	icehouse-3 → icehouse-rc1

Eric Harney (eharney) on 2014-03-19

Changed in cinder:
milestone:	icehouse-rc1 → next

Mike Perez (thingee) on 2014-08-07

tags:

added: drivers

Revision history for this message

Joe Gordon (jogo) wrote on 2014-12-03:

Nova patch is abandoned, this clearly isn't in progress anymore

Changed in nova:
status:	In Progress → Confirmed

Revision history for this message

Eric Harney (eharney) wrote on 2015-06-23:

This appears to still need a fix.

Changed in cinder:
assignee:	Eric Harney (eharney) → Deepak C Shetty (dpkshetty)

Revision history for this message

Deepak C Shetty (dpkshetty) wrote on 2015-07-07:

@Eric Harney,
I read up the history of this bug, looked at the prev patches.

IIUC, overloading 'progress' field isn't a good idea, so the recent work to introduce new fields in Nova
(See https://review.openstack.org/#/c/172813/) should address the overloading 'progress' field part.

But we still need to have a loop on the Cinder side with _some_ timeout as we wouldn't know
if the Nova/Compute node crashed/stopped responding for some reason, so we don't hang forever on the Cinder side

Do you intend to have some mechanism wherein we wouldn't need a loop on the Cinder side ?
IIUC thats only possible if Nova sends _real_ progress data for the blockjob. Do you have any other solution in mind ?

Changed in cinder:
status:	In Progress → New

Revision history for this message

Sean McGinnis (sean-mcginnis) wrote on 2015-11-24:

Automatically unassigning due to inactivity.

Changed in cinder:
assignee:	Deepak C Shetty (dpkshetty) → nobody

Davanum Srinivas (DIMS) (dims-v) on 2016-03-04

Changed in nova:
assignee:	Eric Harney (eharney) → nobody

Revision history for this message

Sean McGinnis (sean-mcginnis) wrote on 2016-03-13:

#10

Do we still have this issue?

tags:

added: cinder-nova

Revision history for this message

Eric Harney (eharney) wrote on 2016-04-05:

#11

Yes, operations will still fail if they take longer than X minutes/hours due to a slow I/O path.

Changed in cinder:
importance:	Medium → Low

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2016-07-05: Cleanup EOL bug report

#12

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
status:	Confirmed → Expired

Justin A Wilson (justin-wilson) on 2016-07-07

Changed in cinder:
status:	New → Incomplete
status:	Incomplete → New
status:	New → Incomplete

Revision history for this message

Sean McGinnis (sean-mcginnis) wrote on 2016-09-28:

#13

GlusterFS driver is now being removed, so marking as Won't Fix.

https://review.openstack.org/#/c/377028/

Changed in cinder:
status:	Incomplete → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.