GlusterFS: Do not time out long-running volume snapshot operations

Bug #1273894 reported by Eric Harney
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Won't Fix
Low
Unassigned
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Currently, when Cinder sends a snapshot create or delete job to Nova for the GlusterFS driver, it has a fixed timeout window, and if the job takes longer than that, the snapshot operation is failed. (The assumption is that Nova has somehow failed.)

This is problematic because it fails operations that are still active but running very slowly.

The fix proposed here is to use the same update_snapshot_status API which is used to finalize these operations to send periodic updates while the operation is in progress, so that Cinder knows that Nova is still active, and that the job does not need to be timed out.

This is backward compatible for both Havana Cinder and Havana Nova.

Eric Harney (eharney)
Changed in nova:
assignee: nobody → Eric Harney (eharney)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/69759

Changed in cinder:
status: New → In Progress
Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/69761

Eric Harney (eharney)
Changed in cinder:
milestone: none → icehouse-3
tags: added: glusterfs libvirt
Eric Harney (eharney)
Changed in cinder:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/76587

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/76587
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=5c79c08f7bb88484a96427d30244057f7cd7cdfc
Submitter: Jenkins
Branch: master

commit 5c79c08f7bb88484a96427d30244057f7cd7cdfc
Author: Eric Harney <email address hidden>
Date: Wed Feb 26 11:38:29 2014 -0500

    GlusterFS: Increase snapshot delete job timeout to two hours

    Increase the timeout for Nova snapshot delete operations from ten
    minutes to two hours. This helps prevent Cinder from terminating
    operations prematurely that are still being processed by Nova.

    It is not uncommon for snapshot delete jobs to run for longer than
    ten minutes depending on the size of the snapshot and speed of the
    storage backend.

    This will be followed up with a more robust mechanism to keep track
    of snapshot job progress as a later effort.

    Related-Bug: 1273894

    Change-Id: I1ad52568aed1ce1bf593e71704e481b6fe5f44fb

Revision history for this message
Eric Harney (eharney) wrote :

Nothing really in shape here for I-3. Moving to RC1 in hopes of finding some decent solution.

Changed in cinder:
milestone: icehouse-3 → icehouse-rc1
Eric Harney (eharney)
Changed in cinder:
milestone: icehouse-rc1 → next
Mike Perez (thingee)
tags: added: drivers
Revision history for this message
Joe Gordon (jogo) wrote :

 Nova patch is abandoned, this clearly isn't in progress anymore

Changed in nova:
status: In Progress → Confirmed
Revision history for this message
Eric Harney (eharney) wrote :

This appears to still need a fix.

Changed in cinder:
assignee: Eric Harney (eharney) → Deepak C Shetty (dpkshetty)
Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

@Eric Harney,
  I read up the history of this bug, looked at the prev patches.

IIUC, overloading 'progress' field isn't a good idea, so the recent work to introduce new fields in Nova
(See https://review.openstack.org/#/c/172813/) should address the overloading 'progress' field part.

But we still need to have a loop on the Cinder side with _some_ timeout as we wouldn't know
if the Nova/Compute node crashed/stopped responding for some reason, so we don't hang forever on the Cinder side

Do you intend to have some mechanism wherein we wouldn't need a loop on the Cinder side ?
IIUC thats only possible if Nova sends _real_ progress data for the blockjob. Do you have any other solution in mind ?

Changed in cinder:
status: In Progress → New
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Automatically unassigning due to inactivity.

Changed in cinder:
assignee: Deepak C Shetty (dpkshetty) → nobody
Changed in nova:
assignee: Eric Harney (eharney) → nobody
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Do we still have this issue?

tags: added: cinder-nova
Revision history for this message
Eric Harney (eharney) wrote :

Yes, operations will still fail if they take longer than X minutes/hours due to a slow I/O path.

Changed in cinder:
importance: Medium → Low
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
status: Confirmed → Expired
Changed in cinder:
status: New → Incomplete
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

GlusterFS driver is now being removed, so marking as Won't Fix.

https://review.openstack.org/#/c/377028/

Changed in cinder:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.