Periodic update replication status causing issues

Bug #1383524 reported by John Griffith
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
Steven Kaufer

Bug Description

the periodic task to update replication status is rather "heavy" for a periodic, first fetching all volumes from the DB and secondly doing a get_rep status to each driver and for each of said volumes and updating the model on every one of them.

I've been noticing random hangs and blocks on this call while working on other things. Wondering if this might have something to do with other issue we're seeing around timed out operations on volumes.

Tags: replication
Revision history for this message
Jay Bryant (jsbryant) wrote :

I have seen the same issues while working on https://bugs.launchpad.net/cinder/+bug/1373513 .

Changed in cinder:
status: Triaged → Confirmed
RonenKat (ronenkat)
Changed in cinder:
assignee: nobody → RonenKat (ronenkat)
Revision history for this message
Avishay Traeger (avishay-il) wrote :

The status should be checked periodically, as the state of the replication is important. However, it should be optimized:
1. If the driver has no replication enabled, don't run the task.
2. A db query that fetches only replicated volumes for this host would help.
3. Adjusting how often this periodic task runs
4. Maybe a batch operation to the driver would also help - i.e., give me the replication statuses for these 100 volumes instead of one at a time. Some drivers might be able to save some communication with the storage that way.

My initial thought was to cache the IDs of the replicated volumes in an instance variable and update it when a new replicated volume is added, but that would probably interfere with running multiple instances of the driver for HA.

Revision history for this message
Jay Bryant (jsbryant) wrote :

Avishay,

Is this something you can look into or do you need me to find someone else to investigate?

Jay Bryant (jsbryant)
tags: added: replication
Revision history for this message
Avishay Traeger (avishay-il) wrote :

Jay, please assign to someone, I can help guide them if necessary.

Revision history for this message
Jay Bryant (jsbryant) wrote :

Avishay, Thanks for the response. Will do.

Revision history for this message
Jay Bryant (jsbryant) wrote :

Steven,

This is the issue we talked about. Thanks for taking a look. Let me know what you think.

Changed in cinder:
assignee: RonenKat (ronenkat) → Steven Kaufer (kaufer)
Revision history for this message
Steven Kaufer (kaufer) wrote :

The plan to address this bug will come in 2 phases:

Phase 1: Implement the optimization #1 as suggested by Avishay above -- If the driver has no replication enabled, don't run the task.

Phase 2: Implement #2 in the same list above. This implementation will require a change to the 'volume_get_all_by_host' DB API so that it supports filters. Since this will likely require more time/effort to get this support, it will be pushed in a different commit.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/154673

Changed in cinder:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/154673
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=fa6ab2483de36c447d199643eb1e5f7de6390165
Submitter: Jenkins
Branch: master

commit fa6ab2483de36c447d199643eb1e5f7de6390165
Author: Steven Kaufer <email address hidden>
Date: Tue Feb 10 21:12:16 2015 +0000

    Replication status periodic task optimization

    A periodic task exists to update the replication status for all volumes.
    Currently, this task executes for all drivers and always retrieves all
    volumes for the current host from the DB.

    This patch set:
    * Ensures that the periodic task is only activated if the driver actually
      supports replication
    * Only retrieves volumes from the DB if 'replication_status'!='disabled' in
      the periodic task

    Also, the driver documentation in cinder.volume.driver.VolumeDriver.
    get_volume_stats() is updated to reflect that the 'replication' key indicates
    that the driver supports replication; this is the key that was actually
    implemented in the drivers that support replication.

    Change-Id: I61fbc31567ad0b6908a00113adeaccf415343e8e
    Closes-Bug: 1383524

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.