solidfire: update_cluster_status() need better error handling for connectivity issues

Bug #1398877 reported by Huang Zhiteng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
John Griffith

Bug Description

If there is issue for solidfire driver to talk to the backend, get_volume_stats() would ignore any exception and report whatever old stats it has to scheduler. But the problem is, if there is no old data, i.e. self.cluster_stats is a empty dict, it would cause trouble in scheduler. Although that is a bug in scheduler too, I think it's better SF driver report mandatory stats with all zeroed value instead of an empty dict.

Revision history for this message
John Griffith (john-griffith) wrote :

Seems reasonable, I intentionally don't fail and kill the driver in the case of lost communications, but I would agree that we either shouldn't report the last know stats (zero them out) or even better we should probably add an "available" field to the stats that the scheduler can then check and ignore everything else if False.

Changed in cinder:
status: New → Triaged
importance: Undecided → High
Changed in cinder:
assignee: nobody → John Griffith (john-griffith)
milestone: none → mitaka-1
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Is this still an issue?

Changed in cinder:
milestone: mitaka-1 → newton-rc1
Changed in cinder:
milestone: newton-rc1 → ocata-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/425842

Changed in cinder:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/425842
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=cc325cb3f64e7453db686148132037f50a3628c0
Submitter: Jenkins
Branch: master

commit cc325cb3f64e7453db686148132037f50a3628c0
Author: John Griffith <email address hidden>
Date: Thu Jan 26 19:00:55 2017 +0000

    Zero out SolidFire capacity when unreachable

    If for some reason connectivity to the cluster goes down, we just return the
    cached capacity info. Then the scheduler still sees the cluster as an
    available resource in the pool.

    This change detects connectivity issues during the get_cluster_stats call and
    if the call fails, we report 0 available capacity to keep create calls from
    being scheduled to the cluster.

    Change-Id: I3f730e140c2b61fdd407c90b134916108312278a
    Closes-Bug: #1398877

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 10.0.0.0rc1

This issue was fixed in the openstack/cinder 10.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.