Cinder

solidfire: update_cluster_status() need better error handling for connectivity issues

Bug #1398877 reported by Huang Zhiteng on 2014-12-03

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Fix Released	High	John Griffith	Cinder ocata-1 "o1"

Bug Description

If there is issue for solidfire driver to talk to the backend, get_volume_stats() would ignore any exception and report whatever old stats it has to scheduler. But the problem is, if there is no old data, i.e. self.cluster_stats is a empty dict, it would cause trouble in scheduler. Although that is a bug in scheduler too, I think it's better SF driver report mandatory stats with all zeroed value instead of an empty dict.

Revision history for this message

John Griffith (john-griffith) wrote on 2015-10-01:

Seems reasonable, I intentionally don't fail and kill the driver in the case of lost communications, but I would agree that we either shouldn't report the last know stats (zero them out) or even better we should probably add an "available" field to the stats that the scheduler can then check and ignore everything else if False.

Changed in cinder:
status:	New → Triaged
importance:	Undecided → High

John Griffith (john-griffith) on 2015-10-02

Changed in cinder:
assignee:	nobody → John Griffith (john-griffith)
milestone:	none → mitaka-1

Revision history for this message

Sean McGinnis (sean-mcginnis) wrote on 2016-09-08:

Is this still an issue?

Changed in cinder:
milestone:	mitaka-1 → newton-rc1

Sean McGinnis (sean-mcginnis) on 2016-09-16

Changed in cinder:
milestone:	newton-rc1 → ocata-1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-26: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/425842

Changed in cinder:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-27: Fix merged to cinder (master)

Reviewed: https://review.openstack.org/425842
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=cc325cb3f64e7453db686148132037f50a3628c0
Submitter: Jenkins
Branch: master

commit cc325cb3f64e7453db686148132037f50a3628c0
Author: John Griffith <email address hidden>
Date: Thu Jan 26 19:00:55 2017 +0000

Zero out SolidFire capacity when unreachable

    If for some reason connectivity to the cluster goes down, we just return the
    cached capacity info. Then the scheduler still sees the cluster as an
    available resource in the pool.

    This change detects connectivity issues during the get_cluster_stats call and
    if the call fails, we report 0 available capacity to keep create calls from
    being scheduled to the cluster.

Change-Id: I3f730e140c2b61fdd407c90b134916108312278a
Closes-Bug: #1398877

Changed in cinder:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-06: Fix included in openstack/cinder 10.0.0.0rc1

This issue was fixed in the openstack/cinder 10.0.0.0rc1 release candidate.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.