Cinder

sheepdog:use a single node for cinder-backend

Bug #1560807 reported by zhangsong on 2016-03-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Fix Released	High	zhangsong	Cinder newton-1 "n1"

Bug Description

I use sheepdog cluster as cinder-backend, the sheep cluster has 5 nodes:
10.133.17.61
10.133.17.62
10.133.17.63
10.133.17.64
10.133.17.65

Node 10.133.17.61 was specified as sheepdog store address in cinder config file:
[sdcluster1]
volume_backend_name = sdcluster1
volume_driver = cinder.volume.drivers.sheepdog.SheepdogDriver
sheepdog_store_address = 10.133.17.61

I created 10 qemu instance by devstack, which use sheepdog volume as system disk.

Now node 10.133.17.61 was down for some reasons; This led to cinder can't connect to sheepdog cluster , so it can't work any more. Even more serious is that all the qemu instance crashed.

Sheepdog is a cluster without central node, each node plays the same role in cluster. It use 3 redundancies as default, so the cluster will not down unless more than 3 nodes have crashed. Now only one node down, but it caused Serious problems above.

The reason is that sheepdogdriver of cinder only use a single node of the cluster to run dog command, read/write data, and provide it to qemu as sheepdog location path. In this case, the single node is 10.133.17.61, it not only process all the dog command called by cinder, but also deal with all the data stream from qemu instance. It's a single point and hot point.

Due to all the qemu instances need to read/write data from/to the single node ,it may also be the bottleneck of sheepdog cluster.

So the sheepdog driver of cinder should use a policy to make full use of sheepdog nodes of a cluster, but not a single node. In this way, if one node down, cinder can connect to sheepdog cluster by other nodes, and it only effect the qemu instance which connect to this node, the others instances can work well.

Tags:

zhangsong (zhangsong) on 2016-03-23

Changed in cinder:
assignee:	nobody → zhangsong (zhangsong)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-23: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/296220

Changed in cinder:
status:	New → In Progress

Sean McGinnis (sean-mcginnis) on 2016-03-24

Changed in cinder:
importance:	Undecided → High
milestone:	none → newton-1
tags:	added: drivers sheepdog

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-24: Fix merged to cinder (master)

Reviewed: https://review.openstack.org/296220
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=bac0d4e550c0a1a9cef6befc9171c9d85cb0ac39
Submitter: Jenkins
Branch: master

commit bac0d4e550c0a1a9cef6befc9171c9d85cb0ac39
Author: zhangsong <email address hidden>
Date: Wed Mar 23 14:39:56 2016 +0800

Sheepdog:make full use of all sheepdog nodes

    The sheepdog driver of cinder should use a policy to make full
    use of all sheepdog nodes of a cluster instead of single node.
    It can solve single point of failure and hot spot problem.

    This patch uses random method. When it needs to run a dog command
    or provide a location path to qemu, it gets a random node from all
    sheepdog nodes of a cluster.

Change-Id: I0280b71203d99829796244afbd9a8f308b7e910a
Closes-Bug: #1560807