sheepdog:use a single node for cinder-backend

Bug #1560807 reported by zhangsong
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
zhangsong

Bug Description

I use sheepdog cluster as cinder-backend, the sheep cluster has 5 nodes:
10.133.17.61
10.133.17.62
10.133.17.63
10.133.17.64
10.133.17.65

Node 10.133.17.61 was specified as sheepdog store address in cinder config file:
[sdcluster1]
volume_backend_name = sdcluster1
volume_driver = cinder.volume.drivers.sheepdog.SheepdogDriver
sheepdog_store_address = 10.133.17.61

I created 10 qemu instance by devstack, which use sheepdog volume as system disk.

Now node 10.133.17.61 was down for some reasons; This led to cinder can't connect to sheepdog cluster , so it can't work any more. Even more serious is that all the qemu instance crashed.

Sheepdog is a cluster without central node, each node plays the same role in cluster. It use 3 redundancies as default, so the cluster will not down unless more than 3 nodes have crashed. Now only one node down, but it caused Serious problems above.

The reason is that sheepdogdriver of cinder only use a single node of the cluster to run dog command, read/write data, and provide it to qemu as sheepdog location path. In this case, the single node is 10.133.17.61, it not only process all the dog command called by cinder, but also deal with all the data stream from qemu instance. It's a single point and hot point.

Due to all the qemu instances need to read/write data from/to the single node ,it may also be the bottleneck of sheepdog cluster.

So the sheepdog driver of cinder should use a policy to make full use of sheepdog nodes of a cluster, but not a single node. In this way, if one node down, cinder can connect to sheepdog cluster by other nodes, and it only effect the qemu instance which connect to this node, the others instances can work well.

zhangsong (zhangsong)
Changed in cinder:
assignee: nobody → zhangsong (zhangsong)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/296220

Changed in cinder:
status: New → In Progress
Changed in cinder:
importance: Undecided → High
milestone: none → newton-1
tags: added: drivers sheepdog
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/296220
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=bac0d4e550c0a1a9cef6befc9171c9d85cb0ac39
Submitter: Jenkins
Branch: master

commit bac0d4e550c0a1a9cef6befc9171c9d85cb0ac39
Author: zhangsong <email address hidden>
Date: Wed Mar 23 14:39:56 2016 +0800

    Sheepdog:make full use of all sheepdog nodes

    The sheepdog driver of cinder should use a policy to make full
    use of all sheepdog nodes of a cluster instead of single node.
    It can solve single point of failure and hot spot problem.

    This patch uses random method. When it needs to run a dog command
    or provide a location path to qemu, it gets a random node from all
    sheepdog nodes of a cluster.

    Change-Id: I0280b71203d99829796244afbd9a8f308b7e910a
    Closes-Bug: #1560807

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/cinder 9.0.0.0b1

This issue was fixed in the openstack/cinder 9.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.