Comment 4 for bug 1400497

Revision history for this message
Christian Schwede (cschwede) wrote : Re: [Bug 1400497] Re: swift-ring-builder not distributing partitions evenly between zones

Hello Tim,

On 09.12.14 23:43, Tim Leak wrote:
> Thanks for the response, Christian. I understand now that the
> device weight is being considered when the partitions are assigned,
> but what I am seeing is that this behavior can cause both copies of
> ingested content to be stored on devices in the same zone. Again,
> this seems to break a fundamental concept of swift:
>
> <from the Swift Architectural Overview, under The Ring> "Data can be
> isolated with the concept of zones in the ring. Each replica of a
> partition is guaranteed to reside in a different zone. A zone could
> represent a drive, a server, a cabinet, a switch, or even a
> datacenter."

Yes, you're right - unfortunately this section in the documents was not
updated with the patch. I submitted a patch for this:

https://review.openstack.org/#/c/140478/

> I would expect that the weight of the devices would need be
> considered when assigning partitions within a zone, say to allow a
> 3TB disk to be allocated more partitions than a 1TB disk. I would
> not expect that the cumulative weights of the zones would need to be
> the same in order to guarantee that copies of ingested data are
> isolated.

So, let's take the following example: you have one zone with 10 x 4 TB
disks, and assigned a weight of 10 x 4000. Another zone has 10 x 2 TB
disks, and a weight of 10 x 2000. Now the second zone will run out of
space very quickly (with Swift < 2.1), so you have to assign more space
to this zone - and by doing this you also assign more weight.

> If this is by design, and I do need to guarantee that different
> copies of a piece of ingested content are stored on devices in
> different zones, then what specific characteristics do I need to
> configure for the devices/zones in order to make that happen?

This is by design. The problem is when you have less regions/zones/nodes
than replicas and add a new region/zone/replica. Before Swift 2.1 one
replica of each partition was moved to the new region/zone - and this is
a huge impact for larger deployments or deployments with a lot of
requests. By including the weight into the calculation it is now
possible to control the data that will be moved to other regions/zones.

The documentation recommends 5 zones, and in that case (or rings with
even more zones) you only have to ensure that none of the zones has a
weight of more than 1/replicas * total weight assigned. This should be
the case if all of your zones are similar sized.

The submitted patch will now raise a warning after rebalancing if there
is any tier at risk.

> Is there any difference when I introduce different regions for
> further isolation?

No, it's the same behavior.

Let me know if you have more questions on this. I can also have a look
at a specific ring if you can share that data.

Best,

Christian