OpenStack Object Storage (swift)

swift 2.1.0 replication issue with rsync

Bug #1536620 reported by Cosmin coroiu on 2016-01-21

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	New	Undecided	Unassigned

Bug Description

We have had a swift cluster running for about 4 months now. We mainly use it to store images that we archive from our Media Database. We have a 7 node cluster and each cluster has a 2TB volume attached to it for the files that we store in swift (24GB RAM , 4 CPU virtual machines).

We initially started with a 3 node cluster, but as space started to shrink there, we added 4 more nodes with the same specs. Our archive jobs were running almost constantly since the day we started the initial cluster, and now 2 of the nodes, have gotten full. Jobs are obviously failing and what we noticed is that the issue is caused by the rsync replication.

On further analysis we noticed that swift replicates it's files using rsync by appending container/partition paths to the command but these paths have gotten so big that they reach the POSIX limit of 4096 characters per command.

We looked into the code abit and found this section in /usr/lib/python2.6/site-packages/swift/obj/replicator.py

for suffix in suffixes:
            spath = join(job['path'], suffix)
            if os.path.exists(spath):
                args.append(spath)

we added a simple loop and itterated over the args and wrote them to a file(see attached).

As can be seen there, the command is very long and exceeds by alot the 4096 limit.

We have currently switched to rsync_method = ssync and set the replication_concurrency = 0 .

Is there maybe a solution to balance the nodes as they are in the current state

The current image is a section from newrelic dashboard where we can see cpu, disk io and memory and used space
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/129305/3188530/AoXHY6hujdxfVou/statusofnodes.png

Tags:

Revision history for this message

Cosmin coroiu (cosmin-coroiu) wrote on 2016-01-21:

rsync command that swift wants to use Edit (99.3 KiB, text/plain)

Revision history for this message

Christian Schwede (cschwede) wrote on 2016-01-21: Re: [Bug 1536620] [NEW] swift 2.1.0 replication issue with rsync

On 21.01.16 13:51, Cosmin coroiu wrote:
> We initially started with a 3 node cluster, but as space started to
> shrink there, we added 4 more nodes with the same specs. Our archive
> jobs were running almost constantly since the day we started the initial
> cluster, and now 2 of the nodes, have gotten full. Jobs are obviously
> failing and what we noticed is that the issue is caused by the rsync
> replication.

I assume the jobs are failing due to missing space, right? And rsync
might be failing because some of the destination nodes are full?

I assume you already rebalanced the object ring, and need to move data
more quickly from the full nodes to the remaining ones?

> On further analysis we noticed that swift replicates it's files using
> rsync by appending container/partition paths to the command but these
> paths have gotten so big that they reach the POSIX limit of 4096
> characters per command.

What OS are you running? I quickly checked on rhel7 and ubuntu 14.04,
and the limit is much higher on both systems:

$ getconf ARG_MAX
2097152

Also, I created a long arg using the following command to see if it fails:

for i in `seq 1 5000`; do cmd+=" hello world" ; done ; echo $cmd > out

That worked on both systems. What error do you get? Or is this a
separated issue?

> Is there maybe a solution to balance the nodes as they are in the
> current state

You might want to have a look at the handoffs_first setting:

https://github.com/openstack/swift/blob/master/etc/object-server.conf-sample#L216-L224

Revision history for this message

Cosmin coroiu (cosmin-coroiu) wrote on 2016-01-21:

Download full text (3.7 KiB)

Thank you for your reply and recommendations

We are running swift on CentOS 6.5

getconf ARG_MAX returns
2621440

The small bash snippet you posted seems to be working on our machines too.

however the way we checked the limit is by xargs --show-limits which outputs:
POSIX smallest allowable upper limit on argument length (all systems): 4096

and as such we concluded the limit is 4096, also the rsync command itself stops suddenly at 4096 characters when checking the command with something like ps aux | grep rsync

The log looks something like this when failing with rsync:
account-replicator: Beginning replication run
Jan 21 04:12:55 srv-hostname account-replicator: Replication run OVER
Jan 21 04:12:55 srv-hostname account-replicator: Attempted to replicate 0 dbs in 0.00171 seconds (0.00000/s)
Jan 21 04:12:55 srv-hostname account-replicator: Removed 0 dbs
Jan 21 04:12:55 srv-hostname account-replicator: 0 successes, 0 failures
Jan 21 04:12:55 srv-hostname account-replicator: no_change:0 ts_repl:0 diff:0 rsync:0 diff_capped:0 hashmatch:0 empty:0
Jan 21 04:12:59 srv-hostname object-replicator: Killing long-running rsync: ['rsync', '--recursive', '--whole-file', '--human-readable', '--xattrs', '--itemize-changes', '--ignore-existing', '--timeout=30', '--contimeout=30', '--bwlimit=0'
, '/srv/osd/objects/2736/1c1', '/srv/osd/objects/2736/588', '/srv/osd/objects/2736/fa0', '/srv/osd/objects/2736/1c2', '/srv/osd/objects/2736/f78', '/srv/osd/objects/2736/574', '/srv/osd/objects/2736/46c', '/srv/osd/objects/2736/3db', '/sr
v/osd/objects/2736/604', '/srv/osd/objects/2736/ff0', '/srv/osd/objects/2736/542', '/srv/osd/objects/2736/a6e', '/srv/osd/objects/2736/36e', '/srv/osd/objects/2736/e50', '/srv/osd/objects/2736/771', '/srv/osd/objects/2736/8dd', '/srv/osd/
objects/2736/cba', '/srv/osd/objects/2736/d06', '/srv/osd/objects/2736/512', '/srv/osd/objects/2736/fd2', '/srv/osd/objects/2736/bb7', '/srv/osd/objects/2736/670', '/srv/osd/objects/2736/923', '/srv/osd/objects/2736/352', '/srv/osd/object
s/2736/ab7', '/srv/osd/objects/2736/2ca', '/srv/osd/objects/2736/809', '/srv/osd/objects/2736/1cc', '/srv/osd/objects/2736/d42', '/srv/osd/objects/2736/5b6', '/srv/osd/objects/2736/9b2', '/srv/osd/objects/2736/023', '/srv/osd/objects/2736
/cdb', '/srv/osd/objects/2736/01e', '/srv/osd/objects/2736/694', '/srv/osd/objects/2736/1b9', '/srv/osd/objects/2736/a15', '/srv/osd/objects/2736/bd4', '/srv/osd/objects/2736/af0', '/srv/osd/objects/2736/efe', '/srv/osd/objects/2736/c72',
'/srv/osd/objects/2736/d3d', '/srv/osd/objects/2736/884', '/srv/osd/objects/2736/978', '/srv/osd/objects/2736/f2f', '/srv/osd/objects/2736/8ee', '/srv/osd/objects/2736/72f', '/srv/osd/objects/2736/f62', '/srv/osd/objects/2736/2af', '/srv
/osd/objects/2736/9a1', '/srv/osd/objects/2736/60c', '/srv/osd/objects/2736/273', '/srv/osd/objects/2736/cbd', '/srv/osd/objects/2736/8e3', '/srv/osd/objects/2736/69c', '/srv/osd/objects/2736/e5c', '/srv/osd/objects/2736/8c6', '/srv/osd/o
bjects/2736/bb6', '/srv/osd/objects/2736/e8f', '/srv/osd/objects/2736/37b', '/srv/osd/objects/2736/2c5', '/srv/osd/objects/2736/0c5', '/srv/osd/objects/2736/bd2', '/sr
Jan 21 04:13:01 srv-hostname container-replicator: Beginni...

Thank you for your reply and recommendations

We are running swift on CentOS 6.5

getconf ARG_MAX returns
2621440

The small bash snippet you posted seems to be working on our machines too.

however the way we checked the limit is by xargs --show-limits which outputs:
POSIX smallest allowable upper limit on argument length (all systems): 4096

and as such we concluded the limit is 4096, also the rsync command itself stops suddenly at 4096 characters when checking the command with something like ps aux | grep rsync

The log looks something like this when failing with rsync:
account-replicator: Beginning replication run
Jan 21 04:12:55 srv-hostname account-replicator: Replication run OVER
Jan 21 04:12:55 srv-hostname account-replicator: Attempted to replicate 0 dbs in 0.00171 seconds (0.00000/s)
Jan 21 04:12:55 srv-hostname account-replicator: Removed 0 dbs
Jan 21 04:12:55 srv-hostname account-replicator: 0 successes, 0 failures
Jan 21 04:12:55 srv-hostname account-replicator: no_change:0 ts_repl:0 diff:0 rsync:0 diff_capped:0 hashmatch:0 empty:0
Jan 21 04:12:59 srv-hostname object-replicator: Killing long-running rsync: ['rsync', '--recursive', '--whole-file', '--human-readable', '--xattrs', '--itemize-changes', '--ignore-existing', '--timeout=30', '--contimeout=30', '--bwlimit=0'
, '/srv/osd/objects/2736/1c1', '/srv/osd/objects/2736/588', '/srv/osd/objects/2736/fa0', '/srv/osd/objects/2736/1c2', '/srv/osd/objects/2736/f78', '/srv/osd/objects/2736/574', '/srv/osd/objects/2736/46c', '/srv/osd/objects/2736/3db', '/sr
v/osd/objects/2736/604', '/srv/osd/objects/2736/ff0', '/srv/osd/objects/2736/542', '/srv/osd/objects/2736/a6e', '/srv/osd/objects/2736/36e', '/srv/osd/objects/2736/e50', '/srv/osd/objects/2736/771', '/srv/osd/objects/2736/8dd', '/srv/osd/
objects/2736/cba', '/srv/osd/objects/2736/d06', '/srv/osd/objects/2736/512', '/srv/osd/objects/2736/fd2', '/srv/osd/objects/2736/bb7', '/srv/osd/objects/2736/670', '/srv/osd/objects/2736/923', '/srv/osd/objects/2736/352', '/srv/osd/object
s/2736/ab7', '/srv/osd/objects/2736/2ca', '/srv/osd/objects/2736/809', '/srv/osd/objects/2736/1cc', '/srv/osd/objects/2736/d42', '/srv/osd/objects/2736/5b6', '/srv/osd/objects/2736/9b2', '/srv/osd/objects/2736/023', '/srv/osd/objects/2736
/cdb', '/srv/osd/objects/2736/01e', '/srv/osd/objects/2736/694', '/srv/osd/objects/2736/1b9', '/srv/osd/objects/2736/a15', '/srv/osd/objects/2736/bd4', '/srv/osd/objects/2736/af0', '/srv/osd/objects/2736/efe', '/srv/osd/objects/2736/c72',
 '/srv/osd/objects/2736/d3d', '/srv/osd/objects/2736/884', '/srv/osd/objects/2736/978', '/srv/osd/objects/2736/f2f', '/srv/osd/objects/2736/8ee', '/srv/osd/objects/2736/72f', '/srv/osd/objects/2736/f62', '/srv/osd/objects/2736/2af', '/srv
/osd/objects/2736/9a1', '/srv/osd/objects/2736/60c', '/srv/osd/objects/2736/273', '/srv/osd/objects/2736/cbd', '/srv/osd/objects/2736/8e3', '/srv/osd/objects/2736/69c', '/srv/osd/objects/2736/e5c', '/srv/osd/objects/2736/8c6', '/srv/osd/o
bjects/2736/bb6', '/srv/osd/objects/2736/e8f', '/srv/osd/objects/2736/37b', '/srv/osd/objects/2736/2c5', '/srv/osd/objects/2736/0c5', '/srv/osd/objects/2736/bd2', '/sr
Jan 21 04:13:01 srv-hostname container-replicator: Beginning replication run
Jan 21 04:13:02 srv-hostname container-replicator: Replication run OVER
Jan 21 04:13:02 srv-hostname container-replicator: Attempted to replicate 392 dbs in 0.92884 seconds (422.03307/s)
Jan 21 04:13:02 srv-hostname container-replicator: Removed 0 dbs
Jan 21 04:13:02 srv-hostname container-replicator: 0 successes, 0 failures

We tried increasing the rsync_timeout to 4 hours, but that doesn't help, as the destination path is missing because of the characters limit, we think.

The handoffs_first looks interesting and we are looking into it

Revision history for this message

Christian Schwede (cschwede) wrote on 2016-01-21: Re: [Bug 1536620] Re: swift 2.1.0 replication issue with rsync

On 21.01.16 14:42, Cosmin coroiu wrote:
> however the way we checked the limit is by xargs --show-limits which outputs:
> POSIX smallest allowable upper limit on argument length (all systems): 4096

That's the default for xargs if nothing else is set. However, there
should be a line above with a much higher value, like this:

$ xargs --show-limits
POSIX upper limit on argument length (this system): 2092804
POSIX smallest allowable upper limit on argument length (all systems): 4096

Also, this is for xargs only, not the system itself.

> and as such we concluded the limit is 4096, also the rsync command
> itself stops suddenly at 4096 characters when checking the command with
> something like ps aux | grep rsync

That might be just the output line that is cutoff; in your reported case
it seems to be even less than 2000 chars.

You can increase the output with something like "ps aux --cols=10000".

> The log looks something like this when failing with rsync:
> Jan 21 04:12:59 srv-hostname object-replicator: Killing long-running rsync:
>
>[... snip - long line ... ]
>
> We tried increasing the rsync_timeout to 4 hours, but that doesn't help,
> as the destination path is missing because of the characters limit, we
> think.

I don't think the character limit is the problem, but there might be
another issue if it takes so long to replicate one partition.

What is your partition power for the object ring? You are only using one
2TB volume per node, so only 7 disks for the whole cluster?

Revision history for this message

Cosmin coroiu (cosmin-coroiu) wrote on 2016-01-21:

Yes, there are 7 disks per whole cluster.

We are using a 12 partition power. when the create command was first ran for the object ring it looked like this:
swift-ring-builder object.builder create 12 1 1