cyclic replication in one cluster
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Hi, we have several clusters,
in one cluster I look endless replication in last 2-3 months from 3 servers
root@str-20 /srv/node/
Sep 4 06:26:06 str-20 object-replicator: <f+++++++++ 91c/f94b7a601fc
Sep 4 06:26:06 str-20 object-replicator: Successful rsync of /srv/node/
--
Sep 4 06:52:17 str-20 object-replicator: <f+++++++++ 91c/f94b7a601fc
Sep 4 06:52:17 str-20 object-replicator: Successful rsync of /srv/node/
and more records in older syslog files
ok, go into str-26:
root@str-26 /srv/node/
File: ‘1398473456.
Size: 467330599 Blocks: 912760 IO Block: 4096 regular file
Device: 8a1h/2209d Inode: 23489931 Links: 1
Access: (0600/-rw-------) Uid: ( 1001/ swift) Gid: ( 1001/ swift)
Access: 2014-05-22 08:24:20.637048253 +0000
Modify: 2014-05-22 08:24:41.892736780 +0000
Change: 2014-05-22 08:24:41.892736780 +0000
Birth: -
root@str-26 /srv/node/
....
....
Use your own device location of servers:
such as "export DEVICE=/srv/node"
ssh 10.10.2.26 "ls -lah ${DEVICE:
ssh 10.10.2.20 "ls -lah ${DEVICE:
....
ok, go to second node
root@str-20 /srv/node/
total 68928
drwxr-xr-x 2 swift swift 42 Sep 1 01:44 .
drwxr-xr-x 3 swift swift 45 Sep 1 01:41 ..
-rw------- 1 swift swift 70582272 Sep 1 01:56 .1398473456.
Now it file from dot
Changed in swift: | |
status: | Invalid → New |
I have no good explanation why the file name on str-20 looks like that? Maybe you can get rid of it?
is the md5 on the two files the same?
str-20: /srv/node/ sdp1/objects/ 255277/ 91c/f94b7a601fc e1a35b92ef3c8e9 28b91c/ 33743.data. FG9inr
.1398473456.
and
str-26: /srv/node/ dev24/objects/ 255277/ 91c/f94b7a601fc e1a35b92ef3c8e9 28b91c/ 33743.data
1398473456.
If so I'd probably start with a targeted audit on str-20: /srv/node/ sdp1
swift-object- auditor /etc/swift/ object- server. conf once verbose -d sdp1
I think once mode on the auditor may be running ZBF more than once these days after the parallel auditor change - but if possible, it might be useful to post any interesting log lines here.
If that doesn't clean it up, then I'd probably remove it manually and clear out the hashes.pkl for that partition on str-20:
rm /srv/node/ sdp1/objects/ 255277/ hashes. pkl
And then push from str-26:
swift-object- replicator /etc/swift/ object- server. conf once verbose -d sdp1 -p 255277
FWIW I couldn't duplicate the issue on my dev environment just by renaming the file. The auditor seems to skip the bogus file name, which is a little disappointing. But when replication from a healthy node pushed the dot file would get cleaned up.
What version of swift are you running?