Slow replication with lack of visibility can cause availability problems (esp. in multi-region)

Bug #1700585 reported by Greg Smethells
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Confirmed
Undecided
Unassigned

Bug Description

We are noticing that our Swift system, running version 2.4.0, has orphaned objects. For example:

```
[me1@joints-io-1 swift-investigation-2017-06-23]$ curl -X GET 'http://swift-r1p1:8080/v1/AUTH_75673124ca7f42968e28bc264ed32331/1?format=json&path=1.2.840.114204.2.2.4.1.243395414945023.14589405468080000' -H "X-Auth-Token: 47901120da53431e9dd7808d780048ee"
[{"hash": "dcf6d30625586dc8c577231cecb410c6", "last_modified": "2016-04-24T23:31:11.593740", "bytes": 81282, "name": "1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm", "content_type": "application/dicom"}, {"hash": "2a9274a1f8b87af57744406a91c6984f", "last_modified": "2016-04-24T23:31:11.763470", "bytes": 6676, "name": "1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.199754063548486.14589405856570000.dcm", "content_type": "application/dicom"}][me1@joints-io-1 swift-investigation-2017-06-23]$
[me1@joints-io-1 swift-investigation-2017-06-23]$ curl --head 'http://swift-r1p1:8080/v1/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm' -H "X-Auth-Token: 47901120da53431e9dd7808d780048ee"
HTTP/1.1 200 OK
Content-Length: 81282
Accept-Ranges: bytes
Last-Modified: Sun, 24 Apr 2016 23:31:12 GMT
Etag: dcf6d30625586dc8c577231cecb410c6
X-Timestamp: 1461540671.59374
Content-Type: application/dicom
X-Trans-Id: txa7722a5cec044d9396770-00595122ba
Date: Mon, 26 Jun 2017 15:05:30 GMT

[me1@joints-io-1 swift-investigation-2017-06-23]$ curl --head 'http://swift-r1p1:8080/v1/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.199754063548486.14589405856570000.dcm' -H "X-Auth-Token: 47901120da53431e9dd7808d780048ee"
HTTP/1.1 404 Not Found
Content-Length: 0
Content-Type: text/html; charset=UTF-8
X-Trans-Id: tx5b2de3447b6a45e8a5a5e-00595122c4
Date: Mon, 26 Jun 2017 15:05:41 GMT

[me1@joints-io-1 swift-investigation-2017-06-23]$
```

Please advise.

Revision history for this message
Greg Smethells (gsmethells) wrote :

This appears to only affect containers that have existed for some time. Recently created containers are not affected.

Revision history for this message
Greg Smethells (gsmethells) wrote :

The system has 3 nodes with 28 disks each. It has been running since Sept 2015.

Revision history for this message
Greg Smethells (gsmethells) wrote :

The orphaned state affects only some of the files in a pseudo-folder. Our smallest container is "1" and a full audit of it shows:

Loaded 41 pseudo dirs
Searching for missing objects within the 41 pseudo dirs in container 1
...........
1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/ - missing 1 out of 2 objects
..........
1.2.840.113564.99.1.345051786560.22940.2015729856590.21508.2/ - missing 1 out of 49 objects
.
1.2.392.200036.9125.2.691113118120.64569371690.1660414/ - missing 1 out of 2 objects
..............
2.16.840.113849.2.1.00000000/ - missing 1 out of 1 objects
.....
Found 4 objects missing from 4 different pseudo dirs in stream 1

Revision history for this message
Greg Smethells (gsmethells) wrote :

Larger containers such as container 7 shows a similar pattern:

Searching for missing objects within the 225361 pseudo dirs in stream 7
.
1.2.124.113532.80.22177.543.20150128.110134.239093179/ - missing 3 out of 541 objects
...
1.2.826.0.1.3680043.9.1692.4.0.1703.1471534627.23231/ - missing 1 out of 25 objects
.....
2.16.124.113531.4.5.130114807185930000.1190778084/ - missing 2 out of 170 objects
................
1.2.840.113837.3740465092.1451569073.0/ - missing 1 out of 251 objects
.....
1.2.124.113532.80.22177.543.20151217.121244.99810360/ - missing 4 out of 242 objects
...
1.2.840.113619.2.222.2025.1773246.844.1367845472.509/ - missing 2 out of 78 objects
...
1.2.392.200036.9125.2.2616017352170.64610568879.230893/ - missing 1 out of 8 objects
..
1.2.124.113532.80.22177.543.20161005.121742.135725280/ - missing 1 out of 226 objects
...........
1.2.840.113711.41813.2.41476.437484696.26.2116281012.1870/ - missing 1 out of 117 objects
..
1.2.840.113619.2.222.2025.1773246.22374.1259938340.619/ - missing 1 out of 182 objects
............
1.2.392.200036.9125.2.26160195117127.64630958155.2840160/ - missing 1 out of 1 objects
.
1.2.392.200036.9123.100.12.11.21007.201405073100320/ - missing 1 out of 165 objects
.
2.16.124.113531.4.5.129212647947810000.2356597374/ - missing 3 out of 177 objects
.........
2.16.124.113531.4.5.130032534539060000.383710149/ - missing 4 out of 192 objects
....
1.2.840.113619.2.222.2025.1773246.32721.1316007717.779/ - missing 1 out of 158 objects
...................
1.2.840.113619.2.222.2025.1773246.24047.1379334966.827/ - missing 3 out of 131 objects
....
1.2.840.113619.2.222.2025.1773246.24394.1369227386.914/ - missing 1 out of 166 objects
.
1.2.124.113532.80.22177.543.20140421.123635.196391627/ - missing 6 out of 381 objects
........
1.2.124.113532.80.22177.543.20150526.104809.334781399/ - missing 6 out of 407 objects
.........
1.2.392.200036.9116.2.5.1.37.2421400870.1392161787.88058/ - missing 6 out of 460 objects
..
1.2.840.113619.2.222.2025.1773246.17446.1328884505.698/ - missing 1 out of 174 objects
............
1.2.840.113619.2.222.2025.1773246.1800.1423055838.703/ - missing 2 out of 152 objects
.
1.2.392.200036.9125.2.261601729513.64595879367.3124136/ - missing 1 out of 1 objects
.
1.2.124.113532.80.22177.543.20160129.114305.208579858/ - missing 1 out of 187 objects
.......
1.2.124.113532.80.22177.543.20140411.225.174694817/ - missing 5 out of 432 objects

et cetera ...

Revision history for this message
Greg Smethells (gsmethells) wrote :

There appears to be no rhyme or reason as to the missing data. Our data is read-only once stored and we have eliminated local client applications from the root cause leaving only Swift.

Revision history for this message
Christian Schwede (cschwede) wrote :

Did you do any rebalance recently? Are all *.rings.gz identical on all nodes? Same for
swift_hash_path_prefix and swift_hash_path_suffix in /etc/swift.conf - this is unchanged and identical on all nodes?

Can you check some of the missing files if they are still on disk? For example:

swift-get-nodes /etc/swift/object.ring.gz AUTH_75673124ca7f42968e28bc264ed32331 1 1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm

That should return some lines similar to this which you can use to check the existence of the *.data files:

ssh 192.168.24.1 "ls -lah ${DEVICE:-/srv/node*}/1/objects/535/4ba/85f17dbde7c0bb8545be8061e4b714ba"

Revision history for this message
Greg Smethells (gsmethells) wrote :
Download full text (5.4 KiB)

We added 7 drives to each of the 3 nodes early last week.

The *.rings.gz files were re-created and copied to all 3 nodes. They are identical on all 3 nodes. We did a rebalance at that time.

The swift_hash_path_prefix and swift_hash_path_suffix in /etc/swift.conf is unchanged and identical on all nodes.

Here is the swift-recon output.

[root@swift-r1p1 ~]# swift-recon --all
===============================================================================
--> Starting reconnaissance on 3 hosts
===============================================================================
[2017-06-26 11:41:04] Checking async pendings
[async_pending] - No hosts returned valid data.
===============================================================================
[2017-06-26 11:41:04] Checking on replication
[replication_time] low: 34, high: 36, avg: 35.2, total: 105, Failed: 0.0%, no_result: 0, reported: 3
Oldest completion was 2017-06-21 23:15:48 (4 days ago) by 192.168.10.236:6000.
Most recent completion was 2017-06-21 23:16:23 (4 days ago) by 10.11.12.76:6000.
===============================================================================
[2017-06-26 11:41:04] Checking auditor stats
[ALL_audit_time_last_path] low: 400766, high: 404103, avg: 402302.9, total: 1206908, Failed: 0.0%, no_result: 0, reported: 3
[ALL_quarantined_last_path] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 3
[ALL_errors_last_path] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 3
[ALL_passes_last_path] low: 47289, high: 48313, avg: 47631.3, total: 142894, Failed: 0.0%, no_result: 0, reported: 3
[ALL_bytes_processed_last_path] low: 34480426331, high: 35853409281, avg: 35055774061.7, total: 105167322185, Failed: 0.0%, no_result: 0, reported: 3
[ZBF_audit_time_last_path] low: 391647, high: 396734, avg: 394729.9, total: 1184189, Failed: 0.0%, no_result: 0, reported: 3
[ZBF_quarantined_last_path] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 3
[ZBF_errors_last_path] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 3
[ZBF_bytes_processed_last_path] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 3
===============================================================================
[2017-06-26 11:41:04] Checking updater times
[updater_last_sweep] low: 9, high: 32, avg: 17.9, total: 53, Failed: 0.0%, no_result: 0, reported: 3
===============================================================================
[2017-06-26 11:41:04] Checking on expirers
[object_expiration_pass] - No hosts returned valid data.
[expired_last_pass] - No hosts returned valid data.
===============================================================================
[2017-06-26 11:41:04] Getting unmounted drives from 3 hosts...
===============================================================================
[2017-06-26 11:41:04] Checking load averages
[5m_load_avg] low: 6, high: 7, avg: 6.8, total: 20, Failed: 0.0%, no_result: 0, reported: 3
[15m_load_avg] low: 6, high: 7, avg: 7.1, total: 21, Failed: 0.0%, no_result: 0, reported: 3
[1m_load_avg] low: 5, high: 6, avg: 5.8, total: 17, Failed: 0.0%, n...

Read more...

Revision history for this message
Greg Smethells (gsmethells) wrote :
Download full text (6.0 KiB)

Here is the swift-get-nodes for both files in container 1.

[root@swift-r1z1n1 ~]# swift-get-nodes /etc/swift/object.ring.gz AUTH_75673124ca7f42968e28bc264ed32331 1 1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm

Account AUTH_75673124ca7f42968e28bc264ed32331
Container 1
Object 1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm

Partition 506
Hash 7eb259c3a61b11add960a8470849b490

Server:Port Device 10.11.12.78:6000 r1z3n1-d23
Server:Port Device 10.11.12.76:6000 r1z1n1-d7
Server:Port Device 192.168.10.236:6000 r1z2n1-d25
Server:Port Device 10.11.12.76:6000 r1z1n1-d28 [Handoff]
Server:Port Device 10.11.12.78:6000 r1z3n1-d27 [Handoff]
Server:Port Device 192.168.10.236:6000 r1z2n1-d3 [Handoff]

curl -I -XHEAD "http://10.11.12.78:6000/r1z3n1-d23/506/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm"
curl -I -XHEAD "http://10.11.12.76:6000/r1z1n1-d7/506/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm"
curl -I -XHEAD "http://192.168.10.236:6000/r1z2n1-d25/506/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm"
curl -I -XHEAD "http://10.11.12.76:6000/r1z1n1-d28/506/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm" # [Handoff]
curl -I -XHEAD "http://10.11.12.78:6000/r1z3n1-d27/506/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm" # [Handoff]
curl -I -XHEAD "http://192.168.10.236:6000/r1z2n1-d3/506/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm" # [Handoff]

Use your own device location of servers:
such as "export DEVICE=/srv/node"
ssh 10.11.12.78 "ls -lah ${DEVICE:-/srv/node*}/r1z3n1-d23/objects/506/490/7eb259c3a61b11add960a8470849b490"
ssh 10.11.12.76 "ls -lah ${DEVICE:-/srv/node*}/r1z1n1-d7/objects/506/490/7eb259c3a61b11add960a8470849b490"
ssh 192.168.10.236 "ls -lah ${DEVICE:-/srv/node*}/r1z2n1-d25/objects/506/490/7eb259c3a61b11add960a8470849b490"
ssh 10.11.12.76 "ls -lah ${DEVICE:-/srv/node*}/r1z1n1-d28/objects/506/490/7eb259c3a61b11add960a8470849b490" # [Handoff]
ssh 10.11.12.78 "ls -lah ${DEVICE:-/srv/node*}/r1z3n1-d27/objects/506/490/7eb259c3a61b11add960a8470849b490" # [Handoff]
ssh 192.168.10.236 "ls -lah ${DEVICE:-/srv/node*}/r1z2n1-d3/objects/506/490/7eb259c3a61b11add960a8470849b490" # [Handoff]

note: `/srv/node*` is used as default value of `devices`, the real value is set in the config file on each storage node.
[root@swift-r1z1n1 ~]# swift-get-nodes /etc/swift/object.ring.gz AUTH_75673124ca7f42968e28bc264ed32331 1 1.2.840.114204.2.2.4.1.24339541...

Read more...

Revision history for this message
Greg Smethells (gsmethells) wrote :

Here is the curl output from those two. The first is there, so we only do the one. The last three curl cmd lines are all the places it ought to be, but is not:

[root@swift-r1z1n1 ~]# curl -I -XHEAD "http://10.11.12.78:6000/r1z3n1-d23/506/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm"
HTTP/1.1 200 OK
Content-Length: 81282
X-Backend-Timestamp: 1461540671.59374
Last-Modified: Sun, 24 Apr 2016 23:31:12 GMT
Etag: "dcf6d30625586dc8c577231cecb410c6"
X-Timestamp: 1461540671.59374
Content-Type: application/dicom
Date: Mon, 26 Jun 2017 16:45:33 GMT

[root@swift-r1z1n1 ~]# curl -I -XHEAD "http://10.11.12.76:6000/r1z1n1-d29/461/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.199754063548486.14589405856570000.dcm"
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Mon, 26 Jun 2017 16:45:46 GMT

[root@swift-r1z1n1 ~]# curl -I -XHEAD "http://10.11.12.76:6000/r1z1n1-d7/506/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.193684909484984.14589405855150000.dcm"
HTTP/1.1 200 OK
Content-Length: 81282
X-Backend-Timestamp: 1461540671.59374
Last-Modified: Sun, 24 Apr 2016 23:31:12 GMT
Etag: "dcf6d30625586dc8c577231cecb410c6"
X-Timestamp: 1461540671.59374
Content-Type: application/dicom
Date: Mon, 26 Jun 2017 16:46:01 GMT

[root@swift-r1z1n1 ~]# curl -I -XHEAD "http://192.168.10.236:6000/r1z2n1-d25/461/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.199754063548486.14589405856570000.dcm"
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Mon, 26 Jun 2017 16:46:14 GMT

[root@swift-r1z1n1 ~]#

Revision history for this message
Greg Smethells (gsmethells) wrote :

Rather, the first curl is for the OK 200 file that is fine.

The last three are for the file that is not there and it is not there for all three replicas.

Revision history for this message
Greg Smethells (gsmethells) wrote :

Let me know if you need anything else.

Revision history for this message
Greg Smethells (gsmethells) wrote :

[root@swift-r1z1n1 ~]# ll /srv/node/r1z1n1-d7/objects/506/490/7eb259c3a61b11add960a8470849b490
total 84
-rw------- 1 swift swift 81282 Dec 8 2016 1461540671.59374.data

Revision history for this message
Greg Smethells (gsmethells) wrote :

The one that is not there, isn't there, though:

[root@swift-r1z1n1 ~]# ll /srv/node/r1z1n1-d29/objects/461/bb4/737feb8aac1d14a84646f82d670ccbb4
ls: cannot access /srv/node/r1z1n1-d29/objects/461/bb4/737feb8aac1d14a84646f82d670ccbb4: No such file or directory
[root@swift-r1z1n1 ~]# ll /srv/node/r1z1n1-d29/objects/461/bb4
total 0
drwxr-xr-x 2 swift swift 42 Jun 26 04:10 7341b6a9f04770647d3b02985aef1bb4
drwxr-xr-x 2 swift swift 42 Jun 24 15:41 7354e6abd5a157574ea62ec7de72abb4
drwxr-xr-x 2 swift swift 42 Jun 20 18:02 735b6a311b0e3e677828c6be87588bb4
drwxr-xr-x 2 swift swift 42 Jun 20 21:30 737986920b62ae74e4f1423c0e8fabb4
[root@swift-r1z1n1 ~]#

Revision history for this message
Greg Smethells (gsmethells) wrote :

I think swift-recon covers this, but here is another look:

[root@swift-r1z1n1 ~]# md5sum /etc/swift/*.ring.gz
7623d8ca43cf2254baa762ac30c18836 /etc/swift/account.ring.gz
d1d15966e1059543846bf1583e63ab02 /etc/swift/container.ring.gz
e899b36c344236c4fcf3c290a0e836d8 /etc/swift/object.ring.gz
[root@swift-r1z1n1 ~]# logout
Connection to swift-r1z1n1 closed.
[root@swift-r1p1 ~]# ssh swift-r1z2n1
Last login: Mon Jun 26 11:50:42 2017 from swift-r1p1

[root@swift-r1z2n1 ~]# md5sum /etc/swift/*.ring.gz
7623d8ca43cf2254baa762ac30c18836 /etc/swift/account.ring.gz
d1d15966e1059543846bf1583e63ab02 /etc/swift/container.ring.gz
e899b36c344236c4fcf3c290a0e836d8 /etc/swift/object.ring.gz
[root@swift-r1z2n1 ~]# logout
Connection to swift-r1z2n1 closed.
[root@swift-r1p1 ~]# ssh swift-r1z3n1
Last login: Mon Jun 26 11:42:32 2017 from swift-r1p1
[root@swift-r1z3n1 ~]# md5sum /etc/swift/*.ring.gz
7623d8ca43cf2254baa762ac30c18836 /etc/swift/account.ring.gz
d1d15966e1059543846bf1583e63ab02 /etc/swift/container.ring.gz
e899b36c344236c4fcf3c290a0e836d8 /etc/swift/object.ring.gz

[root@swift-r1z3n1 ~]# grep swift_hash_path /etc/swift/swift.conf
# swift_hash_path_suffix and swift_hash_path_prefix are used as part of the
swift_hash_path_suffix = ca34a4c3aac7e6c8291e
swift_hash_path_prefix = 3856ea011469b0c93d4a
[root@swift-r1z3n1 ~]# logout
Connection to swift-r1z3n1 closed.
[root@swift-r1p1 ~]# ssh swift-r1z2n1
Last login: Mon Jun 26 12:06:34 2017 from swift-r1p1

[root@swift-r1z2n1 ~]# grep swift_hash_path /etc/swift/swift.conf
# swift_hash_path_suffix and swift_hash_path_prefix are used as part of the
swift_hash_path_suffix = ca34a4c3aac7e6c8291e
swift_hash_path_prefix = 3856ea011469b0c93d4a
[root@swift-r1z2n1 ~]# logout
Connection to swift-r1z2n1 closed.
[root@swift-r1p1 ~]# ssh swift-r1z1n1
Last login: Mon Jun 26 11:59:18 2017 from swift-r1p1
[root@swift-r1z1n1 ~]# grep swift_hash_path /etc/swift/swift.conf
# swift_hash_path_suffix and swift_hash_path_prefix are used as part of the
swift_hash_path_suffix = ca34a4c3aac7e6c8291e
swift_hash_path_prefix = 3856ea011469b0c93d4a
[root@swift-r1z1n1 ~]#

Revision history for this message
Greg Smethells (gsmethells) wrote :

For the record:

The filename that is ".*2.1.1936.*.dcm" exists.

The filename that is ".*2.1.1997.*.dcm" does NOT exist.

According to the above output.

Is there a bin util that can be used to find orphans or to detect them (an audit tool)? I do not see swift-recon flagging the problem we are experiencing and that seems odd to me as it appears to be doing auditing.

Revision history for this message
Greg Smethells (gsmethells) wrote :

Comment #9 above has the wrong curl cmd line for 2.1997 in that it does not cover 10.11.12.78 so here is the corrected one:

[root@swift-r1p1 ~]# curl -I -XHEAD "http://10.11.12.78:6000/r1z3n1-d24/461/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.199754063548486.14589405856570000.dcm"
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Mon, 26 Jun 2017 17:18:17 GMT

Revision history for this message
clayg (clay-gerrard) wrote :

Can you turn your proxies request_node_count [3] >= 84 (3 nodes * 28 devices) and check again for the 404'd objects.

Do you have any dispersion populate/report data you can query [1]?

Can you describe your device failure/replacement procedure [2]?

Can you check the quarantined datadirs on the hosts - it shows they're empty - could be a mis-config throwing off reporting - unless... how have you been handling corrupted data?

Since this issue is troublshooting a particular deployment to attempt locate data or discover underlying bug(s) - it might be useful to move this to #openstack-swift on freenode for more responsive feedback - and then take the results of the investigation to create issue(s) for whatever gaps are uncovered.

e.g. this may turn out to be be a duplicate of lp bug #1619408

1. as covered in the admin guide -
 https://docs.openstack.org/developer/swift/admin_guide.html#dispersion-report
2. https://docs.openstack.org/developer/swift/admin_guide.html#handling-drive-failure
3. https://docs.openstack.org/developer/swift/deployment_guide.html#proxy-server-configuration

Changed in swift:
status: New → Incomplete
Revision history for this message
Greg Smethells (gsmethells) wrote :

These are definitely not ghost listings since we do (in some cases) have the original files in another file system (removable drives, etc) outside of Swift. In addition, these files in Swift should not have been deleted. So, I do not think bug #1619408 applies here.

I will have a sys admin of the system work with someone on #openstack-swift as soon as possible.

They will also update with answers to the above questions.

Revision history for this message
Greg Smethells (gsmethells) wrote :

clayg, please look for a ghebda (Greg Hebda) on #openstack-swift if you can continue helping this investigation for us. Thanks.

Revision history for this message
Greg Smethells (gsmethells) wrote :

As of this morning, we have updated our Swift nodes from version 2.4.0 to 2.10.2 instead. We are unsure if this will help mitigate the issue, but we believe we ought to be on a version listed as "stable" on https://wiki.openstack.org/wiki/Swift/version_map which 2.4.0 is not.

Revision history for this message
Greg Smethells (gsmethells) wrote :

I may be wrong in comment #18 if I misunderstand ghost listings. I am assuming them to be due to a delete operation. If they are instead due to a create operation that fails, then perhaps this is a dupe of that. I think we'd need more evidence to corroborate it first, though.

Revision history for this message
John Dickinson (notmyname) wrote :

Greg, based on your comment in #20, I realized that there is some confusion around "stable" releases and otherwise. I've updated the version map page to hopefully make it more clear.

What was called "stable" I've renamed to "backports for distros". The word "stable" comes from the OpenStack system of managing branches for older releases. We the Swift team release producton-ready releases more frequently than the six-month OpenStack cycle. Like most OpenStack repositories, Swift follows the "cycle with intermediary" system, meaning that while we do a release every six months to align with the OpenStack cycle, we also release more frequently.

The reason we have the "backports for distros" (nee "stable") releases is to support distros like debian and red hat and ubuntu can update their packages with critical patches even if their policy doesn't allow them to consume more recent releases of the project.

If you are not consuming distro-provided packages, there is never a circumstance where I'd recommend using (eg) 2.10.2 instead of the most recent 2.14.0 release.

Revision history for this message
Greg (ghebda) wrote :

In response to comment #17:

Do you have any dispersion populate/report data you can query [1]?

# swift-dispersion-populate
Created 10 containers for dispersion reporting, 1s, 0 retries
Created 10 objects for dispersion reporting, 2s, 0 retries

# swift-dispersion-report
Queried 11 containers for dispersion reporting, 1s, 0 retries
100.00% of container copies found (33 of 33)
Sample represents 1.07% of the container partition space
Queried 10 objects for dispersion reporting, 2s, 0 retries
There were 10 partitions missing 0 copy.
100.00% of object copies found (30 of 30)
Sample represents 0.98% of the object partition space

Can you describe your device failure/replacement procedure [2]?
For a drive being replaced immediately with drive of same size:
We have only had 1 failed drive so far, and the following procedure was used:
1. make sure it's unmounted
2. remove from ring and rebalance
3. add new drive, format, mount in same place, and change permissions
4. add to ring, and rebalance

For quarantine info:
[root@swift-r1p1 ~]# swift-recon --all | grep quarantined
[ALL_quarantined_last_path] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 3
[ZBF_quarantined_last_path] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 3
[quarantined_objects] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 3
[quarantined_accounts] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 3
[quarantined_containers] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 3

For this one: Can you turn your proxies request_node_count [3] >= 84 (3 nodes * 28 devices) and check again for the 404'd objects.

What exactly should that line look like? If we have 85 devices, should it read:
request_node_count = 85

We have not changed that yet, because we don't want to pile on a misconfiguration if we get it wrong.

Revision history for this message
Greg Smethells (gsmethells) wrote :

John, we are using distro packages:

```
[root@swift-r1p1 ~]# cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)
```

Thanks for the clarification concerning the version map.

Revision history for this message
Greg Smethells (gsmethells) wrote :

The previous comment was the distro for the proxy. These are for the three Swift nodes:

[root@swift-r1z1n1 ~]# cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)

[root@swift-r1z2n1 ~]# cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)

[root@swift-r1z3n1 ~]# cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)

Revision history for this message
Pete Zaitcev (zaitcev) wrote :

You're talking about the base OS packages, but John meant Swift in comment #22. Only you know where you are obtaining it. AFAIK, CentOS itself does not publish Swift. Most of the time, it comes from RDO. But some people just run pip to get a stable branch (on top of CentOS, yes). In your original report you are aware that you run Swift 2.4.0, so you should also know where that comes from.

Revision history for this message
Greg Smethells (gsmethells) wrote :

Appears to be from here (yum output):

openstack-swift-account.noarch 2.10.1-1.el7 @centos-openstack-newton
openstack-swift-container.noarch 2.10.1-1.el7 @centos-openstack-newton
openstack-swift-object.noarch 2.10.1-1.el7 @centos-openstack-newton

clayg (clay-gerrard)
summary: - Objects can become orphaned in Swift 2.4.0
+ Slow replication with lack of visibility can cause availability problems
Changed in swift:
status: Incomplete → Confirmed
Revision history for this message
clayg (clay-gerrard) wrote :

Maybe we need recon-cron plumbing for part counting?

https://gist.github.com/clayg/4261e7dc654cc2c80a529b741a7cdd5f

summary: Slow replication with lack of visibility can cause availability problems
+ (esp. in multi-region)
Revision history for this message
Greg Smethells (gsmethells) wrote :
Revision history for this message
Greg Smethells (gsmethells) wrote :

Regarding comment #28, that is kind of what I was attempting to get at in comment #15 above. Some additional help from the system would be nice. It appears to me that the system can currently have "the wheels almost coming off", but nothing about swift-recon will indicate there is anything amiss. Unless there is another util that would catch this situation, I would think Swift ought to provide something above and beyond what is in Clay's script. I love that there is a script, but it feels like there reason it exists is because there is something lacking in Swift itself. IMHO, it is time to address that and entertain what Clay is suggesting in comment #28. Just my 2 cents.

Revision history for this message
clayg (clay-gerrard) wrote :

yes, the recommended existing tooling for alerting about this specific problem (unavailability) is swift-dispersion-report:

https://docs.openstack.org/developer/swift/admin_guide.html#geographically-distributed-swift-considerations

That might not be documented sufficiently, it came up at some point on the ML too:

http://lists.openstack.org/pipermail/openstack/2017-January/018327.html

The specifics of multi-region issue are described here:

https://docs.openstack.org/developer/swift/admin_guide.html#geographically-distributed-swift-considerations

Some general ideas about what you might *do* when the above monitoring techniques indicate there is a issue are described here:

https://docs.openstack.org/developer/swift/admin_guide.html#object-replicator

Some of the material here covers some of the mechanics of rebalance and might be useful:

https://www.youtube.com/watch?v=ger20cqOypE

Further enhancements to recon data might be the best implementable idea we have - so maybe this bug is just that feature. Always room to improve!

Revision history for this message
Greg Smethells (gsmethells) wrote :

clayg are we correct to assume that the output of

https://gist.github.com/clayg/4261e7dc654cc2c80a529b741a7cdd5f

Should be that "handoff" and "misplaced" columns should ideally be 0 (or nearly so)? This is our current standard for dispersion visibility. If that assumption is correct, then we shall monitor using that methodology going forward on a periodic basis.

Revision history for this message
Greg Smethells (gsmethells) wrote :

clayg, also, is it possible that the script

https://gist.github.com/clayg/4261e7dc654cc2c80a529b741a7cdd5f

could be improved to not only identify the issue we have been seeing, but to also perform the correct operations that need to be done in response to what it finds? Meaning can it do operations described here:

https://docs.openstack.org/developer/swift/admin_guide.html#object-replicator

based upon some default metric or command-line defined metric? If so, how big of a script improvement to

https://gist.github.com/clayg/4261e7dc654cc2c80a529b741a7cdd5f

would that be?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.