racy problem on fetching swift rings from swift-proxy
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Swift Proxy Charm |
Fix Released
|
High
|
Edward Hope-Morley | ||
OpenStack Swift Storage Charm |
Fix Released
|
High
|
Edward Hope-Morley |
Bug Description
I'm re-deploying swift-storage onto a node that it was previously installed on and finding that swift-storage-
This is openstack charms version 18.02 on trusty.
Scenario and possible reproducer:
I have 3 swift-proxy units, let's call them p/0, p/1, p/2, and 12 swift-storage units, z1/0, z1/1, z1/2, z1/3, and same four units for z2 and z3.
I manually took the swift-storage-z1/3 disks out of the rings on the swift-proxy leader (let's say p/2) and then changed min-hours on swift-proxy from 0 to 1 to trigger a charm managed rebalance.
I then stopped the swift-* and rsync services on the node to keep it from participating in any rebalancing efforts.
I then ran juju remove-unit z1/3, let it complete, then juju add-unit z1 --to <same metal>.
At this point, it installed and the rings got updated by swift-proxy and started getting distributed to all swift-proxy and swift-storage nodes other than this new unit, now z1/4. on z1/4, I got a hook failed on swift-storage-
My supposition is that either swift-proxy after handshaking new keys with the leader needs to drop the rings into each proxy unit web server directory, or swift-storage needs to only try to pull rings from an is-leader=true unit.
in below log 10.101.140.110 is p/1 ad swift-storage-z1-23 is z1/4:
--2018-04-18 21:00:28-- http://
Connecting to 10.101.
HTTP request sent, awaiting response... 404 Not Found
2018-04-18 21:00:28 ERROR 404: Not Found.
--2018-04-18 21:00:30-- http://
Connecting to 10.101.
HTTP request sent, awaiting response... 404 Not Found
2018-04-18 21:00:30 ERROR 404: Not Found.
--2018-04-18 21:00:34-- http://
Connecting to 10.101.
HTTP request sent, awaiting response... 404 Not Found
2018-04-18 21:00:34 ERROR 404: Not Found.
--2018-04-18 21:00:40-- http://
Connecting to 10.101.
HTTP request sent, awaiting response... 404 Not Found
2018-04-18 21:00:40 ERROR 404: Not Found.
Traceback (most recent call last):
File "hooks/
main()
File "hooks/
hooks.
File "/var/lib/
self.
File "/var/lib/
restart_
File "/var/lib/
r = lambda_f()
File "/var/lib/
(lambda: f(*args, **kwargs)), restart_map, stopstart,
File "hooks/
fetch_
File "/var/lib/
return f(*args, **kwargs)
File "/var/lib/
check_call(cmd)
File "/usr/lib/
raise CalledProcessEr
subprocess.
tags: | added: stable-backport |
Changed in charm-swift-proxy: | |
status: | Fix Committed → Fix Released |
Changed in charm-swift-storage: | |
status: | Fix Committed → Fix Released |
oddly, what it seems like, when querying relation-get on the 3 proxy units is that p/0 just shared private-address, but both p/1 and p/2 are advertising all of the relation info including rings_url, rsync_allowed_ hosts, swift_hash, timestamp, and trigger, and the triggers don't match.
Perhaps this is stuck relation info from a prior swift-proxy ring-balance disconnect.