Adding metadata with POST after the creation of an object shows 404 (multi DC) after upgrading from Kilo to Ussuri
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
New
|
Undecided
|
Unassigned |
Bug Description
Hello,
We are encountering a change of behavior in our swift clusters, while trying to upgrade from Kilo to Ussuri version.
Below are some explanations of the problem/bug we are seeing, we are requesting your help/input :
When adding the following metadata: ‘x-deleted-at : 30’ on a newly created object, we should be expecting a return code 2XX as we were seeing with Kilo version.
However, in Ussuri version now, it's actually returning a 404 Not Found return code, despite the header successfully added to the object. We are seeing a successfull POST & 202 return code on the local DC, but 404 on the others DC because the object is not yet replicated. And the client is getting back a 404 on the POST request at the end. Getting back the headers from the object shows that the x-delete-at header has been succesfully added to the object.
This behavior is happening only in our multi datacenter swift environment, not in our second environment which only has 1 datacenter. We also are using the region feature with 3 regions inside our cluster.
After some digging and testing, we found out the following article : https:/
Our cluster is currently running Kilo version and we were upgrading to Ussuri version, when we found out that issue during the upgrade process.
Since copy post-as-copy option default config is now set to false, and since the removal of all post_as_copy related code and configs, there is a regression in the behavior of adding meta headers on a newly created object in a multidata center openstack swift context.
We have a test script that do the following 4 tasks in a row :
- Create an object PUT
- Read that object GET
- Add an x-delete-at header POST
- read the header HEAD
- GET the obejct to ensure it has been correctly expired/deleted
Before upgrading the swift proxy (to ussuri or anything after octaca) , this test script was working 100 % of the time .
After upgrading to Stein/Ussuri version, with post_as_copy=true support or fast-copy , adding meta failed with 404, while the header is in fact added and confirmed with a HEAD.
Adding meta will failed with 404 as long as the object is not fully replicated on our 3 datacenters.
Once an existing object fully already replicated, adding META works 100% of the time again.
What is really messy is the post operation always succeed even if it is reported as 404. We found out that on the proxy server which is receiving the POST operation, the local storage (same datacenter) is returning a 202 success, when the 2 others are returning a 404, which leads into a 404 not found when returning the response to the swift-client.
Whereas we are expecting to get 20X on adding META on a new object.
We tried to revert back a proxy to Kilo version, and we are getting back the previous behavior when creating/adding metadata on a new object.
From what I understand, with the removal of the post_as_copy code support on new release, we can not have the original behaviour back again.
Is there any workarounds to implement this behaviour again, or at least try to mitigate it ?
Our clients using our cluster will need to do a lot of changes on their scripts/projects in order to handle this new behavior.
-------
Below are some logs to show the process in Ussuri and also what we are seing in Kilo in our production environment with the same test operations :
Current preproduction environment (1 proxy - 1 storage per datacenter)
USSURI
DC 1 proxy01 :
28/Jul/
28/Jul/
28/Jul/
/28/Jul/
DC 1 Storage01 (local)
[28/Jul/
[28/Jul/
[28/Jul/
[28/Jul/
[28/Jul/
[28/Jul/
DC 2 Storage
28/Jul/
DC 3 Storage
28/Jul/
Whereas in Kilo version, you can see that the POST are converted to PUT operations on storage servers, and there are also replicated on other storage servers:
KILO (current production environment - 2 proxy - 3 storage nodes per datacenter)
DC 1 proxy01
30/Jul/
30/Jul/
30/Jul/
30/Jul/
DC 1 storage1
[30/Jul/
[30/Jul/
[30/Jul/
[30/Jul/
[30/Jul/
DC 1 storage2
[30/Jul/
[30/Jul/
DC 1 storage3
[30/Jul/
[30/Jul/
[30/Jul/
30/Jul/
[30/Jul/
DC 2 Storage01
[30/Jul/
[30/Jul/
30/Jul/
this is a very detailed bug report, thank you! I think this is describing the same issue as https:/ /bugs.launchpad .net/swift/ +bug/1818931
I think roughly we know how the code should work (we fixed a similar issue for DELETE https:/ /bugs.launchpad .net/swift/ +bug/1318375) but I haven't gotten any further than the failing unittest.
I normally discourage the use of write affinity - can you turn it off in your multi-datacenter deployment?