test_list_container_contents_with_prefix failed with: ['tempest-TestObject-1992896707'] != []

Bug #1718525 reported by Ihar Hrachyshka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Expired
Undecided
Unassigned

Bug Description

http://logs.openstack.org/50/505750/1/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/d7d55af/logs/testr_results.html.gz

Traceback (most recent call last):
  File "tempest/api/object_storage/test_container_services.py", line 271, in test_list_container_contents_with_prefix
    self.assertEqual([object_name], object_list)
  File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/testtools/testcase.py", line 411, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/testtools/testcase.py", line 498, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: ['tempest-TestObject-1992896707'] != []

Tags: gate-failure
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

This happened on Ocata. I am not an expert in Swift and logs don't reveal anything suspicious to my untrained eye.

tags: added: gate-failure
Revision history for this message
clayg (clay-gerrard) wrote :

Looks like out of order operations race to me....

container-server: 127.0.0.1 - - [20/Sep/2017:18:57:45 +0000] "GET /sdb1/230/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167" 200 185 "GET http://198.72.124.45:8081/v1/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167?&limit=9999&format=json" "tx96ef81ec9fac4a06bb39c-0059c2ba29" "proxy-server 28100" 0.0033 "-" 27659 0
container-server: 127.0.0.1 - - [20/Sep/2017:18:57:45 +0000] "DELETE /sdb1/230/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167/tempest-TestObject-1992896707" 204 - "DELETE http://198.72.124.45:8081/sdb1/93/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167/tempest-TestObject-1992896707" "txeb862930db4749ff92c08-0059c2ba29" "object-server 27632" 0.0006 "-" 27661 0

In this log we don't have fine grained timestamps - but the front-end GET request from the proxy-server that returned the listing was logged before the back-end DELETE request form the object server that removed the object record.

I'm not sure what a client should expect to be able to do about this from the front end - we just call that "eventual consistency" the backend object request is *supposed* to be allowed to offload back-pressure from the container db's updating their indexes to asynchronous updates. Which absolutely allows for the client to see a 201 from a DELETE followed by a container listing that shows the object is still there. Even without failure... as seen above.

You can either update the the test do do listings after the DELETE operation repeatedly until the listing matches the assertion up to a timeout (which would indicate some sort of failure)? Or alternatively configure the Swift deployment with a super high container_update_timeout:

https://github.com/openstack/swift/blob/master/swift/obj/server.py#L346

Changed in swift:
status: New → Incomplete
Revision history for this message
Tim Burke (1-tim-z) wrote :

Looking at the container-server logs for that I see

container-server: 127.0.0.1 - - [20/Sep/2017:18:57:41 +0000] "PUT /sdb1/230/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167" 201 - "PUT http://198.72.124.45:8081/v1/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167" "tx89466f8c01dc438999c73-0059c2ba22" "proxy-server 28100" 3.4021 "-" 27659 0
container-server: 127.0.0.1 - - [20/Sep/2017:18:57:41 +0000] "HEAD /sdb1/230/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167" 204 - "HEAD http://198.72.124.45:8081/v1/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167" "txf8b6cdf966664b769b89f-0059c2ba25" "proxy-server 28100" 0.0021 "-" 27661 0
container-server: 127.0.0.1 - - [20/Sep/2017:18:57:43 +0000] "GET /sdb1/230/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167" 204 - "GET http://198.72.124.45:8081/v1/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167?&prefix=tempest-" "tx1dd7cb7bd4464cf3ad936-0059c2ba27" "proxy-server 28100" 0.0036 "-" 27659 0
container-server: 127.0.0.1 - - [20/Sep/2017:18:57:43 +0000] "PUT /sdb1/230/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167/tempest-TestObject-1992896707" 201 - "PUT http://198.72.124.45:8081/sdb1/93/AUTH_8767ed26eafc44239b72a971f73865fd/tempest-TestContainer-69766167/tempest-TestObject-1992896707" "txf8b6cdf966664b769b89f-0059c2ba25" "object-server 27632" 0.0011 "-" 27661 0

So it seems the container GET came in *before* the update arrived?

Seems weird though, since the object server does the container update before sending back the 201: https://github.com/openstack/swift/blob/2.15.1/swift/obj/server.py#L894-L898

And the object-server logs don't show any "Container update timeout" lines...

I assume the tempest client is synchronous (i.e., create_object waits for a response before returning)... so I don't have an explanation for the observed order of events :-(

Revision history for this message
clayg (clay-gerrard) wrote :

ok, so I was in the wrong spot the principle still seems solid? These logs seem to be showing some sign of out of order operations - probably because of io contention on the container db's.

I notice the Container update timeout log lines are debug? The logs look like mostly >INFO to me?

http://logs.openstack.org/50/505750/1/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/d7d55af/logs/etc/swift/container-server/1.conf.txt.gz

Changed in swift:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Object Storage (swift) because there has been no activity for 60 days.]

Changed in swift:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.