traceback on small PUTs with EC

Bug #1490286 reported by paul luse
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
New
Undecided
paul luse

Bug Description

Running a full set of mixed sizes PUT/GET/DEL over the weekend on our perf cluster running master and it appears the only issue in the logs is with small (4K and 512B) PUTs. Appears to be intermittent but happens quite often. Able to produce same traceback on SAIO running master with the ssbench config below. All settings are standard except EC segment size set to 640K (unclear if this is related). Have not done any debug as of yet (will start Mon)

Here's the the trace:
Aug 30 08:03:20 peluse-VirtualBox object-server: ERROR __call__ error with PUT /sdb6/414/AUTH_test/ssbench_000000/zero_021078 : #012Traceback (most recent call last):#012 File "/home/peluse/swift/swift/obj/server.py", line 938, in __call__#012 res = method(req)#012 File "/home/peluse/swift/swift/common/utils.py", line 2668, in wrapped#012 return func(*a, **kw)#012 File "/home/peluse/swift/swift/common/utils.py", line 1208, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/home/peluse/swift/swift/obj/server.py", line 670, in PUT#012 if not self._read_put_commit_message(mime_documents_iter):#012 File "/home/peluse/swift/swift/obj/server.py", line 404, in _read_put_commit_message#012 commit_hdrs, commit_iter = next(mime_documents_iter)#012 File "/home/peluse/swift/swift/obj/server.py", line 63, in iter_mime_headers_and_bodies#012 hdrs = HeaderKeyDict(rfc822.Message(file_like, 0))#012 File "/usr/lib/python2.7/rfc822.py", line 108, in __init__#012 self.readheaders()#012 File "/usr/lib/python2.7/rfc822.py", line 155, in readheaders#012 line = self.fp.readline()#012 File "/home/peluse/swift/swift/common/utils.py", line 3329, in readline#012 chunk = self.wsgi_input.read(self.read_chunk_size)#012 File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 189, in read#012 return self._chunked_read(self.rfile, length)#012 File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 179, in _chunked_read#012 self.chunk_length = int(rfile.readline().split(b";", 1)[0], 16)#012ValueError: invalid literal for int() with base 16: '' (txn: txbe51035c82c34b96a65ca-0055e31b1d)

Scenario file:
{
  "name": "Small test scenario",
  "sizes": [{
    "name": "zero",
    "size_min": 512,
    "size_max": 512,
    "crud_profile": [50, 50, 0, 0]
  }, {
    "name": "tiny",
    "size_min": 512,
    "size_max": 512,
    "crud_profile": [50, 50, 0, 0]
  }, {
    "name": "small",
    "size_min": 4096,
    "size_max": 4096
  }],
  "initial_files": {
    "zero": 300,
    "tiny": 100,
    "small": 10
  },
  "operation_count": 500,
  "crud_profile": [50, 50, 0, 0],
  "user_count": 4,
  "container_base": "ssbench",
  "container_count": 100,
  "container_concurrency": 10
}

Cmd Line: ssbench-master run-scenario -f /usr/local/share/ssbench/scenarios/very_small.scenario --keep-objects -u 32 -r 1020 --worker 4

swift.conf EC defn (NOTE the experiment with segment size):
[storage-policy:2]
default = yes
name = ec42
policy_type = erasure_coding
ec_type = jerasure_rs_vand
ec_num_data_fragments = 4
ec_object_segment_size = 655360
ec_num_parity_fragments = 2

Tags: ec
Revision history for this message
paul luse (paul-e-luse) wrote :

Hmmm.. this could be another case of client going away/bad error handling... see https://review.openstack.org/#/c/211338/

Revision history for this message
paul luse (paul-e-luse) wrote :

looks that way... timeouts on the proxy side, applying patch mentioned above results in no errors - thousands of retries that work (heavy load, small IO)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.