EC reconstructor (ssync_sender) got exceptions while send request to object-server and disk already umount

Bug #1466138 reported by Charles Hsu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
High
clayg

Bug Description

These disks are umount as follows,

192.168.12.14:6003/d59
192.168.12.13:6003/d21
192.168.12.13:6003/d22
192.168.12.14:6003/d57

swift code version: 2.3.0rc2 (commit SHA f8dee761)
Applied patch 1: https://launchpadlibrarian.net/205919648/reconstructor.patch (https://bugs.launchpad.net/swift/+bug/1452553)
Applied patch 2: https://review.openstack.org/#/c/191521/

object-reconstructor: 192.168.12.15:6003/d34/2114 Early disconnect
object-reconstructor: 192.168.12.13:6003/d23/1253 Early disconnect
object-reconstructor: 192.168.12.12:6003/d8/1762 0.5 seconds: connect send
object-reconstructor: 192.168.12.11:6003/d47/1354 0.5 seconds: connect send
object-reconstructor: 192.168.12.13:6003/d20/3276 0.5 seconds: connect send
object-reconstructor: 548/576 (95.14%) partitions reconstructed in 2976.66s (0.18/sec, 2m remaining)
object-reconstructor: 113118 suffixes checked - 0.00% hashed, 99.84% synced
object-reconstructor: Partition times: max 993.2518s, min 0.0101s, med 123.9985s
object-reconstructor: Trying to sync suffixes with 192.168.12.12:6003/d8/1464 policy#2 frag#1: Timeout (60s)
object-reconstructor: 192.168.12.13:6003/d22/1464 EXCEPTION in replication.Sender: #012Traceback (most recent call last):#012 File "/usr/lib/pymodules/python2.7/swift/obj/ssync_sender.py", line 72, in __call__#012 self.connect()#012 File "/usr/lib/pymodules/python2.7/swift/obj/ssync_sender.py", line 144, in connect#012 self.response = self.connection.getresponse()#012 File "/usr/lib/pymodules/python2.7/swift/common/bufferedhttp.py", line 126, in getresponse#012 response = HTTPConnection.getresponse(self)#012 File "/usr/lib/python2.7/httplib.py", line 1045, in getresponse#012 response.begin()#012 File "/usr/lib/python2.7/httplib.py", line 409, in begin#012 version, status, reason = self._read_status()#012 File "/usr/lib/python2.7/httplib.py", line 373, in _read_status#012 raise BadStatusLine(line)#012BadStatusLine: ''
object-reconstructor: 192.168.12.14:6003/d59/1609 EXCEPTION in replication.Sender: #012Traceback (most recent call last):#012 File "/usr/lib/pymodules/python2.7/swift/obj/ssync_sender.py", line 72, in __call__#012 self.connect()#012 File "/usr/lib/pymodules/python2.7/swift/obj/ssync_sender.py", line 144, in connect#012 self.response = self.connection.getresponse()#012 File "/usr/lib/pymodules/python2.7/swift/common/bufferedhttp.py", line 126, in getresponse#012 response = HTTPConnection.getresponse(self)#012 File "/usr/lib/python2.7/httplib.py", line 1045, in getresponse#012 response.begin()#012 File "/usr/lib/python2.7/httplib.py", line 409, in begin#012 version, status, reason = self._read_status()#012 File "/usr/lib/python2.7/httplib.py", line 373, in _read_status#012 raise BadStatusLine(line)#012BadStatusLine: ''
object-reconstructor: 192.168.12.14:6003/d57/1609 policy#2 frag#8 responded as unmounted
object-reconstructor: 192.168.12.13:6003/d21/2581 EXCEPTION in replication.Sender: #012Traceback (most recent call last):#012 File "/usr/lib/pymodules/python2.7/swift/obj/ssync_sender.py", line 72, in __call__#012 self.connect()#012 File "/usr/lib/pymodules/python2.7/swift/obj/ssync_sender.py", line 144, in connect#012 self.response = self.connection.getresponse()#012 File "/usr/lib/pymodules/python2.7/swift/common/bufferedhttp.py", line 126, in getresponse#012 response = HTTPConnection.getresponse(self)#012 File "/usr/lib/python2.7/httplib.py", line 1045, in getresponse#012 response.begin()#012 File "/usr/lib/python2.7/httplib.py", line 409, in begin#012 version, status, reason = self._read_status()#012 File "/usr/lib/python2.7/httplib.py", line 373, in _read_status#012 raise BadStatusLine(line)#012BadStatusLine: ''

Revision history for this message
clayg (clay-gerrard) wrote :

It would be interesting to turn on DEBUG logging on .14 and .13 and see if they're logging the 507 response to the REPLICATE request.

Also there's a change on master post-kilo that may help with this issue [1] - can you verify if this is reproducible with the latest code?

1. https://review.openstack.org/#/c/177836/

Revision history for this message
Charles Hsu (charles0126) wrote :

@clayg,

I'll turn it on to see what I get.

Should I cherry pick this patch(https://review.openstack.org/#/c/177836/), or I should move to latest code?

description: updated
tags: added: ec
clayg (clay-gerrard)
Changed in swift:
status: New → Confirmed
Revision history for this message
clayg (clay-gerrard) wrote :

this is very closely related to lp bug #1466138

On the latest code here's an example conversation

    Jun 25 08:17:15 saio object-6030: STDERR: (26055) accepted ('127.0.0.1', 42375)
    Jun 25 08:17:15 saio object-6030: 127.0.0.1 - - [25/Jun/2015:08:17:15 +0000] "SSYNC /sdb7/65" 507 - "-" "-" "-" 0.0008 "-" 26055 1

    Jun 25 08:17:15 saio object-6020: 127.0.0.1:6030/sdb7/65 Expected status 200; got 507

    Jun 25 08:17:15 saio object-6030: STDERR: Traceback (most recent call last):
    Jun 25 08:17:15 saio object-6030: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/greenpool.py", line 82, in _spawn_n_impl
    Jun 25 08:17:15 saio object-6030: STDERR: func(*args, **kwargs)
    Jun 25 08:17:15 saio object-6030: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 686, in process_request
    Jun 25 08:17:15 saio object-6030: STDERR: proto.__init__(sock, address, self)
    Jun 25 08:17:15 saio object-6030: STDERR: File "/usr/lib/python2.7/SocketServer.py", line 649, in __init__
    Jun 25 08:17:15 saio object-6030: STDERR: self.handle()
    Jun 25 08:17:15 saio object-6030: STDERR: File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
    Jun 25 08:17:15 saio object-6030: STDERR: self.handle_one_request()
    Jun 25 08:17:15 saio object-6030: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 358, in handle_one_request
    Jun 25 08:17:15 saio object-6030: STDERR: self.handle_one_response()
    Jun 25 08:17:15 saio object-6030: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 507, in handle_one_response
    Jun 25 08:17:15 saio object-6030: STDERR: while self.environ['eventlet.input'].read(MINIMUM_CHUNK_SIZE):
    Jun 25 08:17:15 saio object-6030: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 189, in read
    Jun 25 08:17:15 saio object-6030: STDERR: return self._chunked_read(self.rfile, length)
    Jun 25 08:17:15 saio object-6030: STDERR: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 179, in _chunked_read
    Jun 25 08:17:15 saio object-6030: STDERR: self.chunk_length = int(rfile.readline().split(b";", 1)[0], 16)
    Jun 25 08:17:15 saio object-6030: STDERR: ValueError: invalid literal for int() with base 16: ''
    Jun 25 08:17:15 saio object-6030: STDERR: (26055) accepted ('127.0.0.1', 42376)
    Jun 25 08:17:15 saio object-6030: 127.0.0.1 - - [25/Jun/2015:08:17:15 +0000] "REPLICATE /sdb7/65/852" 507 - "-" "-" "obj-reconstructor 26062" 0.0003 "-" 26055 1

object-6030 is blowing up *after* it responds 507 and *after* object-6020 logs the 507 response.

This seems to indicate that object-6020 isn't closing down the connection well.

Revision history for this message
clayg (clay-gerrard) wrote :
Changed in swift:
importance: Undecided → High
assignee: nobody → clayg (clay-gerrard)
status: Confirmed → In Progress
Revision history for this message
clayg (clay-gerrard) wrote :
Changed in swift:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/hummingbird)

Fix proposed to branch: feature/hummingbird
Review: https://review.openstack.org/202227

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (feature/hummingbird)

Change abandoned by Michael Barton (<email address hidden>) on branch: feature/hummingbird
Review: https://review.openstack.org/202227
Reason: Apparently I did this wrong.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/hummingbird)

Fix proposed to branch: feature/hummingbird
Review: https://review.openstack.org/202230

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/hummingbird)
Download full text (72.8 KiB)

Reviewed: https://review.openstack.org/202230
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=f7cb1777e1b514b3345b9e516ed8f89ad1a4ae87
Submitter: Jenkins
Branch: feature/hummingbird

commit 51f806d3e3d3a1fcbc80d2f7d7ddbe5cc4c024c9
Author: John Dickinson <email address hidden>
Date: Tue Jul 14 20:49:08 2015 -0700

    remove Python 2.6 from the classifier

    Change-Id: I67233e9c7b69826242546bd6bd98c24b81070579

commit 278adf5c20101a191979ce1e4d6277e5f209149e
Author: Hisashi Osanai <email address hidden>
Date: Tue Jul 14 15:33:45 2015 +0900

    Make logic of unit tests responsive to the method names

    The two methods, test_authorize_succeeds_for_tenant_name_in_roles and
    test_authorize_succeeds_for_tenant_id_in_roles, have names that don't
    match what they are testing. tenant_name and tenant_id need to be
    switched.

    Change-Id: I7cb0a7d2b2111127fd5d6b55f2da6a3eadf2235d

commit 1cc3eff958fdd4fb07c2b74c52df7829d3125466
Author: Victor Stinner <email address hidden>
Date: Fri Jul 10 13:04:44 2015 +0200

    Fixes for mock 1.1

    The new release of mock 1.1 is more strict. It helped to find bugs in
    tests.

    Closes-Bug: #1473369
    Change-Id: Id179513c6010d827cbcbdda7692a920e29213bcb

commit ff192cfe5705324497a389aa2f22227d75dc0f8e
Author: janonymous <email address hidden>
Date: Wed Jul 8 18:38:22 2015 +0530

    Replace reduce and unichr , these are no longer available in py3

    * Replace reduce() with six.moves.reduce()
    * Replace unichr with six.unichr

    Change-Id: I2038e47e0a6522dd992fd2a4aeff981cf7750fe0

commit 4beceab4f4be99f14025815cf7ed4510ea77f460
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Jul 9 06:14:56 2015 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I9ff1dde06be45fc7d6c441a1e1c07221f839a9a1

commit 56ee39a7e13417203c5e1816d7a3184a07f85826
Author: Matthew Oliver <email address hidden>
Date: Thu Jul 9 15:19:32 2015 +1000

    Ring builder code clean up follow up patch

    This is a simple change that cleans up a NIT from Sam's 'stop moving
    partitions unnecessarily when overload is on' patch.

    Change-Id: I9d9f1cc23e2bb625d8e158f4d3f64e10973176a1

commit 6cafd0a4c0bb8f311fc59df580b42e801214effd
Author: Oshrit Feder <email address hidden>
Date: Wed Jul 8 15:18:22 2015 +0300

    Fix Container Sync example

    Container-sync realm uses cluster_ as a prefix to specify clusters'
    names. At use, the prefix should not be included. Fixing the examples
    and sample conf to make it clearer that only the name of the cluster
    should be passed.

    Change-Id: I2e521d86faffb59e1b45d3f039987ee023c5e939

commit 125238612f58481316db68d7087252bb7729f447
Author: Janie Richling <email address hidden>
Date: Sat Jul 4 17:08:32 2015 -0500

    Add CORS unit tests to base

    In earlier versions of swift when a request was made with an
    existing origin, but without any CORS settings in the container,
    it was possible to get an u...

tags: added: in-feature-hummingbird
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/crypto)

Fix proposed to branch: feature/crypto
Review: https://review.openstack.org/205579

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/crypto)
Download full text (80.5 KiB)

Reviewed: https://review.openstack.org/205579
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=8ab46b64365b8eab80680f2562f81e8adb3032a3
Submitter: Jenkins
Branch: feature/crypto

commit 89f705e8aab144092d40a13fc4ef19ffef5f3eba
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Jul 23 06:11:27 2015 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I94cf347564cb33977f33b1e64259bcb39a8cf809

commit a65e9db8752793ec37b594dc9eca5066171825db
Author: Christian Schwede <email address hidden>
Date: Wed Jul 22 10:43:17 2015 +0000

    Removing commented out code in test/unit/account/test_backend.py

    Noticed this while reviewing another change. Looks like the test itself already
    ensures correct functionality of the reclaim() method in AccountBroker without
    the commented code, thus removing this stale code.

    Change-Id: I6a26a7591adef9fd794ca68a4e9c493d1127f93c

commit 99d052772a9585e0befdfd292fd03aefde77180a
Author: Kota Tsuyuzaki <email address hidden>
Date: Mon Jul 13 01:12:43 2015 -0700

    Fix 499 client disconnected on COPY EC object

    Currently, a COPY request for an EC object might go to fail as 499 Client
    disconnected because of the difference between destination request content
    length and actual transferred bytes.

    That is because the conditional response status and content length for
    an EC object range GET is handled at calling the response instance on
    proxy server. Therefore the calling response instance (resp()) will change
    the conditional status from 200 (HTTP_OK) to 206 (PartialContent) and will
    change the content length for the range GET.

    In EC case, sometimes Swift needs whole stored contents to decode a segment.
    It will make 200 HTTP OK response from object-server and proxy-server
    will unfortunately set whole content length to the destination content
    length and it makes the bug 1467677.

    This patch introduces a new method "fix_conditional_response" for
    swift.common.swob.Response that calling _response_iter() and cached the
    iter in the Response instance. By calling it, Swift can set correct condtional
    response any time after setting whole content_length to the response
    instance like EC case.

    Change-Id: If85826243f955d2f03c6ad395215c73daab509b1
    Closes-Bug: #1467677

commit 62ed4f81ef80440550633eaaaa962a4f9383c2d3
Author: Timur Alperovich <email address hidden>
Date: Tue Jul 14 16:56:44 2015 -0700

    Add two functional tests for delimiter.

    The first test verifies that a delimiter will trim entries beyond the
    first matching instance of delimiter (after the given matching prefix,
    if any) and squash duplicates. So, when setting the delimiter
    to "-", given blobs "test", "test-foo" and "test-bar-baz", we expect
    only "test" (no matching delim) and "test-" (trim all characters after
    the first "-", and squash duplicates).

    The second test verifies that when a prefix is provid...

tags: added: in-feature-crypto
Thierry Carrez (ttx)
Changed in swift:
milestone: none → 2.4.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.