Bug #1359018 “Inappropriate checking of connection timeout in db...” : Bugs : OpenStack Object Storage (swift)

Revision history for this message

Takashi Kajinami (kajinamit) wrote on 2014-08-20:

#1

Here I show detailed description about this problem.

_repl_to_node checks timeout in _http_connect function. (swift.common.db_replicator l373)

    def _repl_to_node(self, node, broker, partition, info):
        ...
        with ConnectionTimeout(self.conn_timeout):
            http = self._http_connect(node, partition, broker.db_file)

_http_connect creates and returns ReplConecction class. (swift.common.db_replicator l338)
It never calls "connect".

    def _http_connect(self, node, partition, db_file):
        ...
        return ReplConnection(node, partition,
                              os.path.basename(db_file).split('.', 1)[0],
                              self.logger)

ReplConnection doesn't create connection in __init__. (swift.common.db_replicator l115)

    def __init__(self, node, partition, hash_, logger):
        ...
        self.logger = logger
        self.node = node
        host = "%s:%s" % (node['replication_ip'], node['replication_port'])
        BufferedHTTPConnection.__init__(self, host)

self.path = '/%s/%s/%s' % (node['device'], partition, hash_)

clayg (clay-gerrard) on 2014-08-20

Changed in swift:
status:	New → Confirmed

Revision history for this message

clayg (clay-gerrard) wrote on 2014-08-20:

#2

I thought maybe under the hood eventlet's green HTTPConnection or httplib's base were calling connect in __init__; but no...

They've all got this autoconnect thing going on where it'll make the connection when you first try to send data. Which for REPLICATE is basically once you've the whole request ready to go.

That is to say, I don't think we can do much to catch a connection timeout before we try to send the request. We could try to make some expect-100 something work on the REPLICATE verb; but since it's always been this way I think the smartest thing may just be to remove the useless ConnectionTimeout and the one silly `test_repl_to_node_http_connect_fails` that seemed to think a socket was getting opened in there and it would somehow magically return None on failure.

clayg (clay-gerrard) on 2014-08-20

Changed in swift:
importance:	Undecided → Low

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-11-26: Fix merged to swift (master)

#3

Reviewed: https://review.openstack.org/116218
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=7a0c4d248257259612d3471ab42669ca9d90c573
Submitter: Jenkins
Branch: master

commit 7a0c4d248257259612d3471ab42669ca9d90c573
Author: Takashi Kajinami <email address hidden>
Date: Mon Nov 24 22:05:07 2014 +0900

Remove invalid connection checking in db_replicator

    Account/Container-replicator checks connection generation and timeout
    in HTTP REPLICATE Request in _repl_to_node, but it doesn't really checks
    connection but only construction of ReplConnection class.
    This patch removes that invalid checking.

Change-Id: Ie6b4062123d998e69c15638b741e7d1ba8a08b62
Closes-Bug: #1359018

Changed in swift:
status:	Confirmed → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-01: Fix proposed to swift (feature/ec)

#4

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/138165

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-01: Fix merged to swift (feature/ec)

#5

Download full text (15.6 KiB)

Reviewed: https://review.openstack.org/138165
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0d3ebf09b94b41782b2c2a6bbcf255bf1203eca0
Submitter: Jenkins
Branch: feature/ec

commit 977d7c14daa38ab9c9d79bbf8b92371024b93fc8
Author: John Dickinson <email address hidden>
Date: Wed Nov 26 14:19:08 2014 -0800

Fix tempfile bugs from commit 6978275

    Commit 6978275 changed xprofile middleware's usage of mktemp
    and moved to using tempfile. But it was clearly never tested,
    because the os.close() calls never worked. This patch updates
    that previous patch to use a context to open and close the file.

Change-Id: I40ee42e8539551fd8e4dfb353f50146ab40a7847

commit dec97fc3ba2c71884f1c098e7d9cd1f709f74958
Author: OpenStack Proposal Bot <email address hidden>
Date: Wed Nov 26 06:13:29 2014 +0000

Imported Translations from Transifex

For more information about this automatic import see:
https://wiki.openstack.org/wiki/Translations/Infrastructure

Change-Id: Ibf319f7cc1b5036ad8031776cf2c6018fb8a0159

commit 01f6e860066640a2ba1406a23c93a72b34ec495e
Author: Clay Gerrard <email address hidden>
Date: Fri Nov 21 17:28:13 2014 -0800

Add Expected Failure for ssync with sys-meta

    Sysmeta included with an object PUT persists with the PUT data - if an
    internal operation such as POST-as-copy during partial failure, or ssync
    with fast-POST (not supported), causes that data to be lost then the
    associated sysmeta will also be lost.

    Since object sys-meta persistence in the face of a POST when the
    original .data is unavailable requires fast-POST with .meta files the
    probetest that validates object sys-meta persistence of a POST when the
    most up-to-date copy of the object with sys-meta is unavailable
    configures an InternalClient with object_post_as_copy = false.

    This non-default configuration option is not supported by ssync and
    results in a loss of sys-meta very similar to the object sys-meta
    failure you would see with object_post_as_copy = true when the COPY part
    of the POST is unable to retrieve the most recently written object with
    sys-meta.

    Until we can fix the default POST behavior to make metadata updates
    without stomping on newer data file timestamps we should expect object
    sys-meta to be "very very best possible but not really guaranteed
    effort".

Until we can fix ssync to replicate metadata updates without stomping on
newer data file timestamps we should expect this test to fail.

    When ssync replication of fast-POST metadata update is fixed this test
    will fail signaling that the expected failure cruft should be removed,
    but other parts of ssync replication will still work and some other bugs
    can be fixed while we wait.

Change-Id: Ifc5d49514de79b78f7715408e0fe0908357771d3

commit a8751ae557616cab1cafd98a338cad352526a262
Author: Cedric Dos Santos <email address hidden>
Date: Tue Nov 25 12:37:05 2014 +0100

Correct misspelled words

In some files I found misspelling words.

bin/swift-reconciler-enqueue#l26
prima...

Reviewed:  https://review.openstack.org/138165
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0d3ebf09b94b41782b2c2a6bbcf255bf1203eca0
Submitter: Jenkins
Branch:    feature/ec

commit 977d7c14daa38ab9c9d79bbf8b92371024b93fc8
Author: John Dickinson <me@not.mn>
Date:   Wed Nov 26 14:19:08 2014 -0800

Fix tempfile bugs from commit 6978275
    
    Commit 6978275 changed xprofile middleware's usage of mktemp
    and moved to using tempfile. But it was clearly never tested,
    because the os.close() calls never worked. This patch updates
    that previous patch to use a context to open and close the file.
    
    Change-Id: I40ee42e8539551fd8e4dfb353f50146ab40a7847

commit dec97fc3ba2c71884f1c098e7d9cd1f709f74958
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Wed Nov 26 06:13:29 2014 +0000

Imported Translations from Transifex
    
    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure
    
    Change-Id: Ibf319f7cc1b5036ad8031776cf2c6018fb8a0159

commit 01f6e860066640a2ba1406a23c93a72b34ec495e
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Fri Nov 21 17:28:13 2014 -0800

Add Expected Failure for ssync with sys-meta
    
    Sysmeta included with an object PUT persists with the PUT data - if an
    internal operation such as POST-as-copy during partial failure, or ssync
    with fast-POST (not supported), causes that data to be lost then the
    associated sysmeta will also be lost.
    
    Since object sys-meta persistence in the face of a POST when the
    original .data is unavailable requires fast-POST with .meta files the
    probetest that validates object sys-meta persistence of a POST when the
    most up-to-date copy of the object with sys-meta is unavailable
    configures an InternalClient with object_post_as_copy = false.
    
    This non-default configuration option is not supported by ssync and
    results in a loss of sys-meta very similar to the object sys-meta
    failure you would see with object_post_as_copy = true when the COPY part
    of the POST is unable to retrieve the most recently written object with
    sys-meta.
    
    Until we can fix the default POST behavior to make metadata updates
    without stomping on newer data file timestamps we should expect object
    sys-meta to be "very very best possible but not really guaranteed
    effort".
    
    Until we can fix ssync to replicate metadata updates without stomping on
    newer data file timestamps we should expect this test to fail.
    
    When ssync replication of fast-POST metadata update is fixed this test
    will fail signaling that the expected failure cruft should be removed,
    but other parts of ssync replication will still work and some other bugs
    can be fixed while we wait.
    
    Change-Id: Ifc5d49514de79b78f7715408e0fe0908357771d3

commit a8751ae557616cab1cafd98a338cad352526a262
Author: Cedric Dos Santos <cedric.dos.sant@gmail.com>
Date:   Tue Nov 25 12:37:05 2014 +0100

Correct misspelled words
    
    In some files I found misspelling words.
    
    bin/swift-reconciler-enqueue#l26
       primarly => primarily
    swift/account/backend.py#l309
       ommited => omitted
    swift/container/replicator.py#l158
       successfull => successful
    test/unit/account/test_backend.py#1450
       non_existant_policy_index => non_existent_policy_index
    test/unit/account/test_backend.py#1451
       'test-non-existant-policy'=> 'test-non-existent-policy'
    test/unit/account/test_backend.py#1453
       non_existant_policy_index => non_existent_policy_index
    
    Change-Id: I976236e3200a6fbdc20be464acff182b6cface81

commit 98de48d898f1419b0a0cfc273ec778e60331e623
Author: Shilla Saebi <shilla.saebi@gmail.com>
Date:   Sat Nov 22 15:38:48 2014 -0500

Fix typo in apache_deployment doc
    
    Change-Id: I42d76f544290dbda62633de90608d41caadac084

commit a1872b0498e1aa4182a4373c89beeaaaa219ea17
Author: Shilla Saebi <shilla.saebi@gmail.com>
Date:   Sat Nov 22 15:35:10 2014 -0500

Fix 2 typos in admin_guide file
    
    Change-Id: Ibf1e5dbf6ff4747c7f23f6638321ab41bba3021b

commit 0dc4b0a7b75237c09caffdac8c0dfd92bf8e3286
Author: Shilla Saebi <shilla.saebi@gmail.com>
Date:   Sat Nov 22 16:11:37 2014 -0500

Fix typos in overview_large_objects and versioning doc
    
    
    Change-Id: I1a919ad1b0298d5817f9eb2caf5e3bd7b3243c2c

commit 7a0c4d248257259612d3471ab42669ca9d90c573
Author: Takashi Kajinami <kajinamit@nttdata.co.jp>
Date:   Mon Nov 24 22:05:07 2014 +0900

Remove invalid connection checking in db_replicator
    
    Account/Container-replicator checks connection generation and timeout
    in HTTP REPLICATE Request in _repl_to_node, but it doesn't really checks
    connection but only construction of ReplConnection class.
    This patch removes that invalid checking.
    
    Change-Id: Ie6b4062123d998e69c15638b741e7d1ba8a08b62
    Closes-Bug: #1359018

commit 1c9bc0b522bed333b04a46ed7bd2c66a4eb89860
Author: Jay S. Bryant <jsbryant@us.ibm.com>
Date:   Thu Oct 2 14:10:04 2014 -0500

Handle os.listdir failures in object-updater
    
    While investigating bug 1375348 I discovered the problem
    reported there was not limited to the object-auditor.  The
    object-updater has similar bugs.
    
    This patch catches the unhandled exception that can be thrown
    by os.listdir if the self.devices directory is inaccessible.
    
    Change-Id: I6293b840916bb63cf9eebbc05068d9a3c871bdc3
    Related-bug: 1375348

commit 8cc075a8fb7561c736cb38d629f5b3d8ddb67497
Author: Jay S. Bryant <jsbryant@us.ibm.com>
Date:   Thu Nov 20 15:56:58 2014 -0600

mock out os.listdir to return a list
    
    os.listdir returns a list of items.  The test case had been
    written to return a single item which, though not really changing
    the result of the test, was not the best approach.
    
    This patch updates the test case to return a list instead of a single
    item.
    
    Change-Id: I793e0636440c0de0ca339c6592adec3e8b4ee1b4

commit fb353e1756df02622ea257acc987df4ccd094872
Author: John Dickinson <me@not.mn>
Date:   Thu Nov 20 10:22:27 2014 -0800

update AUTHORS
    
    Change-Id: I416e81b20a129377782f5d9298f8b8f5be079c27

commit 6c02adc33e3238f3fe0b75f2857503d1036f4737
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Thu Nov 20 06:11:14 2014 +0000

Imported Translations from Transifex
    
    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure
    
    Co-Authored-By: Pearl Yajing Tan <pearl.y.tan@seagate.com>
    
    Change-Id: Ifa3e292b8d5afbef8a99121b233e5ea596e672b7

commit 87d8626505c31511911facd5e1a1c3b3a65e8663
Author: Eohyung Lee <liquidnuker@gmail.com>
Date:   Thu Nov 20 11:38:49 2014 +0900

fix example typo
    
    5 * 1024 * 1024 = 5242880
    
    Change-Id: I0eeb6e2d9fbd79103cd8c658627344f73fed9498

commit ddf8b0594bb7e5ea9022982a7c5e15d9b261c22e
Author: Andreas Jaeger <aj@suse.de>
Date:   Wed Nov 19 09:11:55 2014 -0500

Fix translation setup
    
    Fix the output directory, it should be swift/locale.
    This fixes the importing of translations.
    
    Change-Id: I48311773c9d200c3b1739dc796618849416096ed

commit e0307f950bccde1337898e16087af726429e13f4
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Mon Nov 17 12:30:15 2014 -0800

Always use FakeMemcache for in-process tests
    
    Better isolation and consistency for in-process functests to always use
    the FakeMemcache.  If you want to test the real memcache you have real
    functional tests.
    
    Change-Id: Ic483f794e122130bd7694c9a5f9a2b1cd0b9a653

commit 6f9ca6122efac6c1c252a948cd5cc18c58c625ff
Author: Anne Gentle <anne@openstack.org>
Date:   Mon Nov 17 16:11:05 2014 -0600

Adds v1 API documentation to doc/source/api
    
    After discussion https://review.openstack.org/#/c/129384/ moving
    to the doc directory in swift repo.
    
    This lets us eliminate the object-api repo along with all the <service>-
    api repos and move content to audience-centric locations.
    
    Change-Id: Ia0d9973847f7409a02dcc1a0e19400a3c3ecdf32

commit 11a72a4a5084dbcb5539596c50793e45c5dac525
Author: Thiago da Silva <thiago@redhat.com>
Date:   Mon Nov 17 11:33:41 2014 -0500

move slo, dlo after tempauth in pipeline
    
    Noticed that slo and dlo middleware were placed before
    tempauth, they should be placed after
    
    DocImpact
    
    Change-Id: Ia931e2280125d846f248b23e219aebad14c66210
    Signed-off-by: Thiago da Silva <thiago@redhat.com>

commit 2792fe81a93dbaa95e58f14099db5e11dd8cde68
Author: Daisuke Morita <morita.daisuke@lab.ntt.co.jp>
Date:   Tue Sep 30 11:06:08 2014 -0400

Show the sum of every policy's amount in /recon/async
    
    After the release of Swift ver. 2.0.0, some recon responses do not
    show each policy's information yet. To make things worse, some recon
    results only count on policy-0's score, therefore the total is not
    shown in the recon results.
    
    With this patch, async_pending count of recon results becomes
    policy-aware. Suppose a number of async_pending files for policy-0 is 2
    and a number for policy-1 is 3, recon sums up every policy's amount
    as follows.
    
    $ curl http://<host>:<port>/recon/async
    {"async_pending": 5} # It showed 2 before this commit
    
    Related-Bug: 1375332
    Change-Id: Ifc88b8c9e06b9f022a926a87ed807e938e1e0412

commit c9f824637845f342b6996058e0fea8338bd1305d
Author: Alistair Coles <alistair.coles@hp.com>
Date:   Mon Aug 11 17:09:48 2014 +0100

Make in process functional tests use sample proxy-server.conf
    
    This patch was first motivated by noticing that the proxy
    server pipeline used for in process functional tests was
    out of date with respect to the pipeline in
    /etc/proxy-server.conf.sample. Rather than cut and paste
    the current pipeline into the in process setup, it seems
    like a better idea would be to have the in process tests
    always use the sample config.
    
    A further benefit is that in process functional tests will
    pick up changes to the sample config introduced by patches -
    previously test/functional/__init__.py would need to be
    manually modified to run in process functional tests
    on new middleware for example.
    
    Note: because the pipeline is now loaded using entry points,
    'python setup.py [develop|install]' will now be needed
    before running the tests.
    
    Obvious next steps would be to do the same for the backend
    servers, and to allow alternative config files and dir's
    to be specified, but this patch is the first step.
    
    Also drive-by fixes some typos in proxy-server.conf.sample
    
    Change-Id: If442bd7c2b1721ec92839c4490924ba33e1545d8

commit e429cd81be711f42441a08e34c077dcd7a97bed0
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Thu Nov 13 16:40:05 2014 -0800

Make error limits survive a ring reload
    
    The proxy was storing the error count and last-error time in the
    ring's internal data, specifically in the device dictionaries. This
    works okay, but it means that whenever a ring changes, all the error
    stats reset.
    
    Now the error stats live in the proxy server object, so they survive a
    ring reload.
    
    Better yet, the error stats are now keyed off of the node's
    IP/port/device triple, so if you have the same device in two rings
    (like with multiple storage policies), then the error stats are
    combined. If the proxy server sees a 507 for an objec request in
    policy X, then that will now result in that particular object disk
    being error-limited for requests in policies Y and Z as well.
    
    Change-Id: Icc72b68b99f37367bb16d43688e7e45327e3e022

commit b98fe3b77b6b422e5e5978f6cf82a11fb87aedfc
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Tue Nov 11 17:03:29 2014 -0800

Prefer X-Backend-Timestamp for X-Newest
    
    When a X-Backend-Timestamp is available it would generally preferred
    over a less specific value and sorts correctly against any X-Timestamp
    values anyway.
    
    Change-Id: I08b7eb37ab8bd6eb3afbb7dee44ed07a8c69b57e

commit 466403723c4c1fd575b1340e0f9214ee28f0aeb7
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Mon Nov 3 14:20:08 2014 -0800

Make resetswift customizable via environment
    
    Instead of recommending to edit resetswift to replace "/dev/sdb1" with
    "/srv/swift-disk", use an environment variable instead. This way I can
    set SAIO_BLOCK_DEVICE=/srv/swift-disk in my .bashrc, and then when I'm
    testing out changes to resetswift, I don't need to remember to edit
    the modified script, nor do I end up submitting changes with the wrong
    default in there.
    
    The variable defaults to /dev/sdb1, so if you use the script unmodified
    and don't set SAIO_BLOCK_DEVICE, nothing changes for you.
    
    Change-Id: I741a8c91c2c54a4f32bc391cd794ef4206402753

commit 331b14238effc9d1928e478bba86122f7e2525c1
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Fri Nov 7 13:53:46 2014 -0800

Reject object names with Unicode surrogates
    
    Technically, you can't encode surrogates into UTF-8 at all, but Python
    2 lets you get away with it. Python 3 does not.
    
    We already have a check for surrogate pairs (commit 0080337), but not
    one for lone surrogates. This commit forbids object names with lone
    surrogates in them.
    
    The problem with surrogates is trivially reproducible:
    
        swift@saio:~$ python2.7
        Python 2.7.3 (default, Feb 27 2014, 19:58:35)
        [GCC 4.6.3] on linux2
        Type "help", "copyright", "credits" or "license" for more information.
        >>> b'\xed\xa0\xbc'.decode('utf-8')
        u'\ud83c'
        >>>
    
        swift@saio:~$ python3.3
        Python 3.3.5 (default, Aug  4 2014, 15:27:24)
        [GCC 4.6.3] on linux
        Type "help", "copyright", "credits" or "license" for more information.
        >>> b'\xed\xa0\xbc'.decode('utf-8')
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte
        >>>
    
    See also http://bugs.python.org/issue9133
    
    Change-Id: I7c31022e8a028c3cdf2ed1586349509d96cfded9

commit 2659888c921d09bd1dd23cda6ee2f158187d80e6
Author: Matthew Oliver <matt@oliver.net.au>
Date:   Fri Jun 13 19:12:31 2014 +1000

When a filesystem does't support xattr return a 507
    
    Currently when the object server tries to write an object's metadata
    to a filesystem that doesn't support xattr, it errors with a stacktrace
    and returns a 500 error back to the user with no information.
    
    This patch catches the resulting IOError when attempting to read or write
    the xattr metadata, logs the error nicely and then returns a 507 error
    back to the user.
    
    Seeing as this change is sending back a 507, it also catches and logs
    the out of disk space errors (ENOSPC and EDQUOT).
    
    Change-Id: I31932b57582817a0b3b58dd315a996bd0bcbc99b
    Closes-Bug: #966671

commit 0a5268c34caa25487c48380a1821e4deac178538
Author: Christian Schwede <christian.schwede@enovance.com>
Date:   Tue Sep 16 14:46:08 2014 +0000

Fix bug in swift-ring-builder list_parts
    
    The number of shown replicas in the partition list might differ from the
    actual number of replicas (as shown in the bugreport).
    
    This codes simply iterates for the builder._replica2part2dev and
    remembers the number of replicas for each partition.
    
    The code to find the partitions was moved to swift/common/ring/utils.py
    to make it easier to test, and a test to ensure the correct number of
    replicas is returned was added.
    
    Closes-Bug: 1370070
    Change-Id: Id6a3ed437bb86df2f43f8b0b79aa8ccb50bbe13e

Thierry Carrez (ttx) on 2014-12-15

Changed in swift:
milestone:	none → 2.2.1
status:	Fix Committed → Fix Released

OpenStack Object Storage (swift)

Inappropriate checking of connection timeout in db_replicator._repl_to_node

Bug Description

Other bug subscribers

Remote bug watches