Inappropriate checking of connection timeout in db_replicator._repl_to_node

Bug #1359018 reported by Takashi Kajinami
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Low
Unassigned

Bug Description

Account/Container-replicator checks connection timeout in HTTP REPLICATE Request in _repl_to_node, but it doesn't really checks connection but only construction of ReplConnection class.

Revision history for this message
Takashi Kajinami (kajinamit) wrote :

Here I show detailed description about this problem.

_repl_to_node checks timeout in _http_connect function. (swift.common.db_replicator l373)

    def _repl_to_node(self, node, broker, partition, info):
        ...
        with ConnectionTimeout(self.conn_timeout):
            http = self._http_connect(node, partition, broker.db_file)

_http_connect creates and returns ReplConecction class. (swift.common.db_replicator l338)
It never calls "connect".

    def _http_connect(self, node, partition, db_file):
        ...
        return ReplConnection(node, partition,
                              os.path.basename(db_file).split('.', 1)[0],
                              self.logger)

ReplConnection doesn't create connection in __init__. (swift.common.db_replicator l115)

    def __init__(self, node, partition, hash_, logger):
        ...
        self.logger = logger
        self.node = node
        host = "%s:%s" % (node['replication_ip'], node['replication_port'])
        BufferedHTTPConnection.__init__(self, host)

        self.path = '/%s/%s/%s' % (node['device'], partition, hash_)

clayg (clay-gerrard)
Changed in swift:
status: New → Confirmed
Revision history for this message
clayg (clay-gerrard) wrote :

I thought maybe under the hood eventlet's green HTTPConnection or httplib's base were calling connect in __init__; but no...

They've all got this autoconnect thing going on where it'll make the connection when you first try to send data. Which for REPLICATE is basically once you've the whole request ready to go.

That is to say, I don't think we can do much to catch a connection timeout before we try to send the request. We could try to make some expect-100 something work on the REPLICATE verb; but since it's always been this way I think the smartest thing may just be to remove the useless ConnectionTimeout and the one silly `test_repl_to_node_http_connect_fails` that seemed to think a socket was getting opened in there and it would somehow magically return None on failure.

clayg (clay-gerrard)
Changed in swift:
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/116218
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=7a0c4d248257259612d3471ab42669ca9d90c573
Submitter: Jenkins
Branch: master

commit 7a0c4d248257259612d3471ab42669ca9d90c573
Author: Takashi Kajinami <email address hidden>
Date: Mon Nov 24 22:05:07 2014 +0900

    Remove invalid connection checking in db_replicator

    Account/Container-replicator checks connection generation and timeout
    in HTTP REPLICATE Request in _repl_to_node, but it doesn't really checks
    connection but only construction of ReplConnection class.
    This patch removes that invalid checking.

    Change-Id: Ie6b4062123d998e69c15638b741e7d1ba8a08b62
    Closes-Bug: #1359018

Changed in swift:
status: Confirmed → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/ec)

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/138165

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/ec)
Download full text (15.6 KiB)

Reviewed: https://review.openstack.org/138165
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0d3ebf09b94b41782b2c2a6bbcf255bf1203eca0
Submitter: Jenkins
Branch: feature/ec

commit 977d7c14daa38ab9c9d79bbf8b92371024b93fc8
Author: John Dickinson <email address hidden>
Date: Wed Nov 26 14:19:08 2014 -0800

    Fix tempfile bugs from commit 6978275

    Commit 6978275 changed xprofile middleware's usage of mktemp
    and moved to using tempfile. But it was clearly never tested,
    because the os.close() calls never worked. This patch updates
    that previous patch to use a context to open and close the file.

    Change-Id: I40ee42e8539551fd8e4dfb353f50146ab40a7847

commit dec97fc3ba2c71884f1c098e7d9cd1f709f74958
Author: OpenStack Proposal Bot <email address hidden>
Date: Wed Nov 26 06:13:29 2014 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: Ibf319f7cc1b5036ad8031776cf2c6018fb8a0159

commit 01f6e860066640a2ba1406a23c93a72b34ec495e
Author: Clay Gerrard <email address hidden>
Date: Fri Nov 21 17:28:13 2014 -0800

    Add Expected Failure for ssync with sys-meta

    Sysmeta included with an object PUT persists with the PUT data - if an
    internal operation such as POST-as-copy during partial failure, or ssync
    with fast-POST (not supported), causes that data to be lost then the
    associated sysmeta will also be lost.

    Since object sys-meta persistence in the face of a POST when the
    original .data is unavailable requires fast-POST with .meta files the
    probetest that validates object sys-meta persistence of a POST when the
    most up-to-date copy of the object with sys-meta is unavailable
    configures an InternalClient with object_post_as_copy = false.

    This non-default configuration option is not supported by ssync and
    results in a loss of sys-meta very similar to the object sys-meta
    failure you would see with object_post_as_copy = true when the COPY part
    of the POST is unable to retrieve the most recently written object with
    sys-meta.

    Until we can fix the default POST behavior to make metadata updates
    without stomping on newer data file timestamps we should expect object
    sys-meta to be "very very best possible but not really guaranteed
    effort".

    Until we can fix ssync to replicate metadata updates without stomping on
    newer data file timestamps we should expect this test to fail.

    When ssync replication of fast-POST metadata update is fixed this test
    will fail signaling that the expected failure cruft should be removed,
    but other parts of ssync replication will still work and some other bugs
    can be fixed while we wait.

    Change-Id: Ifc5d49514de79b78f7715408e0fe0908357771d3

commit a8751ae557616cab1cafd98a338cad352526a262
Author: Cedric Dos Santos <email address hidden>
Date: Tue Nov 25 12:37:05 2014 +0100

    Correct misspelled words

    In some files I found misspelling words.

    bin/swift-reconciler-enqueue#l26
       prima...

Thierry Carrez (ttx)
Changed in swift:
milestone: none → 2.2.1
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.