Send commit (create .durable) when quorum 4xx statuses in the responses

Bug #1491748 reported by Kota Tsuyuzaki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
High
Kota Tsuyuzaki

Bug Description

In EC PUT request case, proxy-server may send commit and it will make .durable file even though the request failed due to a lack of quorum number.

For example:
- Considering the case that almost of object-servers fail by 422 Unpronounceable Entity
- Using ec scheme 4 + 2
- 5 (quorum size) object-server failed with 422, 1 object-servers succeeded as 201 created

How it works:
- Client creates a PUT request
- Proxy will open connections to backend object-servers
- Proxy will send whole encoded chunks to object-servers
- Proxy will send content-md5 as footers.
- Proxy will get responses [422, 422, 422, 422, 422, 201] (currently this list is regarded as "we have quorum response")
- And then proxy will send commits to object-servers (the only object-server with 201 will create .durable file)
- Proxy will return 503 because the commits results in no response statuses from object-servers except the 201 node.

Tested with Swift 2.4.0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/220059

Changed in swift:
status: New → In Progress
Changed in swift:
importance: Undecided → Medium
Revision history for this message
clayg (clay-gerrard) wrote :

This might be related or fixed by patch 225357 [1] - the bug associated with that change (lp bug #1496205) is also tracking invalid writing of .durable when there is errors sending with the object body request phase [2]

1. https://review.openstack.org/#/c/225357/
2. https://bugs.launchpad.net/swift/+bug/1496205

Changed in swift:
importance: Medium → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/220059
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=8f1c7409e7b6a854125a234b8a2b969075d26dae
Submitter: Jenkins
Branch: master

commit 8f1c7409e7b6a854125a234b8a2b969075d26dae
Author: Kota Tsuyuzaki <email address hidden>
Date: Thu Sep 3 00:40:41 2015 -0700

    Don't send commits for quorum *BAD* requests on EC

    In EC PUT request case, proxy-server may send commits to object-servers
    it may make .durable file even though the request failed due to a lack
    of quorum number.

    For example:
    - Considering the case that almost all object-servers fail by 422
      Unprocessable Entity
    - Using ec scheme 4 + 2
    - 5 (quorum size) object-server failed with 422, 1 object-servers
      succeeded as 201 created

    How it works:
    - Client creates a PUT request
    - Proxy will open connections to backend object-servers
    - Proxy will send whole encoded chunks to object-servers
    - Proxy will send content-md5 as footers.
    - Proxy will get responses [422, 422, 422, 422, 422, 201] (currently
      this list will be regarded as "we have quorum response")
    - And then proxy will send commits to object-servers (the only
      object-server with 201 will create .durable file)
    - Proxy will return 503 because the commits results in no response
      statuses from object-servers except the 201 node.

    This patch fixes the quorum handling at ObjectController to check
    that it has *successful* quorum responses before sending durable commits.

    Closes-Bug: #1491748
    Change-Id: Icc099993be76bcc687191f332db56d62856a500f

Changed in swift:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in swift:
milestone: none → 2.5.0
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/crypto)

Fix proposed to branch: feature/crypto
Review: https://review.openstack.org/234065

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/crypto)
Download full text (36.8 KiB)

Reviewed: https://review.openstack.org/234065
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=c80229fd86853f5f8541aeef4b5044170572640d
Submitter: Jenkins
Branch: feature/crypto

commit 9cafa472a336f66d149a20c12f4251703d96f04d
Author: Ondřej Nový <email address hidden>
Date: Sat Oct 10 20:57:07 2015 +0200

    Autodetect systemctl in SAIO and use it on systemd distros

    Change-Id: I84a9b27baac89327749d8774032860f8ad5166f2

commit 92767f28d668643bc2affee7b2fd46fd9349656a
Author: Emile Snyder <email address hidden>
Date: Sun Oct 11 21:24:54 2015 -0700

    Fix 'swift-ring-builder write_builder' after you remove a device

    clayg already posted the code fix in the bug, but noted it needs a test.

    Closes-Bug: #1487280
    Change-Id: I07317754afac7165baac4e696f07daeba2e72adc

commit a48649002970b2150d24d0622a100f54045443c5
Author: Lisak, Peter <email address hidden>
Date: Mon Oct 12 14:42:01 2015 +0200

    swift-recon fails with socket.timeout exception

    If some server is overloaded or timeout set too low, swift-recon fails with
    raised socket.timeout exception.

    This error should be processed the same way as HTTPError/URLError.

    Change-Id: Ide8843977ab224fa866097d0f0b765d6899c66b8

commit 767fac8186ea4541f4466ac9a55c03abea6a878b
Author: Christian Schwede <email address hidden>
Date: Mon Oct 12 07:09:00 2015 +0000

    Enable H234 check (assertEquals is deprecated, use assertEqual)

    All usages of assertEquals and assertNotEquals are fixed now, so let's enable
    the H234 check to avoid regressions in the future.

    Change-Id: I2c2ccb3b268cf9eb11f2db045378ab125a02bc31

commit 1882801be1d8983cd718786bd409cf09f65a00b0
Author: janonymous <email address hidden>
Date: Mon Aug 31 21:49:49 2015 +0530

    pep8 fix: assertNotEquals -> assertNotEqual

    assertNotEquals is deprecated in py3

    Change-Id: Ib611351987bed1199fb8f73a750955a61d022d0a

commit f5f9d791b0b8b32350bd9a47fbc00ff86a65f09d
Author: janonymous <email address hidden>
Date: Wed Aug 5 23:58:14 2015 +0530

    pep8 fix: assertEquals -> assertEqual

    assertEquals is deprecated in py3, replacing it.

    Change-Id: Ida206abbb13c320095bb9e3b25a2b66cc31bfba8
    Co-Authored-By: Ondřej Nový <email address hidden>

commit 1ba7641c794104de57e5010f76cecbf146a2a63b
Author: Zack M. Davis <email address hidden>
Date: Thu Oct 8 16:16:18 2015 -0700

    minutæ: port ClientException tweaks from swiftclient; dict .pop

    openstack/python-swiftclient@5ae4b423 changed python-swiftclient's
    ClientException to have its http_status attribute default to
    None (rather than 0) and to use super in its __init__ method. For
    consistency's sake, it's nice for Swift's inlined copy of
    ClientException to receive the same patch. Also, the retry function in
    direct_client (a major user of ClientException) was using a somewhat
    awkward conditional-assignment-and-delete construction where the .pop
    method of dictionaries would be more idiomatic.

    Change-Id: I70a12f934f84f57549617af28b86f7f5637bd8fa

commit 01f9d15045129d09...

tags: added: in-feature-crypto
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/hummingbird)

Fix proposed to branch: feature/hummingbird
Review: https://review.openstack.org/236162

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/hummingbird)
Download full text (52.0 KiB)

Reviewed: https://review.openstack.org/236162
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=18ddcaf0d6b67fcbb6b0a4cf4a9a99c72f6f3a08
Submitter: Jenkins
Branch: feature/hummingbird

commit a9ddc2d9ea402eaac7ccd8992387f77855968ab5
Author: Mahati Chamarthy <email address hidden>
Date: Fri Oct 16 18:18:33 2015 +0530

    Hyperlink fix to first contribution doc

    Change-Id: I19fc1abc89f888233b80a57c68a152c1c1758640

commit 83a1151d13e096b480aefe6ec18259f2d7d021db
Author: Pete Zaitcev <email address hidden>
Date: Fri Oct 9 16:45:20 2015 -0600

    Interpolate the explanation string not whole HTML body

    The only reason this exists is that I promised to do it.
    But in our case, there's no big advantage, and here's why.

    The general thinking goes that strings must be interpolated
    first because the body may contain a syntax that confuses the
    interpolation. So this patch makes the code "more correct".
    However, our HTML template is tightly controlled. It's not
    like it contains additional percents.

    So I'll just leave this here for now while I'm asking if
    the content type is set correctly.

    Change-Id: Ia18aeb0f94ef389f8b95450986a566e5fa06aa10

commit 384b91eb824376659989b904f9396cbf2e02d2bd
Author: asettle <email address hidden>
Date: Thu Sep 3 15:11:46 2015 +1000

    Moving DLO functionality doc to the middleware code

    This change moves the RST DLO documentation from
    statically inside overview_large_objects.rst and moves it
    to middleware/dlo.py.
    This is where all middleware RST documentation is defined.

    The overview_large_objects.rst is still the main page
    for information on large objects, so now dynamically
    points to both the DLO and SLO middleware RST
    documentation and the relevant middleware.rst page
    simply points to it.

    Change-Id: I40d918c8b7bc608ab945805d69fe359521df038a
    Closes-bug: #1276852

commit 2996974e5d48b4efaa1b271b8fbd0387bced7242
Author: Ondřej Nový <email address hidden>
Date: Sat Oct 10 14:56:30 2015 +0200

    Script for running unit, func and probe tests at once

    When developing Swift it's often needed to run all tests.
    This script makes it much simpler.

    Change-Id: I67e6f7cc05ebd0475001c1b56e8f6fd09c8c644f

commit c2182fd4163050a5f76eb3dedb7703dc821fa83d
Author: janonymous <email address hidden>
Date: Fri Jul 17 20:20:15 2015 +0530

    Python3: do not use im_self/im_func/func_closure

    Use __self__, __func__ and __closure__ instead, as they work
    with both Python 2 and 3.

    Modifying usage of __func__ in codebase.

    Change-Id: I57e907c28c1d4646605e70194ea3650806730b83

commit c0866ceaac2f69ae01345a795520141f59ec64f5
Author: Samuel Merritt <email address hidden>
Date: Fri Sep 25 17:26:37 2015 -0700

    Improve SLO PUT error checking

    This commit tries to give the user a reason that their SLO manifest
    was invalid instead of just saying "Invalid SLO Manifest File". It
    doesn't get every error condition, but it's better than before.

    Examples of things that now have real error...

tags: added: in-feature-hummingbird
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.5.0

This issue was fixed in the openstack/swift 2.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.