Invalid data in hashes.pkl doesn't get fixed

Bug #1830881 reported by Christian Schwede
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Christian Schwede

Bug Description

If the data in hashes.pkl is invalid (eg. corrupted, but pickle still able to load it) it will raise TypeErrors and gets never fixed. For example:

May 29 09:58:11 controller-0 object-server: Starting object replication pass.
May 29 09:58:11 controller-0 object-server: Error syncing partition: #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/swift/obj/replicator.py", line 407, in update#012 self.replication_cycle))#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 3666, in tpool_reraise#012 raise resp#012TypeError: must be encoded string without NULL bytes, not unicode
May 29 09:58:11 controller-0 object-server: 1/1 (100.00%) partitions replicated in 0.01s (193.88/sec, 0s remaining)
May 29 09:58:11 controller-0 object-server: 0 successes, 0 failures
May 29 09:58:11 controller-0 object-server: Object replication complete. (0.00 minutes)
May 29 09:58:11 controller-0 object-server: Replication sleeping for 30 seconds.

May 29 09:58:24 controller-0 object-server: ERROR __call__ error with REPLICATE /d1/174 : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 1150, in __call__#012 res = getattr(self, req.method)(req)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1672, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/swift/obj/server.py", line 1123, in REPLICATE#012 device, partition, suffixes, policy)#012 File "/usr/lib/python2.7/site-packages/swift/obj/diskfile.py", line 1408, in get_hashes#012 self._get_hashes, device, partition, policy, recalculate=suffixes)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 3666, in tpool_reraise#012 raise resp#012TypeError: must be encoded string without NULL bytes, not unicode

A simple reproducer is to create a dict with a NULL byte, eg:

with open("/srv/node/d1/objects/174/hashes.pkl", "wb+") as fh:
    fh.write(pickle.dumps({'\x00\x00\x00': False}))
    fh.close()

This gets never fixed by the replicators.

Changed in swift:
assignee: nobody → Christian Schwede (cschwede)
status: New → In Progress
Revision history for this message
Christian Schwede (cschwede) wrote :

Note: manual fix is to remove the hashes.pkl files from the partition directories that raise an error.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.opendev.org/661930

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.opendev.org/661930
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=c9e78d15e1e05f23facf7c28758b442bb25bde68
Submitter: Zuul
Branch: master

commit c9e78d15e1e05f23facf7c28758b442bb25bde68
Author: Christian Schwede <email address hidden>
Date: Wed May 29 11:37:54 2019 +0200

    Remove invalid dict entries from hashes.pkl

    If the data in a hashes.pkl is corrupted but still de-serialized without
    errors, it will mess up the replication and gets never fixed. This
    happens for example if one of the keys is a NULL byte.

    This patch checks if the dict keys in hashes.pkl are valid strings and
    invalidates it if not.

    Closes-Bug: 1830881
    Change-Id: I84b062d062ff49935feed0aee3e1963bb72eb5ea

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/losf)

Fix proposed to branch: feature/losf
Review: https://review.opendev.org/668007

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/losf)
Download full text (16.1 KiB)

Reviewed: https://review.opendev.org/668007
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=aca7474ca3c7bba54473fca35b6d88a3a1efdca4
Submitter: Zuul
Branch: feature/losf

commit 9d1b7497400e8c3a7159b11dd8455c55a31db985
Author: Tim Burke <email address hidden>
Date: Tue Mar 26 13:02:24 2019 -0700

    py3: port staticweb and domain_remap func tests

    Drive-by: Tighten domain_remap assertions on listings, which required
    that we fix proxy pipeline placement. Add a note about it to the sample
    config.

    Change-Id: I41835148051294088a2c0fb4ed4e7a7b61273e5f

commit 38a24571ad5c192bacaf60a5634ea66164dbbb71
Author: Tim Burke <email address hidden>
Date: Wed Jul 10 09:13:44 2019 -0700

    functests: make container creation less flakey in test_object

    Change-Id: If62d82beb202dea553776920a95c177518b162ab

commit 4c4bd778ea8fe8d02a2892524c7918da0ca25ea9
Author: Tim Burke <email address hidden>
Date: Wed Jul 3 09:52:41 2019 -0700

    container-replicator: Add a timeout for get_shard_ranges

    Previously this had no timeout, which meant that the replicator might
    hang and fail to make progress indefinitely while trying to receive
    shard ranges.

    While we're at it, only call get_shard_ranges when the remote indicates
    that it has shard ranges for us to sync -- this reduces the number of
    requests necessary to bring unsharded replicas in sync.

    Change-Id: I32f51f42d76db38271442a261600089404a00f91
    Closes-Bug: #1835260

commit 345f577ff191ee01c3cb4626805338028815c2b4
Author: Tim Burke <email address hidden>
Date: Wed Jul 3 07:28:14 2019 -0700

    s3token: fix conf option name

    Related-Change: Ica740c28b47aa3f3b38dbfed4a7f5662ec46c2c4
    Change-Id: I71f411a2e99fa8259b86f11ed29d1b816ff469cb

commit 76fde89261e1940daadb720c41df1a3595314a97
Author: Tim Burke <email address hidden>
Date: Mon Jun 17 09:25:52 2019 -0700

    py3: Be able to read and write non-ASCII headers

    Apparently Python's stdlib got more picky about what a header should
    look like. As a result, if an account, container, or object had a
    non-ASCII metadata name (values were fine), the proxy-server wouldn't
    parse all of the headers. See https://bugs.python.org/issue37093 for
    more information.

    This presented several problems:
    - Since the non-ASCII header aborts parsing, we may lose important
      HTTP-level information like Content-Length or Transfer-Encoding.
    - Since the offending header wouldn't get parsed, the client wouldn't
      even know what the problem was.
    - Even if the client knew what the bad header was, it would have no way
      to clear it, as the server uses the same logic to parse incoming
      requests.

    So, hack in our own header parsing if we detect that parsing was
    aborted. Note that we also have to mangle bufferedhttp's putheader so we
    can get non-ASCII headers to the backend servers.

    Now, we can run the test_unicode_metadata tests in
    test/functional/test_account.py and test/functional/test_container.py
    under py2 against services running under py3.

    Change-Id: I0f03c211f35a9a49e047a571...

tags: added: in-feature-losf
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.22.0

This issue was fixed in the openstack/swift 2.22.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/675434

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (stable/stein)

Reviewed: https://review.opendev.org/675434
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=46da6c1c01bc8beaf9a2b0a46122376b1bc7441f
Submitter: Zuul
Branch: stable/stein

commit 46da6c1c01bc8beaf9a2b0a46122376b1bc7441f
Author: Christian Schwede <email address hidden>
Date: Wed May 29 11:37:54 2019 +0200

    Remove invalid dict entries from hashes.pkl

    If the data in a hashes.pkl is corrupted but still de-serialized without
    errors, it will mess up the replication and gets never fixed. This
    happens for example if one of the keys is a NULL byte.

    This patch checks if the dict keys in hashes.pkl are valid strings and
    invalidates it if not.

    Closes-Bug: 1830881
    Change-Id: I84b062d062ff49935feed0aee3e1963bb72eb5ea
    (cherry picked from commit c9e78d15e1e05f23facf7c28758b442bb25bde68)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/690737

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (stable/rocky)

Reviewed: https://review.opendev.org/690737
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=b90c87158df46513489f64e34224de7424859874
Submitter: Zuul
Branch: stable/rocky

commit b90c87158df46513489f64e34224de7424859874
Author: Christian Schwede <email address hidden>
Date: Wed May 29 11:37:54 2019 +0200

    Remove invalid dict entries from hashes.pkl

    If the data in a hashes.pkl is corrupted but still de-serialized without
    errors, it will mess up the replication and gets never fixed. This
    happens for example if one of the keys is a NULL byte.

    This patch checks if the dict keys in hashes.pkl are valid strings and
    invalidates it if not.

    Closes-Bug: 1830881
    Change-Id: I84b062d062ff49935feed0aee3e1963bb72eb5ea
    (cherry picked from commit c9e78d15e1e05f23facf7c28758b442bb25bde68)
    (cherry picked from commit 46da6c1c01bc8beaf9a2b0a46122376b1bc7441f)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.19.2

This issue was fixed in the openstack/swift 2.19.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.21.1

This issue was fixed in the openstack/swift 2.21.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.