crc32 implementation is schizophrenic

Bug #1666320 reported by Tim Burke
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
liberasurecode
Fix Released
Undecided
Tim Burke

Bug Description

There are two implementations of the crc used for fragment checksums; one [1] seems optimized for SSE4, with the other [2] as a fallback implementation.

Bad news: Despite comments declaring that these are using the same polynomials [3], they're not! The first implements CRC32C [4], while the table-based implementation seems to be based off of zlib [5].

Good news: That first implementation doesn't seem to actually get used. While we may set INTEL_SSE41 or INTEL_SSE42 if CPU seems to support it [6], we never seem to set INTEL_SSE4.

Bad news: While the second implementation is close to zlib's, it isn't actually the same -- zlib uses an unsigned long to hold the crc [7], while we use an int. There are two problems:

1. An int is only guaranteed to have at least 16 bits. This probably isn't a problem; most modern CPUs use at least 32 bits for an int.
2. A signed int behaves differently when we left-shift; instead of always filling with zeros on the left, it will fill with zeros when positive (or zero) and ones when negative.

Good (?) news: we don't (necessarily) use that second implementation, either. If libz.so is loaded before liberasurecode.so, we use libz's implementation.

[1] https://github.com/openstack/liberasurecode/blob/1.4.0/src/utils/chksum/crc32.c#L46-L66
[2] https://github.com/openstack/liberasurecode/blob/1.4.0/src/utils/chksum/crc32.c#L69-L130
[3] https://github.com/openstack/liberasurecode/blob/1.4.0/src/utils/chksum/crc32.c#L115-L116
[4] https://en.wikipedia.org/wiki/SSE4#SSE4.2
[5] https://github.com/madler/zlib/blob/v1.2.11/crc32.h#L8-L59
[6] https://github.com/openstack/liberasurecode/blob/1.4.0/get_flags_from_cpuid.c#L87-L92
[7] https://github.com/madler/zlib/blob/master/crc32.c#L238
[8] https://github.com/openstack/liberasurecode/blob/1.4.0/src/utils/chksum/crc32.c#L119

Revision history for this message
Tim Burke (1-tim-z) wrote :

This script may be useful to check whether fragment headers are using zlib CRCs or not.

Usage: python check_crc.py fragment_file [fragment_file2 [...]]

Tim Burke (1-tim-z)
description: updated
Revision history for this message
Tim Burke (1-tim-z) wrote :
Changed in liberasurecode:
status: New → In Progress
assignee: nobody → Tim Burke (1-tim-z)
Revision history for this message
clayg (clay-gerrard) wrote :

On my "bad" frags it doesn't look like there's any checksum information at all:

00000000 03 00 00 00 f5 07 00 00 00 00 00 00 de 17 00 00 |....?.......?...|
00000010 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 04 00 0d 02 00 cc 5e 0c 0b 09 |...........?^...|
00000040 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 4a 78 20 53 2d 65 25 78 2d 60 09 3b 27 72 24 51 |Jx S-e%x-`.;'r$Q|

compared to

00000000 02 00 00 00 00 00 08 00 00 00 00 00 00 00 10 00 |................|
00000010 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 04 00 0d 02 00 cc 5e 0c 0b 00 |............^...|
00000040 04 01 00 3e 17 51 ec 00 00 00 00 00 00 00 00 00 |...>.Q..........|
00000050 cb fd 36 8e 4a d7 1e f8 3d 9a 39 8c 22 90 25 8e |..6.J...=.9.".%.|

so on my bad node it's not possible no matter what the LD_PRELOAD to get the checksum to come back valid...

What's weird is that depending on the LD_PRELOAD the check_crc gets a different computed meta_crc:

ubuntu@saio:/vagrant/.scratch$ LD_PRELOAD=/usr/local/lib/liberasurecode.so.1.4.0 ./check_crc.py /srv/node3/sdb3/objects-1/589/c2b/93464008f434853ad9b74c8c5381ac2b/1498598728.24878#2#d.data
3985286910 3964737342 '>\x17Q\xec'
checksum mismatch!
ubuntu@saio:/vagrant/.scratch$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libz.so ./check_crc.py /srv/node3/sdb3/objects-1/589/c2b/93464008f434853ad9b74c8c5381ac2b/1498598728.24878#2#d.data
3964737342 3964737342 '>\x17Q\xec'
first frag checksum OK

^ different computed value matches stable recorded version depending on preload

root@swift-test-02:~# LD_PRELOAD=/usr/lib/liberasurecode.so.1.4.0 /home/swiftstack/check_crc.py /srv/node/d11/objects-2/398/465/6381800fa3e800cd36d6ea5bf4fd6465/1458349982.69822#3.data
283779307 0 '\x00\x00\x00\x00'
checksum mismatch!
root@swift-test-02:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libz.so /home/swiftstack/check_crc.py /srv/node/d11/objects-2/398/465/6381800fa3e800cd36d6ea5bf4fd6465/1458349982.69822#3.data
2055872539 0 '\x00\x00\x00\x00'
checksum mismatch!

^ different computed value never matches stable (empty) recorded version regardless of preload

Revision history for this message
clayg (clay-gerrard) wrote :

I wonder how "chksum_type" is determined?

root@swift-test-02:~# cat audit_frag.py
#!/usr/bin/env python
import sys
from swift.common.storage_policy import POLICIES

policy = POLICIES[2]

with open(sys.argv[1]) as f:
    for frag_data in iter(lambda: f.read(policy.fragment_size), ''):
        print '%r' % policy.pyeclib_driver.get_metadata(frag_data, formatted=True)
root@swift-test-02:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libz.so python audit_frag.py /srv/node/d11/objects-2/398/465/6381800fa3e800cd36d6ea5bf4fd6465/1458349982.69822#3.data
{'index': 3, 'chksum_mismatch': 0, 'backend_id': 'isa_l_rs_vand', 'orig_data_size': 6110L, 'chksum_type': 'none', 'backend_version': 140376711236864, 'chksum': '', 'size': 2037}
root@swift-test-02:~# LD_PRELOAD=/usr/lib/liberasurecode.so.1.4.0 python audit_frag.py /srv/node/d11/objects-2/398/465/6381800fa3e800cd36d6ea5bf4fd6465/1458349982.69822#3.data
{'index': 3, 'chksum_mismatch': 0, 'backend_id': 'isa_l_rs_vand', 'orig_data_size': 6110L, 'chksum_type': 'none', 'backend_version': 139775415815424, 'chksum': '', 'size': 2037}

ubuntu@saio:/vagrant/.scratch$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libz.so python audit_frag.py /srv/node3/sdb3/objects-1/589/c2b/93464008f434853ad9b74c8c5381ac2b/1498598728.24878#2#d.data
{'index': 2, 'chksum_mismatch': 0, 'backend_id': 'isa_l_rs_vand', 'orig_data_size': 1048576L, 'chksum_type': 'none', 'backend_version': 134400, 'chksum': '', 'size': 524288}
{'index': 2, 'chksum_mismatch': 0, 'backend_id': 'isa_l_rs_vand', 'orig_data_size': 1048576L, 'chksum_type': 'none', 'backend_version': 134400, 'chksum': '', 'size': 524288}
{'index': 2, 'chksum_mismatch': 0, 'backend_id': 'isa_l_rs_vand', 'orig_data_size': 5473L, 'chksum_type': 'none', 'backend_version': 134400, 'chksum': '', 'size': 2737}
ubuntu@saio:/vagrant/.scratch$ LD_PRELOAD=/usr/local/lib/liberasurecode.so.1.4.0 python audit_frag.py /srv/node3/sdb3/objects-1/589/c2b/93464008f434853ad9b74c8c5381ac2b/1498598728.24878#2#d.data
Traceback (most recent call last):
  File "audit_frag.py", line 9, in <module>
    print policy.pyeclib_driver.get_metadata(frag_data, formatted=True)
  File "/vagrant/pyeclib/pyeclib/ec_iface.py", line 335, in get_metadata
    return self.ec_lib_reference.get_metadata(fragment, formatted)
  File "/vagrant/pyeclib/pyeclib/core.py", line 127, in get_metadata
    formatted)
pyeclib.ec_iface.ECInvalidFragmentMetadata: pyeclib_c_get_metadata ERROR: Fragment integrity check failed. Please inspect syslog for liberasurecode error report.

Revision history for this message
Kevin Greenan (kmgreen2) wrote : Re: [Bug 1666320] Re: crc32 implementation is schizophrenic
Download full text (4.9 KiB)

Hey Dude,

I'll have a little ok this afternoon.

-kevin

Sent from my iPhone

> On Jun 27, 2017, at 8:09 PM, clayg <email address hidden> wrote:
>
> I wonder how "chksum_type" is determined?
>
> root@swift-test-02:~# cat audit_frag.py
> #!/usr/bin/env python
> import sys
> from swift.common.storage_policy import POLICIES
>
> policy = POLICIES[2]
>
> with open(sys.argv[1]) as f:
> for frag_data in iter(lambda: f.read(policy.fragment_size), ''):
> print '%r' % policy.pyeclib_driver.get_metadata(frag_data, formatted=True)
> root@swift-test-02:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libz.so python audit_frag.py /srv/node/d11/objects-2/398/465/6381800fa3e800cd36d6ea5bf4fd6465/1458349982.69822#3.data
> {'index': 3, 'chksum_mismatch': 0, 'backend_id': 'isa_l_rs_vand', 'orig_data_size': 6110L, 'chksum_type': 'none', 'backend_version': 140376711236864, 'chksum': '', 'size': 2037}
> root@swift-test-02:~# LD_PRELOAD=/usr/lib/liberasurecode.so.1.4.0 python audit_frag.py /srv/node/d11/objects-2/398/465/6381800fa3e800cd36d6ea5bf4fd6465/1458349982.69822#3.data
> {'index': 3, 'chksum_mismatch': 0, 'backend_id': 'isa_l_rs_vand', 'orig_data_size': 6110L, 'chksum_type': 'none', 'backend_version': 139775415815424, 'chksum': '', 'size': 2037}
>
>
> ubuntu@saio:/vagrant/.scratch$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libz.so python audit_frag.py /srv/node3/sdb3/objects-1/589/c2b/93464008f434853ad9b74c8c5381ac2b/1498598728.24878#2#d.data
> {'index': 2, 'chksum_mismatch': 0, 'backend_id': 'isa_l_rs_vand', 'orig_data_size': 1048576L, 'chksum_type': 'none', 'backend_version': 134400, 'chksum': '', 'size': 524288}
> {'index': 2, 'chksum_mismatch': 0, 'backend_id': 'isa_l_rs_vand', 'orig_data_size': 1048576L, 'chksum_type': 'none', 'backend_version': 134400, 'chksum': '', 'size': 524288}
> {'index': 2, 'chksum_mismatch': 0, 'backend_id': 'isa_l_rs_vand', 'orig_data_size': 5473L, 'chksum_type': 'none', 'backend_version': 134400, 'chksum': '', 'size': 2737}
> ubuntu@saio:/vagrant/.scratch$ LD_PRELOAD=/usr/local/lib/liberasurecode.so.1.4.0 python audit_frag.py /srv/node3/sdb3/objects-1/589/c2b/93464008f434853ad9b74c8c5381ac2b/1498598728.24878#2#d.data
> Traceback (most recent call last):
> File "audit_frag.py", line 9, in <module>
> print policy.pyeclib_driver.get_metadata(frag_data, formatted=True)
> File "/vagrant/pyeclib/pyeclib/ec_iface.py", line 335, in get_metadata
> return self.ec_lib_reference.get_metadata(fragment, formatted)
> File "/vagrant/pyeclib/pyeclib/core.py", line 127, in get_metadata
> formatted)
> pyeclib.ec_iface.ECInvalidFragmentMetadata: pyeclib_c_get_metadata ERROR: Fragment integrity check failed. Please inspect syslog for liberasurecode error report.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1666320
>
> Title:
> crc32 implementation is schizophrenic
>
> Status in liberasurecode:
> In Progress
>
> Bug description:
> There are two implementations of the crc used for fragment checksums;
> one [1] seems optimized for SSE4, with the other [2] as a fallback
> implementation.
>
> Bad news: Despite comments declaring that these a...

Read more...

Revision history for this message
Tim Burke (1-tim-z) wrote :

Clay, my check_crc.py script didn't fully capture is_invalid_fragment_header; in particular, it didn't include the version check [1], which would have caused your first frag (which seems to have been written by libec 1.0.9) to skip the checksumming. My understanding is that pre-1.2.0 libec would always store 0 for the checksum, which matches your observations.

chksum_type applies to *fragment data* checksumming; it's only relevant insofar as it would use the same implementation of crc32 as the metadata checksumming. It defaults to 1 (no checksum is stored) and though pyeclib exposes a way to change it [2], Swift never uses it [3]. Instead, Swift relies on its existing MD5-based integrity checks.

[1] https://github.com/openstack/liberasurecode/blob/1.5.0/src/erasurecode.c#L1092-L1094
[2] https://github.com/openstack/pyeclib/blob/1.5.0/pyeclib/ec_iface.py#L186-L192
[3] https://github.com/openstack/swift/blob/2.14.0/swift/common/storage_policy.py#L482-L484

Revision history for this message
Tim Burke (1-tim-z) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to liberasurecode (master)

Fix proposed to branch: master
Review: https://review.openstack.org/480776

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to liberasurecode (master)

Reviewed: https://review.openstack.org/480776
Committed: https://git.openstack.org/cgit/openstack/liberasurecode/commit/?id=9b4d8bcf8dc97d7edad3dc1443b317ecb5a0a254
Submitter: Jenkins
Branch: master

commit 9b4d8bcf8dc97d7edad3dc1443b317ecb5a0a254
Author: Tim Burke <email address hidden>
Date: Thu Jul 6 00:08:45 2017 +0000

    Stop pretending to support SSE4-optimized CRC-32C

    It isn't the CRC we want, and we never used it anyway. While we may
    define INTEL_SSE41 or INTEL_SSE42 if CPU seems to support it, we've
    never defined INTEL_SSE4.

    Change-Id: I04e1dd6458ccde58a0a2f3f4d6947569a31e9697
    Partial-Bug: 1666320

Changed in liberasurecode:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/459023
Committed: https://git.openstack.org/cgit/openstack/liberasurecode/commit/?id=a9b20ae6a38073afe91ae2b7d789ddfb7dabade8
Submitter: Jenkins
Branch: master

commit a9b20ae6a38073afe91ae2b7d789ddfb7dabade8
Author: Tim Burke <email address hidden>
Date: Thu Jul 6 00:21:01 2017 +0000

    Use zlib for CRC-32

    Previously, we had our own CRC that was almost but not quite like
    zlib's implementation. However,

    * it hasn't been subjected to the same rigor with regard to error-detection
      properties and
    * it may not even get used, depending upon whether zlib happens to get
      loaded before or after liberasurecode.

    Now, we'll use zlib's CRC-32 when writing new frags, while still
    tolerating frags that were created with the old implementation.

    Change-Id: Ib5ea2a830c7c23d66bf2ca404a3eb84ad00c5bc5
    Closes-Bug: 1666320

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to liberasurecode (master)

Reviewed: https://review.openstack.org/636556
Committed: https://git.openstack.org/cgit/openstack/liberasurecode/commit/?id=7e97b2f808e1115425eef527899b27528f05af3a
Submitter: Zuul
Branch: master

commit 7e97b2f808e1115425eef527899b27528f05af3a
Author: Pete Zaitcev <email address hidden>
Date: Tue Feb 12 22:08:26 2019 -0600

    Make our alt crc32 more portable

    Apparently the author of our old crc32 assumed that shifting an int
    to the right sign-extends, which is not always the case. Result is,
    building and running make test on s390x fails. The fix is to force
    a sign-extension using the "xor 0x80; sub 0x80" trick.

    N.B. This does not cause a compatibility problem, because by a
    miracle the "broken" crc32_alt was actually computing a stock
    crc32, same that zlib has. Therefore, if someone, somewhere,
    ran a Swift cluster on s390x with erasure coding policy,
    the data in it is already stored with zlib checksums, as we
    do it now anyway. This fix only permits the tests pass, which
    used the bad data sample from x86.

    Change-Id: Ibd5e4e6c02be00540a9648cc7e0f8efda275bf3f
    Related-Change: Ib5ea2a830c7c23d66bf2ca404a3eb84ad00c5bc5
    Related-Bug: 1666320

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to liberasurecode (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/738959

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to liberasurecode (master)

Reviewed: https://review.opendev.org/738959
Committed: https://git.openstack.org/cgit/openstack/liberasurecode/commit/?id=5626cd5791bd35279594e067c10581e89ae66abc
Submitter: Zuul
Branch: master

commit 5626cd5791bd35279594e067c10581e89ae66abc
Author: Tim Burke <email address hidden>
Date: Wed Jul 1 21:59:45 2020 -0700

    Be willing to write fragments with legacy crc

    ...if users *really* want to. They opt-in at run time by setting

        LIBERASURECODE_WRITE_LEGACY_CRC=1

    in the environment; leaving it unset, set to an empty string, or set to
    the string "0" continues to write zlib crcs.

    UpgradeImpact
    =============
    This option is intended to allow a smooth upgrade from liberasurecode
    1.5.0 and earlier in a system with multiple readers and writers:

      * Before upgrade, ensure the environment variable is set on all nodes.
        This will be ignored by earlier versions.
      * Upgrade liberasurecode on each node in the system, restarting any
        services that use it. Every node continues writing CRCs that are
        still usable by nodes that have not yet upgraded.
      * Now that every node is capable of reading zlib CRCs, remove the
        environment variable from each node to start writing new CRCs.

    If you are already using 1.6.0 or later, just upgrade normally.

    Closes-Bug: #1886088
    Closes-Bug: #1867937
    Related-Bug: #1666320
    Needed-By: https://review.opendev.org/#/c/739164/
    Change-Id: I9adfbe631a2dddc592fd08f8a325f3e8331b92f1

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.