[21.10 FEAT] zlib CRC32 optimization for s390x

Bug #1932010 reported by bugproxy
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
zlib (Ubuntu)
Fix Released
High
Simon Chopin

Bug Description

Use SIMD instructions to accelerate the zlib CRC32 implementation.
Business value: Performance improvement in database area

The zlib CRC32 optimization for IBM Z is currently being discussed for inclusion into zlib-ng:
https://github.com/zlib-ng/zlib-ng/pull/912

The patch for zlib will be based on that.

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-192731 severity-high targetmilestone-inin2110
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → zlib (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in zlib (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Foundations Team (canonical-foundations)
Changed in ubuntu-z-systems:
importance: Undecided → High
status: New → Triaged
Steve Langasek (vorlon)
tags: added: fr-1445
Revision history for this message
Simon Chopin (schopin) wrote :

Here's a debdiff containing a backported version of the feature. There is also a small refactoring as it allowed me to easily integrate the new feature in the build process.

Changed in zlib (Ubuntu):
assignee: Canonical Foundations Team (canonical-foundations) → Simon Chopin (schopin)
status: New → In Progress
information type: Private → Public
Revision history for this message
Simon Chopin (schopin) wrote :
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-08-10 17:45 EDT-------
*** Bug 193835 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote : patch

------- Comment on attachment From <email address hidden> 2021-08-10 17:47 EDT-------

Argh, I haven't noticed this ticket and made a backport on my own a week ago. I'm attaching my patch here just for reference and will give yours a try tomorrow.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

At a very quick glance, Simon's patch looks a bit closer to the pull request than the one in comment #4, so if you could test that soon that would be good.

Simon, aside from debian/patches/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch which I'm not sure I'm competent to review beyond seeing how it compares to the pull request, your patch looks mostly fine but it seems you've duplicated the definition of CONFIGURE_* in debian/rules? You probably don't want to do that?

Revision history for this message
Simon Chopin (schopin) wrote :

Indeed, nice catch! The _32 and _64 variants weren't used anymore, either.

Updated debdiff attached, and a ~ppa2 version has been uploaded to my PPA.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-08-11 06:59 EDT-------
zlib1g_1.2.11.dfsg-2ubuntu7~ppa2_s390x.deb works fine, thanks!

# time -p env DFLTCC=0 python3 -c 'import gzip; gzip.decompress(gzip.compress(b"\x00" * 500000000, 1))'
real 2.59
user 2.23
sys 0.35

# dpkg -i zlib1g_1.2.11.dfsg-2ubuntu7~ppa2_s390x.deb

# time -p env DFLTCC=0 python3 -c 'import gzip; gzip.decompress(gzip.compress(b"\x00" * 500000000, 1))'
real 1.99
user 1.62
sys 0.36

However, I'm not quite sure why does

++#ifdef S390_CRC32_VX
++ if (getauxval(AT_HWCAP) & HWCAP_S390_VX)
++ return s390_crc32_vx;
++#endif

work. For me getauxval() causes a segfault when called from an ifunc, that's why I have to use the s390-specific ifunc's hwcap parameter.

Revision history for this message
Simon Chopin (schopin) wrote : Re: [Bug 1932010] Comment bridged from LTC Bugzilla

> However, I'm not quite sure why does
>
> ++#ifdef S390_CRC32_VX
> ++ if (getauxval(AT_HWCAP) & HWCAP_S390_VX)
> ++ return s390_crc32_vx;
> ++#endif
>
> work. For me getauxval() causes a segfault when called from an ifunc,
> that's why I have to use the s390-specific ifunc's hwcap parameter.

Perhaps a bug in your compiler version?

In any case, I believe we can consider the current patch as working as
intended, despite this mystery.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-08-11 10:36 EDT-------
I think it rather has to do with glibc - ifunc are normally called quite early and therefore cannot use the dynamically resolved symbols, but maybe newer glibcs tolerate that. Unfortunately I couldn't find more information on this subject. The code seems to work, but just to be on the safe side I'd rather switch to the "old way" of doing things, that involves the hwcap argument:

https://bugzilla.linux.ibm.com/attachment.cgi?id=150618&action=diff#a/crc32.c_sec1
https://bugzilla.linux.ibm.com/attachment.cgi?id=150618&action=diff#a/crc32.c_sec3

Revision history for this message
Simon Chopin (schopin) wrote :

I don't have access to your bugzilla, so I'm assuming you pointed to your own patch and its crc32.c section where you pass hwcap by argument and call it earlier in the stack? If so, I've implemented this approach in this new debdiff, and the PPA has been updated as well :-).

Michael, if you're wondering why the s390_crc32_vx has migrated to crc32.c, it's because the GLibc documentation mentions something about ifunc only working properly if the deported functions are in the same translation unit.

In the end, my version of the patch doesn't look that different to Ilya's ;-)

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-08-11 14:45 EDT-------
Oh, sorry, I forgot that you can't use the IBM BZ links. I indeed had the following diffs in mind:

1.
#ifndef Z_IFUNC_ASM
local
#endif
-unsigned long (*(crc32_z_ifunc(void)))(unsigned long, const unsigned char FAR *, z_size_t)
+unsigned long (*(crc32_z_ifunc(
+#ifdef __s390__
+unsigned long hwcap
+#else
+void
+#endif
+)))(unsigned long, const unsigned char FAR *, z_size_t)
{
#if _ARCH_PWR8==1
#if defined(__BUILTIN_CPU_SUPPORTS__)

2.
#endif
#endif /* _ARCH_PWR8 */

+#ifdef HAVE_S390X_VX
+ if (hwcap & HWCAP_S390_VX)
+ return s390_crc32_vx;
+#endif
+
/* return a function pointer for optimized arches here */

3.
static unsigned long ZEXPORT (*crc32_func)(unsigned long, const unsigned char FAR *, z_size_t) = NULL;

if (!crc32_func)
- crc32_func = crc32_z_ifunc();
+ crc32_func = crc32_z_ifunc(
+#ifdef __s390__
+ getauxval(AT_HWCAP)
+#endif
+ );
return (*crc32_func)(crc, buf, len);
}

I think the latest debdiff still misses the second one (the getauxval() call is still there):

++#if defined(__s390__) && defined(S390_CRC32_VX)
++ if (getauxval(AT_HWCAP) & HWCAP_S390_VX)
++ return s390_crc32_vx;
++#endif

A couple nits:
- changelog: it's dfltcc, not dltss (yeah, the name is super weird :-))
- "/* Work around a probable bug in glibc when invoking getauxval in the ifunc */" - I don't think it's actually a bug, since https://sourceware.org/glibc/wiki/GNU_IFUNC says that:

When LD_BIND_NOW=1 or -Wl,z,now is in effect symbols must be immediately resolved at startup. In cases where an external function call depends needs to be made that may fail if such a call has not been initialized yet (PLT-based relocation which is processed later). For example calling strlen in an IFUNC resolver built with -Wl,z,now may lead to a segfault because the PLT is not yet resolved. This may work on x86_64 where the R_*_IRELATIVE relocations happen in DT_JMPREL after the DT_REL* relocations, but that is no guarantee that it will work on AArch64, PPC64, or other architectures that are slightly different. Such fundamental limitations may be lifted at a future point.

What's surprising is rather that, given this paragraph, the code nevertheless works and there doesn't seem to be an obvious glibc commit that lifts this limitation.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I made the changes iii suggested and uploaded.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-08-12 11:39 EDT-------
Thanks, everything looks good now!

# time -p env DFLTCC=0 python3 -c 'import gzip; gzip.decompress(gzip.compress(b"\x00" * 500000000, 1))'
real 2.56
user 2.23
sys 0.33

# sha1sum zlib1g_1.2.11.dfsg-2ubuntu7_s390x.deb
56f20f19ee81e025b6ee26efb803f442aeb2819a zlib1g_1.2.11.dfsg-2ubuntu7_s390x.deb
# dpkg -i zlib1g_1.2.11.dfsg-2ubuntu7_s390x.deb

# time -p env DFLTCC=0 python3 -c 'import gzip; gzip.decompress(gzip.compress(b"\x00" * 500000000, 1))'
real 1.95
user 1.62
sys 0.32

Mathew Hodson (mhodson)
Changed in zlib (Ubuntu):
importance: Undecided → High
Revision history for this message
Frank Heimes (fheimes) wrote :

Since the updated package version arrived in impish-proposed:
 zlib | 1:1.2.11.dfsg-2ubuntu7 | impish-proposed | source
I'm updating the status to Fix Committed.

Changed in zlib (Ubuntu):
status: In Progress → Fix Committed
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zlib - 1:1.2.11.dfsg-2ubuntu7

---------------
zlib (1:1.2.11.dfsg-2ubuntu7) impish; urgency=medium

  [ Simon Chopin ]
  * d/rules: use configure options for dfltcc instead of hardcoding
    the CFLAGS
  * d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch
    ported from zlib-ng #912, adding a vectorized implementation
    of CRC32 on s390x architectures based on kernel code. LP: #1932010

  [ Michael Hudson-Doyle ]
  * d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch: adjust to
    not make a PLT call in an ifunc on s390/s390x.

 -- Simon Chopin <email address hidden> Thu, 12 Aug 2021 15:45:49 +1200

Changed in zlib (Ubuntu):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-10-25 05:29 EDT-------
Fix is part of impish U21.10, hence closing the bug.
IBM BZ status change:->CLOSED

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments