libatlas not using vector instructions - large performance impact

Bug #1803077 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Canonical Foundations Team
atlas (Ubuntu)
Fix Released
Undecided
Dimitri John Ledkov

Bug Description

The libatlas library delivered with Ubuntu 18.04.1 is build for zEC12. There is no alternative library available for z13 and z14 exploiting the vector instructions. The source package from Ubuntu seems to have the z13 patches applied.

---uname output---
Linux m42lp10 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:42:24 UTC 2018 s390x s390x s390x GNU/Linux

---Additional Hardware Info---
standard Ubuntu install

Machine Type = z13, z14

---Debugger---
A debugger is not configured

---Steps to Reproduce---
Install libatlas, call any of the standard blas routines, observe that there are no Z instructions.

Userspace tool common name: libatlas

The userspace tool has the following bit modes: 64 bit

Userspace deb: atlas package in Ubuntu

Userspace tool obtained from project website: na

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-173087 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
bugproxy (bugproxy)
tags: added: targetmilestone-inin1804
removed: targetmilestone-inin---
Frank Heimes (fheimes)
affects: linux (Ubuntu) → atlas (Ubuntu)
Frank Heimes (fheimes)
tags: added: universe
Changed in ubuntu-z-systems:
importance: Undecided → High
Revision history for this message
Frank Heimes (fheimes) wrote :

Canonical focuses on having a single library build for each architecture in it's archive, containing all possible optimizations (LP 1702917, #4).
If a separate z13 optimized library is desired it needs to be placed in a PPA - for example.
I think that optimizations should be ideally addressed in the upstream code - for example with (#ifndef) approaches like HW_CAPS or with S390_ALTERNATIVE macros.
Any thoughts and opinions?

Changed in ubuntu-z-systems:
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Changed in atlas (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Dimitri John Ledkov (xnox)
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Hi, I'm looking through the atlas code. In 2017, I did contribute changes to packaging to compile atlas with the ARCHS set to z12 on ubuntu (up from z9, and provided the matching archdefs).

Looking at the upstream code I do not see any z13 or z14 specific code in atlas - are there any? do we need to upgrade atlas?

Is this a request to simply compile the library with `-march=zEC12 -mtune=z14` options?

Ideally, we would want to avoid providing three builds for each of the -march=zEC12, -march=z13, -march=z14.

Ideally, we would have a single `fat` binary that does runtime detection and does utilize the best available vectorization it can; with explicit code changes or by utilizing automatic compiler function multi-versioning. I.e. many other libraries on s390x and other architectures do runtime detection to exploit NEO, Altivec, AVX2, etc, rather than provide separate builds.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-11-27 03:43 EDT-------
There were some z13 related patches from IBM and modified versions of that got upstream.
e.g. https://sourceforge.net/p/math-atlas/patches/74/

From our experiments the z13 support does not appear to be functional in current versions of libatlas. We are working on a patch including also z14 support. First measurements show improvements of up to 5x compared to the libatlas version from 18.04. . There is one major problem to fix. After that we plan to bring these patches upstream.

We also plan to provide the proper tuning files for z13 and z14. These will allow building libatlas tuned for z13/z14 without actually requiring such a machine during build. If such tuning files are missing during build phase libatlas will try to run a lengthy tuning run. This step alone would take more than a day per machine.

In order to make use of this in a distro I think having separate libraries is the way to go. All the infrastructure is in place for many years. The dynamic loader already checks various lib subdirs depending on the hardware capabilities. Just placing a library version in the proper subdir will do the trick. No runtime overhead. No extra memory occupied.

wrt IFUNC:

This is good for function level optimizations. I.e. if there is a subset of functions in a lib which would benefit from machine optimizations this is the way to go. We do this already in glibc. But in case of libatlas IFUNC would be needed for basically everything in the lib. In the end you would have a lib 3 times the size of the current version plus one more for every new CPU level. Without investing some effort in libatlas to keep the ifunc versions for one CPU level together in one particular area (an ELF section probably) the different versions would end up being interleaved in the binary forcing the entire thing to occupy memory.

But the toughest part would probably be to make libatlas build differently tuned versions of a function. The libatlas tuning is not just about building something with different compiler options. libatlas generates code itself depending on cache characteristics, availability of multiply and add, vector instructions and measurements done during tuning. The build mechanism is supposed to produce on lib currently and would require substantial changes to perform the tuning on a function level base.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

"All the infrastructure is in place for many years." the runtime dirs, and compiling - yes; the debian packaging however will need changes to build library multiple times and ship it in the debs, that is not done.

We do not have z14; and we would not want to subject our machines to retuning for z13 every time - plus i'm not sure if it will retune successfully on our builders, as we run builders as overcommitted KVM instances.

Thus we will need proper tuning files for z13 and z14. Ideally for zEC12 as well, for us to integrate tuned builds. And we don't have capacity to create such tuning files due to lack of access to zEC12 or z14 machines.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Note that DEB_BUILD_OPTIONS=custom builds are not something that is suitable to run on our buildds and to ship in the archive for everyone. As the package name clash there, and the build would be for the buildd, rather than for a particular tuning level.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

So, I'm expecting for you to provide tarballs similar to:

$ ls -latr debian/archdefs/s390x/*.bz2
-rw-r--r-- 1 xnox xnox 7191 Feb 5 15:42 debian/archdefs/s390x/IBMz964.tar.bz2
-rw-r--r-- 1 xnox xnox 7835 Feb 5 15:42 debian/archdefs/s390x/IBMz1264.tar.bz2

but for z13, and z14. Also please check the IBMz1264.tar.bz2 tarball if you can or want to change the performance there.

If there are any additional patches required to cherrypick from upstream, please point them out. And any compiler flags.

For the tuned builds, we can use things like -march=z13 -O3 and so on. if that results in better performance.

For bonus points, you can also work on changing the debian/rules to execute the builds multiple times in separate build directories, but it's probably something quicker for me to hack on, once you have the compiler flags / config options / tuning tarballs done.

Revision history for this message
Frank Heimes (fheimes) wrote :

Changing to Fix Committed because LP 1814796 got Fix Committed - otherwise this is not reflected in IBM's bugzilla.

Changed in atlas (Ubuntu):
status: New → Fix Released
Changed in ubuntu-z-systems:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.