[regression] Wrong EU count for i5-5250U, reducing OpenCL performance

Bug #1800752 reported by Gordon Lack
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
beignet (Ubuntu)
Invalid
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

I run a BOINC project which uses opencl_intel on my Kubunut system

On bioinc the tasks ran in 12,000 +/- 200 seconds.
They've been doing that for ~6 months.

I've now updated the system to cosmic, and the task are not taking 19,000 +/- 200s.

I have another system (which is still running Mint18.3 - and hasn't changed) which also runs these tasks and the timings there have not changed, so the tasks themselves are still equivalent.

ProblemType: Bug
DistroRelease: Ubuntu 18.10
Package: beignet-opencl-icd 1.3.2-4
ProcVersionSignature: Ubuntu 4.18.0-10.11-generic 4.18.12
Uname: Linux 4.18.0-10-generic x86_64
ApportVersion: 2.20.10-0ubuntu13
Architecture: amd64
CurrentDesktop: KDE
Date: Wed Oct 31 01:17:08 2018
InstallationDate: Installed on 2018-05-09 (174 days ago)
InstallationMedia: Kubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
SourcePackage: beignet
UpgradeStatus: Upgraded to cosmic on 2018-10-26 (4 days ago)
---
ProblemType: Bug
ApportVersion: 2.20.10-0ubuntu13
Architecture: amd64
DistroRelease: Ubuntu 18.10
InstallationDate: Installed on 2018-03-23 (228 days ago)
InstallationMedia: Kubuntu 18.04 LTS "Bionic Beaver" - Alpha amd64 (20180306.1)
Package: linux
PackageArchitecture: amd64
ProcVersionSignature: Ubuntu 4.18.0-10.11-generic 4.18.12
Tags: cosmic
Uname: Linux 4.18.0-10-generic x86_64
UpgradeStatus: Upgraded to cosmic on 2018-10-28 (9 days ago)
UserGroups:

_MarkForUpload: True

Revision history for this message
Gordon Lack (gordon-lack) wrote :
description: updated
Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

Was this report sent from the affected system? as it says the CPU is an i5-2500K, which is too old for beignet. (It shouldn't crash, but won't be running the computations, so the slowdown is probably from some other package.)

Please post the output of clinfo (from the clinfo package).

Changed in beignet (Ubuntu):
status: New → Incomplete
Revision history for this message
Gordon Lack (gordon-lack) wrote :
Download full text (8.9 KiB)

>> Was this report sent from the affected system?

No - sorry. I forgot about that....

This is clinfo from the affected system:

Welcome to Ubuntu 18.10 (GNU/Linux 4.18.0-10-generic x86_64)

 * Documentation: https://help.ubuntu.com
 * Management: https://landscape.canonical.com
 * Support: https://ubuntu.com/advantage

0 packages can be updated.
0 updates are security updates.

Last login: Sun Nov 4 22:49:29 2018 from 192.168.1.10
[nuc]: scp nuc.clinfo gmllaptop:
nuc.clinfo 100% 8371 1.8MB/s 00:00
[nuc]: rm -f nuc.clinfo
[nuc]: exit
logout
Connection to nuc closed.
[gmllaptop]: cat nuc.clinfo
Number of platforms 1
  Platform Name Intel Gen OCL Driver
  Platform Vendor Intel
  Platform Version OpenCL 2.0 beignet 1.3
  Platform Profile FULL_PROFILE
  Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing
  Platform Extensions function suffix Intel

  Platform Name Intel Gen OCL Driver
Number of devices 1
  Device Name Intel(R) HD Graphics 6000 BroadWell U-Processor GT3
  Device Vendor Intel
  Device Vendor ID 0x8086
  Device Version OpenCL 1.2 beignet 1.3
  Driver Version 1.3
  Device OpenCL C Version OpenCL C 1.2 beignet 1.3
  Device Type GPU
  Device Profile FULL_PROFILE
  Device Available Yes
  Compiler Available Yes
  Linker Available Yes
  Max compute units 24
  Max clock frequency 1000MHz
  Device Partition (core)
    Max number of sub-devices 1
    Supported partition types None, None, None
  Max work item dimensions 3
  Max work item sizes 512x512x512
  Max work group size 512
  Preferred work group size multiple 16
  Preferred / native vector sizes
    char 16 / 8
    short 8 / 8
    int 4 / 4
    long 2 / 2
    ...

Read more...

Revision history for this message
Gordon Lack (gordon-lack) wrote :
Download full text (3.4 KiB)

If it's any help here is the info from the last BOINC startup log under Bionic and the first under Cosmic.

Bionic:
28-Oct-2018 11:12:30 [---] Starting BOINC client version 7.7.0 for x86_64-pc-linux-gnu
28-Oct-2018 11:12:30 [---] This a development version of BOINC and may not function properly
28-Oct-2018 11:12:30 [---] log flags: file_xfer, sched_ops, task
28-Oct-2018 11:12:30 [---] Libraries: libcurl/7.58.0 GnuTLS/3.5.18 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
28-Oct-2018 11:12:30 [---] Data directory: /extra/BOINC/wlib
28-Oct-2018 11:12:31 [---] OpenCL: Intel GPU 0: Intel(R) HD Graphics 6000 BroadWell U-Processor GT3 (driver version 1.3, device version OpenCL 1.2 beignet 1.3, 3931MB, 3931MB available, 384 GFLOPS peak)
28-Oct-2018 11:12:31 [---] Host name: nuc
28-Oct-2018 11:12:31 [---] Processor: 4 GenuineIntel Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz [Family 6 Model 61 Stepping 4]
28-Oct-2018 11:12:31 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap intel_pt xsaveopt dtherm ida arat pln pts flush_l1d
28-Oct-2018 11:12:31 [---] OS: Linux: 4.15.0-38-generic
28-Oct-2018 11:12:31 [---] Memory: 7.68 GB physical, 4.88 GB virtual
28-Oct-2018 11:12:31 [---] Disk: 18.21 GB total, 14.96 GB free

Cosmic:
28-Oct-2018 12:04:26 [---] Starting BOINC client version 7.7.0 for x86_64-pc-linux-gnu
28-Oct-2018 12:04:26 [---] This a development version of BOINC and may not function properly
28-Oct-2018 12:04:26 [---] log flags: file_xfer, sched_ops, task
28-Oct-2018 12:04:26 [---] Libraries: libcurl/7.61.0 GnuTLS/3.6.4 zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.4) nghttp2/1.32.1 librtmp/2.3
28-Oct-2018 12:04:26 [---] Data directory: /extra/BOINC/wlib
28-Oct-2018 12:04:26 [---] OpenCL: Intel GPU 0: Intel(R) HD Graphics 6000 BroadWell U-Processor GT3 (driver version 1.3, device version OpenCL 1.2 beignet 1.3, 3931MB, 3931MB available, 192 GFLOPS peak)
28-Oct-2018 12:04:26 [---] Host name: nuc
28-Oct-2018 12:04:26 [---] Processor: 4 GenuineIntel Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz [Family 6 Model 61 Stepping 4]
28-Oct-2018 12:04:26 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi fl...

Read more...

Revision history for this message
Gordon Lack (gordon-lack) wrote :

Just noticed that the logs in #4 show a difference in the reported GFLOPS peak.
Bioinc says 384, while Cosmic says 192.

Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

Well spotted - BOINC "GFLOPS peak" is the theoretical 8 * compute units * max clock (not an actual speed measurement), and matches the 24 compute units clinfo reports, not the 48 compute units this GPU is supposed to have.

Beignet can take this value from an internal hardcoded list (in src/cl_device_id.c, which says 48 for this GPU), but by default it prefers to ask the kernel.

This change should both confirm that that's where the 24 is coming from, and change it back to 48 to allow testing whether this is the source of the slowdown (warning: untested, might crash/hang the system):

--- a/src/intel/intel_driver.c
+++ b/src/intel/intel_driver.c
@@ -975,7 +975,7 @@ unsigned int eu_total;

 /* Prefer driver-queried max compute units if supported */
 if (!drm_intel_get_eu_total(driver->fd, &eu_total))
- device->max_compute_unit = eu_total;
+ printf("kernel says %i compute units, hardcoded says %i\n", eu_total, device->max_compute_unit);
 else if (IS_CHERRYVIEW(device->device_id))
   printf(CHV_CONFIG_WARNING);
 #else

Procedure for recompiling a package with changes:

# requires a deb-src line in /etc/apt/sources.list - if there is a commented one there, uncomment it
sudo apt-get install build-essential
sudo apt-get build-dep beignet
apt-get source beignet
cd beignet*
# make your changes
# the patch name this asks for can be anything, the header can be left as-is
dpkg-source --commit
dpkg-buildpackage -us -uc
sudo dpkg -i ../beignet*.deb

Revision history for this message
Gordon Lack (gordon-lack) wrote :

Looks like there's a bug in dpkg-buildpackage!?!

There's a update_metainfo_xml.py which, from line1, expects to be run by python3. but
a) it isn't marked as executable.
b) it produces an error when run by dpkg-buildpackage. A complaint about the encoding= parameter, so it looks like it's being run by python2?

Revision history for this message
Gordon Lack (gordon-lack) wrote :

The dh_auto_configure reports this:

-- Found PythonInterp: /usr/bin/python (found version "2.7.15")

I've added a python symlink (to python3) in /usr/local/bin to get the
build to run.

but there's a bug in the build logic somewhere.

Revision history for this message
Gordon Lack (gordon-lack) wrote :

Built and installed beignet-opencl-icd_1.3.2-4_amd64.deb with the patch.

clinfo now reports (4 times):
   kernel says 24 compute units, hardcoded says 48

Peak GFLOPS reported by BOINC now back at 384.

BOINC running - haven't had sufficient time to check relative processing speed yet.

System hasn't hung....

So that seem to be the issue. I suppose the question now is why did teh code get 48 in Bionic, but only get 24 in Cosmic?

And how to report the bug in the build logic for finding python...

Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

> I suppose the question now is why did teh code get 48 in Bionic, but only get 24 in Cosmic?
Agreed; I'll look into this later.

> And how to report the bug in the build logic for finding python...
Here is enough. (It's probably in beignet, not dpkg-buildpackage, and not new: packages for distribution are built in minimal chroots, so there wouldn't have been a python2 to trigger this bug in official builds.)

Changed in beignet (Ubuntu):
status: Incomplete → In Progress
Revision history for this message
Gordon Lack (gordon-lack) wrote :

>> BOINC running - haven't had sufficient time to check relative processing speed yet.

I can confirm that it is back to "full speed". Jobs are taking ~12,000s again.

I've rebooted to the previous Bioinc kernel (still there from the upgrade) and when I run clinfo there I see:
  kernel says 48 compute units, hardcoded says 48
so it looks like a kernel regression between 4.15.0-38-generic and 4.18.0-10-generic.

Revision history for this message
Gordon Lack (gordon-lack) wrote :

FYI: Boiled it down to a simple ioctl() call.

===== Start code =====
/* Read an ioctl value */

#include <stdio.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#include <xf86drm.h>
#include <i915_drm.h>

int main(void) {

    int eu_total;
    drm_i915_getparam_t gp;

    gp.value = &eu_total;
    gp.param = I915_PARAM_EU_TOTAL;

    int dru = open("/dev/dri/renderD128", O_RDONLY);
    if (dru < 0) {
        perror("open renderD128");
        return errno;
    }
    int ret = ioctl(dru, DRM_IOCTL_I915_GETPARAM, &gp);
    if (ret < 0) {
        perror("ioctl");
        return errno;
    }

    printf("Got value: %d\n", eu_total);
    return 0;
}
===== End code =====

Compiled with:
  gcc -I /usr/include/drm gmltest.c -o gmltest
(needed the extra -I to get drm.h found by xf86drm.h, which is probably a bug, but I'll ignore that...)
This then reports:
>> root@nuc:/local/users/gml4410/homework# ./gmltest
>> Got value: 24

Revision history for this message
Gordon Lack (gordon-lack) wrote :

And, just for confirmation, with the older Bionic 4.15.0-38-generic kernel I get:

>> root@nuc:/local/users/gml4410/homework# ./gmltest
>> Got value: 48

Revision history for this message
Gordon Lack (gordon-lack) wrote :
Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

Further information may be available in the files (readable only by root, the 0 may be different if you have more than one graphics device):

/sys/kernel/debug/dri/0/i915_capabilities
/sys/kernel/debug/dri/0/i915_sseu_status

summary: - Performance degraded in Cosmic
+ [regression] Wrong EU count for i5-5250U, reducing OpenCL performance
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1800752

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Gordon Lack (gordon-lack) wrote :

apport-collect 1800752

produced:
>> ERROR: The python3-launchpadlib package is not installed. This functionality is not available.
which might explain why the info wasn't there...

I've installed it - it might now be collecting data...

Revision history for this message
Gordon Lack (gordon-lack) wrote :

>> Further information may be available in the files (readable only by root,
>> the 0 may be different if you have more than one graphics device):

Info attached (only 0 has those files) for both Cosmic (4.18.0-10-generic) and Bionic (4.15.0-38-generic) kernels. (The change mentioned in #14 seems to have gone into 4.17.)

Revision history for this message
Gordon Lack (gordon-lack) wrote :

Here's the Bionic kernel info (couldn't seem to be able to attach two files to one comment?).

Revision history for this message
Gordon Lack (gordon-lack) wrote :

>> I've installed it - it might now be collecting data...

No, it's not. It fired up the OAuth authorization page, which I authorized (and got the mail confirmation). Now it just sits there....

Revision history for this message
Gordon Lack (gordon-lack) wrote : Dependencies.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Gordon Lack (gordon-lack) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Gordon Lack (gordon-lack) wrote : ProcEnviron.txt

apport information

Revision history for this message
Gordon Lack (gordon-lack) wrote :

OK. So I have to run as root *while logged in to desktop of the system*. Just being logged in as root (on a system where I "never" user the console...) over the network isn't good enough.

Now done. It claims to have submitted data. I'll change the state to Confirmed.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Gordon Lack (gordon-lack) wrote :

Only affect Cosmic. Bionic is OK.

Revision history for this message
Gordon Lack (gordon-lack) wrote :

I've been looking (briefly) for some docs on the Intel GPU hardware layout.

Came across this:
 https://01.org/linuxgraphics/documentation/hardware-specification-prms
leading to:
 https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol04-configurations_3.pdf

which says that a Intel HD graphics 6000 set-up (which is what this system has) will have 47* EUs.

*(*) Intel reserves the right to increase the number of EUs on these SKUs.

So perhaps this is a strange oddity that doesn't fit into the mask counting?

Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

('invalid' = 'not beignet's bug' - I do agree it is *a* bug)

The kernel code that reads these numbers from the device (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/i915/intel_device_info.c#n434) does explicitly support the 47 EU case (any one of the 48 faulty and disabled), and has nothing obviously wrong with it.

If you want to take this further, my suggested next step is:
- get the latest upstream kernel (https://wiki.ubuntu.com/KernelTeam/GitKernelBuild)
- add #include <linux/printk.h> and lots of debug printk()s to the above broadwell_sseu_info_init function (printk() not printf(), and under 1024 bytes per message, as this is kernel code - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/kernel-hacking/hacking.rst#n234)
- compile (will probably take several hours) and boot into that
- see if the issue still exists, and if it does, check /var/log/kern.log for the debug messages you added

Changed in beignet (Ubuntu):
status: In Progress → Invalid
Revision history for this message
Gordon Lack (gordon-lack) wrote :

Or I could post a comment to the commit mentioned in #14 - see what the author (from Intel) thinks?

Not sure how much printk()s would tell me - the debugfs already shows it ends up with only 4 EUs pre sub-slice instead of 8. I'll give it a go, though. But I (might) need signed kernels...I assume it will be possible to do what us required.

Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

Additional information (note in particular that Intel consider the graphics development branch at https://cgit.freedesktop.org/drm-tip, not the integration branch at kernel.org, to be the appropriate "latest upstream" to test against):

https://01.org/linuxgraphics/documentation/how-report-bugs

And yes, self-built kernels won't work on hardware/BIOSes that require kernels to be signed.

There may be an easier way than rebuild-with-printk() to get the GPU's raw registers: https://01.org/linuxgraphics/documentation/using-intel-reg-dumper

Revision history for this message
Gordon Lack (gordon-lack) wrote :

Thanks.
I'll add relevant links (both ways) if I can as I report it on that site.
If the code hasn't changed in the graphics development branch I may submit a report now anyway...

Revision history for this message
Gordon Lack (gordon-lack) wrote :
Revision history for this message
Gordon Lack (gordon-lack) wrote :

They've found the bug!!
   https://bugs.freedesktop.org/attachment.cgi?id=142443

Is there a mechanism for getting this patched into the current Cosmic kernels (and, presumably, Disco)?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Yes. The patch will land Cosmic and Disco.

Revision history for this message
Gordon Lack (gordon-lack) wrote :

Good.

I've tested that patch in 4.18.0 and can confirm it works.
https://bugs.freedesktop.org/show_bug.cgi?id=108712#c14

Revision history for this message
Gordon Lack (gordon-lack) wrote :

This has been fixed....

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.