Performance regression with memcpy on Intel CPU

Bug #1988240 reported by Shantanu Jain
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

# lsb_release -rd
Description: Ubuntu 20.04.4 LTS
Release: 20.04

Reporting a performance regression in libc6-dev==2.31-0ubuntu9.9 when upgrading from 9.7.

Regression was observed on Intel Xeon(R) Gold 6248 CPU @ 2.50GHz (Cascade Lake)

We're seeing a 3x slowdown on e.g. the following tiny program and similar slowdowns on important workloads:
```
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(void) {
    size_t SIZE = (1 << 20);
    char *src = malloc(SIZE);
    char *dst = malloc(SIZE);

    for(int i = 0; i < (SIZE); ++i) {
        src[i] = rand() % 256;
        dst[i] = rand() % 256;
    }
    clock_t start = clock();
    for(int i = 0; i < 10000; ++i) {
        memcpy(dst, src, SIZE);
    }
    clock_t end = clock();
    printf("%f\n", (double) (end - start)/CLOCKS_PER_SEC);
}
```

Probably due to changes resulting from https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1928508

Revision history for this message
Heitor Alves de Siqueira (halves) wrote :

Thanks for the report, Shantanu!

Have you confirmed whether this is indeed related to the changes from bug 1928508? I've looked into upstream changes to __x86_shared_non_temporal_threshold, and there were no fixes or regression reports after the ones we've backported to Ubuntu Focal. At the time this change was introduced, no regressions in other platforms have been reported upstream or in Ubuntu, so I wonder if we missed your test case.

Would you be able to double-check whether that patch is responsible? Have you seen different performance behavior in recent glibc versions, or other distros with the same glibc version? One could also use different tunable values for __x86_shared_non_temporal_threshold like below:

$ GLIBC_TUNABLES=glibc.cpu.x86_non_temporal_threshold=1024*1024*3*4

Changed in glibc (Ubuntu):
status: New → Incomplete
Revision history for this message
Shantanu Jain (hauntsaninja) wrote :

Thanks for the quick response!

> Would you be able to double-check whether that patch is responsible?
I'll figure out how to build glibc to confirm that it's related to 1928508 and get back to you with definitive confirmation :-)

> Have you seen different performance behavior in recent glibc versions?
Yes, like I mentioned, libc6-dev==2.31-0ubuntu9.7 has >3x better performance than libc6-dev==2.31-
0ubuntu9.7.
(I see even better performance on the same hardware with 18.04 using a much older glibc)

> or other distros with the same glibc version.
Not yet, I can try this as well.

> One could also use different tunable values
Thanks for the suggestion! I can confirm that setting `export GLIBC_TUNABLES=glibc.cpu.x86_non_temporal_threshold=$(( 1024*1024*3*4 ))` fixes the performance regression. This also points to 1928508

Revision history for this message
Shantanu Jain (hauntsaninja) wrote :

> different tunable values
With the good libc (libc6-dev==2.31-0ubuntu9.7):
I can trigger a performance regression by explicitly setting the tunable threshold to `1024*1024*3/4`. If I explicitly set to `1024*1024*16*3/4` I once again have good perf.

> or other distros
With Ubuntu 22.04.1 LTS and libc6 2.35-0ubuntu3.1 I see the performance regression. It seems marginally better than 2.31-0ubuntu9.9, e.g. 2.9x worse perf instead of 3.2x.

With 22.10 devel and libc6 2.35-0ubuntu3, I see performance regression equal to that with 22.04.1.

With Debian bullseye and libc6 2.31-13+deb11u3, I see the performance regression.

> double-check whether that patch is responsible
Unfortunately, I wasn't able to get glibc to make install :-( Ran into some scary looking "_dl_call_libc_early_init: Assertion `sym != NULL' failed!" errors. Let me know if you have advice on how to build glibc.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for glibc (Ubuntu) because there has been no activity for 60 days.]

Changed in glibc (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.