OpenSSL CPU detection for AMD Ryzen CPUs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openssl (Ubuntu) |
Fix Released
|
Medium
|
Eric Desrochers | ||
Xenial |
Fix Released
|
Medium
|
Eric Desrochers | ||
Yakkety |
Fix Released
|
Medium
|
Eric Desrochers | ||
Zesty |
Fix Released
|
Medium
|
Eric Desrochers | ||
Artful |
Fix Released
|
Medium
|
Eric Desrochers |
Bug Description
[Impact]
* Context:
AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement.
[1] /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 1
model name : AMD Ryzen 5 1600 Six-Core Processor
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse
4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho
pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
[2] - sha_ni: SHA1/SHA256 Instruction Extensions
[3] - https:/
...
All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5]
...
* Program to performs the CPUID check:
Reference :
https:/
... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check:
--
int CheckForIntelSh
int a, b, c, d;
// Look for CPUID.7.0.EBX[29]
// EAX = 7, ECX = 0
a = 7;
c = 0;
asm volatile ("cpuid"
:"=a"(a), "=b"(b), "=c"(c), "=d"(d)
:"a"(a), "c"(c)
);
// Intel® SHA Extensions feature bit is EBX[29]
return ((b >> 29) & 1);
}
--
On CPU with sha_ni the program return "1". Otherwise it return "0".
[Test Case]
* Reproducible with Xenial/Zesty/Artful release.
* Generated a checksum of a big file (e.g. 5GB file) with openssl
$ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
SHA256(
real 0m12.835s
user 0m12.344s
sys 0m0.484s
* Openssl speed
$ openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s
Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s
OpenSSL 1.0.2g 1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(idx,
compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55
The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch)
[Regression Potential]
* Note : IRC discussion with infinity :
https:/
* Note from irc discussion with apw and rbasak :
https:/
* It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability.
* The code check the CPUID bit to determine if the sha instructions are available are not.
* Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension
Reference: https:/
"I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without."
* LP reporter comment :
I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension :
(Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology)
[Tested on a Ryzen CPU]
# Generated a checksum of a big file (e.g. 5GB file) with openssl
$ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile
SHA256(
real 0m3.471s
user 0m2.956s
sys 0m0.516s
# Openssl speed
$ openssl speed sha1
Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s
Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s
Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s
OpenSSL 1.0.2g 1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(idx,
compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k
* autopkgtest failure :
{Xenial - openssl}
* Regression in autopkgtest for linux-hwe (ppc64el): test log
* Regression in autopkgtest for linux-hwe (i386): test log
* Regression in autopkgtest for linux-hwe (amd64): test log
* Regression in autopkgtest for linux-gke (amd64): test log
* Regression in autopkgtest for linux-hwe-edge (ppc64el): test log
* Regression in autopkgtest for linux-hwe-edge (i386): test log
* Regression in autopkgtest for linux-hwe-edge (amd64): test log
* Regression in autopkgtest for linux-aws (amd64): test log
In this case all the linux-* regressions are failing because the autopkgtest script: debian/
It compares the 'dpkg-parsechan
if [ "$sver" != "$rver" ]; then
echo "ERROR: running version does not match source package" 1>&2
exit 1
fi
Buildlog error:
autopkgtest [00:45:29]: test ubuntu-
Source Package Version: 4.8.0-54.57~16.04.1
Running Kernel Version: 4.4.0-79.100
ERROR: running version does not match source package
autopkgtest [00:45:32]: test ubuntu-
To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package.
* Regression in autopkgtest for postgresql-9.5 (armhf): test log
==> The autopkgtest regression seems to be there since "2016-10-17 01:14:36 UTC" so it's nothing new nor related to this specific SRU.
http://
* Regression in autopkgtest for linux-hwe (armhf): test log
* Regression in autopkgtest for linux-hwe-edge (armhf): test log
==> ubuntu-
The debian/test/control is using "isolation-machine" restrictions, which prevent the test to run in a schroot/container. The test will have better chance to succeed if running inside qemu.
* Regression in autopkgtest for nodejs (s390x): test log
==> This test is failing since 'openssl/
{Yakkety - openssl}
Regression in autopkgtest for mongodb (armhf): test log
==> The autopkgtest regression seems to be there since "2016-11-17 19:43:07 UTC" so it's nothing new nor related to this specific SRU.
http://
Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9
==> The autopkgtest regression seems to be there since "2016-10-21 14:20:53 UTC" so it's nothing new nor related to this specific SRU.
http://
{Zesty - openssl}
none
[Other Info]
* Debian Bug :
https:/
* Upstream PR :
https:/
* Upstream Repository :
https:/
* Upstream Commits :
1aed5e1 crypto/
## This fix moves extended feature detection past basic feature detection where it belongs.
f8418d8 crypto/
## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
Changed in openssl (Ubuntu): | |
importance: | Undecided → Low |
description: | updated |
tags: | added: sts |
Changed in openssl (Ubuntu): | |
assignee: | nobody → Eric Desrochers (slashd) |
importance: | Low → Medium |
milestone: | none → ubuntu-16.04.2 |
status: | New → Triaged |
Changed in openssl (Ubuntu): | |
assignee: | Eric Desrochers (slashd) → nobody |
description: | updated |
description: | updated |
Changed in openssl (Ubuntu Xenial): | |
status: | New → Triaged |
Changed in openssl (Ubuntu Zesty): | |
status: | New → Triaged |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in openssl (Ubuntu Xenial): | |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in openssl (Ubuntu Zesty): | |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in openssl (Ubuntu Artful): | |
assignee: | nobody → Eric Desrochers (slashd) |
status: | Triaged → In Progress |
description: | updated |
description: | updated |
Changed in openssl (Ubuntu Zesty): | |
status: | Triaged → In Progress |
Changed in openssl (Ubuntu Xenial): | |
status: | Triaged → In Progress |
importance: | Undecided → Medium |
Changed in openssl (Ubuntu Zesty): | |
importance: | Undecided → Medium |
description: | updated |
description: | updated |
tags: | added: sts-sru |
tags: | added: patch |
Changed in openssl (Ubuntu Artful): | |
status: | In Progress → Triaged |
status: | Triaged → Fix Committed |
Changed in openssl (Ubuntu Yakkety): | |
status: | New → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Eric Desrochers (slashd) |
description: | updated |
tags: | removed: verification-failed |
tags: | removed: patch |
tags: | removed: ua |
tags: | added: verification-failed |
tags: | removed: verification-failed |
tags: | added: verification-failed |
tags: |
added: verification-done removed: verification-failed verification-needed |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
tags: |
added: verification-done removed: verification-needed |
Here's some context after a conversation about this bug on channel : #ubuntu-release
...
[10:01:50] <slashd> hi SRU, I'm currently working on a case (no LP bug yet).... about an OpenSSL bug on new AMD CPU (Ryzen) released last Feb ... where the SHA Extension routine is not called on AMD Ryzen cores. My question is since this look like H/W enablement ... do you think this could be eligible for SRU in stable release such like Xenial ? or this will only be accepted for devel release ? This is a new CPU but Xenial is there for a couple of years still so maybe future Xenial user running Ryzen CPU may benefit on this eventually...
[10:03:20] <apw> slashd, for me some new functionality like that is ok as long as it is very self-contained so easy to review and confirm is only used on the new h/w
[10:03:52] <apw>one of our main goals is to avoid regressions
[10:04:41] <slashd>apw, make sense, thanks for your input
[10:12:24] <rbasak> The SRU policy does explicitly permit hardware enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be involved in mitigating risk and making the final risk decision, FWIW.
[10:16:11] <apw> rbasak, right, it would have to be carefully considered once we can see what the diff actually is
[10:16:34] <slashd> rbasak, apw ack, will communite the info with the proper group
[10:16:50] <apw> with a much greater level of testing and scrutiny than a regular fix only sru
[10:17:12] <slashd> apw, rbasak, FYI I have requested the new CPU from our partner to test in deep
...