Activity log for bug #1674399

Date Who What changed Old value New value Message
2017-03-20 15:46:27 Eric Desrochers bug added bug
2017-03-20 15:49:47 Eric Desrochers openssl (Ubuntu): importance Undecided Low
2017-03-20 15:50:28 Eric Desrochers description I has been brought to my attention that : "CPUID detection in OpenSSL does not properly detect potential optimizations for AMD processors." After further verification on my side : Extended feature flags were not pulled on AMD processors, as result a number of extensions were effectively masked on Ryzen CPUs. It should have been reported for Excavator since it implements AVX2 extension, but apparently nobody noticed ... The GitHub PR: https://github.com/openssl/openssl/pull/2849 It has been brought to my attention that : "CPUID detection in OpenSSL does not properly detect potential optimizations for AMD processors." After further verification on my side : Extended feature flags were not pulled on AMD processors, as result a number of extensions were effectively masked on Ryzen CPUs. It should have been reported for Excavator since it implements AVX2 extension, but apparently nobody noticed ... The GitHub PR: https://github.com/openssl/openssl/pull/2849
2017-03-20 15:50:42 Eric Desrochers tags sts
2017-03-20 17:03:52 David A. Desrosiers openssl (Ubuntu): importance Low Medium
2017-03-20 17:03:52 David A. Desrosiers openssl (Ubuntu): status New Triaged
2017-03-20 17:03:52 David A. Desrosiers openssl (Ubuntu): milestone ubuntu-16.04.2
2017-03-20 17:03:52 David A. Desrosiers openssl (Ubuntu): assignee Eric Desrochers (slashd)
2017-03-20 17:36:22 Eric Desrochers openssl (Ubuntu): milestone ubuntu-16.04.2
2017-03-20 17:36:32 Eric Desrochers nominated for series Ubuntu Xenial
2017-03-20 17:36:32 Eric Desrochers nominated for series Ubuntu Zesty
2017-03-22 10:42:17 Dominique Poulain bug added subscriber Dominique Poulain
2017-03-27 13:38:46 Eric Desrochers openssl (Ubuntu): assignee Eric Desrochers (slashd)
2017-04-21 18:59:57 Eric Desrochers description It has been brought to my attention that : "CPUID detection in OpenSSL does not properly detect potential optimizations for AMD processors." After further verification on my side : Extended feature flags were not pulled on AMD processors, as result a number of extensions were effectively masked on Ryzen CPUs. It should have been reported for Excavator since it implements AVX2 extension, but apparently nobody noticed ... The GitHub PR: https://github.com/openssl/openssl/pull/2849 AMD added support in their processors for SHA Extensions starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine and shows no improvement. It has been brought to my attention that : "CPUID detection in OpenSSL does not properly detect potential optimizations for AMD processors." After further verification on my side : Extended feature flags were not pulled on AMD processors, as result a number of extensions were effectively masked on Ryzen CPUs. It should have been reported for Excavator since it implements AVX2 extension, but apparently nobody noticed ... The GitHub PR: https://github.com/openssl/openssl/pull/2849
2017-04-21 19:01:35 Eric Desrochers description AMD added support in their processors for SHA Extensions starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine and shows no improvement. It has been brought to my attention that : "CPUID detection in OpenSSL does not properly detect potential optimizations for AMD processors." After further verification on my side : Extended feature flags were not pulled on AMD processors, as result a number of extensions were effectively masked on Ryzen CPUs. It should have been reported for Excavator since it implements AVX2 extension, but apparently nobody noticed ... The GitHub PR: https://github.com/openssl/openssl/pull/2849 AMD added support in their processors for SHA Extensions starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. It has been brought to my attention that : "CPUID detection in OpenSSL does not properly detect potential optimizations for AMD processors." After further verification on my side : Extended feature flags were not pulled on AMD processors, as result a number of extensions were effectively masked on Ryzen CPUs. It should have been reported for Excavator since it implements AVX2 extension, but apparently nobody noticed ... The GitHub PR: https://github.com/openssl/openssl/pull/2849
2017-04-21 19:02:00 Eric Desrochers nominated for series Ubuntu Artful
2017-04-21 19:02:00 Eric Desrochers bug task added openssl (Ubuntu Artful)
2017-04-21 20:21:32 Eric Desrochers bug task added openssl (Ubuntu Zesty)
2017-04-21 20:21:40 Eric Desrochers bug task added openssl (Ubuntu Xenial)
2017-04-21 20:21:49 Eric Desrochers openssl (Ubuntu Xenial): status New Triaged
2017-04-21 20:21:51 Eric Desrochers openssl (Ubuntu Zesty): status New Triaged
2017-04-23 16:52:56 Eric Desrochers description AMD added support in their processors for SHA Extensions starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. It has been brought to my attention that : "CPUID detection in OpenSSL does not properly detect potential optimizations for AMD processors." After further verification on my side : Extended feature flags were not pulled on AMD processors, as result a number of extensions were effectively masked on Ryzen CPUs. It should have been reported for Excavator since it implements AVX2 extension, but apparently nobody noticed ... The GitHub PR: https://github.com/openssl/openssl/pull/2849 * Context AMD added support in their processors for SHA Extensions[1] starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() { int a, b, c, d; // Look for CPUID.7.0.EBX[29] // EAX = 7, ECX = 0 a = 7; c = 0; asm volatile ("cpuid" :"=a"(a), "=b"(b), "=c"(c), "=d"(d) :"a"(a), "c"(c) ); // Intel® SHA Extensions feature bit is EBX[29] return ((b >> 29) & 1); } -- * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
2017-04-23 16:53:24 Eric Desrochers description * Context AMD added support in their processors for SHA Extensions[1] starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() { int a, b, c, d; // Look for CPUID.7.0.EBX[29] // EAX = 7, ECX = 0 a = 7; c = 0; asm volatile ("cpuid" :"=a"(a), "=b"(b), "=c"(c), "=d"(d) :"a"(a), "c"(c) ); // Intel® SHA Extensions feature bit is EBX[29] return ((b >> 29) & 1); } -- * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. * Context AMD added support in their processors for SHA Extensions[1] (sha_ni) starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
2017-04-23 16:54:39 Eric Desrochers description * Context AMD added support in their processors for SHA Extensions[1] (sha_ni) starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. * Context AMD added support in their processors for SHA Extensions[1] / CPU flag: sha_ni[2] starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
2017-04-23 16:56:42 Eric Desrochers description * Context AMD added support in their processors for SHA Extensions[1] / CPU flag: sha_ni[2] starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. * Context AMD added support in their processors for SHA Extensions[1] / CPU flag: sha_ni[2] starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
2017-04-23 16:59:27 Eric Desrochers description * Context AMD added support in their processors for SHA Extensions[1] / CPU flag: sha_ni[2] starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. * Context AMD added support in their processors for SHA Extensions[1] / CPU flag: sha_ni[2] starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
2017-04-24 18:12:29 Eric Desrochers description * Context AMD added support in their processors for SHA Extensions[1] / CPU flag: sha_ni[2] starting with Ryzen CPU. Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards.
2017-04-24 18:14:29 Eric Desrochers description * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. --> This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. --> This fix moves extended feature detection past basic feature detection where it belongs.
2017-04-24 18:15:05 Eric Desrochers description * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. --> This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. --> This fix moves extended feature detection past basic feature detection where it belongs. * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This fix moves extended feature detection past basic feature detection where it belongs.
2017-04-24 18:19:04 Eric Desrochers description * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This fix moves extended feature detection past basic feature detection where it belongs. * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This fix moves extended feature detection past basic feature detection where it belongs.
2017-04-24 18:32:40 Eric Desrochers description * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This fix moves extended feature detection past basic feature detection where it belongs. * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-04-24 21:19:50 Eric Desrochers openssl (Ubuntu Xenial): assignee Eric Desrochers (slashd)
2017-04-24 21:19:52 Eric Desrochers openssl (Ubuntu Zesty): assignee Eric Desrochers (slashd)
2017-04-24 21:19:53 Eric Desrochers openssl (Ubuntu Artful): assignee Eric Desrochers (slashd)
2017-04-24 21:20:05 Eric Desrochers openssl (Ubuntu Artful): status Triaged In Progress
2017-04-25 02:35:26 Eric Desrochers bug watch added http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145
2017-04-25 16:16:39 Eric Desrochers description * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() { int a, b, c, d; // Look for CPUID.7.0.EBX[29] // EAX = 7, ECX = 0 a = 7; c = 0; asm volatile ("cpuid" :"=a"(a), "=b"(b), "=c"(c), "=d"(d) :"a"(a), "c"(c) ); // Intel® SHA Extensions feature bit is EBX[29] return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case] * Reproducible with Xenial/Zesty/Artful release. * Generated a checksum of a big file (e.g. 5GB file) with openssl $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s [Regression Potential] * None expected, it basically allow openssl to take benefit of sha extension potential (mostly performance-wise) if AMD cpu has the capability. * Generated a checksum of a big file (e.g. 5GB file) with openssl $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Original Description] * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-04-25 16:21:26 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() { int a, b, c, d; // Look for CPUID.7.0.EBX[29] // EAX = 7, ECX = 0 a = 7; c = 0; asm volatile ("cpuid" :"=a"(a), "=b"(b), "=c"(c), "=d"(d) :"a"(a), "c"(c) ); // Intel® SHA Extensions feature bit is EBX[29] return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case] * Reproducible with Xenial/Zesty/Artful release. * Generated a checksum of a big file (e.g. 5GB file) with openssl $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s [Regression Potential] * None expected, it basically allow openssl to take benefit of sha extension potential (mostly performance-wise) if AMD cpu has the capability. * Generated a checksum of a big file (e.g. 5GB file) with openssl $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Original Description] * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s The performance are clearly better when using the patch which take benefit of the sha extension. [Regression Potential]  * None expected, it basically allow openssl to take benefit of sha extension potential (mostly performance-wise) if AMD cpu has the capability.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Original Description] * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-04-25 16:28:43 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s The performance are clearly better when using the patch which take benefit of the sha extension. [Regression Potential]  * None expected, it basically allow openssl to take benefit of sha extension potential (mostly performance-wise) if AMD cpu has the capability.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Original Description] * Context AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". * Upstream work: - Repository : https://github.com/openssl/openssl.git - Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * None expected, it basically allow openssl to take benefit of sha extension potential (mostly performance-wise) if AMD cpu has the capability.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-04-25 16:34:02 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * None expected, it basically allow openssl to take benefit of sha extension potential (mostly performance-wise) if AMD cpu has the capability.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential] * Note from irc discussion with apw and rbasak : [10:03:20] <apw> slashd, for me some new functionality like that is ok as long as it is very self-contained so easy to review and confirm is only used on the new h/w [10:03:52] <apw>one of our main goals is to avoid regressions [10:12:24] <rbasak> The SRU policy does explicitly permit hardware enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be involved in mitigating risk and making the final risk decision, FWIW.  * None expected, it basically allow openssl to take benefit of sha extension potential (mostly performance-wise) if AMD cpu has the capability.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-04-25 16:34:53 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential] * Note from irc discussion with apw and rbasak : [10:03:20] <apw> slashd, for me some new functionality like that is ok as long as it is very self-contained so easy to review and confirm is only used on the new h/w [10:03:52] <apw>one of our main goals is to avoid regressions [10:12:24] <rbasak> The SRU policy does explicitly permit hardware enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be involved in mitigating risk and making the final risk decision, FWIW.  * None expected, it basically allow openssl to take benefit of sha extension potential (mostly performance-wise) if AMD cpu has the capability.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note from irc discussion with apw and rbasak : [10:03:20] <apw> slashd, for me some new functionality like that is ok as long as it is very self-contained so easy to review and confirm is only used on the new h/w [10:03:52] <apw>one of our main goals is to avoid regressions [10:12:24] <rbasak> The SRU policy does explicitly permit hardware enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be involved in mitigating risk and making the final risk decision, FWIW.  * None expected, it basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting have the capability.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-04-25 16:39:36 Eric Desrochers openssl (Ubuntu Zesty): status Triaged In Progress
2017-04-25 16:39:37 Eric Desrochers openssl (Ubuntu Xenial): status Triaged In Progress
2017-04-25 16:39:40 Eric Desrochers openssl (Ubuntu Xenial): importance Undecided Medium
2017-04-25 16:39:42 Eric Desrochers openssl (Ubuntu Zesty): importance Undecided Medium
2017-04-25 17:21:28 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note from irc discussion with apw and rbasak : [10:03:20] <apw> slashd, for me some new functionality like that is ok as long as it is very self-contained so easy to review and confirm is only used on the new h/w [10:03:52] <apw>one of our main goals is to avoid regressions [10:12:24] <rbasak> The SRU policy does explicitly permit hardware enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be involved in mitigating risk and making the final risk decision, FWIW.  * None expected, it basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting have the capability.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note from irc discussion with apw and rbasak : [10:03:20] <apw> slashd, for me some new functionality like that is ok as long as it is very self-contained so easy to review and confirm is only used on the new h/w [10:03:52] <apw>one of our main goals is to avoid regressions [10:12:24] <rbasak> The SRU policy does explicitly permit hardware enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be involved in mitigating risk and making the final risk decision, FWIW. Full discussion: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-04-26 00:27:19 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note from irc discussion with apw and rbasak : [10:03:20] <apw> slashd, for me some new functionality like that is ok as long as it is very self-contained so easy to review and confirm is only used on the new h/w [10:03:52] <apw>one of our main goals is to avoid regressions [10:12:24] <rbasak> The SRU policy does explicitly permit hardware enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be involved in mitigating risk and making the final risk decision, FWIW. Full discussion: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note from irc discussion with apw and rbasak : [10:03:20] <apw> slashd, for me some new functionality like that is ok as long as it is very self-contained so easy to review and confirm is only used on the new h/w [10:03:52] <apw>one of our main goals is to avoid regressions [10:12:24] <rbasak> The SRU policy does explicitly permit hardware enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be involved in mitigating risk and making the final risk decision, FWIW. Full discussion: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-04-26 00:28:13 Eric Desrochers tags sts sts sts-sru
2017-04-26 19:47:19 Eric Desrochers attachment added artful_openssl_lp1674399.debdiff https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/+attachment/4868343/+files/artful_openssl_lp1674399.debdiff
2017-04-26 19:56:55 Eric Desrochers bug added subscriber SRU Verification
2017-04-26 20:26:57 Ubuntu Foundations Team Bug Bot tags sts sts-sru patch sts sts-sru
2017-04-26 21:05:07 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note from irc discussion with apw and rbasak : [10:03:20] <apw> slashd, for me some new functionality like that is ok as long as it is very self-contained so easy to review and confirm is only used on the new h/w [10:03:52] <apw>one of our main goals is to avoid regressions [10:12:24] <rbasak> The SRU policy does explicitly permit hardware enablement in an LTS IIRC, though I'd expect ~ubuntu-sru to be involved in mitigating risk and making the final risk decision, FWIW. Full discussion: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential] * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-04-26 21:12:40 Eric Desrochers bug added subscriber Ubuntu Sponsors Team
2017-04-26 21:17:41 Eric Desrochers attachment added zesty_openssl_lp1674399.debdiff https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/+attachment/4868385/+files/zesty_openssl_lp1674399.debdiff
2017-04-26 21:18:18 Eric Desrochers attachment added xenial_openssl_lp1674399.debdiff https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/+attachment/4868386/+files/xenial_openssl_lp1674399.debdiff
2017-04-26 21:19:48 Eric Desrochers bug added subscriber David A. Desrosiers
2017-04-27 15:58:36 Marc Deslauriers openssl (Ubuntu Artful): status In Progress Triaged
2017-04-27 15:58:39 Marc Deslauriers openssl (Ubuntu Artful): status Triaged Fix Committed
2017-04-27 18:25:24 Launchpad Janitor openssl (Ubuntu Artful): status Fix Committed Fix Released
2017-04-27 20:12:44 Eric Desrochers attachment removed zesty_openssl_lp1674399.debdiff https://bugs.launchpad.net/ubuntu/zesty/+source/openssl/+bug/1674399/+attachment/4868385/+files/zesty_openssl_lp1674399.debdiff
2017-04-27 21:11:58 Eric Desrochers nominated for series Ubuntu Yakkety
2017-04-27 21:11:58 Eric Desrochers bug task added openssl (Ubuntu Yakkety)
2017-04-27 21:12:09 Eric Desrochers openssl (Ubuntu Yakkety): status New In Progress
2017-04-27 21:12:12 Eric Desrochers openssl (Ubuntu Yakkety): importance Undecided Medium
2017-04-27 21:12:26 Eric Desrochers openssl (Ubuntu Yakkety): assignee Eric Desrochers (slashd)
2017-04-28 01:12:12 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential] * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and it does reveal a significant performance increase on Ryzen : (Note that the performance remain the same on non-sha extension AMD CPU, as expected) # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-04-28 02:27:40 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and it does reveal a significant performance increase on Ryzen : (Note that the performance remain the same on non-sha extension AMD CPU, as expected) # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-04-28 02:28:36 Eric Desrochers attachment added zesty_openssl_lp1674399.debdiff https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/+attachment/4868958/+files/zesty_openssl_lp1674399.debdiff
2017-04-28 02:28:56 Eric Desrochers attachment added yakkety_openssl_lp1674399.debdiff https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/+attachment/4868959/+files/yakkety_openssl_lp1674399.debdiff
2017-05-02 17:52:28 Eric Desrochers removed subscriber Ubuntu Sponsors Team
2017-05-04 20:21:37 Brian Murray openssl (Ubuntu Yakkety): status In Progress Fix Committed
2017-05-04 20:21:46 Brian Murray bug added subscriber Ubuntu Stable Release Updates Team
2017-05-04 20:21:52 Brian Murray tags patch sts sts-sru patch sts sts-sru verification-needed
2017-05-04 22:19:37 Eric Desrochers bug watch added https://github.com/openssl/openssl/issues/2848
2017-05-04 22:20:02 Eric Desrochers tags patch sts sts-sru verification-needed patch sts sts-sru verification-done-yakkety
2017-05-08 15:32:14 Łukasz Zemczak openssl (Ubuntu Zesty): status In Progress Fix Committed
2017-05-08 15:32:21 Łukasz Zemczak tags patch sts sts-sru verification-done-yakkety patch sts sts-sru verification-done-yakkety verification-needed
2017-05-08 15:34:43 Łukasz Zemczak openssl (Ubuntu Xenial): status In Progress Fix Committed
2017-05-15 15:43:22 Eric Desrochers tags patch sts sts-sru verification-done-yakkety verification-needed patch sts sts-sru verification-done-xenial verification-done-yakkety verification-done-zesty verification-needed
2017-05-15 15:43:37 Eric Desrochers tags patch sts sts-sru verification-done-xenial verification-done-yakkety verification-done-zesty verification-needed patch sts-sru ua verification-done-xenial verification-done-yakkety verification-done-zesty
2017-05-15 15:50:22 Eric Desrochers tags patch sts-sru ua verification-done-xenial verification-done-yakkety verification-done-zesty patch sts-sru-done ua verification-done-xenial verification-done-yakkety verification-done-zesty
2017-05-19 07:23:34 William Grant tags patch sts-sru-done ua verification-done-xenial verification-done-yakkety verification-done-zesty patch regression-proposed sts-sru-done ua verification-done-xenial verification-done-yakkety verification-done-zesty
2017-05-19 07:26:23 William Grant bug added subscriber William Grant
2017-05-19 12:06:02 Eric Desrochers tags patch regression-proposed sts-sru-done ua verification-done-xenial verification-done-yakkety verification-done-zesty patch regression-proposed sts-sru-done ua verification-failed
2017-05-19 13:54:57 Łukasz Zemczak tags patch regression-proposed sts-sru-done ua verification-failed patch regression-proposed sts-sru-done ua
2017-05-19 13:54:58 Łukasz Zemczak tags patch regression-proposed sts-sru-done ua patch regression-proposed sts-sru-done ua verification-needed
2017-05-24 19:37:38 Ubuntu Foundations Team Bug Bot bug added subscriber Brian Murray
2017-05-24 19:37:40 Ubuntu Foundations Team Bug Bot tags patch regression-proposed sts-sru-done ua verification-needed patch regression-proposed sts-sru-done ua verification-failed verification-needed
2017-05-24 19:47:51 Eric Desrochers tags patch regression-proposed sts-sru-done ua verification-failed verification-needed patch sts-sru-done ua verification-needed
2017-05-25 19:37:55 Ubuntu Foundations Team Bug Bot tags patch sts-sru-done ua verification-needed patch sts-sru-done ua verification-failed verification-needed
2017-05-25 19:57:22 Eric Desrochers tags patch sts-sru-done ua verification-failed verification-needed patch sts-sru-done ua verification-needed
2017-05-25 19:57:28 Eric Desrochers tags patch sts-sru-done ua verification-needed sts-sru-done ua verification-needed
2017-05-25 19:58:03 Eric Desrochers tags sts-sru-done ua verification-needed sts-sru-done verification-needed
2017-05-26 19:39:53 Ubuntu Foundations Team Bug Bot tags sts-sru-done verification-needed sts-sru-done verification-failed verification-needed
2017-05-29 15:26:45 Eric Desrochers tags sts-sru-done verification-failed verification-needed sts-sru-done verification-needed
2017-05-29 19:39:59 Ubuntu Foundations Team Bug Bot tags sts-sru-done verification-needed sts-sru-done verification-failed verification-needed
2017-05-29 21:34:52 William Grant tags sts-sru-done verification-failed verification-needed sts-sru-done verification-done
2017-06-01 16:08:37 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k * autopkgtest failure : {Xenial - openssl} * Regression in autopkgtest for linux-hwe (ppc64el): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-hwe (i386): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-hwe (amd64): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-gke (amd64): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-hwe-edge (ppc64el): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-hwe-edge (i386): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-hwe-edge (amd64): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-aws (amd64): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel In this case all the linux-* regressions are failling because the autopkgtest script: debian/tests/ubuntu-regression-suite fails if the running kernel during the test is different from the the kernel that kernel team believe it should run against. It compare the 'dpkg-parsechangelog -SVersion' ouput agains the version in /proc/version_signature and if they don't match it fails. if [ "$sver" != "$rver" ]; then echo "ERROR: running version does not match source package" 1>&2 exit 1 fi buildlog error: autopkgtest [00:45:29]: test ubuntu-regression-suite: [----------------------- Source Package Version: 4.8.0-54.57~16.04.1 Running Kernel Version: 4.4.0-79.100 ERROR: running version does not match source package autopkgtest [00:45:32]: test ubuntu-regression-suite: -----------------------] To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package. * Regression in autopkgtest for postgresql-9.5 (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/xenial/armhf * Regression in autopkgtest for linux-hwe (armhf): test log ==> TIMED OUT, I re-ran the test, now waiting. * Regression in autopkgtest for linux-hwe-edge (armhf): test log ==> TIMED OUT, I re-ran the test, now waiting. * Regression in autopkgtest for nodejs (s390x): test log ==> This test is failing since 'openssl/1.0.2g-1ubuntu4.3' (2016-09-16 07:42:14 UTC) prior this SRU. {Yakkety - openssl} Regression in autopkgtest for mongodb (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/m/mongodb/yakkety/armhf Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9 ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/yakkety/armhf {Zesty - openssl} none [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-06-01 16:10:25 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k * autopkgtest failure : {Xenial - openssl} * Regression in autopkgtest for linux-hwe (ppc64el): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-hwe (i386): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-hwe (amd64): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-gke (amd64): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-hwe-edge (ppc64el): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-hwe-edge (i386): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-hwe-edge (amd64): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel * Regression in autopkgtest for linux-aws (amd64): test log ==> ERROR: running version does not match source package autopkgtest machine just doesn't run the expected kernel In this case all the linux-* regressions are failling because the autopkgtest script: debian/tests/ubuntu-regression-suite fails if the running kernel during the test is different from the the kernel that kernel team believe it should run against. It compare the 'dpkg-parsechangelog -SVersion' ouput agains the version in /proc/version_signature and if they don't match it fails. if [ "$sver" != "$rver" ]; then echo "ERROR: running version does not match source package" 1>&2 exit 1 fi buildlog error: autopkgtest [00:45:29]: test ubuntu-regression-suite: [----------------------- Source Package Version: 4.8.0-54.57~16.04.1 Running Kernel Version: 4.4.0-79.100 ERROR: running version does not match source package autopkgtest [00:45:32]: test ubuntu-regression-suite: -----------------------] To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package. * Regression in autopkgtest for postgresql-9.5 (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/xenial/armhf * Regression in autopkgtest for linux-hwe (armhf): test log ==> TIMED OUT, I re-ran the test, now waiting. * Regression in autopkgtest for linux-hwe-edge (armhf): test log ==> TIMED OUT, I re-ran the test, now waiting. * Regression in autopkgtest for nodejs (s390x): test log ==> This test is failing since 'openssl/1.0.2g-1ubuntu4.3' (2016-09-16 07:42:14 UTC) prior this SRU. {Yakkety - openssl} Regression in autopkgtest for mongodb (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/m/mongodb/yakkety/armhf Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9 ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/yakkety/armhf {Zesty - openssl} none [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k * autopkgtest failure : {Xenial - openssl} * Regression in autopkgtest for linux-hwe (ppc64el): test log * Regression in autopkgtest for linux-hwe (i386): test log * Regression in autopkgtest for linux-hwe (amd64): test log * Regression in autopkgtest for linux-gke (amd64): test log * Regression in autopkgtest for linux-hwe-edge (ppc64el): test log * Regression in autopkgtest for linux-hwe-edge (i386): test log * Regression in autopkgtest for linux-hwe-edge (amd64): test log * Regression in autopkgtest for linux-aws (amd64): test log In this case all the linux-* regressions are failing because the autopkgtest script: debian/tests/ubuntu-regression-suite fails if the running kernel during the test is different from the the kernel that kernel team believe it should run against. It compares the 'dpkg-parsechangelog -SVersion' output against the version in /proc/version_signature and if they don't match it fails. if [ "$sver" != "$rver" ]; then     echo "ERROR: running version does not match source package" 1>&2     exit 1 fi Buildlog error: autopkgtest [00:45:29]: test ubuntu-regression-suite: [----------------------- Source Package Version: 4.8.0-54.57~16.04.1 Running Kernel Version: 4.4.0-79.100 ERROR: running version does not match source package autopkgtest [00:45:32]: test ubuntu-regression-suite: -----------------------] To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package. * Regression in autopkgtest for postgresql-9.5 (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/xenial/armhf * Regression in autopkgtest for linux-hwe (armhf): test log ==> TIMED OUT, I re-ran the test, now waiting. * Regression in autopkgtest for linux-hwe-edge (armhf): test log ==> TIMED OUT, I re-ran the test, now waiting. * Regression in autopkgtest for nodejs (s390x): test log ==> This test is failing since 'openssl/1.0.2g-1ubuntu4.3' (2016-09-16 07:42:14 UTC) prior this SRU. {Yakkety - openssl} Regression in autopkgtest for mongodb (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/m/mongodb/yakkety/armhf Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9 ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/yakkety/armhf {Zesty - openssl} none [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-06-01 19:45:53 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k * autopkgtest failure : {Xenial - openssl} * Regression in autopkgtest for linux-hwe (ppc64el): test log * Regression in autopkgtest for linux-hwe (i386): test log * Regression in autopkgtest for linux-hwe (amd64): test log * Regression in autopkgtest for linux-gke (amd64): test log * Regression in autopkgtest for linux-hwe-edge (ppc64el): test log * Regression in autopkgtest for linux-hwe-edge (i386): test log * Regression in autopkgtest for linux-hwe-edge (amd64): test log * Regression in autopkgtest for linux-aws (amd64): test log In this case all the linux-* regressions are failing because the autopkgtest script: debian/tests/ubuntu-regression-suite fails if the running kernel during the test is different from the the kernel that kernel team believe it should run against. It compares the 'dpkg-parsechangelog -SVersion' output against the version in /proc/version_signature and if they don't match it fails. if [ "$sver" != "$rver" ]; then     echo "ERROR: running version does not match source package" 1>&2     exit 1 fi Buildlog error: autopkgtest [00:45:29]: test ubuntu-regression-suite: [----------------------- Source Package Version: 4.8.0-54.57~16.04.1 Running Kernel Version: 4.4.0-79.100 ERROR: running version does not match source package autopkgtest [00:45:32]: test ubuntu-regression-suite: -----------------------] To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package. * Regression in autopkgtest for postgresql-9.5 (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/xenial/armhf * Regression in autopkgtest for linux-hwe (armhf): test log ==> TIMED OUT, I re-ran the test, now waiting. * Regression in autopkgtest for linux-hwe-edge (armhf): test log ==> TIMED OUT, I re-ran the test, now waiting. * Regression in autopkgtest for nodejs (s390x): test log ==> This test is failing since 'openssl/1.0.2g-1ubuntu4.3' (2016-09-16 07:42:14 UTC) prior this SRU. {Yakkety - openssl} Regression in autopkgtest for mongodb (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/m/mongodb/yakkety/armhf Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9 ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/yakkety/armhf {Zesty - openssl} none [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k * autopkgtest failure : {Xenial - openssl} * Regression in autopkgtest for linux-hwe (ppc64el): test log * Regression in autopkgtest for linux-hwe (i386): test log * Regression in autopkgtest for linux-hwe (amd64): test log * Regression in autopkgtest for linux-gke (amd64): test log * Regression in autopkgtest for linux-hwe-edge (ppc64el): test log * Regression in autopkgtest for linux-hwe-edge (i386): test log * Regression in autopkgtest for linux-hwe-edge (amd64): test log * Regression in autopkgtest for linux-aws (amd64): test log In this case all the linux-* regressions are failing because the autopkgtest script: debian/tests/ubuntu-regression-suite fails if the running kernel during the test is different from the the kernel that kernel team believe it should run against. It compares the 'dpkg-parsechangelog -SVersion' output against the version in /proc/version_signature and if they don't match it fails. if [ "$sver" != "$rver" ]; then     echo "ERROR: running version does not match source package" 1>&2     exit 1 fi Buildlog error: autopkgtest [00:45:29]: test ubuntu-regression-suite: [----------------------- Source Package Version: 4.8.0-54.57~16.04.1 Running Kernel Version: 4.4.0-79.100 ERROR: running version does not match source package autopkgtest [00:45:32]: test ubuntu-regression-suite: -----------------------] To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package. * Regression in autopkgtest for postgresql-9.5 (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/xenial/armhf * Regression in autopkgtest for linux-hwe (armhf): test log * Regression in autopkgtest for linux-hwe-edge (armhf): test log ==> ubuntu-regression-suite SKIP Test requires machine-level isolation but testbed does not provide that The debian/test/control is using "isolation-machine" restrictions, which prevent the test to run in a schroot/container. * Regression in autopkgtest for nodejs (s390x): test log ==> This test is failing since 'openssl/1.0.2g-1ubuntu4.3' (2016-09-16 07:42:14 UTC) prior this SRU. {Yakkety - openssl} Regression in autopkgtest for mongodb (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/m/mongodb/yakkety/armhf Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9 ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/yakkety/armhf {Zesty - openssl} none [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-06-01 19:48:01 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k * autopkgtest failure : {Xenial - openssl} * Regression in autopkgtest for linux-hwe (ppc64el): test log * Regression in autopkgtest for linux-hwe (i386): test log * Regression in autopkgtest for linux-hwe (amd64): test log * Regression in autopkgtest for linux-gke (amd64): test log * Regression in autopkgtest for linux-hwe-edge (ppc64el): test log * Regression in autopkgtest for linux-hwe-edge (i386): test log * Regression in autopkgtest for linux-hwe-edge (amd64): test log * Regression in autopkgtest for linux-aws (amd64): test log In this case all the linux-* regressions are failing because the autopkgtest script: debian/tests/ubuntu-regression-suite fails if the running kernel during the test is different from the the kernel that kernel team believe it should run against. It compares the 'dpkg-parsechangelog -SVersion' output against the version in /proc/version_signature and if they don't match it fails. if [ "$sver" != "$rver" ]; then     echo "ERROR: running version does not match source package" 1>&2     exit 1 fi Buildlog error: autopkgtest [00:45:29]: test ubuntu-regression-suite: [----------------------- Source Package Version: 4.8.0-54.57~16.04.1 Running Kernel Version: 4.4.0-79.100 ERROR: running version does not match source package autopkgtest [00:45:32]: test ubuntu-regression-suite: -----------------------] To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package. * Regression in autopkgtest for postgresql-9.5 (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/xenial/armhf * Regression in autopkgtest for linux-hwe (armhf): test log * Regression in autopkgtest for linux-hwe-edge (armhf): test log ==> ubuntu-regression-suite SKIP Test requires machine-level isolation but testbed does not provide that The debian/test/control is using "isolation-machine" restrictions, which prevent the test to run in a schroot/container. * Regression in autopkgtest for nodejs (s390x): test log ==> This test is failing since 'openssl/1.0.2g-1ubuntu4.3' (2016-09-16 07:42:14 UTC) prior this SRU. {Yakkety - openssl} Regression in autopkgtest for mongodb (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/m/mongodb/yakkety/armhf Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9 ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/yakkety/armhf {Zesty - openssl} none [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k * autopkgtest failure : {Xenial - openssl} * Regression in autopkgtest for linux-hwe (ppc64el): test log * Regression in autopkgtest for linux-hwe (i386): test log * Regression in autopkgtest for linux-hwe (amd64): test log * Regression in autopkgtest for linux-gke (amd64): test log * Regression in autopkgtest for linux-hwe-edge (ppc64el): test log * Regression in autopkgtest for linux-hwe-edge (i386): test log * Regression in autopkgtest for linux-hwe-edge (amd64): test log * Regression in autopkgtest for linux-aws (amd64): test log In this case all the linux-* regressions are failing because the autopkgtest script: debian/tests/ubuntu-regression-suite fails if the running kernel during the test is different from the the kernel that kernel team believe it should run against. It compares the 'dpkg-parsechangelog -SVersion' output against the version in /proc/version_signature and if they don't match it fails. if [ "$sver" != "$rver" ]; then     echo "ERROR: running version does not match source package" 1>&2     exit 1 fi Buildlog error: autopkgtest [00:45:29]: test ubuntu-regression-suite: [----------------------- Source Package Version: 4.8.0-54.57~16.04.1 Running Kernel Version: 4.4.0-79.100 ERROR: running version does not match source package autopkgtest [00:45:32]: test ubuntu-regression-suite: -----------------------] To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package. * Regression in autopkgtest for postgresql-9.5 (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/xenial/armhf * Regression in autopkgtest for linux-hwe (armhf): test log * Regression in autopkgtest for linux-hwe-edge (armhf): test log ==> ubuntu-regression-suite SKIP Test requires machine-level isolation but testbed does not provide that The debian/test/control is using "isolation-machine" restrictions, which prevent the test to run in a schroot/container. The test will have better chance to succeed if running inside qemu. * Regression in autopkgtest for nodejs (s390x): test log ==> This test is failing since 'openssl/1.0.2g-1ubuntu4.3' (2016-09-16 07:42:14 UTC) prior this SRU. {Yakkety - openssl} Regression in autopkgtest for mongodb (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/m/mongodb/yakkety/armhf Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9 ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/yakkety/armhf {Zesty - openssl} none [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-06-01 20:19:09 Brian Murray tags sts-sru-done verification-done sts-sru-done verification-needed
2017-06-05 14:48:56 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k * autopkgtest failure : {Xenial - openssl} * Regression in autopkgtest for linux-hwe (ppc64el): test log * Regression in autopkgtest for linux-hwe (i386): test log * Regression in autopkgtest for linux-hwe (amd64): test log * Regression in autopkgtest for linux-gke (amd64): test log * Regression in autopkgtest for linux-hwe-edge (ppc64el): test log * Regression in autopkgtest for linux-hwe-edge (i386): test log * Regression in autopkgtest for linux-hwe-edge (amd64): test log * Regression in autopkgtest for linux-aws (amd64): test log In this case all the linux-* regressions are failing because the autopkgtest script: debian/tests/ubuntu-regression-suite fails if the running kernel during the test is different from the the kernel that kernel team believe it should run against. It compares the 'dpkg-parsechangelog -SVersion' output against the version in /proc/version_signature and if they don't match it fails. if [ "$sver" != "$rver" ]; then     echo "ERROR: running version does not match source package" 1>&2     exit 1 fi Buildlog error: autopkgtest [00:45:29]: test ubuntu-regression-suite: [----------------------- Source Package Version: 4.8.0-54.57~16.04.1 Running Kernel Version: 4.4.0-79.100 ERROR: running version does not match source package autopkgtest [00:45:32]: test ubuntu-regression-suite: -----------------------] To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package. * Regression in autopkgtest for postgresql-9.5 (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/xenial/armhf * Regression in autopkgtest for linux-hwe (armhf): test log * Regression in autopkgtest for linux-hwe-edge (armhf): test log ==> ubuntu-regression-suite SKIP Test requires machine-level isolation but testbed does not provide that The debian/test/control is using "isolation-machine" restrictions, which prevent the test to run in a schroot/container. The test will have better chance to succeed if running inside qemu. * Regression in autopkgtest for nodejs (s390x): test log ==> This test is failing since 'openssl/1.0.2g-1ubuntu4.3' (2016-09-16 07:42:14 UTC) prior this SRU. {Yakkety - openssl} Regression in autopkgtest for mongodb (armhf): test log ==> Under investigation http://autopkgtest.ubuntu.com/packages/m/mongodb/yakkety/armhf Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9 ==> Under investigation http://autopkgtest.ubuntu.com/packages/p/postgresql-9.5/yakkety/armhf {Zesty - openssl} none [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k * autopkgtest failure : {Xenial - openssl} * Regression in autopkgtest for linux-hwe (ppc64el): test log * Regression in autopkgtest for linux-hwe (i386): test log * Regression in autopkgtest for linux-hwe (amd64): test log * Regression in autopkgtest for linux-gke (amd64): test log * Regression in autopkgtest for linux-hwe-edge (ppc64el): test log * Regression in autopkgtest for linux-hwe-edge (i386): test log * Regression in autopkgtest for linux-hwe-edge (amd64): test log * Regression in autopkgtest for linux-aws (amd64): test log In this case all the linux-* regressions are failing because the autopkgtest script: debian/tests/ubuntu-regression-suite fails if the running kernel during the test is different from the the kernel that kernel team believe it should run against. It compares the 'dpkg-parsechangelog -SVersion' output against the version in /proc/version_signature and if they don't match it fails. if [ "$sver" != "$rver" ]; then     echo "ERROR: running version does not match source package" 1>&2     exit 1 fi Buildlog error: autopkgtest [00:45:29]: test ubuntu-regression-suite: [----------------------- Source Package Version: 4.8.0-54.57~16.04.1 Running Kernel Version: 4.4.0-79.100 ERROR: running version does not match source package autopkgtest [00:45:32]: test ubuntu-regression-suite: -----------------------] To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package. * Regression in autopkgtest for postgresql-9.5 (armhf): test log ==> The autopkgtest regression since to be there since "2016-10-17 01:14:36 UTC" so it's nothing new nor related to this specific SRU. http://autopkgtest.ubuntu.com/packages/postgresql-9.5/xenial/armhf * Regression in autopkgtest for linux-hwe (armhf): test log * Regression in autopkgtest for linux-hwe-edge (armhf): test log ==> ubuntu-regression-suite SKIP Test requires machine-level isolation but testbed does not provide that The debian/test/control is using "isolation-machine" restrictions, which prevent the test to run in a schroot/container. The test will have better chance to succeed if running inside qemu. * Regression in autopkgtest for nodejs (s390x): test log ==> This test is failing since 'openssl/1.0.2g-1ubuntu4.3' (2016-09-16 07:42:14 UTC) prior this SRU. {Yakkety - openssl} Regression in autopkgtest for mongodb (armhf): test log ==> The autopkgtest regression since to be there since "2016-11-17 19:43:07 UTC" so it's nothing new nor related to this specific SRU. http://autopkgtest.ubuntu.com/packages/mongodb/yakkety/armhf Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9 ==> The autopkgtest regression since to be there since "2016-10-21 14:20:53 UTC" so it's nothing new nor related to this specific SRU. http://autopkgtest.ubuntu.com/packages/postgresql-9.5/yakkety/armhf {Zesty - openssl} none [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-06-05 14:51:53 Eric Desrochers description [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k * autopkgtest failure : {Xenial - openssl} * Regression in autopkgtest for linux-hwe (ppc64el): test log * Regression in autopkgtest for linux-hwe (i386): test log * Regression in autopkgtest for linux-hwe (amd64): test log * Regression in autopkgtest for linux-gke (amd64): test log * Regression in autopkgtest for linux-hwe-edge (ppc64el): test log * Regression in autopkgtest for linux-hwe-edge (i386): test log * Regression in autopkgtest for linux-hwe-edge (amd64): test log * Regression in autopkgtest for linux-aws (amd64): test log In this case all the linux-* regressions are failing because the autopkgtest script: debian/tests/ubuntu-regression-suite fails if the running kernel during the test is different from the the kernel that kernel team believe it should run against. It compares the 'dpkg-parsechangelog -SVersion' output against the version in /proc/version_signature and if they don't match it fails. if [ "$sver" != "$rver" ]; then     echo "ERROR: running version does not match source package" 1>&2     exit 1 fi Buildlog error: autopkgtest [00:45:29]: test ubuntu-regression-suite: [----------------------- Source Package Version: 4.8.0-54.57~16.04.1 Running Kernel Version: 4.4.0-79.100 ERROR: running version does not match source package autopkgtest [00:45:32]: test ubuntu-regression-suite: -----------------------] To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package. * Regression in autopkgtest for postgresql-9.5 (armhf): test log ==> The autopkgtest regression since to be there since "2016-10-17 01:14:36 UTC" so it's nothing new nor related to this specific SRU. http://autopkgtest.ubuntu.com/packages/postgresql-9.5/xenial/armhf * Regression in autopkgtest for linux-hwe (armhf): test log * Regression in autopkgtest for linux-hwe-edge (armhf): test log ==> ubuntu-regression-suite SKIP Test requires machine-level isolation but testbed does not provide that The debian/test/control is using "isolation-machine" restrictions, which prevent the test to run in a schroot/container. The test will have better chance to succeed if running inside qemu. * Regression in autopkgtest for nodejs (s390x): test log ==> This test is failing since 'openssl/1.0.2g-1ubuntu4.3' (2016-09-16 07:42:14 UTC) prior this SRU. {Yakkety - openssl} Regression in autopkgtest for mongodb (armhf): test log ==> The autopkgtest regression since to be there since "2016-11-17 19:43:07 UTC" so it's nothing new nor related to this specific SRU. http://autopkgtest.ubuntu.com/packages/mongodb/yakkety/armhf Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9 ==> The autopkgtest regression since to be there since "2016-10-21 14:20:53 UTC" so it's nothing new nor related to this specific SRU. http://autopkgtest.ubuntu.com/packages/postgresql-9.5/yakkety/armhf {Zesty - openssl} none [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection. [Impact] * Context: AMD added support in their processors for SHA Extensions[1] (CPU flag: sha_ni[2]) starting with Ryzen[3] CPU. Note that Ryzen CPU come in 64bit only (Confirmed with AMD representative). Current OpenSSL version in Ryzens still calls SHA for SSSE3 routine as result a number of extensions were effectively masked on Ryzen and shows no improvement. [1] /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 5 1600 Six-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse 4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflusho pt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold [2] - sha_ni: SHA1/SHA256 Instruction Extensions [3] - https://en.wikipedia.org/wiki/Ryzen ... All models support: x87, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA, CVT16/F16C, ABM, BMI1, BMI2, SHA.[5] ... * Program to performs the CPUID check: Reference : https://software.intel.com/en-us/articles/intel-sha-extensions ... Availability of the Intel® SHA Extensions on a particular processor can be determined by checking the SHA CPUID bit in CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29]. The following C function, using inline assembly, performs the CPUID check: -- int CheckForIntelShaExtensions() {    int a, b, c, d;    // Look for CPUID.7.0.EBX[29]    // EAX = 7, ECX = 0    a = 7;    c = 0;    asm volatile ("cpuid"         :"=a"(a), "=b"(b), "=c"(c), "=d"(d)         :"a"(a), "c"(c)        );    // Intel® SHA Extensions feature bit is EBX[29]    return ((b >> 29) & 1); } -- On CPU with sha_ni the program return "1". Otherwise it return "0". [Test Case]  * Reproducible with Xenial/Zesty/Artful release.  * Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m12.835s user 0m12.344s sys 0m0.484s * Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 9969152 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 8019164 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 5254219 sha1's in 2.99s Doing sha1 for 3s on 1024 size blocks: 2217067 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 347842 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 53168.81k 171075.50k 449859.55k 756758.87k 949840.55 The performance are clearly better when using the patch which take benefit of the sha extension. (See Regression Potential section for result with patch) [Regression Potential]  * Note : IRC discussion with infinity : https://bugs.launchpad.net/ubuntu/xenial/+source/openssl/+bug/1674399/comments/8  * Note from irc discussion with apw and rbasak : https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1674399/comments/2  * It basically allow openssl to take benefit of sha extension potential (mostly performance-wise) now that new AMD cpu starting to have the capability. * The code check the CPUID bit to determine if the sha instructions are available are not. * Maintainer comment proves that he did the successfully tested on Intel with/without SHA extension Reference: https://github.com/openssl/openssl/issues/2848 "I don't have access to Ryzen system, so I didn't test it explicitly on Ryzen. Reporter did confirm it tough. Myself I tested on Intel processors, yes, with/without." * LP reporter comment : I, slashd, have tested on a Ryzen system (and AMD non-ryzen) and non-sha INTEL cpu. It does reveal a significant performance increase on Ryzen due to the sha extension : (Note that the performance remain the same on non-sha extension CPU (AMD/INTEL), as expected since they don't take benefit of the sha extension technology) [Tested on a Ryzen CPU] # Generated a checksum of a big file (e.g. 5GB file) with openssl  $ time /usr/bin/openssl dgst -sha256 /var/tmp/5Gfile SHA256(/var/tmp/5Gfile)= 8d448d81521cbc1bfdc04dd199d448bd3c49374221007bd0846d8d39a70dd4f8 real 0m3.471s user 0m2.956s sys 0m0.516s # Openssl speed $ openssl speed sha1 Doing sha1 for 3s on 16 size blocks: 12081890 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 11563950 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 8375101 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 3987643 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 678036 sha1's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 64436.75k 246697.60k 714675.29k 1361115.48k 1851490.30k * autopkgtest failure : {Xenial - openssl} * Regression in autopkgtest for linux-hwe (ppc64el): test log * Regression in autopkgtest for linux-hwe (i386): test log * Regression in autopkgtest for linux-hwe (amd64): test log * Regression in autopkgtest for linux-gke (amd64): test log * Regression in autopkgtest for linux-hwe-edge (ppc64el): test log * Regression in autopkgtest for linux-hwe-edge (i386): test log * Regression in autopkgtest for linux-hwe-edge (amd64): test log * Regression in autopkgtest for linux-aws (amd64): test log In this case all the linux-* regressions are failing because the autopkgtest script: debian/tests/ubuntu-regression-suite fails if the running kernel during the test is different from the the kernel that kernel team believe it should run against. It compares the 'dpkg-parsechangelog -SVersion' output against the version in /proc/version_signature and if they don't match it fails. if [ "$sver" != "$rver" ]; then     echo "ERROR: running version does not match source package" 1>&2     exit 1 fi Buildlog error: autopkgtest [00:45:29]: test ubuntu-regression-suite: [----------------------- Source Package Version: 4.8.0-54.57~16.04.1 Running Kernel Version: 4.4.0-79.100 ERROR: running version does not match source package autopkgtest [00:45:32]: test ubuntu-regression-suite: -----------------------] To avoid this failure, the autopkgtest machine would have to run the exact same kernel version as expected by each linux-* package. * Regression in autopkgtest for postgresql-9.5 (armhf): test log ==> The autopkgtest regression seems to be there since "2016-10-17 01:14:36 UTC" so it's nothing new nor related to this specific SRU. http://autopkgtest.ubuntu.com/packages/postgresql-9.5/xenial/armhf * Regression in autopkgtest for linux-hwe (armhf): test log * Regression in autopkgtest for linux-hwe-edge (armhf): test log ==> ubuntu-regression-suite SKIP Test requires machine-level isolation but testbed does not provide that The debian/test/control is using "isolation-machine" restrictions, which prevent the test to run in a schroot/container. The test will have better chance to succeed if running inside qemu. * Regression in autopkgtest for nodejs (s390x): test log ==> This test is failing since 'openssl/1.0.2g-1ubuntu4.3' (2016-09-16 07:42:14 UTC) prior this SRU. {Yakkety - openssl} Regression in autopkgtest for mongodb (armhf): test log ==> The autopkgtest regression seems to be there since "2016-11-17 19:43:07 UTC" so it's nothing new nor related to this specific SRU. http://autopkgtest.ubuntu.com/packages/mongodb/yakkety/armhf Regression in autopkgtest for postgresql-9.5 (armhf): test log 1.0.2g-1ubuntu9 ==> The autopkgtest regression seems to be there since "2016-10-21 14:20:53 UTC" so it's nothing new nor related to this specific SRU. http://autopkgtest.ubuntu.com/packages/postgresql-9.5/yakkety/armhf {Zesty - openssl} none [Other Info] * Debian Bug : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=861145 * Upstream PR : https://github.com/openssl/openssl/issues/2848 * Upstream Repository : https://github.com/openssl/openssl.git * Upstream Commits : 1aed5e1 crypto/x86*cpuid.pl: move extended feature detection. ## This fix moves extended feature detection past basic feature detection where it belongs. f8418d8 crypto/x86_64cpuid.pl: move extended feature detection upwards. ## This commit for x86_64cpuid.pl addressed the problem, but messed up processor vendor detection.
2017-06-05 15:02:40 Eric Desrochers tags sts-sru-done verification-needed sts-sru-done verification-done
2017-06-08 00:54:05 Launchpad Janitor openssl (Ubuntu Xenial): status Fix Committed Fix Released
2017-06-08 00:54:12 Chris Halse Rogers removed subscriber Ubuntu Stable Release Updates Team
2017-06-08 00:55:50 Launchpad Janitor openssl (Ubuntu Yakkety): status Fix Committed Fix Released
2017-06-08 00:57:01 Launchpad Janitor openssl (Ubuntu Zesty): status Fix Committed Fix Released