Activity log for bug #2024019

Date Who What changed Old value New value Message
2023-06-15 18:16:55 Salvatore bug added bug
2023-06-15 18:19:00 Salvatore merge proposal linked https://code.launchpad.net/~dipietro-salvatore/ubuntu/+source/nginx/+git/nginx/+merge/434825
2023-06-15 18:19:27 Salvatore bug added subscriber Ubuntu Sponsors
2023-06-16 16:17:27 Robie Basak tags server-todo
2023-06-16 16:17:37 Robie Basak bug added subscriber Ubuntu Server
2023-06-19 14:43:43 Christian Ehrhardt  bug added subscriber Fabio Augusto Miranda Martins
2023-06-19 14:43:46 Christian Ehrhardt  removed subscriber Ubuntu Sponsors
2023-06-19 14:44:00 Christian Ehrhardt  tags server-todo
2023-06-19 14:47:14 Christian Ehrhardt  nginx (Ubuntu): assignee Fabio Augusto Miranda Martins (fabio.martins)
2023-09-25 14:37:31 Fabio Augusto Miranda Martins nginx (Ubuntu): assignee Fabio Augusto Miranda Martins (fabio.martins)
2023-11-27 13:14:44 Athos Ribeiro tags server-triage-discuss
2023-12-06 17:57:41 Athos Ribeiro tags server-triage-discuss
2024-01-30 21:25:42 Fabio Augusto Miranda Martins nginx (Ubuntu): status New In Progress
2024-01-30 21:25:47 Fabio Augusto Miranda Martins nginx (Ubuntu): importance Undecided Low
2024-01-30 21:25:58 Fabio Augusto Miranda Martins nginx (Ubuntu): assignee Mitchell Dzurick (mitchdz)
2024-03-20 21:40:52 Mitchell Dzurick nominated for series Ubuntu Focal
2024-03-20 21:40:52 Mitchell Dzurick bug task added nginx (Ubuntu Focal)
2024-03-20 21:41:36 Mitchell Dzurick bug task added haproxy (Ubuntu)
2024-03-20 21:41:52 Mitchell Dzurick bug task added postgresql-12 (Ubuntu)
2024-03-20 21:42:13 Mitchell Dzurick nginx (Ubuntu): assignee Mitchell Dzurick (mitchdz)
2024-03-20 21:42:15 Mitchell Dzurick haproxy (Ubuntu Focal): assignee Mitchell Dzurick (mitchdz)
2024-03-20 21:42:17 Mitchell Dzurick nginx (Ubuntu Focal): assignee Mitchell Dzurick (mitchdz)
2024-03-20 21:42:19 Mitchell Dzurick postgresql-12 (Ubuntu Focal): assignee Mitchell Dzurick (mitchdz)
2024-03-20 21:43:49 Mitchell Dzurick nginx (Ubuntu): importance Low Undecided
2024-03-20 22:17:03 Mitchell Dzurick nginx (Ubuntu): status In Progress New
2024-04-30 20:04:04 Mitchell Dzurick merge proposal linked https://code.launchpad.net/~mitchdz/ubuntu/+source/nginx/+git/nginx/+merge/465301
2024-04-30 21:32:00 Mitchell Dzurick description Hi, we recorded more than 30% performance regression on Ubuntu Focal for AWS Graviton instances since Nginx package is not compiled with "-moutline-atomics" cflag for arm64 architecture (337548 rps with default package and 484453 rps using the proposed flag). As far as I know, only Ubuntu 20.04 is affected by it and we are proposing a fix for it (https://code.launchpad.net/~dipietro-salvatore/ubuntu/+source/nginx/+git/nginx/+merge/434825). Set up to reproduce the issue: - AWS instance: m6g.metal - AWS ami: ami-0aa916c7b0be51092 - lsb_release -rd: Description: Ubuntu 20.04.6 LTS Release: 20.04 - apt-cache policy nginx: nginx: Installed: (none) Candidate: 1.18.0-0ubuntu1.4 Version table: 1.18.0-0ubuntu1.4 500 500 http://us-west-2.ec2.ports.ubuntu.com/ubuntu-ports focal-updates/main arm64 Packages 500 http://ports.ubuntu.com/ubuntu-ports focal-security/main arm64 Packages 1.17.10-0ubuntu1 500 500 http://us-west-2.ec2.ports.ubuntu.com/ubuntu-ports focal/main arm64 Packages [ Impact ] When using nginx on arm64, the CPU may go under heavy load and drop packets when being stressed. A compile-time optimization to use hardware acceleration could be included to help alleviate the CPU utilization for systems that have atomic instructions available. [ Test Plan ] The test plan requires a decent amount of setup. This setup is covered in full at https://github.com/mitchdz/aws-nginx-testbed. In short, the setup involves a focal based arm64 nginx server, 4 amd64 focal endpoints running a nodejs applicaiton, and 1 amd64 focal server to initiate the workloads. This test is an E2E test which tests both functionality and performance improvements. 1) Set up the system as described in https://github.com/mitchdz/aws-nginx-testbed 2) Initiate the workload on the DRV instance $ ./wrk/wrk -t36 -c512 -d60s http://172.31.33.24:80/ # Private IP of SUT 3) While workload is running, capture results on SUT $ sudo perf record -a -e r6e sleep 20s 4) After the test is ran, you should have metrics on both the DRV and SUT system. 5) Install new nginx with -moutline atomics $ sudo add-apt-repository ppa:mitchdz/lp2024019-moutline-atomic $ sudo apt update $ Sudo apt install -y nginx $ dpkg -s nginx | grep Version Version: 1.18.0-0ubuntu1.5~focal1 6) ensure outline-atomics is installed (ctrl+f to see it, since this is not a user friendly wall of text) $ nginx -V |& grep moutline configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/build/nginx-v8ZFDO/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 -moutline-atomics' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-compat --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module 7) re-run wrk test as shown in steps 2-3 Results: - without patch: - wrk output: ``` ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512 -d60s http://172.31.40.247:80/ Running 1m test @ http://172.31.40.247:80/ 36 threads and 512 connections Thread Stats Avg Stdev Max +/- Stdev Latency 3.14ms 2.03ms 211.63ms 72.12% Req/Sec 4.67k 0.89k 23.92k 76.44% 10056815 requests in 1.00m, 5.20GB read Requests/sec: 167336.42 Transfer/sec: 88.57MB ``` - STREX count: Samples: 4M of event 'r6e', Event count (approx.): 364457784 - SUT CPU utiliz: 100% - with patch: - wrk output: ``` ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512 -d60s http://172.31.40.247:80/ Running 1m test @ http://172.31.40.247:80/ 36 threads and 512 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.02ms 1.04ms 210.87ms 99.52% Req/Sec 13.94k 327.01 16.95k 79.69% 30006090 requests in 1.00m, 15.51GB read Requests/sec: 499273.74 Transfer/sec: 264.26MB ``` - STREX count: No Samples - SUT CPU utiliz: 20% The important metrics here are the perf metrics, where we can see pre-patch the CPU is under heavy load, and 864K STREX events are seen, whereas post-patch the CPU is not under as heavy of a load with only 7 cycle events. [ Where Problems Could Occur ] Performance Trade-offs: * While this will decrease the CPU load, it will increase the utilization of atomic operations. * outlining atomics can make debugging more complicated, especially in the case of concurrency debugging. * This improves application performance through offloading instructions. This can reveal bugs that were not possible before such as race conditions, deadlocks, or incorrect synchronization. * This optimization adds a run-time check for the availability of atomic instructions. If atomics instructions are not found, ARMv8.0 compatible code is executed, so overhead will be added to systems that do not have atomic instructions. [ Other Info ] * Why is -moutline-atomics not enabled already? Focal uses gcc-9 which does not enable -moutline-atomics by default. gcc-10 is when it became enabled by default. https://devdocs.io/gcc~9/aarch64-options vs https://devdocs.io/gcc~10/aarch64-options gcc-9 changelog showing the addition of -moutline-atomics: https://gcc.gnu.org/gcc-9/changes.html Here is the thread with the discussion to enable by default: https://gcc.gnu.org/pipermail/gcc/2020-April/000490.html A lengthy discussion also happened in Debian to include this flag by default - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956418
2024-04-30 21:32:09 Mitchell Dzurick haproxy (Ubuntu): status New Invalid
2024-04-30 21:32:18 Mitchell Dzurick haproxy (Ubuntu Focal): status New In Progress
2024-04-30 21:32:29 Mitchell Dzurick haproxy (Ubuntu Focal): status In Progress Triaged
2024-04-30 21:32:33 Mitchell Dzurick nginx (Ubuntu Focal): status New In Progress
2024-04-30 21:32:36 Mitchell Dzurick postgresql-12 (Ubuntu Focal): status New Triaged
2024-04-30 21:32:40 Mitchell Dzurick postgresql-12 (Ubuntu): status New Invalid
2024-04-30 21:32:43 Mitchell Dzurick nginx (Ubuntu): status New Invalid
2024-04-30 21:34:52 Mitchell Dzurick description [ Impact ] When using nginx on arm64, the CPU may go under heavy load and drop packets when being stressed. A compile-time optimization to use hardware acceleration could be included to help alleviate the CPU utilization for systems that have atomic instructions available. [ Test Plan ] The test plan requires a decent amount of setup. This setup is covered in full at https://github.com/mitchdz/aws-nginx-testbed. In short, the setup involves a focal based arm64 nginx server, 4 amd64 focal endpoints running a nodejs applicaiton, and 1 amd64 focal server to initiate the workloads. This test is an E2E test which tests both functionality and performance improvements. 1) Set up the system as described in https://github.com/mitchdz/aws-nginx-testbed 2) Initiate the workload on the DRV instance $ ./wrk/wrk -t36 -c512 -d60s http://172.31.33.24:80/ # Private IP of SUT 3) While workload is running, capture results on SUT $ sudo perf record -a -e r6e sleep 20s 4) After the test is ran, you should have metrics on both the DRV and SUT system. 5) Install new nginx with -moutline atomics $ sudo add-apt-repository ppa:mitchdz/lp2024019-moutline-atomic $ sudo apt update $ Sudo apt install -y nginx $ dpkg -s nginx | grep Version Version: 1.18.0-0ubuntu1.5~focal1 6) ensure outline-atomics is installed (ctrl+f to see it, since this is not a user friendly wall of text) $ nginx -V |& grep moutline configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/build/nginx-v8ZFDO/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 -moutline-atomics' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-compat --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module 7) re-run wrk test as shown in steps 2-3 Results: - without patch: - wrk output: ``` ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512 -d60s http://172.31.40.247:80/ Running 1m test @ http://172.31.40.247:80/ 36 threads and 512 connections Thread Stats Avg Stdev Max +/- Stdev Latency 3.14ms 2.03ms 211.63ms 72.12% Req/Sec 4.67k 0.89k 23.92k 76.44% 10056815 requests in 1.00m, 5.20GB read Requests/sec: 167336.42 Transfer/sec: 88.57MB ``` - STREX count: Samples: 4M of event 'r6e', Event count (approx.): 364457784 - SUT CPU utiliz: 100% - with patch: - wrk output: ``` ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512 -d60s http://172.31.40.247:80/ Running 1m test @ http://172.31.40.247:80/ 36 threads and 512 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.02ms 1.04ms 210.87ms 99.52% Req/Sec 13.94k 327.01 16.95k 79.69% 30006090 requests in 1.00m, 15.51GB read Requests/sec: 499273.74 Transfer/sec: 264.26MB ``` - STREX count: No Samples - SUT CPU utiliz: 20% The important metrics here are the perf metrics, where we can see pre-patch the CPU is under heavy load, and 864K STREX events are seen, whereas post-patch the CPU is not under as heavy of a load with only 7 cycle events. [ Where Problems Could Occur ] Performance Trade-offs: * While this will decrease the CPU load, it will increase the utilization of atomic operations. * outlining atomics can make debugging more complicated, especially in the case of concurrency debugging. * This improves application performance through offloading instructions. This can reveal bugs that were not possible before such as race conditions, deadlocks, or incorrect synchronization. * This optimization adds a run-time check for the availability of atomic instructions. If atomics instructions are not found, ARMv8.0 compatible code is executed, so overhead will be added to systems that do not have atomic instructions. [ Other Info ] * Why is -moutline-atomics not enabled already? Focal uses gcc-9 which does not enable -moutline-atomics by default. gcc-10 is when it became enabled by default. https://devdocs.io/gcc~9/aarch64-options vs https://devdocs.io/gcc~10/aarch64-options gcc-9 changelog showing the addition of -moutline-atomics: https://gcc.gnu.org/gcc-9/changes.html Here is the thread with the discussion to enable by default: https://gcc.gnu.org/pipermail/gcc/2020-April/000490.html A lengthy discussion also happened in Debian to include this flag by default - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956418 [ Impact ] When using nginx on arm64, the CPU may go under heavy load and drop packets when being stressed. A compile-time optimization to use hardware acceleration could be included to help alleviate the CPU utilization for systems that have atomic instructions available. [ Test Plan ] The test plan requires a decent amount of setup. This setup is covered in full at https://github.com/mitchdz/aws-nginx-testbed. In short, the setup involves a focal based arm64 nginx server, 4 amd64 focal endpoints running a nodejs applicaiton, and 1 amd64 focal server to initiate the workloads. This test is an E2E test which tests both functionality and performance improvements. 1) Set up the system as described in https://github.com/mitchdz/aws-nginx-testbed 2) Initiate the workload on the DRV instance $ ./wrk/wrk -t36 -c512 -d60s http://172.31.33.24:80/ # Private IP of SUT 3) While workload is running, capture results on SUT $ sudo perf record -a -e r6e sleep 20s 4) After the test is ran, you should have metrics on both the DRV and SUT system. 5) Install new nginx with -moutline atomics $ sudo add-apt-repository ppa:mitchdz/lp2024019-moutline-atomic $ sudo apt update $ Sudo apt install -y nginx $ dpkg -s nginx | grep Version Version: 1.18.0-0ubuntu1.5~focal1 6) ensure outline-atomics is installed (ctrl+f to see it, since this is not a user friendly wall of text) $ nginx -V |& grep moutline configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/build/nginx-v8ZFDO/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 -moutline-atomics' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-compat --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module 7) re-run wrk test as shown in steps 2-3 Results: - without patch:    - wrk output:     ```       ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512 -d60s http://172.31.40.247:80/       Running 1m test @ http://172.31.40.247:80/         36 threads and 512 connections         Thread Stats Avg Stdev Max +/- Stdev           Latency 3.14ms 2.03ms 211.63ms 72.12%           Req/Sec 4.67k 0.89k 23.92k 76.44%         10056815 requests in 1.00m, 5.20GB read       Requests/sec: 167336.42       Transfer/sec: 88.57MB     ```   - STREX count: Samples: 4M of event 'r6e', Event count (approx.): 364457784   - SUT CPU utiliz: 100% - with patch:   - wrk output:     ```       ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512 -d60s http://172.31.40.247:80/       Running 1m test @ http://172.31.40.247:80/         36 threads and 512 connections         Thread Stats Avg Stdev Max +/- Stdev           Latency 1.02ms 1.04ms 210.87ms 99.52%           Req/Sec 13.94k 327.01 16.95k 79.69%         30006090 requests in 1.00m, 15.51GB read       Requests/sec: 499273.74       Transfer/sec: 264.26MB     ```   - STREX count: No Samples   - SUT CPU utiliz: 20% [ Where Problems Could Occur ] Performance Trade-offs: * While this will decrease the CPU load, it will increase the utilization of atomic operations. * outlining atomics can make debugging more complicated, especially in the case of concurrency debugging. * This improves application performance through offloading instructions. This can reveal bugs that were not possible before such as race conditions, deadlocks, or incorrect synchronization. * This optimization adds a run-time check for the availability of atomic instructions. If atomics instructions are not found, ARMv8.0 compatible code is executed, so overhead will be added to systems that do not have atomic instructions. [ Other Info ] * Why is -moutline-atomics not enabled already? Focal uses gcc-9 which does not enable -moutline-atomics by default. gcc-10 is when it became enabled by default. https://devdocs.io/gcc~9/aarch64-options vs https://devdocs.io/gcc~10/aarch64-options gcc-9 changelog showing the addition of -moutline-atomics: https://gcc.gnu.org/gcc-9/changes.html Here is the thread with the discussion to enable by default: https://gcc.gnu.org/pipermail/gcc/2020-April/000490.html A lengthy discussion also happened in Debian to include this flag by default - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956418