Add GCC atomic support (-moutline-atomics) for arm64 on Focal

Bug #2024019 reported by Salvatore
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
haproxy (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Triaged
Undecided
Mitchell Dzurick
nginx (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Committed
Undecided
Mitchell Dzurick
postgresql-12 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Triaged
Undecided
Mitchell Dzurick

Bug Description

[ Impact ]

When using nginx on arm64, the CPU may go under heavy load and drop packets when being stressed.

A compile-time optimization to use hardware acceleration could be included to help alleviate the CPU utilization for systems that have atomic instructions available.

[ Test Plan ]

The test plan requires a decent amount of setup. This setup is covered in full at https://github.com/mitchdz/aws-nginx-testbed.

In short, the setup involves a focal based arm64 nginx server, 4 amd64 focal endpoints running a nodejs applicaiton, and 1 amd64 focal server to initiate the workloads.

This test is an E2E test which tests both functionality and performance improvements.

1) Set up the system as described in https://github.com/mitchdz/aws-nginx-testbed

2) Initiate the workload on the DRV instance
$ ./wrk/wrk -t36 -c512 -d60s http://172.31.33.24:80/ # Private IP of SUT

3) While workload is running, capture results on SUT
$ sudo perf record -a -e r6e sleep 20s

4) After the test is ran, you should have metrics on both the DRV and SUT system.

5) Install new nginx with -moutline atomics
$ sudo add-apt-repository ppa:mitchdz/lp2024019-moutline-atomic
$ sudo apt update
$ Sudo apt install -y nginx
$ dpkg -s nginx | grep Version
Version: 1.18.0-0ubuntu1.5~focal1

6) ensure outline-atomics is installed (ctrl+f to see it, since this is not a user friendly wall of text)
$ nginx -V |& grep moutline
configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/build/nginx-v8ZFDO/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 -moutline-atomics' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-compat --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module

7) re-run wrk test as shown in steps 2-3

Results:
- without patch:
   - wrk output:
    ```
      ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512 -d60s http://172.31.40.247:80/
      Running 1m test @ http://172.31.40.247:80/
        36 threads and 512 connections
        Thread Stats Avg Stdev Max +/- Stdev
          Latency 3.14ms 2.03ms 211.63ms 72.12%
          Req/Sec 4.67k 0.89k 23.92k 76.44%
        10056815 requests in 1.00m, 5.20GB read
      Requests/sec: 167336.42
      Transfer/sec: 88.57MB
    ```
  - STREX count: Samples: 4M of event 'r6e', Event count (approx.): 364457784
  - SUT CPU utiliz: 100%

- with patch:
  - wrk output:
    ```
      ubuntu@ip-172-31-43-203:~/wrk$ ./wrk -t36 -c512 -d60s http://172.31.40.247:80/
      Running 1m test @ http://172.31.40.247:80/
        36 threads and 512 connections
        Thread Stats Avg Stdev Max +/- Stdev
          Latency 1.02ms 1.04ms 210.87ms 99.52%
          Req/Sec 13.94k 327.01 16.95k 79.69%
        30006090 requests in 1.00m, 15.51GB read
      Requests/sec: 499273.74
      Transfer/sec: 264.26MB
    ```
  - STREX count: No Samples
  - SUT CPU utiliz: 20%

Additionally, steps (1) to (7) should be run on an ARMv8.0 system to validate that this does not break those systems and that the performance penalty for these systems is minimal.

[ Where Problems Could Occur ]
Performance Trade-offs:
* While this will decrease the CPU load, it will increase the utilization of atomic operations.
* outlining atomics can make debugging more complicated, especially in the case of concurrency debugging.
* This improves application performance through offloading instructions. This can reveal bugs that were not possible before such as race conditions, deadlocks, or incorrect synchronization.
* This optimization adds a run-time check for the availability of atomic instructions. If atomics instructions are not found, ARMv8.0 compatible code is executed, so overhead will be added to systems that do not have atomic instructions.

[ Other Info ]
* Why is -moutline-atomics not enabled already?
Focal uses gcc-9 which does not enable -moutline-atomics by default. gcc-10 is when it became enabled by default.

https://devdocs.io/gcc~9/aarch64-options
vs
https://devdocs.io/gcc~10/aarch64-options

gcc-9 changelog showing the addition of -moutline-atomics: https://gcc.gnu.org/gcc-9/changes.html

Here is the thread with the discussion to enable by default: https://gcc.gnu.org/pipermail/gcc/2020-April/000490.html

A lengthy discussion also happened in Debian to include this flag by default - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956418

Related branches

Robie Basak (racb)
tags: added: server-todo
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I talked with Fabio (now subscribed), he will track and ensure this is done together with all the other "enable -moutline-atomics" cases.

tags: removed: server-todo
Changed in nginx (Ubuntu):
assignee: nobody → Fabio Augusto Miranda Martins (fabio.martins)
Revision history for this message
Salvatore (dipietro-salvatore) wrote :

Do you have any update regarding this change?

Revision history for this message
Thomas Ward (teward) wrote :

@paelzer if you could update it that'd be great.

@Salvatore Note that @sil2100 made a 'Needs information' request on the pull request from here, and that needs attention, you might see more stuff there about problems, etc. introduced that're blocking this.

Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

This request was previously blocked by the GCC SRU for focal (LP#1994020) which is now released. I'm discussing this internally with Paelzer's team and we'll update this launchpad with the progress.

Changed in nginx (Ubuntu):
assignee: Fabio Augusto Miranda Martins (fabio.martins) → nobody
tags: added: server-triage-discuss
tags: removed: server-triage-discuss
Changed in nginx (Ubuntu):
status: New → In Progress
importance: Undecided → Low
assignee: nobody → Mitchell Dzurick (mitchdz)
Changed in nginx (Ubuntu):
assignee: Mitchell Dzurick (mitchdz) → nobody
Changed in haproxy (Ubuntu Focal):
assignee: nobody → Mitchell Dzurick (mitchdz)
Changed in nginx (Ubuntu Focal):
assignee: nobody → Mitchell Dzurick (mitchdz)
Changed in postgresql-12 (Ubuntu Focal):
assignee: nobody → Mitchell Dzurick (mitchdz)
Changed in nginx (Ubuntu):
importance: Low → Undecided
Changed in nginx (Ubuntu):
status: In Progress → New
description: updated
Changed in haproxy (Ubuntu):
status: New → Invalid
Changed in haproxy (Ubuntu Focal):
status: New → In Progress
status: In Progress → Triaged
Changed in nginx (Ubuntu Focal):
status: New → In Progress
Changed in postgresql-12 (Ubuntu Focal):
status: New → Triaged
Changed in postgresql-12 (Ubuntu):
status: New → Invalid
Changed in nginx (Ubuntu):
status: New → Invalid
description: updated
Revision history for this message
Chris Halse Rogers (raof) wrote :

SRU review: Ok. From digging around I see that in one benchmark there's a 1% performance decrease measured on ARMv8.0 processors, which seems like an acceptable tradeoff for the very significant performance increase on ARMv8.1 processors¹.

Since we *have* this very nice performance test, could we *also* run it on an ARMv8.0 system as a part of validation? That would give us the relevant performance penalty for v8.0 systems - which would be good to verify is not significant - and also check that the code *works* on v8.0 systems!

So, in summary: this looks like an acceptable SRU, as long as we test on both an ARMv8.0 system and an ARMv8.1 system, and the v8.0 system does not significantly regress performance (around 1% seems ok, much more is iffy).

¹: Although it would be *nice* to have some sort of data as to how prevalent these chips are - is this a big win for almost everyone, with only a few chips that don't support v8.1, or a big win only on uncommon chips?

Changed in nginx (Ubuntu Focal):
status: In Progress → Incomplete
Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Thanks Chris.

I'm setting up a Graviton system (ARMv8.0) in a different AWS region, since my old region doesn't support it. Results coming soon.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote (last edit ):

I did 3 runs pre/post patch and recorded the data in the attached file.

Post-patch means using my PPA from [0].

I had to set up these systems in a different AWS region. I chose eu-west-1 simply because it supports the AWS a1.metal instance type which is the first graviton instance. I set these systems up as described in[1], otherwise the test performed was identical to[2].

Overall I think the degradation in performance is nominal in this setup. Notes I took on the results:

* I saw a 0.72% increase in CPU usage
* The strex events seem relatively the same, the first run post-patch seemed a little high, but that could just be due to inconsistencies due to an e2e test like this, the 2 runs after seem to equalize.

To summarize the results in the txt file:

request/s
pre-patch: (160287+152196+154558)/3 = 155680
post-patch: (154558+162545+161283)/3 = 159462

strex events:
pre-patch: (894+867+883)/3 = 881
post-patch: (913+879+872)/3 = 888

CPU usage (only did one run, explained how this is gathered in txt document)
pre-patch: 98.77%
post-patch: 99.49%

[0] - https://launchpad.net/~mitchdz/+archive/ubuntu/lp2024019-moutline-atomic
[1] - https://github.com/mitchdz/aws-nginx-testbed/blob/main/ARMv8.0-test.bash
[2] - https://github.com/mitchdz/aws-nginx-testbed

Changed in nginx (Ubuntu Focal):
status: Incomplete → In Progress
Revision history for this message
Chris Halse Rogers (raof) wrote :

Excellent, that's very helpful!

I've updated the test plan in the bug to note that once the package is built in -proposed these ARMv8.0 tests should be run against that (just as standard SRU-process), and I'll now accept this into -proposed.

description: updated
Changed in nginx (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Salvatore, or anyone else affected,

Accepted nginx into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nginx/1.18.0-0ubuntu1.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Attached is the detailed test results. No significant performance regression is seen in ARMv8.0, and a significant improvement in STREX events are seen in ARMv8.1, which makes me happy to verify the proposed package.

Test result summary:

Package tested: nginx 1.18.0-0ubuntu1.5

[ARMv8.0]
STREX EVENTS
* 178 -> 181

CPU Usage
* 99.21 -> 99.64

[ARMv8.1]
STREX EVENTS
* 151 -> 0

CPU Usage
* 100 -> 19.47

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.