liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
liburcu (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Medium
|
Matthew Ruffell |
Bug Description
[Impact]
In Linux 4.3, a new syscall was defined, called "membarrier". This systemcall was defined specifically for use in userspace-rcu (liburcu) to speed up the fast path / reader side of the library. The original implementation in Linux 4.3 only supported the MEMBARRIER_
MEMBARRIER_
The problem with MEMBARRIER_
In Linux 4.14, this was addressed by adding the MEMBARRIER_
Calls to membarrier with the MEMBARRIER_
Because of this, membarrier calls that use MEMBARRIER_
Since Bionic uses a 4.15 kernel, all kernel requirements are met, and this SRU is to enable support for MEMBARRIER_
This brings the performance of the liburcu library back in line to where it was in Trusty, as this particular user has performance problems upon upgrading from Trusty to Bionic.
[Test]
Testing performance is heavily dependant on the application which links against liburcu, and the workload which it executes.
A test package is available in the following ppa:
https:/
For the sake of testing, we can use the benchmarks provided in the liburcu source code. Download a copy of the source code for liburcu either from the repos or from github:
$ pull-lp-source liburcu bionic
# OR
$ git clone https:/
$ git checkout v0.10.1 # version in bionic
Build the code:
$ ./bootstrap
$ ./configure
$ make
Go into the tests/benchmark directory
$ cd tests/benchmark
From there, you can run benchmarks for the four main usages of liburcu: urcu, urcu-bp, urcu-signal and urcu-mb.
On a 8 core machine, 6 threads for readers and 2 threads for writers, with a 10 second runtime, execute:
$ ./test_urcu 6 2 10
$ ./test_urcu_bp 6 2 10
$ ./test_urcu_signal 6 2 10
$ ./test_urcu_mb 6 2 10
Results:
./test_urcu 6 2 10
0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops
0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops
$ ./test_urcu_bp 6 2 10
0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops
0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops
$ ./test_urcu_signal 6 2 10
0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops
0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops
$ ./test_urcu_mb 6 2 10
0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops
0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops
The SRU only changes behaviour for urcu and urcu-bp, since they are the only "flavours" of liburcu which the patches change. From a pure ops standpoint:
$ ./test_urcu 6 2 10
17612527935 ops
14989247316 ops
$ ./test_urcu_bp 6 2 10
1179590602 ops
13230930051 ops
We see that this particular benchmark workload, test_urcu sees extra performance overhead with MEMBARRIER_
The real winner in this benchmark workload is test_urcu_bp, which sees a 10x performance increase with MEMBARRIER_
Again, these benchmarks are indicative only are very "random". Performance is really dependant on the application which links against liburcu and its workload.
[Regression Potential]
This SRU changes the behaviour of the following libraries which applications link against: -lurcu and -lurcu-bp. Behaviour is not changed in the rest: -lurcu-qsbr, -lucru-signal and -lucru-mb.
On Bionic, liburcu will call the membarrier syscall in urcu and urcu-bp. This does not change. What is changing is the semantics of that syscall, from MEMBARRIER_
I have run the testsuite that comes with the Bionic source code, and "make regtest", "make short_bench" and "make long_bench" pass. You want to run these on a cloud instance somewhere since they take multiple hours.
If a regression were to occur, applications linked against -lurcu and -lurcu-bp would be affected. The homepage: https:/
[Scope]
The two commits which are being SRU'd are:
commit c0bb9f693f92659
Author: Mathieu Desnoyers <email address hidden>
Date: Thu Dec 21 13:42:23 2017 -0500
Subject: liburcu: Use membarrier private expedited when available
Link: https:/
commit 3745305bf09e782
Author: Mathieu Desnoyers <email address hidden>
Date: Fri Dec 22 10:57:59 2017 -0500
Subject: liburcu-bp: Use membarrier private expedited when available
Link: https:/
Both cherry pick directly onto 0.10.1 in Bionic, and are originally from 0.11.0, meaning that Eoan, Focal and Groovy already have the patch.
[Other]
If you are interested in how the membarrier syscall works, you can read their commits in the Linux kernel:
commit 5b25b13ab08f616
Author: Mathieu Desnoyers <email address hidden>
Date: Fri Sep 11 13:07:39 2015 -0700
Subject: sys_membarrier(): system-wide memory barrier (generic, x86)
Link: https:/
commit 22e4ebb97582283
Author: Mathieu Desnoyers <email address hidden>
Date: Fri Jul 28 16:40:40 2017 -0400
Subject: membarrier: Provide expedited private command
Link: https:/
Additionally, blog posts from LTTng:
https:/
And Phoronix:
https:/
Changed in liburcu (Ubuntu): | |
status: | New → Fix Released |
Changed in liburcu (Ubuntu Bionic): | |
status: | New → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Matthew Ruffell (mruffell) |
tags: | added: sts |
description: | updated |
tags: | added: sts-sponsor-ddstreet |
description: | updated |
Attached is a debdiff for Bionic