Activity log for bug #1876230

Date Who What changed Old value New value Message
2020-05-01 03:52:58 Matthew Ruffell bug added bug
2020-05-01 03:53:08 Matthew Ruffell nominated for series Ubuntu Bionic
2020-05-01 03:53:08 Matthew Ruffell bug task added liburcu (Ubuntu Bionic)
2020-05-01 03:53:15 Matthew Ruffell liburcu (Ubuntu): status New Fix Released
2020-05-01 03:53:18 Matthew Ruffell liburcu (Ubuntu Bionic): status New In Progress
2020-05-01 03:53:21 Matthew Ruffell liburcu (Ubuntu Bionic): importance Undecided Medium
2020-05-01 03:53:24 Matthew Ruffell liburcu (Ubuntu Bionic): assignee Matthew Ruffell (mruffell)
2020-05-01 03:53:40 Matthew Ruffell tags sts
2020-05-01 03:55:28 Matthew Ruffell description [Impact] In Linux 4.3, a new syscall was defined, called "membarrier". This systemcall was defined specifically for use in userspace-rcu (liburcu) to speed up the fast path / reader side of the library. The original implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED subcommand of the membarrier syscall. MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all processes running on the system. When it exits, the userspace thread which called it is guaranteed that all running threads share the same world view in regards to userspace addresses which are consumed by readers and writers. The problem with MEMBARRIER_CMD_SHARED is system calls made in this fashion can block, since it deploys a barrier across all threads in a system, and some other threads can be waiting on blocking operations, and take time to reach the barrier. In Linux 4.14, this was addressed by adding the MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It only targets threads which share the same mm as the thread calling the membarrier syscall, aka, threads in the current process, and not all threads / processes in the system. Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command are guaranteed non-blocking, due to using inter-processor interrupts to implement memory barriers. Because of this, membarrier calls that use MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use MEMBARRIER_CMD_SHARED. Since Bionic uses a 4.15 kernel, all kernel requirements are met, and this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in the liburcu package. This brings the performance of the liburcu library back in line to where it was in Trusty, as this particular user has performance problems upon upgrading from Trusty to Bionic. [Test] Testing performance is heavily dependant on the application which links against liburcu, and the workload which it executes. For the sake of testing, we can use the benchmarks provided in the liburcu source code. Download a copy of the source code for liburcu either from the repos or from github: $ pull-lp-source liburcu bionic # OR $ git clone https://github.com/urcu/userspace-rcu.git $ git checkout v0.10.1 # version in bionic Build the code: $ ./bootstrap $ ./configure $ make Go into the tests/benchmark directory $ cd tests/benchmark From there, you can run benchmarks for the four main usages of liburcu: urcu, urcu-bp, urcu-signal and urcu-mb. On a 8 core machine, 6 threads for readers and 2 threads for writers, with a 10 second runtime, execute: $ ./test_urcu 6 2 10 $ ./test_urcu_bp 6 2 10 $ ./test_urcu_signal 6 2 10 $ ./test_urcu_mb 6 2 10 Results: ./test_urcu 6 2 10 0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops 0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops $ ./test_urcu_bp 6 2 10 0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops 0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops $ ./test_urcu_signal 6 2 10 0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops 0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops $ ./test_urcu_mb 6 2 10 0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops 0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops The SRU only changes behaviour for urcu and urcu-bp, since they are the only "flavours" of liburcu which the patches change. From a pure ops standpoint: $ ./test_urcu 6 2 10 17612527935 ops 14989247316 ops $ ./test_urcu_bp 6 2 10 1179590602 ops 13230930051 ops We see that this particular benchmark workload, test_urcu sees extra performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is explained by the extra impact that it has on the slowpath, and the extra amount of writes it did during my benchmark. The real winner in this benchmark workload is test_urcu_bp, which sees a 10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some of this may be down to the 3x less writes it did during my benchmark. Again, these benchmarks are indicative only are very "random". Performance is really dependant on the application which links against liburcu and its workload. [Regression Potential] This SRU changes the behaviour of the following libraries which applications link against: -lurcu and -lurcu-bp. Behaviour is not changed in the rest: -lurcu-qsbr, -lucru-signal and -lucru-mb. On Bionic, liburcu will call the membarrier syscall in urcu and urcu-bp. This does not change. What is changing is the semantics of that syscall, from MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_PRIVATE_EXPEDITED. The changed code is all run in kernel space and resides in the kernel. These commits simply change the parameters which are supplied to the membarrier syscall from liburcu. I have run the testsuite that comes with the Bionic source code, and "make regtest", "make short_bench" and "make long_bench" pass. You want to run these on a cloud instance somewhere since they take multiple hours. If a regression were to occur, applications linked against -lurcu and -lurcu-bp would be affected. The homepage: https://liburcu.org/ offers a list of the major applications that use liburcu: Knot DNS, Netsniff-ng, Sheepdog, GlusterFS, gdnsd and LTTng. [Other] The two commits which are being SRU'd are: commit c0bb9f693f926595a7cb8b4ce712cef08d9f5d49 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Thu Dec 21 13:42:23 2017 -0500 Subject: liburcu: Use membarrier private expedited when available Link: https://github.com/urcu/userspace-rcu/commit/c0bb9f693f926595a7cb8b4ce712cef08d9f5d49 commit 3745305bf09e7825e75ee5b5490347ee67c6efdd Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Dec 22 10:57:59 2017 -0500 Subject: liburcu-bp: Use membarrier private expedited when available Link: https://github.com/urcu/userspace-rcu/commit/3745305bf09e7825e75ee5b5490347ee67c6efdd Both cherry pick directly onto 0.10.1 in Bionic, and are originally from 0.11.0, meaning that Eoan, Focal and Groovy already have the patch. If you are interested in how the membarrier syscall works, you can read their commits in the Linux kernel: commit 5b25b13ab08f616efd566347d809b4ece54570d1 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Sep 11 13:07:39 2015 -0700 Subject: sys_membarrier(): system-wide memory barrier (generic, x86) Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b25b13ab08f616efd566347d809b4ece54570d1 commit 22e4ebb975822833b083533035233d128b30e98f Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Jul 28 16:40:40 2017 -0400 Subject: membarrier: Provide expedited private command Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22e4ebb975822833b083533035233d128b30e98f Additionally, blog posts from LTTng: https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/ And Phoronix: https://www.phoronix.com/scan.php?page=news_item&px=URCU-Membarrier-Performance [Impact] In Linux 4.3, a new syscall was defined, called "membarrier". This systemcall was defined specifically for use in userspace-rcu (liburcu) to speed up the fast path / reader side of the library. The original implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED subcommand of the membarrier syscall. MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all processes running on the system. When it exits, the userspace thread which called it is guaranteed that all running threads share the same world view in regards to userspace addresses which are consumed by readers and writers. The problem with MEMBARRIER_CMD_SHARED is system calls made in this fashion can block, since it deploys a barrier across all threads in a system, and some other threads can be waiting on blocking operations, and take time to reach the barrier. In Linux 4.14, this was addressed by adding the MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It only targets threads which share the same mm as the thread calling the membarrier syscall, aka, threads in the current process, and not all threads / processes in the system. Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command are guaranteed non-blocking, due to using inter-processor interrupts to implement memory barriers. Because of this, membarrier calls that use MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use MEMBARRIER_CMD_SHARED. Since Bionic uses a 4.15 kernel, all kernel requirements are met, and this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in the liburcu package. This brings the performance of the liburcu library back in line to where it was in Trusty, as this particular user has performance problems upon upgrading from Trusty to Bionic. [Test] Testing performance is heavily dependant on the application which links against liburcu, and the workload which it executes. A test package is available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test For the sake of testing, we can use the benchmarks provided in the liburcu source code. Download a copy of the source code for liburcu either from the repos or from github: $ pull-lp-source liburcu bionic # OR $ git clone https://github.com/urcu/userspace-rcu.git $ git checkout v0.10.1 # version in bionic Build the code: $ ./bootstrap $ ./configure $ make Go into the tests/benchmark directory $ cd tests/benchmark From there, you can run benchmarks for the four main usages of liburcu: urcu, urcu-bp, urcu-signal and urcu-mb. On a 8 core machine, 6 threads for readers and 2 threads for writers, with a 10 second runtime, execute: $ ./test_urcu 6 2 10 $ ./test_urcu_bp 6 2 10 $ ./test_urcu_signal 6 2 10 $ ./test_urcu_mb 6 2 10 Results: ./test_urcu 6 2 10 0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops 0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops $ ./test_urcu_bp 6 2 10 0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops 0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops $ ./test_urcu_signal 6 2 10 0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops 0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops $ ./test_urcu_mb 6 2 10 0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops 0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops The SRU only changes behaviour for urcu and urcu-bp, since they are the only "flavours" of liburcu which the patches change. From a pure ops standpoint: $ ./test_urcu 6 2 10 17612527935 ops 14989247316 ops $ ./test_urcu_bp 6 2 10 1179590602 ops 13230930051 ops We see that this particular benchmark workload, test_urcu sees extra performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is explained by the extra impact that it has on the slowpath, and the extra amount of writes it did during my benchmark. The real winner in this benchmark workload is test_urcu_bp, which sees a 10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some of this may be down to the 3x less writes it did during my benchmark. Again, these benchmarks are indicative only are very "random". Performance is really dependant on the application which links against liburcu and its workload. [Regression Potential] This SRU changes the behaviour of the following libraries which applications link against: -lurcu and -lurcu-bp. Behaviour is not changed in the rest: -lurcu-qsbr, -lucru-signal and -lucru-mb. On Bionic, liburcu will call the membarrier syscall in urcu and urcu-bp. This does not change. What is changing is the semantics of that syscall, from MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_PRIVATE_EXPEDITED. The changed code is all run in kernel space and resides in the kernel. These commits simply change the parameters which are supplied to the membarrier syscall from liburcu. I have run the testsuite that comes with the Bionic source code, and "make regtest", "make short_bench" and "make long_bench" pass. You want to run these on a cloud instance somewhere since they take multiple hours. If a regression were to occur, applications linked against -lurcu and -lurcu-bp would be affected. The homepage: https://liburcu.org/ offers a list of the major applications that use liburcu: Knot DNS, Netsniff-ng, Sheepdog, GlusterFS, gdnsd and LTTng. [Other] The two commits which are being SRU'd are: commit c0bb9f693f926595a7cb8b4ce712cef08d9f5d49 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Thu Dec 21 13:42:23 2017 -0500 Subject: liburcu: Use membarrier private expedited when available Link: https://github.com/urcu/userspace-rcu/commit/c0bb9f693f926595a7cb8b4ce712cef08d9f5d49 commit 3745305bf09e7825e75ee5b5490347ee67c6efdd Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Dec 22 10:57:59 2017 -0500 Subject: liburcu-bp: Use membarrier private expedited when available Link: https://github.com/urcu/userspace-rcu/commit/3745305bf09e7825e75ee5b5490347ee67c6efdd Both cherry pick directly onto 0.10.1 in Bionic, and are originally from 0.11.0, meaning that Eoan, Focal and Groovy already have the patch. If you are interested in how the membarrier syscall works, you can read their commits in the Linux kernel: commit 5b25b13ab08f616efd566347d809b4ece54570d1 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Sep 11 13:07:39 2015 -0700 Subject: sys_membarrier(): system-wide memory barrier (generic, x86) Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b25b13ab08f616efd566347d809b4ece54570d1 commit 22e4ebb975822833b083533035233d128b30e98f Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Jul 28 16:40:40 2017 -0400 Subject: membarrier: Provide expedited private command Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22e4ebb975822833b083533035233d128b30e98f Additionally, blog posts from LTTng: https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/ And Phoronix: https://www.phoronix.com/scan.php?page=news_item&px=URCU-Membarrier-Performance
2020-05-01 04:26:58 Matthew Ruffell attachment added liburcu debdiff for Bionic https://bugs.launchpad.net/ubuntu/+source/liburcu/+bug/1876230/+attachment/5364290/+files/lp1876230_bionic.debdiff
2020-05-03 22:57:32 Dominique Poulain bug added subscriber Dominique Poulain
2020-05-05 09:55:06 Dan Streetman bug added subscriber STS Sponsors
2020-05-05 09:55:20 Dan Streetman tags sts sts sts-sponsor-ddstreet
2020-05-05 10:26:19 Dan Streetman description [Impact] In Linux 4.3, a new syscall was defined, called "membarrier". This systemcall was defined specifically for use in userspace-rcu (liburcu) to speed up the fast path / reader side of the library. The original implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED subcommand of the membarrier syscall. MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all processes running on the system. When it exits, the userspace thread which called it is guaranteed that all running threads share the same world view in regards to userspace addresses which are consumed by readers and writers. The problem with MEMBARRIER_CMD_SHARED is system calls made in this fashion can block, since it deploys a barrier across all threads in a system, and some other threads can be waiting on blocking operations, and take time to reach the barrier. In Linux 4.14, this was addressed by adding the MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It only targets threads which share the same mm as the thread calling the membarrier syscall, aka, threads in the current process, and not all threads / processes in the system. Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command are guaranteed non-blocking, due to using inter-processor interrupts to implement memory barriers. Because of this, membarrier calls that use MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use MEMBARRIER_CMD_SHARED. Since Bionic uses a 4.15 kernel, all kernel requirements are met, and this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in the liburcu package. This brings the performance of the liburcu library back in line to where it was in Trusty, as this particular user has performance problems upon upgrading from Trusty to Bionic. [Test] Testing performance is heavily dependant on the application which links against liburcu, and the workload which it executes. A test package is available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test For the sake of testing, we can use the benchmarks provided in the liburcu source code. Download a copy of the source code for liburcu either from the repos or from github: $ pull-lp-source liburcu bionic # OR $ git clone https://github.com/urcu/userspace-rcu.git $ git checkout v0.10.1 # version in bionic Build the code: $ ./bootstrap $ ./configure $ make Go into the tests/benchmark directory $ cd tests/benchmark From there, you can run benchmarks for the four main usages of liburcu: urcu, urcu-bp, urcu-signal and urcu-mb. On a 8 core machine, 6 threads for readers and 2 threads for writers, with a 10 second runtime, execute: $ ./test_urcu 6 2 10 $ ./test_urcu_bp 6 2 10 $ ./test_urcu_signal 6 2 10 $ ./test_urcu_mb 6 2 10 Results: ./test_urcu 6 2 10 0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops 0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops $ ./test_urcu_bp 6 2 10 0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops 0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops $ ./test_urcu_signal 6 2 10 0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops 0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops $ ./test_urcu_mb 6 2 10 0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops 0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops The SRU only changes behaviour for urcu and urcu-bp, since they are the only "flavours" of liburcu which the patches change. From a pure ops standpoint: $ ./test_urcu 6 2 10 17612527935 ops 14989247316 ops $ ./test_urcu_bp 6 2 10 1179590602 ops 13230930051 ops We see that this particular benchmark workload, test_urcu sees extra performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is explained by the extra impact that it has on the slowpath, and the extra amount of writes it did during my benchmark. The real winner in this benchmark workload is test_urcu_bp, which sees a 10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some of this may be down to the 3x less writes it did during my benchmark. Again, these benchmarks are indicative only are very "random". Performance is really dependant on the application which links against liburcu and its workload. [Regression Potential] This SRU changes the behaviour of the following libraries which applications link against: -lurcu and -lurcu-bp. Behaviour is not changed in the rest: -lurcu-qsbr, -lucru-signal and -lucru-mb. On Bionic, liburcu will call the membarrier syscall in urcu and urcu-bp. This does not change. What is changing is the semantics of that syscall, from MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_PRIVATE_EXPEDITED. The changed code is all run in kernel space and resides in the kernel. These commits simply change the parameters which are supplied to the membarrier syscall from liburcu. I have run the testsuite that comes with the Bionic source code, and "make regtest", "make short_bench" and "make long_bench" pass. You want to run these on a cloud instance somewhere since they take multiple hours. If a regression were to occur, applications linked against -lurcu and -lurcu-bp would be affected. The homepage: https://liburcu.org/ offers a list of the major applications that use liburcu: Knot DNS, Netsniff-ng, Sheepdog, GlusterFS, gdnsd and LTTng. [Other] The two commits which are being SRU'd are: commit c0bb9f693f926595a7cb8b4ce712cef08d9f5d49 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Thu Dec 21 13:42:23 2017 -0500 Subject: liburcu: Use membarrier private expedited when available Link: https://github.com/urcu/userspace-rcu/commit/c0bb9f693f926595a7cb8b4ce712cef08d9f5d49 commit 3745305bf09e7825e75ee5b5490347ee67c6efdd Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Dec 22 10:57:59 2017 -0500 Subject: liburcu-bp: Use membarrier private expedited when available Link: https://github.com/urcu/userspace-rcu/commit/3745305bf09e7825e75ee5b5490347ee67c6efdd Both cherry pick directly onto 0.10.1 in Bionic, and are originally from 0.11.0, meaning that Eoan, Focal and Groovy already have the patch. If you are interested in how the membarrier syscall works, you can read their commits in the Linux kernel: commit 5b25b13ab08f616efd566347d809b4ece54570d1 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Sep 11 13:07:39 2015 -0700 Subject: sys_membarrier(): system-wide memory barrier (generic, x86) Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b25b13ab08f616efd566347d809b4ece54570d1 commit 22e4ebb975822833b083533035233d128b30e98f Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Jul 28 16:40:40 2017 -0400 Subject: membarrier: Provide expedited private command Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22e4ebb975822833b083533035233d128b30e98f Additionally, blog posts from LTTng: https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/ And Phoronix: https://www.phoronix.com/scan.php?page=news_item&px=URCU-Membarrier-Performance [Impact] In Linux 4.3, a new syscall was defined, called "membarrier". This systemcall was defined specifically for use in userspace-rcu (liburcu) to speed up the fast path / reader side of the library. The original implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED subcommand of the membarrier syscall. MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all processes running on the system. When it exits, the userspace thread which called it is guaranteed that all running threads share the same world view in regards to userspace addresses which are consumed by readers and writers. The problem with MEMBARRIER_CMD_SHARED is system calls made in this fashion can block, since it deploys a barrier across all threads in a system, and some other threads can be waiting on blocking operations, and take time to reach the barrier. In Linux 4.14, this was addressed by adding the MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It only targets threads which share the same mm as the thread calling the membarrier syscall, aka, threads in the current process, and not all threads / processes in the system. Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command are guaranteed non-blocking, due to using inter-processor interrupts to implement memory barriers. Because of this, membarrier calls that use MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use MEMBARRIER_CMD_SHARED. Since Bionic uses a 4.15 kernel, all kernel requirements are met, and this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in the liburcu package. This brings the performance of the liburcu library back in line to where it was in Trusty, as this particular user has performance problems upon upgrading from Trusty to Bionic. [Test] Testing performance is heavily dependant on the application which links against liburcu, and the workload which it executes. A test package is available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test For the sake of testing, we can use the benchmarks provided in the liburcu source code. Download a copy of the source code for liburcu either from the repos or from github: $ pull-lp-source liburcu bionic # OR $ git clone https://github.com/urcu/userspace-rcu.git $ git checkout v0.10.1 # version in bionic Build the code: $ ./bootstrap $ ./configure $ make Go into the tests/benchmark directory $ cd tests/benchmark From there, you can run benchmarks for the four main usages of liburcu: urcu, urcu-bp, urcu-signal and urcu-mb. On a 8 core machine, 6 threads for readers and 2 threads for writers, with a 10 second runtime, execute: $ ./test_urcu 6 2 10 $ ./test_urcu_bp 6 2 10 $ ./test_urcu_signal 6 2 10 $ ./test_urcu_mb 6 2 10 Results: ./test_urcu 6 2 10 0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops 0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops $ ./test_urcu_bp 6 2 10 0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops 0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops $ ./test_urcu_signal 6 2 10 0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops 0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops $ ./test_urcu_mb 6 2 10 0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops 0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops The SRU only changes behaviour for urcu and urcu-bp, since they are the only "flavours" of liburcu which the patches change. From a pure ops standpoint: $ ./test_urcu 6 2 10 17612527935 ops 14989247316 ops $ ./test_urcu_bp 6 2 10 1179590602 ops 13230930051 ops We see that this particular benchmark workload, test_urcu sees extra performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is explained by the extra impact that it has on the slowpath, and the extra amount of writes it did during my benchmark. The real winner in this benchmark workload is test_urcu_bp, which sees a 10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some of this may be down to the 3x less writes it did during my benchmark. Again, these benchmarks are indicative only are very "random". Performance is really dependant on the application which links against liburcu and its workload. [Regression Potential] This SRU changes the behaviour of the following libraries which applications link against: -lurcu and -lurcu-bp. Behaviour is not changed in the rest: -lurcu-qsbr, -lucru-signal and -lucru-mb. On Bionic, liburcu will call the membarrier syscall in urcu and urcu-bp. This does not change. What is changing is the semantics of that syscall, from MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_PRIVATE_EXPEDITED. The changed code is all run in kernel space and resides in the kernel. These commits simply change the parameters which are supplied to the membarrier syscall from liburcu. I have run the testsuite that comes with the Bionic source code, and "make regtest", "make short_bench" and "make long_bench" pass. You want to run these on a cloud instance somewhere since they take multiple hours. If a regression were to occur, applications linked against -lurcu and -lurcu-bp would be affected. The homepage: https://liburcu.org/ offers a list of the major applications that use liburcu: Knot DNS, Netsniff-ng, Sheepdog, GlusterFS, gdnsd and LTTng. [Scope] The two commits which are being SRU'd are: commit c0bb9f693f926595a7cb8b4ce712cef08d9f5d49 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Thu Dec 21 13:42:23 2017 -0500 Subject: liburcu: Use membarrier private expedited when available Link: https://github.com/urcu/userspace-rcu/commit/c0bb9f693f926595a7cb8b4ce712cef08d9f5d49 commit 3745305bf09e7825e75ee5b5490347ee67c6efdd Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Dec 22 10:57:59 2017 -0500 Subject: liburcu-bp: Use membarrier private expedited when available Link: https://github.com/urcu/userspace-rcu/commit/3745305bf09e7825e75ee5b5490347ee67c6efdd Both cherry pick directly onto 0.10.1 in Bionic, and are originally from 0.11.0, meaning that Eoan, Focal and Groovy already have the patch. [Other] If you are interested in how the membarrier syscall works, you can read their commits in the Linux kernel: commit 5b25b13ab08f616efd566347d809b4ece54570d1 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Sep 11 13:07:39 2015 -0700 Subject: sys_membarrier(): system-wide memory barrier (generic, x86) Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b25b13ab08f616efd566347d809b4ece54570d1 commit 22e4ebb975822833b083533035233d128b30e98f Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Fri Jul 28 16:40:40 2017 -0400 Subject: membarrier: Provide expedited private command Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22e4ebb975822833b083533035233d128b30e98f Additionally, blog posts from LTTng: https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/ And Phoronix: https://www.phoronix.com/scan.php?page=news_item&px=URCU-Membarrier-Performance
2020-05-05 10:46:04 Dan Streetman bug added subscriber Dan Streetman
2020-05-15 07:45:24 Łukasz Zemczak liburcu (Ubuntu Bionic): status In Progress Fix Committed
2020-05-15 07:45:25 Łukasz Zemczak bug added subscriber Ubuntu Stable Release Updates Team
2020-05-15 07:45:27 Łukasz Zemczak bug added subscriber SRU Verification
2020-05-15 07:45:31 Łukasz Zemczak tags sts sts-sponsor-ddstreet sts sts-sponsor-ddstreet verification-needed verification-needed-bionic
2020-05-21 05:51:10 Matthew Ruffell tags sts sts-sponsor-ddstreet verification-needed verification-needed-bionic sts sts-sponsor-ddstreet verification-done-bionic
2020-05-25 08:13:47 Launchpad Janitor liburcu (Ubuntu Bionic): status Fix Committed Fix Released
2020-05-25 08:13:50 Łukasz Zemczak removed subscriber Ubuntu Stable Release Updates Team