2019-08-13 14:02:30 |
Heitor Alves de Siqueira |
bug |
|
|
added bug |
2019-08-13 14:03:39 |
Heitor Alves de Siqueira |
description |
[Impact]
Performance degradation for read/write workloads in bcache devices, occasional system stalls
[Description]
In the latest bcache drivers, there's a sysfs attribute that calculates bucket priority statistics in /sys/fs/bcache/*/cache0/priority_stats. Querying this file has a big performance impact on tasks that run in the same CPU, and also affects read/write performance of the bcache device itself.
This is due to the way the driver calculates the stats: the bcache buckets are locked and iterated through, collecting information about each individual bucket. An array of nbucket elements is constructed and sorted afterwards, which can cause very high CPU contention in cases of larger bcache setups.
From our tests, the sorting step of the priority_stats query causes the most expressive performance reduction, as it can hinder tasks that are not even doing any bcache IO. If a task is "unlucky" to be scheduled in the same CPU as the sysfs query, its performance will be harshly reduced as both compete for CPU time. We've had users report systems stalls of up to ~6s due to this, as a result from monitoring tools that query the priority_stats periodically (e.g. Prometheus Node Exporter from [0]). These system stalls have triggered several other issues such as ceph-mon re-elections, problems in percona-cluster and general network stalls, so the impact is not isolated to bcache IO workloads.
[0] https://github.com/prometheus/node_exporter
[Test Case]
Note: As the sorting step has the most noticeable performance impact, the test case below pins a workload and the sysfs query to the same CPU. CPU contention issues still occur without any pinning, this just removes the scheduling factor of landing in different CPUs and affecting different tasks.
1) Start a read/write workload on the bcache device with e.g. fio or dd, pinned to a certain CPU:
# taskset 0x10 dd if=/dev/zero of=/dev/bcache0 bs=4k status=progress
2) Start a sysfs query loop for the priority_stats attribute pinned to the same CPU:
# for i in {1..100000}; do taskset 0x10 cat /sys/fs/bcache/*/cache0/priority_stats
3) Monitor the read/write workload for any performance impact |
[Impact]
Performance degradation for read/write workloads in bcache devices, occasional system stalls
[Description]
In the latest bcache drivers, there's a sysfs attribute that calculates bucket priority statistics in /sys/fs/bcache/*/cache0/priority_stats. Querying this file has a big performance impact on tasks that run in the same CPU, and also affects read/write performance of the bcache device itself.
This is due to the way the driver calculates the stats: the bcache buckets are locked and iterated through, collecting information about each individual bucket. An array of nbucket elements is constructed and sorted afterwards, which can cause very high CPU contention in cases of larger bcache setups.
From our tests, the sorting step of the priority_stats query causes the most expressive performance reduction, as it can hinder tasks that are not even doing any bcache IO. If a task is "unlucky" to be scheduled in the same CPU as the sysfs query, its performance will be harshly reduced as both compete for CPU time. We've had users report systems stalls of up to ~6s due to this, as a result from monitoring tools that query the priority_stats periodically (e.g. Prometheus Node Exporter from [0]). These system stalls have triggered several other issues such as ceph-mon re-elections, problems in percona-cluster and general network stalls, so the impact is not isolated to bcache IO workloads.
[0] https://github.com/prometheus/node_exporter
[Test Case]
Note: As the sorting step has the most noticeable performance impact, the test case below pins a workload and the sysfs query to the same CPU. CPU contention issues still occur without any pinning, this just removes the scheduling factor of landing in different CPUs and affecting different tasks.
1) Start a read/write workload on the bcache device with e.g. fio or dd, pinned to a certain CPU:
# taskset 0x10 dd if=/dev/zero of=/dev/bcache0 bs=4k status=progress
2) Start a sysfs query loop for the priority_stats attribute pinned to the same CPU:
# for i in {1..100000}; do taskset 0x10 cat /sys/fs/bcache/*/cache0/priority_stats > /dev/null; done
3) Monitor the read/write workload for any performance impact |
|
2019-08-13 14:21:51 |
Heitor Alves de Siqueira |
attachment added |
|
bcache-results.tar.gz https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840043/+attachment/5282348/+files/bcache-results.tar.gz |
|
2019-08-13 14:30:05 |
Ubuntu Kernel Bot |
linux (Ubuntu): status |
New |
Incomplete |
|
2019-08-13 14:56:23 |
Heitor Alves de Siqueira |
linux (Ubuntu): status |
Incomplete |
Confirmed |
|
2019-08-13 16:16:17 |
Guilherme G. Piccoli |
bug |
|
|
added subscriber Guilherme G. Piccoli |
2019-08-16 10:39:43 |
Heitor Alves de Siqueira |
bug task added |
|
linux |
|
2019-08-16 10:41:30 |
Heitor Alves de Siqueira |
linux: status |
New |
In Progress |
|
2019-08-20 16:29:46 |
Peter Sabaini |
tags |
sts |
canonical-bootstack sts |
|
2019-08-20 16:30:00 |
Peter Sabaini |
bug |
|
|
added subscriber Canonical IS CREs |
2019-08-20 17:11:47 |
Nivedita Singhvi |
bug |
|
|
added subscriber Nivedita Singhvi |
2019-10-10 19:41:16 |
Heitor Alves de Siqueira |
description |
[Impact]
Performance degradation for read/write workloads in bcache devices, occasional system stalls
[Description]
In the latest bcache drivers, there's a sysfs attribute that calculates bucket priority statistics in /sys/fs/bcache/*/cache0/priority_stats. Querying this file has a big performance impact on tasks that run in the same CPU, and also affects read/write performance of the bcache device itself.
This is due to the way the driver calculates the stats: the bcache buckets are locked and iterated through, collecting information about each individual bucket. An array of nbucket elements is constructed and sorted afterwards, which can cause very high CPU contention in cases of larger bcache setups.
From our tests, the sorting step of the priority_stats query causes the most expressive performance reduction, as it can hinder tasks that are not even doing any bcache IO. If a task is "unlucky" to be scheduled in the same CPU as the sysfs query, its performance will be harshly reduced as both compete for CPU time. We've had users report systems stalls of up to ~6s due to this, as a result from monitoring tools that query the priority_stats periodically (e.g. Prometheus Node Exporter from [0]). These system stalls have triggered several other issues such as ceph-mon re-elections, problems in percona-cluster and general network stalls, so the impact is not isolated to bcache IO workloads.
[0] https://github.com/prometheus/node_exporter
[Test Case]
Note: As the sorting step has the most noticeable performance impact, the test case below pins a workload and the sysfs query to the same CPU. CPU contention issues still occur without any pinning, this just removes the scheduling factor of landing in different CPUs and affecting different tasks.
1) Start a read/write workload on the bcache device with e.g. fio or dd, pinned to a certain CPU:
# taskset 0x10 dd if=/dev/zero of=/dev/bcache0 bs=4k status=progress
2) Start a sysfs query loop for the priority_stats attribute pinned to the same CPU:
# for i in {1..100000}; do taskset 0x10 cat /sys/fs/bcache/*/cache0/priority_stats > /dev/null; done
3) Monitor the read/write workload for any performance impact |
[Impact]
Querying bcache's priority_stats attribute in sysfs causes severe performance degradation for read/write workloads and occasional system stalls
[Test Case]
Note: As the sorting step has the most noticeable performance impact, the test case below pins a workload and the sysfs query to the same CPU. CPU contention issues still occur without any pinning, this just removes the scheduling factor of landing in different CPUs and affecting different tasks.
1) Start a read/write workload on the bcache device with e.g. fio or dd, pinned to a certain CPU:
# taskset 0x10 dd if=/dev/zero of=/dev/bcache0 bs=4k status=progress
2) Start a sysfs query loop for the priority_stats attribute pinned to the same CPU:
# for i in {1..100000}; do taskset 0x10 cat /sys/fs/bcache/*/cache0/priority_stats > /dev/null; done
3) Monitor the read/write workload for any performance impact
[Fix]
To fix CPU contention and performance impact, a cond_resched() call is introduced in the priority_stats sort comparison.
[Regression Potential]
Regression potential is low, as the change is confined to the priority_stats sysfs query. In cases where frequent queries to bcache priority_stats take place (e.g. node_exporter), the impact should be more noticeable as those could now take a bit longer to complete. A regression due to this patch would most likely show up as a performance degradation in bcache-focused workloads.
--
[Description]
In the latest bcache drivers, there's a sysfs attribute that calculates bucket priority statistics in /sys/fs/bcache/*/cache0/priority_stats. Querying this file has a big performance impact on tasks that run in the same CPU, and also affects read/write performance of the bcache device itself.
This is due to the way the driver calculates the stats: the bcache buckets are locked and iterated through, collecting information about each individual bucket. An array of nbucket elements is constructed and sorted afterwards, which can cause very high CPU contention in cases of larger bcache setups.
From our tests, the sorting step of the priority_stats query causes the most expressive performance reduction, as it can hinder tasks that are not even doing any bcache IO. If a task is "unlucky" to be scheduled in the same CPU as the sysfs query, its performance will be harshly reduced as both compete for CPU time. We've had users report systems stalls of up to ~6s due to this, as a result from monitoring tools that query the priority_stats periodically (e.g. Prometheus Node Exporter from [0]). These system stalls have triggered several other issues such as ceph-mon re-elections, problems in percona-cluster and general network stalls, so the impact is not isolated to bcache IO workloads.
An example benchmark can be seen in [1], where the read performance on a bcache device suffered quite heavily (going from ~40k IOPS to ~4k IOPS due to priority_stats). Other comparison charts are found under [2].
[0] https://github.com/prometheus/node_exporter
[1] https://people.canonical.com/~halves/priority_stats/read/4k-iops-2Dsmooth.png
[2] https://people.canonical.com/~halves/priority_stats/ |
|
2019-10-10 20:16:42 |
Heitor Alves de Siqueira |
linux: status |
In Progress |
Fix Committed |
|
2019-10-10 20:20:28 |
Heitor Alves de Siqueira |
linux (Ubuntu): status |
Confirmed |
In Progress |
|
2019-10-17 14:59:45 |
Kleber Sacilotto de Souza |
nominated for series |
|
Ubuntu Eoan |
|
2019-10-17 14:59:45 |
Kleber Sacilotto de Souza |
bug task added |
|
linux (Ubuntu Eoan) |
|
2019-10-17 14:59:45 |
Kleber Sacilotto de Souza |
nominated for series |
|
Ubuntu Disco |
|
2019-10-17 14:59:45 |
Kleber Sacilotto de Souza |
bug task added |
|
linux (Ubuntu Disco) |
|
2019-10-17 14:59:45 |
Kleber Sacilotto de Souza |
nominated for series |
|
Ubuntu Xenial |
|
2019-10-17 14:59:45 |
Kleber Sacilotto de Souza |
bug task added |
|
linux (Ubuntu Xenial) |
|
2019-10-17 14:59:45 |
Kleber Sacilotto de Souza |
nominated for series |
|
Ubuntu Bionic |
|
2019-10-17 14:59:45 |
Kleber Sacilotto de Souza |
bug task added |
|
linux (Ubuntu Bionic) |
|
2019-10-17 15:05:27 |
Kleber Sacilotto de Souza |
linux (Ubuntu Disco): assignee |
|
Heitor Alves de Siqueira (halves) |
|
2019-10-17 15:05:33 |
Kleber Sacilotto de Souza |
linux (Ubuntu Bionic): assignee |
|
Heitor Alves de Siqueira (halves) |
|
2019-10-17 15:05:39 |
Kleber Sacilotto de Souza |
linux (Ubuntu Xenial): assignee |
|
Heitor Alves de Siqueira (halves) |
|
2019-10-17 15:13:14 |
Kleber Sacilotto de Souza |
linux (Ubuntu Xenial): status |
New |
Fix Committed |
|
2019-10-17 15:13:16 |
Kleber Sacilotto de Souza |
linux (Ubuntu Bionic): status |
New |
Fix Committed |
|
2019-10-17 15:13:18 |
Kleber Sacilotto de Souza |
linux (Ubuntu Disco): status |
New |
Fix Committed |
|
2019-10-17 15:13:21 |
Kleber Sacilotto de Souza |
linux (Ubuntu Eoan): status |
In Progress |
Fix Committed |
|
2019-10-22 15:02:17 |
Ubuntu Kernel Bot |
tags |
canonical-bootstack sts |
canonical-bootstack sts verification-needed-disco |
|
2019-10-22 15:47:40 |
Ubuntu Kernel Bot |
tags |
canonical-bootstack sts verification-needed-disco |
canonical-bootstack sts verification-needed-bionic verification-needed-disco |
|
2019-10-22 16:04:32 |
Ubuntu Kernel Bot |
tags |
canonical-bootstack sts verification-needed-bionic verification-needed-disco |
canonical-bootstack sts verification-needed-bionic verification-needed-disco verification-needed-xenial |
|
2019-10-24 16:03:17 |
Ubuntu Kernel Bot |
tags |
canonical-bootstack sts verification-needed-bionic verification-needed-disco verification-needed-xenial |
canonical-bootstack sts verification-needed-bionic verification-needed-disco verification-needed-eoan verification-needed-xenial |
|
2019-10-25 15:56:30 |
Heitor Alves de Siqueira |
tags |
canonical-bootstack sts verification-needed-bionic verification-needed-disco verification-needed-eoan verification-needed-xenial |
canonical-bootstack sts verification-done-xenial verification-needed-bionic verification-needed-disco verification-needed-eoan |
|
2019-10-26 09:21:04 |
Heitor Alves de Siqueira |
tags |
canonical-bootstack sts verification-done-xenial verification-needed-bionic verification-needed-disco verification-needed-eoan |
canonical-bootstack sts verification-done-bionic verification-done-xenial verification-needed-disco verification-needed-eoan |
|
2019-10-26 11:48:31 |
Heitor Alves de Siqueira |
tags |
canonical-bootstack sts verification-done-bionic verification-done-xenial verification-needed-disco verification-needed-eoan |
canonical-bootstack sts verification-done-bionic verification-done-disco verification-done-xenial verification-needed-eoan |
|
2019-10-26 12:51:23 |
Heitor Alves de Siqueira |
tags |
canonical-bootstack sts verification-done-bionic verification-done-disco verification-done-xenial verification-needed-eoan |
canonical-bootstack sts verification-done-bionic verification-done-disco verification-done-eoan verification-done-xenial |
|
2019-11-12 22:18:04 |
Launchpad Janitor |
linux (Ubuntu Eoan): status |
Fix Committed |
Fix Released |
|
2019-11-12 22:18:04 |
Launchpad Janitor |
cve linked |
|
2018-12207 |
|
2019-11-12 22:18:04 |
Launchpad Janitor |
cve linked |
|
2019-0154 |
|
2019-11-12 22:18:04 |
Launchpad Janitor |
cve linked |
|
2019-0155 |
|
2019-11-12 22:18:04 |
Launchpad Janitor |
cve linked |
|
2019-11135 |
|
2019-11-12 22:18:04 |
Launchpad Janitor |
cve linked |
|
2019-15793 |
|
2019-11-12 22:18:04 |
Launchpad Janitor |
cve linked |
|
2019-17666 |
|
2019-11-12 22:21:40 |
Launchpad Janitor |
linux (Ubuntu Disco): status |
Fix Committed |
Fix Released |
|
2019-11-12 22:21:40 |
Launchpad Janitor |
cve linked |
|
2019-15098 |
|
2019-11-12 22:21:40 |
Launchpad Janitor |
cve linked |
|
2019-17052 |
|
2019-11-12 22:21:40 |
Launchpad Janitor |
cve linked |
|
2019-17053 |
|
2019-11-12 22:21:40 |
Launchpad Janitor |
cve linked |
|
2019-17054 |
|
2019-11-12 22:21:40 |
Launchpad Janitor |
cve linked |
|
2019-17055 |
|
2019-11-12 22:21:40 |
Launchpad Janitor |
cve linked |
|
2019-17056 |
|
2019-11-12 22:24:59 |
Launchpad Janitor |
linux (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2019-11-12 22:33:58 |
Launchpad Janitor |
linux (Ubuntu Xenial): status |
Fix Committed |
Fix Released |
|
2019-12-06 15:57:44 |
Launchpad Janitor |
linux (Ubuntu): status |
Fix Committed |
Fix Released |
|
2019-12-06 15:57:44 |
Launchpad Janitor |
cve linked |
|
2019-15794 |
|
2020-07-15 21:11:53 |
Guilherme G. Piccoli |
linux: status |
Fix Committed |
Fix Released |
|
2021-01-27 18:59:35 |
Eric Desrochers |
bug |
|
|
added subscriber Eric Desrochers |