2019-12-06 08:02:19 |
Przemyslaw Hausman |
bug |
|
|
added bug |
2019-12-06 08:08:47 |
Przemyslaw Hausman |
attachment added |
|
perf-report.txt https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+attachment/5310185/+files/perf-report.txt |
|
2019-12-06 08:30:07 |
Ubuntu Kernel Bot |
linux (Ubuntu): status |
New |
Incomplete |
|
2019-12-06 08:30:09 |
Ubuntu Kernel Bot |
tags |
|
bionic |
|
2019-12-09 18:36:21 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Disco |
|
2019-12-09 18:36:21 |
Guilherme G. Piccoli |
bug task added |
|
linux (Ubuntu Disco) |
|
2019-12-09 18:36:21 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Focal |
|
2019-12-09 18:36:21 |
Guilherme G. Piccoli |
bug task added |
|
linux (Ubuntu Focal) |
|
2019-12-09 18:36:21 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Bionic |
|
2019-12-09 18:36:21 |
Guilherme G. Piccoli |
bug task added |
|
linux (Ubuntu Bionic) |
|
2019-12-09 18:36:21 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Xenial |
|
2019-12-09 18:36:21 |
Guilherme G. Piccoli |
bug task added |
|
linux (Ubuntu Xenial) |
|
2019-12-09 18:36:21 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Eoan |
|
2019-12-09 18:36:21 |
Guilherme G. Piccoli |
bug task added |
|
linux (Ubuntu Eoan) |
|
2019-12-09 18:36:33 |
Guilherme G. Piccoli |
linux (Ubuntu Xenial): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2019-12-09 18:36:36 |
Guilherme G. Piccoli |
linux (Ubuntu Bionic): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2019-12-09 18:36:39 |
Guilherme G. Piccoli |
linux (Ubuntu Disco): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2019-12-09 18:36:40 |
Guilherme G. Piccoli |
linux (Ubuntu Eoan): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2019-12-09 18:36:43 |
Guilherme G. Piccoli |
linux (Ubuntu Focal): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2019-12-09 18:36:59 |
Guilherme G. Piccoli |
linux (Ubuntu Focal): status |
Incomplete |
New |
|
2019-12-09 18:37:05 |
Guilherme G. Piccoli |
linux (Ubuntu Disco): status |
New |
Confirmed |
|
2019-12-09 18:37:09 |
Guilherme G. Piccoli |
linux (Ubuntu Bionic): status |
New |
Confirmed |
|
2019-12-09 18:37:24 |
Guilherme G. Piccoli |
linux (Ubuntu Xenial): status |
New |
Invalid |
|
2019-12-09 19:00:09 |
Ubuntu Kernel Bot |
linux (Ubuntu): status |
New |
Incomplete |
|
2019-12-09 19:00:13 |
Ubuntu Kernel Bot |
linux (Ubuntu Eoan): status |
New |
Incomplete |
|
2019-12-17 17:38:33 |
Guilherme G. Piccoli |
linux (Ubuntu Focal): status |
Incomplete |
Fix Released |
|
2019-12-17 17:38:36 |
Guilherme G. Piccoli |
linux (Ubuntu Eoan): status |
Incomplete |
Fix Released |
|
2019-12-18 14:41:51 |
Guilherme G. Piccoli |
tags |
bionic |
bionic disco sts |
|
2019-12-18 18:47:16 |
Guilherme G. Piccoli |
description |
This bug is similar to #1832082 (bnx2x driver causes 100% CPU load) but applies for qede driver instead of bnx2x. The symptoms are the same:
With chrony installed, and configured with "hwtimestamp *", I observe 100% CPU load on 2 CPU cores.
Running perf report shows that kernel is busy executing qede_ptp_task function in qede driver.
A workaround is to disable "hwtimestamp *" in chrony configuration.
---
$ modinfo qede
filename: /lib/modules/4.15.0-72-generic/kernel/drivers/net/ethernet/qlogic/qede/qede.ko
version: 8.10.10.21
license: GPL
description: QLogic FastLinQ 4xxxx Ethernet Driver
srcversion: D5EC89D815FC81B973EE9F0
alias: pci:v00001077d00008090sv*sd*bc*sc*i*
alias: pci:v00001077d00008070sv*sd*bc*sc*i*
alias: pci:v00001077d00001664sv*sd*bc*sc*i*
alias: pci:v00001077d00001656sv*sd*bc*sc*i*
alias: pci:v00001077d00001654sv*sd*bc*sc*i*
alias: pci:v00001077d00001644sv*sd*bc*sc*i*
alias: pci:v00001077d00001636sv*sd*bc*sc*i*
alias: pci:v00001077d00001666sv*sd*bc*sc*i*
alias: pci:v00001077d00001634sv*sd*bc*sc*i*
depends: ptp,qed
retpoline: Y
intree: Y
name: qede
vermagic: 4.15.0-72-generic SMP mod_unload
signat: PKCS#7
signer:
sig_key:
sig_hashalgo: md4
parm: debug: Default debug msglevel (uint)
$ uname -a
Linux dcn1-clm-inf-1 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ lspci | grep -i ether
19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
# perf report snippet:
Children Self Command Shared Object
- 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
ret_from_fork
- kthread
- 44.74% worker_thread
- 44.57% process_one_work
- 42.67% qede_ptp_task
- 38.86% qed_ptp_hw_read_tx_ts
qed_rd
- 3.03% queue_work_on
- 2.06% __queue_work
- 0.68% get_work_pool
- 0.61% radix_tree_lookup
__radix_tree_lookup
0.50% set_work_pool_and_clear_pending |
[Impact]
* The PTP feature in qede driver is implemented in a way that if the NIC firmware takes some time to perform the timestamping then the PTP worker function will reschedule itself indefinitely until the value read from a device register is meaningful. With that behavior, if an userspace tool requests a bad configured TX/RX filter (or if NIC firmware has any other issue in timestamping), the function qede_ptp_task() will reschedule itself forever and cause an unbound resource consumption. This manifests as a kworker thread consuming 100% of CPU.
* The dmesg log will show a message like this: "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"
Also, by using perf user can observe a stack like the following:
- 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
ret_from_fork
- kthread
- 44.74% worker_thread
- 44.57% process_one_work
- 42.67% qede_ptp_task
- 38.86% qed_ptp_hw_read_tx_ts
qed_rd
- 3.03% queue_work_on
- 2.06% __queue_work
- 0.68% get_work_pool
- 0.61% radix_tree_lookup
__radix_tree_lookup
0.50% set_work_pool_and_clear_pending
* The patch proposed in this SRU request refactors the PTP worked in qede by adding a time limit, after which the task doesn't reschedule itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede: Handle infinite driver spinning for Tx timestamp.") http://git.kernel.org/linus/9adebac37e7d
Besides fixing the issue, it also adds an ethtool statistics for accounting the PTP errors.
[Test case]
By using chrony in Bionic, the following steps will reproduce the issue:
a) Install chrony on Bionic in a system with working NIC managed by qede;
b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf file;
c) Restart chrony service
Check dmesg for the "[...]Timestamping in progress" message and the overall CPU workload using a tool like "top" to observe a kthread consuming 100% of CPU.
[Regression potential]
The patch scope is restricted to qede PTP handler, and is upstream for more than 7 months. If there's any possibility of regressions, the worst would be an issue affecting the packet timestamping, not messing with the regular xmit path of the driver. |
|
2020-01-07 12:59:52 |
Stefan Bader |
linux (Ubuntu Bionic): importance |
Undecided |
Medium |
|
2020-01-07 12:59:59 |
Stefan Bader |
linux (Ubuntu Disco): importance |
Undecided |
Medium |
|
2020-01-07 13:07:35 |
Kleber Sacilotto de Souza |
linux (Ubuntu Bionic): status |
Confirmed |
Fix Committed |
|
2020-01-07 13:07:37 |
Kleber Sacilotto de Souza |
linux (Ubuntu Disco): status |
Confirmed |
Fix Committed |
|
2020-01-10 18:03:10 |
Ubuntu Kernel Bot |
tags |
bionic disco sts |
bionic disco sts verification-needed-disco |
|
2020-01-24 13:06:38 |
Guilherme G. Piccoli |
tags |
bionic disco sts verification-needed-disco |
bionic disco sts verification-done-disco |
|
2020-01-27 13:21:23 |
Launchpad Janitor |
linux (Ubuntu Disco): status |
Fix Committed |
Fix Released |
|
2020-01-27 13:21:23 |
Launchpad Janitor |
cve linked |
|
2019-14615 |
|
2020-01-27 13:21:23 |
Launchpad Janitor |
cve linked |
|
2019-18885 |
|
2020-01-27 13:21:23 |
Launchpad Janitor |
cve linked |
|
2019-19050 |
|
2020-01-27 13:21:23 |
Launchpad Janitor |
cve linked |
|
2019-19077 |
|
2020-01-27 13:21:23 |
Launchpad Janitor |
cve linked |
|
2019-19078 |
|
2020-01-27 13:21:23 |
Launchpad Janitor |
cve linked |
|
2019-19082 |
|
2020-01-27 13:21:23 |
Launchpad Janitor |
cve linked |
|
2019-19332 |
|
2020-01-27 13:21:23 |
Launchpad Janitor |
cve linked |
|
2020-7053 |
|
2020-02-03 23:12:21 |
Ubuntu Kernel Bot |
tags |
bionic disco sts verification-done-disco |
bionic disco sts verification-done-disco verification-needed-bionic |
|
2020-02-13 03:17:08 |
Khaled El Mously |
tags |
bionic disco sts verification-done-disco verification-needed-bionic |
bionic disco sts verification-done-bionic verification-done-disco |
|
2020-02-17 10:36:02 |
Launchpad Janitor |
linux (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2020-02-17 10:36:02 |
Launchpad Janitor |
cve linked |
|
2019-20096 |
|
2020-02-17 10:36:02 |
Launchpad Janitor |
cve linked |
|
2019-5108 |
|