qede driver causes 100% CPU load
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Guilherme G. Piccoli | ||
Xenial |
Invalid
|
Undecided
|
Guilherme G. Piccoli | ||
Bionic |
Fix Released
|
Medium
|
Guilherme G. Piccoli | ||
Disco |
Fix Released
|
Medium
|
Guilherme G. Piccoli | ||
Eoan |
Fix Released
|
Undecided
|
Guilherme G. Piccoli | ||
Focal |
Fix Released
|
Undecided
|
Guilherme G. Piccoli |
Bug Description
[Impact]
* The PTP feature in qede driver is implemented in a way that if the NIC firmware takes some time to perform the timestamping then the PTP worker function will reschedule itself indefinitely until the value read from a device register is meaningful. With that behavior, if an userspace tool requests a bad configured TX/RX filter (or if NIC firmware has any other issue in timestamping), the function qede_ptp_task() will reschedule itself forever and cause an unbound resource consumption. This manifests as a kworker thread consuming 100% of CPU.
* The dmesg log will show a message like this: "qede_ptp_
Also, by using perf user can observe a stack like the following:
- 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
ret_from_fork
- kthread
- 44.74% worker_thread
- 44.57% process_one_work
- 42.67% qede_ptp_task
- 38.86% qed_ptp_
- 3.03% queue_work_on
- 2.06% __queue_work
0.50% set_work_
* The patch proposed in this SRU request refactors the PTP worked in qede by adding a time limit, after which the task doesn't reschedule itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede: Handle infinite driver spinning for Tx timestamp.") http://
Besides fixing the issue, it also adds an ethtool statistics for accounting the PTP errors.
[Test case]
By using chrony in Bionic, the following steps will reproduce the issue:
a) Install chrony on Bionic in a system with working NIC managed by qede;
b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf file;
c) Restart chrony service
Check dmesg for the "[...]Timestamping in progress" message and the overall CPU workload using a tool like "top" to observe a kthread consuming 100% of CPU.
[Regression potential]
The patch scope is restricted to qede PTP handler, and is upstream for more than 7 months. If there's any possibility of regressions, the worst would be an issue affecting the packet timestamping, not messing with the regular xmit path of the driver.
Changed in linux (Ubuntu Xenial): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
Changed in linux (Ubuntu Bionic): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
Changed in linux (Ubuntu Disco): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
Changed in linux (Ubuntu Eoan): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
Changed in linux (Ubuntu Focal): | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
status: | Incomplete → New |
Changed in linux (Ubuntu Disco): | |
status: | New → Confirmed |
Changed in linux (Ubuntu Bionic): | |
status: | New → Confirmed |
Changed in linux (Ubuntu Xenial): | |
status: | New → Invalid |
Changed in linux (Ubuntu Focal): | |
status: | Incomplete → Fix Released |
Changed in linux (Ubuntu Eoan): | |
status: | Incomplete → Fix Released |
description: | updated |
Changed in linux (Ubuntu Bionic): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Disco): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Bionic): | |
status: | Confirmed → Fix Committed |
Changed in linux (Ubuntu Disco): | |
status: | Confirmed → Fix Committed |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1855409
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.