2019-06-08 10:36:03 |
Przemyslaw Hausman |
bug |
|
|
added bug |
2019-06-08 10:36:18 |
Przemyslaw Hausman |
information type |
Public |
Private |
|
2019-06-19 20:50:13 |
Przemyslaw Hausman |
information type |
Private |
Public |
|
2019-06-19 21:00:06 |
Ubuntu Kernel Bot |
linux (Ubuntu): status |
New |
Incomplete |
|
2019-06-19 21:00:07 |
Ubuntu Kernel Bot |
tags |
|
bionic |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Cosmic |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
bug task added |
|
linux (Ubuntu Cosmic) |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Disco |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
bug task added |
|
linux (Ubuntu Disco) |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Eoan |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
bug task added |
|
linux (Ubuntu Eoan) |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Bionic |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
bug task added |
|
linux (Ubuntu Bionic) |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Xenial |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
bug task added |
|
linux (Ubuntu Xenial) |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
nominated for series |
|
Ubuntu Ff-series |
|
2019-06-21 11:50:57 |
Guilherme G. Piccoli |
bug task added |
|
linux (Ubuntu Ff-series) |
|
2019-06-21 11:51:10 |
Guilherme G. Piccoli |
linux (Ubuntu Eoan): status |
Incomplete |
Confirmed |
|
2019-06-21 11:51:13 |
Guilherme G. Piccoli |
linux (Ubuntu Ff-series): status |
New |
Confirmed |
|
2019-06-21 11:51:15 |
Guilherme G. Piccoli |
linux (Ubuntu Disco): status |
New |
Confirmed |
|
2019-06-21 11:51:17 |
Guilherme G. Piccoli |
linux (Ubuntu Cosmic): status |
New |
Confirmed |
|
2019-06-21 11:51:19 |
Guilherme G. Piccoli |
linux (Ubuntu Bionic): status |
New |
Confirmed |
|
2019-06-21 11:51:21 |
Guilherme G. Piccoli |
linux (Ubuntu Xenial): status |
New |
Confirmed |
|
2019-06-21 11:51:25 |
Guilherme G. Piccoli |
linux (Ubuntu Xenial): importance |
Undecided |
High |
|
2019-06-21 11:51:26 |
Guilherme G. Piccoli |
linux (Ubuntu Bionic): importance |
Undecided |
High |
|
2019-06-21 11:51:28 |
Guilherme G. Piccoli |
linux (Ubuntu Cosmic): importance |
Undecided |
High |
|
2019-06-21 11:51:29 |
Guilherme G. Piccoli |
linux (Ubuntu Disco): importance |
Undecided |
High |
|
2019-06-21 11:51:31 |
Guilherme G. Piccoli |
linux (Ubuntu Eoan): importance |
Undecided |
High |
|
2019-06-21 11:51:32 |
Guilherme G. Piccoli |
linux (Ubuntu Ff-series): importance |
Undecided |
High |
|
2019-06-21 11:51:35 |
Guilherme G. Piccoli |
linux (Ubuntu Xenial): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2019-06-21 11:51:37 |
Guilherme G. Piccoli |
linux (Ubuntu Bionic): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2019-06-21 11:51:38 |
Guilherme G. Piccoli |
linux (Ubuntu Cosmic): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2019-06-21 11:51:40 |
Guilherme G. Piccoli |
linux (Ubuntu Ff-series): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2019-06-21 11:51:41 |
Guilherme G. Piccoli |
linux (Ubuntu Eoan): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2019-06-21 11:51:44 |
Guilherme G. Piccoli |
linux (Ubuntu Disco): assignee |
|
Guilherme G. Piccoli (gpiccoli) |
|
2019-06-21 11:52:00 |
Guilherme G. Piccoli |
tags |
bionic |
bnx2x sts |
|
2019-06-21 11:54:14 |
Guilherme G. Piccoli |
description |
For the customer OpenStack deployment we deploy infra nodes on Dell R630 servers. The servers have onboard Broadcom's NetXtreme II BCM57800 NIC (quad port: 2x1G ports, 2x10G ports). For each port in UP state, we observe 100% CPU load. So in total, we observe 4 CPUs with 100% load.
perf report shows function bnx2x_ptp_task taking up much of the CPUs time: https://pastebin.canonical.com/p/kfrpd6Pwh5/
Also, /var/log/syslog contains the following outputs every few seconds:
[1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
[1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
[1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
[1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
So, the problem seems to be in a "timestampped" TX packet; the driver for some reason (to be yet understood) get an unexpected value from a register and then, it that same function, reschedule itself to try again this register read, read gets a bad value again, and so on infinitely.
This is showing in the system as the 100% CPU usage kthreads; the message "The device supports only a single outstanding packet to timestamp, this packet will not be timestamped" happens because the driver can only timestamp a single TX packet at a time, and given it's stuck trying, it cannot accept another packet in this "queue".
The infinite loop appears to be:
static void bnx2x_ptp_task(struct work_struct *work)
{
struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task);
int port = BP_PORT(bp);
u32 val_seq;
u64 timestamp, ns;
struct skb_shared_hwtstamps shhwtstamps;
/* Read Tx timestamp registers */
val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID :
NIG_REG_P0_TLLH_PTP_BUF_SEQID);
if (val_seq & 0x10000) {
[...]
} else {
DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n");
/* Reschedule to keep checking for a valid timestamp value */
schedule_work(&bp->ptp_task);
}
It appears that val_seq & 0x10000 is never true, so the task constantly reschedules itself immediately. Instrumenting the function shows that it is being called in excess of 100,000 times per second. The REG_RD call does appear to be expensive (as it's a register read from the device) and shows high in the perf report, but that by itself doesn't appear to be the root cause (i.e., it's not hanging forever in the REG_RD).
The cause appears to be that the driver is not prepared to deal with the PTP request never being completed by the hardware. It's unclear why it isn't completing, but regardless, the driver should not loop forever here.
Additional info:
ubuntu@infra-1:~$ uname -a
Linux infra-1 4.15.0-50-generic #54-Ubuntu SMP Mon May 6 18:46:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Lin
ubuntu@infra-1:~$ lspci | grep Broadcom
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
01:00.2 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
01:00.3 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
ubuntu@infra-1:~$ lspci -n | grep 01:00
01:00.0 0200: 14e4:168a (rev 10)
01:00.1 0200: 14e4:168a (rev 10)
01:00.2 0200: 14e4:168a (rev 10)
01:00.3 0200: 14e4:168a (rev 10)
ubuntu@infra-1:~/deploy$ sudo lshw -c network
*-network:0
description: Ethernet interface
product: NetXtreme II BCM57800 1/10 Gigabit Ethernet
vendor: Broadcom Inc. and subsidiaries
physical id: 0
bus info: pci@0000:01:00.0
logical name: eno1
version: 10
serial: 42:39:92:e0:66:b6
size: 10Gbit/s
capacity: 10Gbit/s
width: 64 bits
clock: 33MHz
capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 100bt 100bt-fd 1000bt-fd 10000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=bnx2x driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 phy 1.45 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s
resources: irq:79 memory:95000000-957fffff memory:95800000-95ffffff memory:96030000-9603ffff memory:91a00000-91a7ffff
*-network:1
description: Ethernet interface
product: NetXtreme II BCM57800 1/10 Gigabit Ethernet
vendor: Broadcom Inc. and subsidiaries
physical id: 0.1
bus info: pci@0000:01:00.1
logical name: eno2
version: 10
serial: 42:39:92:e0:66:b6
size: 10Gbit/s
capacity: 10Gbit/s
width: 64 bits
clock: 33MHz
capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 100bt 100bt-fd 1000bt-fd 10000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=bnx2x driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 phy 1.45 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s
resources: irq:90 memory:94000000-947fffff memory:94800000-94ffffff memory:96020000-9602ffff memory:91a80000-91afffff
*-network:2
description: Ethernet interface
product: NetXtreme II BCM57800 1/10 Gigabit Ethernet
vendor: Broadcom Inc. and subsidiaries
physical id: 0.2
bus info: pci@0000:01:00.2
logical name: eno3
version: 10
serial: 52:f2:aa:63:a5:3c
size: 1Gbit/s
capacity: 1Gbit/s
width: 64 bits
clock: 33MHz
capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=bnx2x driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=1Gbit/s
resources: irq:90 memory:93000000-937fffff memory:93800000-93ffffff memory:96010000-9601ffff memory:91b00000-91b7ffff
*-network:3
description: Ethernet interface
product: NetXtreme II BCM57800 1/10 Gigabit Ethernet
vendor: Broadcom Inc. and subsidiaries
physical id: 0.3
bus info: pci@0000:01:00.3
logical name: eno4
version: 10
serial: 52:f2:aa:63:a5:3c
size: 1Gbit/s
capacity: 1Gbit/s
width: 64 bits
clock: 33MHz
capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=bnx2x driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=1Gbit/s
resources: irq:111 memory:92000000-927fffff memory:92800000-92ffffff memory:96000000-9600ffff memory:91b80000-91bfffff
*-network:0
description: Ethernet interface
physical id: 3
logical name: bond1.1166
serial: 42:39:92:e0:66:b6
capabilities: ethernet physical
configuration: autonegotiation=off broadcast=yes driver=802.1Q VLAN Support driverversion=1.8 duplex=full firmware=N/A link=yes multicast=yes
*-network:1
description: Ethernet interface
physical id: 4
logical name: bond1
serial: 42:39:92:e0:66:b6
capabilities: ethernet physical
configuration: autonegotiation=off broadcast=yes driver=bonding driverversion=3.7.1 duplex=full firmware=2 link=yes master=yes multicast=yes
*-network:2
description: Ethernet interface
physical id: 5
logical name: broam
serial: 36:76:ae:d3:1d:3b
capabilities: ethernet physical
configuration: broadcast=yes driver=bridge driverversion=2.3 firmware=N/A ip=10.246.65.10 link=yes multicast=yes
*-network:3
description: Ethernet interface
physical id: 6
logical name: brinternal
serial: ce:27:22:0d:8b:d1
capabilities: ethernet physical
configuration: broadcast=yes driver=bridge driverversion=2.3 firmware=N/A ip=10.246.66.10 link=yes multicast=yes
*-network:4
description: Ethernet interface
physical id: 7
logical name: bond1.1171
serial: 42:39:92:e0:66:b6
capabilities: ethernet physical
configuration: autonegotiation=off broadcast=yes driver=802.1Q VLAN Support driverversion=1.8 duplex=full firmware=N/A link=yes multicast=yes
*-network:5
description: Ethernet interface
physical id: 8
logical name: bond0
serial: 52:f2:aa:63:a5:3c
capabilities: ethernet physical
configuration: autonegotiation=off broadcast=yes driver=bonding driverversion=3.7.1 duplex=full firmware=2 link=yes master=yes multicast=yes
*-network:6
description: Ethernet interface
physical id: 9
logical name: brexternal
serial: 5e:e0:5c:1f:da:01
capabilities: ethernet physical
configuration: broadcast=yes driver=bridge driverversion=2.3 firmware=N/A ip=10.246.71.10 link=yes multicast=yes
ubuntu@infra-1:~$ modinfo bnx2x
filename: /lib/modules/4.15.0-50-generic/kernel/drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko
firmware: bnx2x/bnx2x-e2-7.13.1.0.fw
firmware: bnx2x/bnx2x-e1h-7.13.1.0.fw
firmware: bnx2x/bnx2x-e1-7.13.1.0.fw
version: 1.712.30-0
license: GPL
description: QLogic BCM57710/57711/57711E/57712/57712_MF/57800/57800_MF/57810/57810_MF/57840/57840_MF Driver
author: Eliezer Tamir
srcversion: 5338D57FE057310DCD66774
alias: pci:v000014E4d0000163Fsv*sd*bc*sc*i*
alias: pci:v000014E4d0000163Esv*sd*bc*sc*i*
alias: pci:v000014E4d0000163Dsv*sd*bc*sc*i*
alias: pci:v00001077d000016ADsv*sd*bc*sc*i*
alias: pci:v000014E4d000016ADsv*sd*bc*sc*i*
alias: pci:v00001077d000016A4sv*sd*bc*sc*i*
alias: pci:v000014E4d000016A4sv*sd*bc*sc*i*
alias: pci:v000014E4d000016ABsv*sd*bc*sc*i*
alias: pci:v000014E4d000016AFsv*sd*bc*sc*i*
alias: pci:v000014E4d000016A2sv*sd*bc*sc*i*
alias: pci:v00001077d000016A1sv*sd*bc*sc*i*
alias: pci:v000014E4d000016A1sv*sd*bc*sc*i*
alias: pci:v000014E4d0000168Dsv*sd*bc*sc*i*
alias: pci:v000014E4d000016AEsv*sd*bc*sc*i*
alias: pci:v000014E4d0000168Esv*sd*bc*sc*i*
alias: pci:v000014E4d000016A9sv*sd*bc*sc*i*
alias: pci:v000014E4d000016A5sv*sd*bc*sc*i*
alias: pci:v000014E4d0000168Asv*sd*bc*sc*i*
alias: pci:v000014E4d0000166Fsv*sd*bc*sc*i*
alias: pci:v000014E4d00001663sv*sd*bc*sc*i*
alias: pci:v000014E4d00001662sv*sd*bc*sc*i*
alias: pci:v000014E4d00001650sv*sd*bc*sc*i*
alias: pci:v000014E4d0000164Fsv*sd*bc*sc*i*
alias: pci:v000014E4d0000164Esv*sd*bc*sc*i*
depends: mdio,libcrc32c,ptp
retpoline: Y
intree: Y
name: bnx2x
vermagic: 4.15.0-50-generic SMP mod_unload
signat: PKCS#7
signer:
sig_key:
sig_hashalgo: md4
parm: num_queues: Set number of queues (default is as a number of CPUs) (int)
parm: disable_tpa: Disable the TPA (LRO) feature (int)
parm: int_mode: Force interrupt mode other than MSI-X (1 INT#x; 2 MSI) (int)
parm: dropless_fc: Pause on exhausted host ring (int)
parm: mrrs: Force Max Read Req Size (0..3) (for debug) (int)
parm: debug: Default debug msglevel (int) |
For the customer OpenStack deployment we deploy infra nodes on Dell R630 servers. The servers have onboard Broadcom's NetXtreme II BCM57800 NIC (quad port: 2x1G ports, 2x10G ports). For each port in UP state, we observe 100% CPU load. So in total, we observe 4 CPUs with 100% load.
perf report shows function bnx2x_ptp_task taking up much of the CPUs time: https://pastebin.canonical.com/p/kfrpd6Pwh5/
Also, /var/log/syslog contains the following outputs every few seconds:
[1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
[1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
[1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
[1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
So, the problem seems to be in a "timestampped" TX packet; the driver for some reason (to be yet understood) get an unexpected value from a register and then, it that same function, reschedule itself to try again this register read, read gets a bad value again, and so on infinitely.
This is showing in the system as the 100% CPU usage kthreads; the message "The device supports only a single outstanding packet to timestamp, this packet will not be timestamped" happens because the driver can only timestamp a single TX packet at a time, and given it's stuck trying, it cannot accept another packet in this "queue".
The infinite loop appears to be:
static void bnx2x_ptp_task(struct work_struct *work)
{
struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task);
int port = BP_PORT(bp);
u32 val_seq;
u64 timestamp, ns;
struct skb_shared_hwtstamps shhwtstamps;
/* Read Tx timestamp registers */
val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID :
NIG_REG_P0_TLLH_PTP_BUF_SEQID);
if (val_seq & 0x10000) {
[...]
} else {
DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n");
/* Reschedule to keep checking for a valid timestamp value */
schedule_work(&bp->ptp_task);
}
It appears that val_seq & 0x10000 is never true, so the task constantly reschedules itself immediately. Instrumenting the function shows that it is being called in excess of 100,000 times per second. The REG_RD call does appear to be expensive (as it's a register read from the device) and shows high in the perf report, but that by itself doesn't appear to be the root cause (i.e., it's not hanging forever in the REG_RD).
The cause appears to be that the driver is not prepared to deal with the PTP request never being completed by the hardware. It's unclear why it isn't completing, but regardless, the driver should not loop forever here. |
|
2019-06-21 11:54:37 |
Guilherme G. Piccoli |
attachment added |
|
system_details.txt https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+attachment/5272083/+files/system_details.txt |
|
2019-06-21 11:55:06 |
Guilherme G. Piccoli |
bug |
|
|
added subscriber Guilherme G. Piccoli |
2019-07-03 17:57:53 |
Guilherme G. Piccoli |
description |
For the customer OpenStack deployment we deploy infra nodes on Dell R630 servers. The servers have onboard Broadcom's NetXtreme II BCM57800 NIC (quad port: 2x1G ports, 2x10G ports). For each port in UP state, we observe 100% CPU load. So in total, we observe 4 CPUs with 100% load.
perf report shows function bnx2x_ptp_task taking up much of the CPUs time: https://pastebin.canonical.com/p/kfrpd6Pwh5/
Also, /var/log/syslog contains the following outputs every few seconds:
[1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
[1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
[1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
[1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped
So, the problem seems to be in a "timestampped" TX packet; the driver for some reason (to be yet understood) get an unexpected value from a register and then, it that same function, reschedule itself to try again this register read, read gets a bad value again, and so on infinitely.
This is showing in the system as the 100% CPU usage kthreads; the message "The device supports only a single outstanding packet to timestamp, this packet will not be timestamped" happens because the driver can only timestamp a single TX packet at a time, and given it's stuck trying, it cannot accept another packet in this "queue".
The infinite loop appears to be:
static void bnx2x_ptp_task(struct work_struct *work)
{
struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task);
int port = BP_PORT(bp);
u32 val_seq;
u64 timestamp, ns;
struct skb_shared_hwtstamps shhwtstamps;
/* Read Tx timestamp registers */
val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID :
NIG_REG_P0_TLLH_PTP_BUF_SEQID);
if (val_seq & 0x10000) {
[...]
} else {
DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n");
/* Reschedule to keep checking for a valid timestamp value */
schedule_work(&bp->ptp_task);
}
It appears that val_seq & 0x10000 is never true, so the task constantly reschedules itself immediately. Instrumenting the function shows that it is being called in excess of 100,000 times per second. The REG_RD call does appear to be expensive (as it's a register read from the device) and shows high in the perf report, but that by itself doesn't appear to be the root cause (i.e., it's not hanging forever in the REG_RD).
The cause appears to be that the driver is not prepared to deal with the PTP request never being completed by the hardware. It's unclear why it isn't completing, but regardless, the driver should not loop forever here. |
[Impact]
* The PTP feature in bnx2x driver is implemented in a way that if the NIC firmware takes some time to perform the timestamping - which is observed as a bad register read in bnx2x_ptp_task() - then the ptp worker function will reschedule itself indefinitely until the value read from the register is meaningful. With that behavior, if an userspace tool request a bad configured RX filter to bnx2x (or if NIC firmware has any other issue in timestamping), the function bnx2x_ptp_task() will be rescheduled forever and cause a unbound resource consumption. This manifests as a kworker thread consuming 100% of CPU.
* The dmesg log will show the following message regarding other packets being skipped on timestamp routine due to a packet getting stuck in the timestamping "pipeline":
"bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
outstanding packet to timestamp, this packet will not be timestamped"
Also, by using ftrace user can notice that function bnx2x_ptp_task() is being called a lot, and by enabling bnx2x PTP debugging log (ethtool -s <iface> msglvl 16777216) it's possible to observe the following message flooding the kernel log:
"bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp yet"
* The patch proposed in this SRU request is accepted upstream and is available currently (2019-07-03) in David Miller's linux-net tree:
git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
Besides fixing the issue, it also adds an ethtool statistics for accounting the ptp errors and reduces message flooding in case of errors.
[Test case]
Reproducing the problem is not difficult; we've used chrony in Bionic to trigger the problem. The steps are:
a) Install chrony on Bionic in a system with working NIC managed by bnx2x;
b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf file;
c) Restart chrony service
Check dmesg for the "[...]single outstanding packet" message and the overall CPU workload using a tool like "top" to observe a kthread consuming 100% of CPU.
[Regression potential]
The patch scope is restricted to bnx2x ptp handler, and was validated by the driver maintainer. If there's any possibility of regressions, we believe the worst would be an issue affecting the packet timestamping, not messing with the regular xmit path for the driver. |
|
2019-07-03 19:22:08 |
Guilherme G. Piccoli |
linux (Ubuntu Ff-series): status |
Confirmed |
Fix Committed |
|
2019-07-03 19:22:10 |
Guilherme G. Piccoli |
linux (Ubuntu Eoan): status |
Confirmed |
Fix Committed |
|
2019-07-03 19:22:12 |
Guilherme G. Piccoli |
linux (Ubuntu Disco): status |
Confirmed |
Fix Committed |
|
2019-07-03 19:22:18 |
Guilherme G. Piccoli |
linux (Ubuntu Cosmic): status |
Confirmed |
In Progress |
|
2019-07-03 19:22:20 |
Guilherme G. Piccoli |
linux (Ubuntu Eoan): status |
Fix Committed |
In Progress |
|
2019-07-03 19:22:28 |
Guilherme G. Piccoli |
linux (Ubuntu Eoan): status |
In Progress |
Fix Committed |
|
2019-07-03 19:22:32 |
Guilherme G. Piccoli |
linux (Ubuntu Disco): status |
Fix Committed |
In Progress |
|
2019-07-03 19:22:35 |
Guilherme G. Piccoli |
linux (Ubuntu Bionic): status |
Confirmed |
In Progress |
|
2019-07-03 19:22:37 |
Guilherme G. Piccoli |
linux (Ubuntu Xenial): status |
Confirmed |
In Progress |
|
2019-07-10 07:39:16 |
Stefan Bader |
linux (Ubuntu Cosmic): status |
In Progress |
Won't Fix |
|
2019-07-16 10:45:51 |
Kleber Sacilotto de Souza |
linux (Ubuntu Bionic): status |
In Progress |
Fix Committed |
|
2019-07-16 10:46:14 |
Kleber Sacilotto de Souza |
linux (Ubuntu Disco): status |
In Progress |
Fix Committed |
|
2019-07-16 10:46:54 |
Kleber Sacilotto de Souza |
linux (Ubuntu Xenial): status |
In Progress |
Fix Committed |
|
2019-07-16 20:15:55 |
Guilherme G. Piccoli |
linux (Ubuntu Ff-series): status |
Fix Committed |
Fix Released |
|
2019-07-16 20:16:40 |
Guilherme G. Piccoli |
linux (Ubuntu Ff-series): status |
Fix Released |
Fix Committed |
|
2019-07-24 20:24:36 |
Brad Figg |
tags |
bnx2x sts |
bnx2x cscc sts |
|
2019-07-25 16:05:15 |
Ubuntu Kernel Bot |
tags |
bnx2x cscc sts |
bnx2x cscc sts verification-needed-disco |
|
2019-07-25 18:32:57 |
Ubuntu Kernel Bot |
tags |
bnx2x cscc sts verification-needed-disco |
bnx2x cscc sts verification-needed-bionic verification-needed-disco |
|
2019-07-30 11:12:08 |
Ubuntu Kernel Bot |
tags |
bnx2x cscc sts verification-needed-bionic verification-needed-disco |
bnx2x cscc sts verification-needed-bionic verification-needed-disco verification-needed-xenial |
|
2019-07-30 15:15:31 |
Guilherme G. Piccoli |
linux (Ubuntu Ff-series): status |
Fix Committed |
Fix Released |
|
2019-07-30 15:16:00 |
Guilherme G. Piccoli |
tags |
bnx2x cscc sts verification-needed-bionic verification-needed-disco verification-needed-xenial |
bnx2x cscc sts verification-done-bionic verification-done-disco verification-done-xenial |
|
2019-08-09 11:38:28 |
Launchpad Janitor |
linux (Ubuntu Eoan): status |
Fix Committed |
Fix Released |
|
2019-08-09 11:38:28 |
Launchpad Janitor |
cve linked |
|
2019-12614 |
|
2019-08-09 11:38:28 |
Launchpad Janitor |
cve linked |
|
2019-13648 |
|
2019-08-12 10:03:09 |
Pedro GuimarĂ£es |
bug |
|
|
added subscriber Pedro GuimarĂ£es |
2019-08-12 12:14:31 |
Yoshi Kadokawa |
bug |
|
|
added subscriber Yoshi Kadokawa |
2019-08-13 08:59:53 |
Launchpad Janitor |
linux (Ubuntu Disco): status |
Fix Committed |
Fix Released |
|
2019-08-13 08:59:53 |
Launchpad Janitor |
cve linked |
|
2019-1125 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
linux (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2000-1134 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2007-3852 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2008-0525 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2009-0416 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2011-4834 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2015-1838 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2015-7442 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2016-7489 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2018-5383 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2019-10126 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2019-12818 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2019-12819 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2019-12984 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2019-13233 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2019-13272 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2019-2101 |
|
2019-08-13 11:27:47 |
Launchpad Janitor |
cve linked |
|
2019-3846 |
|
2019-08-13 12:04:14 |
Launchpad Janitor |
linux (Ubuntu Xenial): status |
Fix Committed |
Fix Released |
|