2020-12-23 05:31:33 |
Manish Chopra |
bug |
|
|
added bug |
2020-12-23 06:57:00 |
Chris Guiver |
affects |
ubuntu |
linux (Ubuntu) |
|
2020-12-23 07:00:02 |
Chris Guiver |
bug |
|
|
added subscriber Chris Guiver |
2020-12-23 07:00:10 |
Ubuntu Kernel Bot |
linux (Ubuntu): status |
New |
Incomplete |
|
2020-12-23 07:00:11 |
Ubuntu Kernel Bot |
tags |
|
focal |
|
2020-12-23 07:44:39 |
Manish Chopra |
attachment added |
|
wire_traces.zip https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1909062/+attachment/5446099/+files/wire_traces.zip |
|
2020-12-23 07:45:21 |
Manish Chopra |
linux (Ubuntu): status |
Incomplete |
Confirmed |
|
2020-12-25 21:01:51 |
Manish Chopra |
description |
Customer is reporting a problem with QL41xxx and Ubuntu internal DNS server. The issue appeared when the customer updated to the latest Ubuntu kernel 20.04.1 LTS version 5.4.0-52-generic. Issue was not observed with 4.5 ubuntu-linux.
Problem Definition:
Product: PowerEdge R740xd
Serial: C7J90W2
Hostname: xkubmin1r12
OS Version: /etc/os-release shows Ubuntu 18.04.4 LTS, but Booted kernel is the latest Ubuntu 20.04.1 LTS version 5.4.0-52-generic
NIC: 2 dual-port (4) QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller [1077:8070] (rev 02)
Firmware: 15.15.11
Inbox driver qede v8.37.0.20
Completed Detailed Problem Description:
Anything that uses the internal Kubernetes DNS server fails. If an external DNS server is used resolution works for non-Kubernetes IPs.
Customer is experiencing the same issue described in this article.
https://github.com/kubernetes/kubernetes/issues/95365
Customer Impact: Production site
The QLogic Nic 41262 is their main nic for all of their 14G environment thousands of servers. Unclear how many of those hosts are Kubernetes, but the point is they want the QL41000 to work since it is very prevalent in the entire environment.
Below patch recently on upstream fixes this -
[Note that issue was introduced by driver's tunnel offload support which was added in after 4.5 kernel]
commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
qede: fix offload for IPIP tunnel packets
IPIP tunnels packets are unknown to device,
hence these packets are incorrectly parsed and
caused the packet corruption, so disable offlods
for such packets at run time.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Sudarsana Kalluru <skalluru@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20201221145530.7771-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thanks,
Manish |
With QL41xxx and Ubuntu DNS server DNS failures are seen when updated to the latest Ubuntu kernel 20.04.1 LTS version 5.4.0-52-generic. Issue was not observed with 4.5 ubuntu-linux.
Problem Definition:
OS Version: /etc/os-release shows Ubuntu 18.04.4 LTS, but Booted kernel is the latest Ubuntu 20.04.1 LTS version 5.4.0-52-generic
NIC: 2 dual-port (4) QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller [1077:8070] (rev 02)
Inbox driver qede v8.37.0.20
Complete Detailed Problem Description:
Anything that uses the internal Kubernetes DNS server fails. If an external DNS server is used resolution works for non-Kubernetes IPs.
Similar issue is described in this article.
https://github.com/kubernetes/kubernetes/issues/95365
Below patch recently on upstream fixes this -
[Note that issue was introduced by driver's tunnel offload support which was added in after 4.5 kernel]
commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
qede: fix offload for IPIP tunnel packets
IPIP tunnels packets are unknown to device,
hence these packets are incorrectly parsed and
caused the packet corruption, so disable offlods
for such packets at run time.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Sudarsana Kalluru <skalluru@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20201221145530.7771-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thanks,
Manish |
|
2021-01-03 01:47:46 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Groovy |
|
2021-01-03 01:47:46 |
Matthew Ruffell |
bug task added |
|
linux (Ubuntu Groovy) |
|
2021-01-03 01:47:46 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Focal |
|
2021-01-03 01:47:46 |
Matthew Ruffell |
bug task added |
|
linux (Ubuntu Focal) |
|
2021-01-03 02:13:12 |
Matthew Ruffell |
linux (Ubuntu Focal): status |
New |
In Progress |
|
2021-01-03 02:13:14 |
Matthew Ruffell |
linux (Ubuntu Groovy): status |
New |
In Progress |
|
2021-01-03 02:13:17 |
Matthew Ruffell |
linux (Ubuntu Focal): importance |
Undecided |
Medium |
|
2021-01-03 02:13:19 |
Matthew Ruffell |
linux (Ubuntu Groovy): importance |
Undecided |
Medium |
|
2021-01-03 02:13:22 |
Matthew Ruffell |
linux (Ubuntu Focal): assignee |
|
Matthew Ruffell (mruffell) |
|
2021-01-03 02:13:25 |
Matthew Ruffell |
linux (Ubuntu Groovy): assignee |
|
Matthew Ruffell (mruffell) |
|
2021-01-03 02:13:42 |
Matthew Ruffell |
summary |
Ubuntu kernel 5.x QL41xxx NIC (qede driver) Kubernetes internal DNS failure |
qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload |
|
2021-01-03 02:37:37 |
Matthew Ruffell |
description |
With QL41xxx and Ubuntu DNS server DNS failures are seen when updated to the latest Ubuntu kernel 20.04.1 LTS version 5.4.0-52-generic. Issue was not observed with 4.5 ubuntu-linux.
Problem Definition:
OS Version: /etc/os-release shows Ubuntu 18.04.4 LTS, but Booted kernel is the latest Ubuntu 20.04.1 LTS version 5.4.0-52-generic
NIC: 2 dual-port (4) QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller [1077:8070] (rev 02)
Inbox driver qede v8.37.0.20
Complete Detailed Problem Description:
Anything that uses the internal Kubernetes DNS server fails. If an external DNS server is used resolution works for non-Kubernetes IPs.
Similar issue is described in this article.
https://github.com/kubernetes/kubernetes/issues/95365
Below patch recently on upstream fixes this -
[Note that issue was introduced by driver's tunnel offload support which was added in after 4.5 kernel]
commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
qede: fix offload for IPIP tunnel packets
IPIP tunnels packets are unknown to device,
hence these packets are incorrectly parsed and
caused the packet corruption, so disable offlods
for such packets at run time.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Sudarsana Kalluru <skalluru@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20201221145530.7771-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thanks,
Manish |
BugLink: https://bugs.launchpad.net/bugs/1909062
[Impact]
For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4 kernel, Kubernetes Internal DNS requests will fail, due to these packets getting corrupted.
Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this particular packet type is not supported for hardware tx checksum offload, and the packets end up corrupted when the qede driver attempts to checksum them.
This only affects internal Kubernetes DNS, as regular DNS lookups to regular external domains will succeed, due to them not using IPIP packet types.
[Fix]
Marvell has developed a fix for the qede driver, which checks the packet type, and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers of type IPIP.
commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
Subject: qede: fix offload for IPIP tunnel packets
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=5d5647dad259bb416fd5d3d87012760386d97530
This commit is currently in the netdev tree, awaiting merge to mainline. The commit is queued for upstream stable.
[Testcase]
The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part of a Kubernetes cluster.
Firstly, get a list of all devices in the system:
$ sudo ifconfig
Next, set all devices down with:
$ sudo ifconfig <device> down
Next, bring up the QLogic QL41xxx device:
$ sudo ifconfig <qlogic nic device> up
Then, attempt to lookup an internal Kubernetes domain:
$ nslookup <internal kubernetes domain address>
Without the patch, the connection will time out:
;; connection timed out; no servers could be reached
If we look at packet traces with tcpdump, we see it leaves the source, but never arrives at the destination.
There is a test kernel available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
If you install it, then Kubernetes internal DNS lookups will succeed.
[Where problems could occur]
If a regression were to occur, then users of the qede driver would be affected. This is limited to those with QLogic QL41xxx series NICs. The patch explicitly checks for IPIP type packets, so only those particular packets would be affected.
Since IPIP type packets are uncommon, it would not cause a total outage on regression, since most packets are not IPIP tunnelled. It could potentially cause problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
A workaround would be to use ethtool to disable tx csum offload for all packet types, or to revert to an older kernel. |
|
2021-01-03 02:40:10 |
Matthew Ruffell |
tags |
focal |
focal sts |
|
2021-01-04 15:18:04 |
Jerry Clement |
bug |
|
|
added subscriber Jerry Clement |
2021-01-04 19:36:50 |
Terry Rudd |
bug |
|
|
added subscriber Terry Rudd |
2021-01-11 04:39:17 |
Matthew Ruffell |
description |
BugLink: https://bugs.launchpad.net/bugs/1909062
[Impact]
For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4 kernel, Kubernetes Internal DNS requests will fail, due to these packets getting corrupted.
Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this particular packet type is not supported for hardware tx checksum offload, and the packets end up corrupted when the qede driver attempts to checksum them.
This only affects internal Kubernetes DNS, as regular DNS lookups to regular external domains will succeed, due to them not using IPIP packet types.
[Fix]
Marvell has developed a fix for the qede driver, which checks the packet type, and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers of type IPIP.
commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
Subject: qede: fix offload for IPIP tunnel packets
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=5d5647dad259bb416fd5d3d87012760386d97530
This commit is currently in the netdev tree, awaiting merge to mainline. The commit is queued for upstream stable.
[Testcase]
The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part of a Kubernetes cluster.
Firstly, get a list of all devices in the system:
$ sudo ifconfig
Next, set all devices down with:
$ sudo ifconfig <device> down
Next, bring up the QLogic QL41xxx device:
$ sudo ifconfig <qlogic nic device> up
Then, attempt to lookup an internal Kubernetes domain:
$ nslookup <internal kubernetes domain address>
Without the patch, the connection will time out:
;; connection timed out; no servers could be reached
If we look at packet traces with tcpdump, we see it leaves the source, but never arrives at the destination.
There is a test kernel available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
If you install it, then Kubernetes internal DNS lookups will succeed.
[Where problems could occur]
If a regression were to occur, then users of the qede driver would be affected. This is limited to those with QLogic QL41xxx series NICs. The patch explicitly checks for IPIP type packets, so only those particular packets would be affected.
Since IPIP type packets are uncommon, it would not cause a total outage on regression, since most packets are not IPIP tunnelled. It could potentially cause problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
A workaround would be to use ethtool to disable tx csum offload for all packet types, or to revert to an older kernel. |
BugLink: https://bugs.launchpad.net/bugs/1909062
[Impact]
For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4 kernel, Kubernetes Internal DNS requests will fail, due to these packets getting corrupted.
Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this particular packet type is not supported for hardware tx checksum offload, and the packets end up corrupted when the qede driver attempts to checksum them.
This only affects internal Kubernetes DNS, as regular DNS lookups to regular external domains will succeed, due to them not using IPIP packet types.
[Fix]
Marvell has developed a fix for the qede driver, which checks the packet type, and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers of type IPIP.
commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
Subject: qede: fix offload for IPIP tunnel packets
Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530
This commit landed in mainline in 5.11-rc3. The commit is queued for upstream stable.
[Testcase]
The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part of a Kubernetes cluster.
Firstly, get a list of all devices in the system:
$ sudo ifconfig
Next, set all devices down with:
$ sudo ifconfig <device> down
Next, bring up the QLogic QL41xxx device:
$ sudo ifconfig <qlogic nic device> up
Then, attempt to lookup an internal Kubernetes domain:
$ nslookup <internal kubernetes domain address>
Without the patch, the connection will time out:
;; connection timed out; no servers could be reached
If we look at packet traces with tcpdump, we see it leaves the source, but never arrives at the destination.
There is a test kernel available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
If you install it, then Kubernetes internal DNS lookups will succeed.
[Where problems could occur]
If a regression were to occur, then users of the qede driver would be affected. This is limited to those with QLogic QL41xxx series NICs. The patch explicitly checks for IPIP type packets, so only those particular packets would be affected.
Since IPIP type packets are uncommon, it would not cause a total outage on regression, since most packets are not IPIP tunnelled. It could potentially cause problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
A workaround would be to use ethtool to disable tx csum offload for all packet types, or to revert to an older kernel. |
|
2021-01-14 21:39:48 |
Matthew Ruffell |
description |
BugLink: https://bugs.launchpad.net/bugs/1909062
[Impact]
For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4 kernel, Kubernetes Internal DNS requests will fail, due to these packets getting corrupted.
Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this particular packet type is not supported for hardware tx checksum offload, and the packets end up corrupted when the qede driver attempts to checksum them.
This only affects internal Kubernetes DNS, as regular DNS lookups to regular external domains will succeed, due to them not using IPIP packet types.
[Fix]
Marvell has developed a fix for the qede driver, which checks the packet type, and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers of type IPIP.
commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
Subject: qede: fix offload for IPIP tunnel packets
Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530
This commit landed in mainline in 5.11-rc3. The commit is queued for upstream stable.
[Testcase]
The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part of a Kubernetes cluster.
Firstly, get a list of all devices in the system:
$ sudo ifconfig
Next, set all devices down with:
$ sudo ifconfig <device> down
Next, bring up the QLogic QL41xxx device:
$ sudo ifconfig <qlogic nic device> up
Then, attempt to lookup an internal Kubernetes domain:
$ nslookup <internal kubernetes domain address>
Without the patch, the connection will time out:
;; connection timed out; no servers could be reached
If we look at packet traces with tcpdump, we see it leaves the source, but never arrives at the destination.
There is a test kernel available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
If you install it, then Kubernetes internal DNS lookups will succeed.
[Where problems could occur]
If a regression were to occur, then users of the qede driver would be affected. This is limited to those with QLogic QL41xxx series NICs. The patch explicitly checks for IPIP type packets, so only those particular packets would be affected.
Since IPIP type packets are uncommon, it would not cause a total outage on regression, since most packets are not IPIP tunnelled. It could potentially cause problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
A workaround would be to use ethtool to disable tx csum offload for all packet types, or to revert to an older kernel. |
BugLink: https://bugs.launchpad.net/bugs/1909062
[Impact]
For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4 kernel, Kubernetes Internal DNS requests will fail, due to these packets getting corrupted.
Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this particular packet type is not supported for hardware tx checksum offload, and the packets end up corrupted when the qede driver attempts to checksum them.
This only affects internal Kubernetes DNS, as regular DNS lookups to regular external domains will succeed, due to them not using IPIP packet types.
[Fix]
Marvell has developed a fix for the qede driver, which checks the packet type, and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers of type IPIP.
commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
Subject: qede: fix offload for IPIP tunnel packets
Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530
This commit landed in mainline in 5.11-rc3. The commit was accepted into update stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7.
Note, this SRU isn't targeted for Bionic due to tx csum offload support only landing in 5.0 and onward, meaning the 4.15 kernel still works even without this patch. Because of this, Bionic can pick the patch up naturally from upstream stable.
[Testcase]
The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part of a Kubernetes cluster.
Firstly, get a list of all devices in the system:
$ sudo ifconfig
Next, set all devices down with:
$ sudo ifconfig <device> down
Next, bring up the QLogic QL41xxx device:
$ sudo ifconfig <qlogic nic device> up
Then, attempt to lookup an internal Kubernetes domain:
$ nslookup <internal kubernetes domain address>
Without the patch, the connection will time out:
;; connection timed out; no servers could be reached
If we look at packet traces with tcpdump, we see it leaves the source, but never arrives at the destination.
There is a test kernel available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
If you install it, then Kubernetes internal DNS lookups will succeed.
[Where problems could occur]
If a regression were to occur, then users of the qede driver would be affected. This is limited to those with QLogic QL41xxx series NICs. The patch explicitly checks for IPIP type packets, so only those particular packets would be affected.
Since IPIP type packets are uncommon, it would not cause a total outage on regression, since most packets are not IPIP tunnelled. It could potentially cause problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
A workaround would be to use ethtool to disable tx csum offload for all packet types, or to revert to an older kernel. |
|
2021-01-14 21:39:56 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Hirsute |
|
2021-01-14 21:39:56 |
Matthew Ruffell |
bug task added |
|
linux (Ubuntu Hirsute) |
|
2021-01-14 21:40:02 |
Matthew Ruffell |
linux (Ubuntu Hirsute): status |
Confirmed |
In Progress |
|
2021-01-14 21:40:07 |
Matthew Ruffell |
linux (Ubuntu Hirsute): importance |
Undecided |
Medium |
|
2021-01-14 21:40:09 |
Matthew Ruffell |
linux (Ubuntu Hirsute): assignee |
|
Matthew Ruffell (mruffell) |
|
2021-01-14 21:43:04 |
Matthew Ruffell |
description |
BugLink: https://bugs.launchpad.net/bugs/1909062
[Impact]
For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4 kernel, Kubernetes Internal DNS requests will fail, due to these packets getting corrupted.
Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this particular packet type is not supported for hardware tx checksum offload, and the packets end up corrupted when the qede driver attempts to checksum them.
This only affects internal Kubernetes DNS, as regular DNS lookups to regular external domains will succeed, due to them not using IPIP packet types.
[Fix]
Marvell has developed a fix for the qede driver, which checks the packet type, and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers of type IPIP.
commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
Subject: qede: fix offload for IPIP tunnel packets
Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530
This commit landed in mainline in 5.11-rc3. The commit was accepted into update stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7.
Note, this SRU isn't targeted for Bionic due to tx csum offload support only landing in 5.0 and onward, meaning the 4.15 kernel still works even without this patch. Because of this, Bionic can pick the patch up naturally from upstream stable.
[Testcase]
The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part of a Kubernetes cluster.
Firstly, get a list of all devices in the system:
$ sudo ifconfig
Next, set all devices down with:
$ sudo ifconfig <device> down
Next, bring up the QLogic QL41xxx device:
$ sudo ifconfig <qlogic nic device> up
Then, attempt to lookup an internal Kubernetes domain:
$ nslookup <internal kubernetes domain address>
Without the patch, the connection will time out:
;; connection timed out; no servers could be reached
If we look at packet traces with tcpdump, we see it leaves the source, but never arrives at the destination.
There is a test kernel available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
If you install it, then Kubernetes internal DNS lookups will succeed.
[Where problems could occur]
If a regression were to occur, then users of the qede driver would be affected. This is limited to those with QLogic QL41xxx series NICs. The patch explicitly checks for IPIP type packets, so only those particular packets would be affected.
Since IPIP type packets are uncommon, it would not cause a total outage on regression, since most packets are not IPIP tunnelled. It could potentially cause problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
A workaround would be to use ethtool to disable tx csum offload for all packet types, or to revert to an older kernel. |
BugLink: https://bugs.launchpad.net/bugs/1909062
[Impact]
For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4 kernel, Kubernetes Internal DNS requests will fail, due to these packets getting corrupted.
Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this particular packet type is not supported for hardware tx checksum offload, and the packets end up corrupted when the qede driver attempts to checksum them.
This only affects internal Kubernetes DNS, as regular DNS lookups to regular external domains will succeed, due to them not using IPIP packet types.
[Fix]
Marvell has developed a fix for the qede driver, which checks the packet type, and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers of type IPIP.
commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
Subject: qede: fix offload for IPIP tunnel packets
Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530
This commit landed in mainline in 5.11-rc3. The commit was accepted into upstream stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7.
Note, this SRU isn't targeted for Bionic due to tx csum offload support only landing in 5.0 and onward, meaning the 4.15 kernel still works even without this patch. Because of this, Bionic can pick the patch up naturally from upstream stable.
[Testcase]
The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part of a Kubernetes cluster.
Firstly, get a list of all devices in the system:
$ sudo ifconfig
Next, set all devices down with:
$ sudo ifconfig <device> down
Next, bring up the QLogic QL41xxx device:
$ sudo ifconfig <qlogic nic device> up
Then, attempt to lookup an internal Kubernetes domain:
$ nslookup <internal kubernetes domain address>
Without the patch, the connection will time out:
;; connection timed out; no servers could be reached
If we look at packet traces with tcpdump, we see it leaves the source, but never arrives at the destination.
There is a test kernel available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
If you install it, then Kubernetes internal DNS lookups will succeed.
[Where problems could occur]
If a regression were to occur, then users of the qede driver would be affected. This is limited to those with QLogic QL41xxx series NICs. The patch explicitly checks for IPIP type packets, so only those particular packets would be affected.
Since IPIP type packets are uncommon, it would not cause a total outage on regression, since most packets are not IPIP tunnelled. It could potentially cause problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
A workaround would be to use ethtool to disable tx csum offload for all packet types, or to revert to an older kernel. |
|
2021-01-22 19:50:35 |
Kelsey Steele |
linux (Ubuntu Groovy): status |
In Progress |
Fix Committed |
|
2021-01-22 19:50:38 |
Kelsey Steele |
linux (Ubuntu Focal): status |
In Progress |
Fix Committed |
|
2021-01-22 19:50:50 |
Kelsey Steele |
linux (Ubuntu Hirsute): status |
In Progress |
Fix Committed |
|
2021-01-29 07:39:21 |
Ubuntu Kernel Bot |
tags |
focal sts |
focal sts verification-needed-groovy |
|
2021-02-01 13:02:51 |
Pedro Principeza |
bug |
|
|
added subscriber Pedro Principeza |
2021-02-02 21:21:21 |
Manish Chopra |
tags |
focal sts verification-needed-groovy |
focal sts verification-done-groovy |
|
2021-02-05 10:18:05 |
Ubuntu Kernel Bot |
tags |
focal sts verification-done-groovy |
focal sts verification-done-groovy verification-needed-focal |
|
2021-02-10 22:33:28 |
Matthew Ruffell |
tags |
focal sts verification-done-groovy verification-needed-focal |
focal sts verification-done-focal verification-done-groovy |
|
2021-02-23 16:16:31 |
Launchpad Janitor |
linux (Ubuntu Focal): status |
Fix Committed |
Fix Released |
|
2021-02-23 16:16:31 |
Launchpad Janitor |
cve linked |
|
2020-27777 |
|
2021-02-23 16:16:31 |
Launchpad Janitor |
cve linked |
|
2020-29372 |
|
2021-02-23 16:22:36 |
Launchpad Janitor |
linux (Ubuntu Groovy): status |
Fix Committed |
Fix Released |
|
2021-02-23 16:22:36 |
Launchpad Janitor |
cve linked |
|
2020-28974 |
|
2022-01-26 21:55:51 |
Brian Murray |
linux (Ubuntu Hirsute): status |
Fix Committed |
Won't Fix |
|