DPDK ports get disabled after Open vSwitch restart with Intel XXV710(i40e) and 25G AOC cables

Bug #1940957 reported by Nobuto Murata
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dpdk (Ubuntu)
Undecided
Unassigned
Focal
Undecided
Unassigned
Hirsute
Undecided
Unassigned
Impish
Undecided
Unassigned

Bug Description

[Impact]

 * Cable detection breaks i40e driver based use cases in some setups

 * An upstream patch was identified that resolves the issues, proposed and
   accepted upstream-stable and hereby backported (as we do not want to
   wait for 19.10.11 in December) along the 19.11.10 updates.

[Test Plan]

 * Nobuto has contact with a site that has a setup with the right cables
   and devices to trigger this. He will coordinate the testing of this on
   Focal.
 * For non-Focal this is part of the normal MRE policy for DPDK
   (see bug 1940913) as it is (will be) part of the upstream stable
   releases.

[Where problems could occur]

 * First of all this only affects a certain driver (i40e) all others will
   be unchanged due to this. When using that driver the detection of
   cables is adjusted and thereby the use-cases to look out for regression
   is more like "establish connection" "restart connection" and "setup"
   than let's say "bulk traffic"

[Other Info]

 * The patch is accepted in the WIP 19.11.11 stable release and will on
   the next MRE be everywhere (not just in Ubuntu)

---

- Ubuntu 20.04 LTS
- dpdk 19.11.7-0ubuntu0.20.04.1
  (we tested it with 19.11.10~rc1, but the problem persists)
- Intel XXV710
- Cisco 25G AOC cables

Patch to backport:
https://git.dpdk.org/dpdk/commit/?id=b1daa3461429e7674206a714c17adca65e9b44b4

[Impact]

DPDK ports for a bond get disabled and no traffic goes in and out after openvswitch restart with the combination above. If that happens the DPDK bond has to be re-created as a workaround but it's not feasible since service restart basically breaks everything.

    ---- dpdk-bond0 ----
    bond_mode: balance-tcp
    bond may use recirculation: yes, Recirc-ID : 1
    bond-hash-basis: 0
    updelay: 0 ms
    downdelay: 0 ms
    next rebalance: 7267 ms
    lacp_status: configured
    lacp_fallback_ab: false
    active slave mac: 00:00:00:00:00:00(none)
    slave dpdk-7272e20: disabled
      may_enable: false
    slave dpdk-d2cb784: disabled
      may_enable: false

[Test Plan]

1. configure a DPDK bond with openvswitch as follows for example.

$ sudo ovs-appctl bond/show dpdk-bond0

    ---- dpdk-bond0 ----
    bond_mode: balance-tcp
    bond may use recirculation: yes, Recirc-ID : 1
    bond-hash-basis: 0
    updelay: 0 ms
    downdelay: 0 ms
    next rebalance: 1691 ms
    lacp_status: negotiated
    lacp_fallback_ab: false
    active slave mac: 40:a6:b7:XX:YY:ZZ(dpdk-d2cb784)
    slave dpdk-7272e20: enabled
      may_enable: true
    slave dpdk-d2cb784: enabled
      active slave
      may_enable: true

2. Apply updated packages

3. Reboot the machine (just to make sure we are not using anything old)

4. Restart the openvswitch

$ sudo systemctl restart openvswitch-switch

5. Confirm ports are enabled after both the step 3. and 4. and the port status matches the one in the step 1.

[Where problems could occur]

The scope of the patch is i40e and the two specific cable types only: i40e + 25G AOC and ACC cables so it's unlikely to affect any other combinations. Before this patch, 25G AOC/ACC cables were not in the additional PHY types of the driver functionality so it's not likely to make things worse.

Related branches

Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Nobuto,
FYI the PPA patch lacks any kind of dep-3 headers [0]
Those can be really useful later on, so adding them is generally recommended.

The diff on LP goes a bit crazy unfortunately listing all since 19.11.1 so it is hard to read.

FYI - currently I'm in the progress to prepare 19.11.10 upstream and then MRE that as an SRU.
So I'd recommend to rebase and test your changes onto that.
If you need a PPA with it to base on look here (with the RC1) [1]

That said the patch you backported was not flagged for stable@dpdk - do you by any chance know why?
Maybe you could ping the Authors of [2] if they want to nominate it still.
If they do we could make it part of the next upstream stable release which comes at the benefit of getting testing by various companies owning the various potential HW that DPDK can run on.

[0]: https://dep-team.pages.debian.net/deps/dep3/
[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4656
[2]: https://github.com/DPDK/dpdk/commit/b1daa3461429e7674206a714c17adca65e9b44b4

Nobuto Murata (nobuto)
summary: - i40e: support 25G AOC/ACC cables
+ DPDK ports get disabled after Open vSwitch restart with Intel
+ XXV710(i40e) and 25G AOC cables
Nobuto Murata (nobuto)
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It was by now confirmed by testing 19.11.10 (-rc1) that we really need the referenced patch (19.11.10-rc1+AOC patch worked, the same without AOC did not).

I'm sending that to DPDK stable for general review by the community and will make it part of the planned 19.11.10 MRE update for Ubuntu (see bug 1940913).

Changed in dpdk (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI submitted upstream for inclusion in the (next) stable release as:
  http://mails.dpdk.org/archives/stable/2021-September/033280.html

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

19.11.10 is released now (minutes ago) and I got an Ack on the AOC patch which is enqueued for 19.11.11 but we can make it part of our SRU upload.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote (last edit ):
Changed in dpdk (Ubuntu Focal):
status: New → Confirmed
Changed in dpdk (Ubuntu Hirsute):
status: New → Confirmed
Changed in dpdk (Ubuntu Impish):
status: Confirmed → Triaged
description: updated
Changed in dpdk (Ubuntu Hirsute):
status: Confirmed → Triaged
Changed in dpdk (Ubuntu Focal):
status: Confirmed → Triaged
Changed in dpdk (Ubuntu Impish):
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dpdk - 20.11.3-0ubuntu1

---------------
dpdk (20.11.3-0ubuntu1) impish; urgency=medium

  * Merge LTS stable release 20.11.3 (LP: #1940913)
    Release notes are available at:
    https://doc.dpdk.org/guides-20.11/rel_notes/release_20_11.html#id1
    - Remove test-catch-coredumps.patch [now part of upstream]
  * d/p/u/lp-1940957-net-i40e-support-25G-AOC-ACC-cables.patch: fix issues
    with 25G AOC cables (LP: #1940957)

 -- Christian Ehrhardt <email address hidden> Tue, 24 Aug 2021 12:28:59 +0200

Changed in dpdk (Ubuntu Impish):
status: In Progress → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI
- accepted in both upstream stable branches
- uploaded to impish
- uploaded to Focal/Hiruste-unapproved for SRU

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Nobuto, or anyone else affected,

Accepted dpdk into hirsute-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/dpdk/20.11.3-0ubuntu0.21.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-hirsute to verification-done-hirsute. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-hirsute. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in dpdk (Ubuntu Hirsute):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-hirsute
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Nobuto, or anyone else affected,

Accepted dpdk into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/dpdk/19.11.10-0ubuntu0.20.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in dpdk (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Christian Ehrhardt  (paelzer) wrote (last edit ):

I got this from Nobuto (thanks)

> 1. configure a DPDK bond with openvswitch as follows for example.
>
> $ sudo ovs-appctl bond/show dpdk-bond0
>
> ---- dpdk-bond0 ----
> bond_mode: balance-tcp
> bond may use recirculation: yes, Recirc-ID : 1
> bond-hash-basis: 0
> updelay: 0 ms
> downdelay: 0 ms
> next rebalance: 1691 ms
> lacp_status: negotiated
> lacp_fallback_ab: false
> active slave mac: 40:a6:b7:XX:YY:ZZ(dpdk-d2cb784)
> slave dpdk-7272e20: enabled
> may_enable: true
> slave dpdk-d2cb784: enabled
> active slave
> may_enable: true
>
> 2. Apply updated packages
>
> $ cat <<EOF | sudo tee /etc/apt/sources.list.d/ubuntu-$(lsb_release -cs)-proposed.list
> # Enable Ubuntu proposed archive
> deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -cs)-proposed restricted main multiverse universe
> EOF
>
> $ sudo apt update
> $ sudo apt upgrade
>
> 3. Reboot the machine (just to make sure we are not using anything old)
>
> 4. Restart the openvswitch
>
> $ sudo systemctl restart openvswitch-switch
>
> 5. Confirm ports are enabled after both the step 3. and 4. and the port status matches the one in the step 1.

That tested Focal for the case and is thereby fine.
I'll ask if there is a chance to test Hirsute in the same environment.

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Nobuto will see if they can do a reduced test in the environment also for 21.04, but they surely can't do a full redeployment.

Therefore I prepare for the case that this won't happen...
Since the patch is already part of 19.11.10+ (queued for 19.11.11) and just prefetched here the same rules as for the MRE apply. That was/is covered in bug 1940913.
Those tests now all completed and are good.

While I'd appreciate if someone could do and add test results of any kind of complexity in an environment with those cables - what we have is already sufficient to call it verified by the MRE exception rule.

tags: added: verification-done-hirsute
removed: verification-needed-hirsute
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dpdk - 20.11.3-0ubuntu0.21.04.2

---------------
dpdk (20.11.3-0ubuntu0.21.04.2) hirsute; urgency=medium

  * Skip flaky self-tests to make the tests more reliable (LP: #1939861)
    - d/p/disable_ppc64_autopkgtest_fails.patch: skip known false-positives
    - d/p/disable_armhf_autopkgtest_fails.patch: disable arm failures that do
      not represent regressions
    - d/p/disable_autopkgtest_fails.patch: disable failures that do not
      represent regressions
    - Add disable_lcores_autotest_ppc.patch to fix ppc64el autopkgtest

dpdk (20.11.3-0ubuntu0.21.04.1) hirsute; urgency=medium

  * Merge LTS stable release 20.11.3 (LP: #1940913)
    Release notes are available at:
    https://doc.dpdk.org/guides-20.11/rel_notes/release_20_11.html#id1
    - Remove test-catch-coredumps.patch [now part of upstream]
  * d/p/u/lp-1940957-net-i40e-support-25G-AOC-ACC-cables.patch: fix issues
    with 25G AOC cables (LP: #1940957)

 -- Christian Ehrhardt <email address hidden> Wed, 08 Sep 2021 09:00:48 +0200

Changed in dpdk (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for dpdk has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dpdk - 19.11.10-0ubuntu0.20.04.1

---------------
dpdk (19.11.10-0ubuntu0.20.04.1) focal; urgency=medium

  * Merge the latest upstream stable minor release 19.11.10 (LP: #1940913)
    Release notes available at:
    https://doc.dpdk.org/guides-19.11/rel_notes/release_19_11.html
    - Revert "fix linking back to pre be like 19.11.6 behavior (LP 1920141)"
      [now part of upstream]
  * d/p/u/lp-1940957-net-i40e-support-25G-AOC-ACC-cables.patch: fix issues
    with 25G AOC cables (LP: #1940957)

 -- Christian Ehrhardt <email address hidden> Tue, 24 Aug 2021 11:51:47 +0200

Changed in dpdk (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers