Cluster deploy fails if one port of Intel XL710 40G dual-port NIC is allocated for DPDK and the other for SR-IOV

Bug #1583077 reported by Mikhail Chernik on 2016-05-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Anastasia Balobashina
Mitaka
High
Anastasia Balobashina
Newton
High
Anastasia Balobashina

Bug Description

Environment: Hardware Lab, MOS 9.0 ISO 362 Intel XL710 dual-port NIC, PCI ID 8086:1583

Detailed description:
If both DPDK and SR-IOV are configured on ports of same XL710 NIC, DPDK fails to initialize port and ovs-vswitchd is terminated during deploy.

ovs-vswitchd output: http://paste.openstack.org/show/497462/

Steps to reproduce:

1. Create cluster with 1 controller and 1 compute with Intel XL710 dual-port NIC
2. Configure hugepages for host and DPDK on compute
3. Turn on DPDK on one port of 40G NIC and SR-IOV on another port of same NIC
4. Start deployment

Expected result:
Cluster is deployed

Actual result:
Deployment failed with message 'Failed tasks: Task[netconfig/5] Stopping the deployment process!'

Lab is available for investigation.

Mikhail Chernik (mchernik) wrote :

The issue is not reproduced on 10G 82599-based dual-port NIC (AOC-STGN-i2S 8086:10fb) on the same host.

Deploy completed successfully.

More information would be useful. Can you provide the "ethtool -i" output for the interface being used for SR-IOV by the kernel?

It seems like everything points to a firmware bug on this part. If we can first verify the firmware version installed via "ethtool -i" as in the example below which is from my system with 5.0 firmware, then we could update the firmware to see if it resolves the issue.

[root@ahduyck-xeon-server ~]# ethtool -i ens7f0
driver: i40e
version: 1.3.21-k
firmware-version: f5.0.40043 a1.5 n5.02 e2284
bus-info: 0000:81:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

The firmware update is available from:
https://downloadcenter.intel.com/download/24769

Mikhail Chernik (mchernik) wrote :

root@node-5:~# ethtool -i ens11f1
driver: i40e
version: 1.3.47
firmware-version: 4.53 0x80001dca 0.0.0
bus-info: 0000:81:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

It is currently the latest version since Intel removed update v5.02.

Mikhail Chernik (mchernik) wrote :

It turns out that we have a NIC with 5.02 firmware
root@node-2:~# ethtool -i ens11f1
driver: i40e
version: 1.3.47
firmware-version: 5.02 0x80002285 0.0.0
bus-info: 0000:81:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

However, nothing has changed.

Dmitry Klenov (dklenov) on 2016-05-18
tags: added: area-mos
tags: added: area-linux
removed: area-mos
Albert Syriy (asyriy) wrote :

There is a new version of the i40e driver 1.5.18 just has been merged for MOS 10.0
Could you try running the new version of the driver?
http://perestroika-repo-tst.infra.mirantis.net/mos-repos/ubuntu/master

Mikhail Chernik (mchernik) wrote :

As far as I understand, the root cause of this problem is not connected with driver version, however, I will try. Can you please debuild openvswitch-switch-dpdk with DPDK debug logging?

Albert Syriy (asyriy) wrote :

Yes, you are right!
I found the very similar issue and fortunately it has been fixed.
The root cause of the issue is the firmware.

Please see the link for details:
http://dpdk.org/dev/patchwork/patch/9631/

The work-around patch the i40e_ethdev.c file, but better update the firmware (rev 2.5) FVL5.

I also checked the latest version of driver. It doesn't contain the work-around in the driver code.
So fix is in firmware.

Mikhail Chernik (mchernik) wrote :

Request for adding this issue to release notes: https://bugs.launchpad.net/fuel/+bug/1587867
Closed for 9.0.

Related fix proposed to branch: master
Change author: Olena Logvinova <email address hidden>
Review: https://review.fuel-infra.org/22403

tags: added: release-notes

Reviewed: https://review.fuel-infra.org/22403
Submitter: Evgeny Konstantinov <email address hidden>
Branch: master

Commit: 9bfce94c4aec46c05e744432fd307a020d4c62c7
Author: Olena Logvinova <email address hidden>
Date: Wed Jun 22 13:20:42 2016

[RN 9.0] [NFV] [Intel XL710 40] known issues

This patch adds the following bugs and their workarounds
to the RN 9.0 Known issues section:

https://bugs.launchpad.net/fuel/+bug/1583077
https://bugs.launchpad.net/fuel/+bug/1587310

Change-Id: Ie15d0888afd46599a0ed421bd757bf1d471504a5
Related-Bug: #1583077
Related-Bug: #1587310
Closes-Bug: #1587867

tags: added: release-notes-done
removed: release-notes
Changed in fuel:
assignee: MOS Scale (mos-scale) → Anastasia Balobashina (atolochkova)
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/411316
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=53f2fe91c9468e03694b8453d5c03fa1dbb8800d
Submitter: Jenkins
Branch: master

commit 53f2fe91c9468e03694b8453d5c03fa1dbb8800d
Author: Anastasiya <email address hidden>
Date: Thu Dec 15 17:22:18 2016 +0400

    Replace dpdk driver to vfio-pci if sriov is enabled

    Change-Id: Ic87b926b4f547f91b2f130830b35fafc195ada92
    Partial-Bug: #1583077

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
milestone: 10.0 → 11.0

Reviewed: https://review.openstack.org/414536
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=7ea2d025255946325ed9e107c90a821cae52625d
Submitter: Jenkins
Branch: stable/mitaka

commit 7ea2d025255946325ed9e107c90a821cae52625d
Author: Anastasiya <email address hidden>
Date: Thu Dec 15 17:22:18 2016 +0400

    Replace dpdk driver to vfio-pci if sriov is enabled

    Change-Id: Ic87b926b4f547f91b2f130830b35fafc195ada92
    Partial-Bug: #1583077
    (cherry picked from commit 53f2fe91c9468e03694b8453d5c03fa1dbb8800d)

tags: added: in-stable-mitaka

Reviewed: https://review.openstack.org/415196
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=9c984a1ad3c32c6fc9e43da99eba39f6d9c22b29
Submitter: Jenkins
Branch: stable/newton

commit 9c984a1ad3c32c6fc9e43da99eba39f6d9c22b29
Author: Anastasiya <email address hidden>
Date: Thu Dec 15 17:22:18 2016 +0400

    Replace dpdk driver to vfio-pci if sriov is enabled

    Change-Id: Ic87b926b4f547f91b2f130830b35fafc195ada92
    Partial-Bug: #1583077
    (cherry picked from commit 53f2fe91c9468e03694b8453d5c03fa1dbb8800d)

tags: added: in-stable-newton

Related fix proposed to branch: master
Change author: Evgeny Konstantinov <email address hidden>
Review: https://review.fuel-infra.org/30384

Change abandoned by Evgeny Konstantinov <email address hidden> on branch: master
Review: https://review.fuel-infra.org/30384
Reason: will do a new commit

Related fix proposed to branch: master
Change author: Evgeny Konstantinov <email address hidden>
Review: https://review.fuel-infra.org/30429

Reviewed: https://review.fuel-infra.org/30429
Submitter: Mariia Zlatkova <email address hidden>
Branch: master

Commit: fcd84a46e56a2f316ff2cc1f55fa116775fa148b
Author: Evgeny Konstantinov <email address hidden>
Date: Thu Feb 2 14:57:32 2017

Add DPDK SR-IOV resolved issue to relnotes 9.2

Change-Id: I39b36f7a50b97f06bb909cbdbb7ed63703bab72a
Related-Bug: #1583077

Related fix proposed to branch: stable/9.2
Change author: Evgeny Konstantinov <email address hidden>
Review: https://review.fuel-infra.org/30444

Reviewed: https://review.fuel-infra.org/30444
Submitter: Mariia Zlatkova <email address hidden>
Branch: stable/9.2

Commit: 223d9cd4197fddead56e4a066281632a010e4db6
Author: Evgeny Konstantinov <email address hidden>
Date: Thu Feb 2 16:08:03 2017

Add DPDK SR-IOV resolved issue to relnotes 9.2

Change-Id: I39b36f7a50b97f06bb909cbdbb7ed63703bab72a
Related-Bug: #1583077
(cherry picked from commit fcd84a46e56a2f316ff2cc1f55fa116775fa148b)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers