Enable basic support for Solarflare 8000 series NIC

Bug #1783152 reported by Mauricio Faria de Oliveira on 2018-07-23
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
debian-installer (Ubuntu)
Undecided
Unassigned
Trusty
Medium
Mauricio Faria de Oliveira
linux (Ubuntu)
Undecided
Unassigned
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned
linux-lts-xenial (Ubuntu)
Undecided
Unassigned
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned

Bug Description

SRU Justification:

[Impact]

 * Users cannot use Solarflare 8000 series NICs.

 * Servers with only this NIC cannot do netboot.

 * The patchset adds the PCI IDs and a basic fix.

[Test Case]

 * Try to probe/netboot/use a Solarflare 8000
   series NIC.

 * It does not probe on the original kernel,
   but it does probe/netboot/install/stress
   (i.e., basic fuctionality works) on the
   patched kernel.

[Regression Potential]

 * Users with Solarflare 8000 series NIC might hit
   problems on device probe or due to a new network
   interface coming up, now that the NIC comes up.

 * More specific features of the NIC or advanced
   tuning/setup might not work as expected or run
   into issues.

[Other Info]

 * There are known error messages on device probe.

 * These are benign/non-fatal and will be addressed
   on another SRU cycle.

---

The Trusty HWE kernel from Xenial lacks the PCI ID for the Solarflare 8000 series NIC.
This prevents network installs on servers which only have that NIC.

In order to get NIC detected, link up, and successful network install,
only 2 commits are required:

dd248f1bc65b sfc: Add PCI ID for Solarflare 8000 series 10/40G NIC
93171b14a545 sfc: make TSO version a per-queue parameter

This patchset is undergoing testing, and I will post the patches to the kernel-team mailing list.

---

There are some kernel messages produced possibly due to additional commits missing,
but are benign/non-fatal and allows the NIC probing and basic functionality to work.

[ 2.803941] sfc 0000:37:00.0 (unnamed net_device) (uninitialized): Solarflare NIC detected
[ 2.806336] sfc 0000:37:00.0 (unnamed net_device) (uninitialized): Part Number : SFN8042
[ 2.807366] sfc 0000:37:00.0 (unnamed net_device) (uninitialized): MC command 0x4a inlen 8 failed rc=-2 (raw=2) arg=0
[ 2.808052] sfc 0000:37:00.0 (unnamed net_device) (uninitialized): no PTP support
[ 2.808488] sfc 0000:37:00.0 (unnamed net_device) (uninitialized): MC command 0x8f inlen 0 failed rc=-1 (raw=1) arg=0
[ 2.808605] sfc 0000:37:00.0 (unnamed net_device) (uninitialized): failed to allocate PIO buffers (-1)
...
[ 4.037694] sfc 0000:37:00.0 p2p1: link up at 40000Mbps full-duplex (MTU 1500)

The PTP (precision time protocol / ieee 1588) support is a feature to synchronize clocks
over a computer network with high precision, and is not required for basic functionality
nor for this particular user.

The failure to allocate PIO buffers is non-fatal, see sfc/ef10.c/efx_ef10_dimension_resources() comments.

The additional patches to resolve the error messages will be worked on another SRU cycle.

description: updated
description: updated
description: updated
description: updated

The patches look good from the testing front.
They passed testing over weekend + monday + tuesday morning, including stress tests.
I'll try to get to list of tests run.

Patches posted to the kernel-team mailing list [1].

[1] https://lists.ubuntu.com/archives/kernel-team/2018-July/094159.html

Changed in linux-lts-xenial (Ubuntu Trusty):
status: New → In Progress
Changed in linux-lts-xenial (Ubuntu Xenial):
status: New → Invalid
Changed in linux (Ubuntu Trusty):
status: New → Invalid
Changed in linux (Ubuntu Xenial):
status: New → In Progress

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1783152

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

Reverting linux's status from Incomplete to New, as the problem has been already diagnosed and fixes submitted.

Changed in linux (Ubuntu):
status: Incomplete → New

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1783152

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

I should read the bot comments more attentively.
Setting bug status to Confirmed.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-done-xenial
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :
Download full text (16.4 KiB)

This bug was fixed in the package linux - 4.4.0-134.160

---------------
linux (4.4.0-134.160) xenial; urgency=medium

  * linux: 4.4.0-134.160 -proposed tracker (LP: #1787177)

  * locking sockets broken due to missing AppArmor socket mediation patches
    (LP: #1780227)
    - UBUNTU SAUCE: apparmor: fix apparmor mediating locking non-fs, unix sockets

  * Backport namespaced fscaps to xenial 4.4 (LP: #1778286)
    - Introduce v3 namespaced file capabilities
    - commoncap: move assignment of fs_ns to avoid null pointer dereference
    - capabilities: fix buffer overread on very short xattr
    - commoncap: Handle memory allocation failure.

  * Xenial update to 4.4.140 stable release (LP: #1784409)
    - usb: cdc_acm: Add quirk for Uniden UBC125 scanner
    - USB: serial: cp210x: add CESINEL device ids
    - USB: serial: cp210x: add Silicon Labs IDs for Windows Update
    - n_tty: Fix stall at n_tty_receive_char_special().
    - staging: android: ion: Return an ERR_PTR in ion_map_kernel
    - n_tty: Access echo_* variables carefully.
    - x86/boot: Fix early command-line parsing when matching at end
    - ath10k: fix rfc1042 header retrieval in QCA4019 with eth decap mode
    - i2c: rcar: fix resume by always initializing registers before transfer
    - ipv4: Fix error return value in fib_convert_metrics()
    - kprobes/x86: Do not modify singlestep buffer while resuming
    - nvme-pci: initialize queue memory before interrupts
    - netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain()
    - ARM: dts: imx6q: Use correct SDMA script for SPI5 core
    - ubi: fastmap: Correctly handle interrupted erasures in EBA
    - mm: hugetlb: yield when prepping struct pages
    - tracing: Fix missing return symbol in function_graph output
    - scsi: sg: mitigate read/write abuse
    - s390: Correct register corruption in critical section cleanup
    - drbd: fix access after free
    - cifs: Fix infinite loop when using hard mount option
    - jbd2: don't mark block as modified if the handle is out of credits
    - ext4: make sure bitmaps and the inode table don't overlap with bg
      descriptors
    - ext4: always check block group bounds in ext4_init_block_bitmap()
    - ext4: only look at the bg_flags field if it is valid
    - ext4: verify the depth of extent tree in ext4_find_extent()
    - ext4: include the illegal physical block in the bad map ext4_error msg
    - ext4: clear i_data in ext4_inode_info when removing inline data
    - ext4: add more inode number paranoia checks
    - ext4: add more mount time checks of the superblock
    - ext4: check superblock mapped prior to committing
    - HID: i2c-hid: Fix "incomplete report" noise
    - HID: hiddev: fix potential Spectre v1
    - HID: debug: check length before copy_to_user()
    - x86/mce: Detect local MCEs properly
    - x86/mce: Fix incorrect "Machine check from unknown source" message
    - media: cx25840: Use subdev host data for PLL override
    - mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
    - dm bufio: avoid sleeping while holding the dm_bufio lock
    - dm bufio: drop the lock when doing GFP_NOIO allocation
    - mtd: rawnand: mxc: set spa...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (16.5 KiB)

This bug was fixed in the package linux-lts-xenial - 4.4.0-134.160~14.04.1

---------------
linux-lts-xenial (4.4.0-134.160~14.04.1) trusty; urgency=medium

  * linux-lts-xenial: 4.4.0-134.160~14.04.1 -proposed tracker (LP: #1787179)

  * linux: 4.4.0-134.160 -proposed tracker (LP: #1787177)

  * locking sockets broken due to missing AppArmor socket mediation patches
    (LP: #1780227)
    - UBUNTU SAUCE: apparmor: fix apparmor mediating locking non-fs, unix sockets

  * Backport namespaced fscaps to xenial 4.4 (LP: #1778286)
    - Introduce v3 namespaced file capabilities
    - commoncap: move assignment of fs_ns to avoid null pointer dereference
    - capabilities: fix buffer overread on very short xattr
    - commoncap: Handle memory allocation failure.

  * Xenial update to 4.4.140 stable release (LP: #1784409)
    - usb: cdc_acm: Add quirk for Uniden UBC125 scanner
    - USB: serial: cp210x: add CESINEL device ids
    - USB: serial: cp210x: add Silicon Labs IDs for Windows Update
    - n_tty: Fix stall at n_tty_receive_char_special().
    - staging: android: ion: Return an ERR_PTR in ion_map_kernel
    - n_tty: Access echo_* variables carefully.
    - x86/boot: Fix early command-line parsing when matching at end
    - ath10k: fix rfc1042 header retrieval in QCA4019 with eth decap mode
    - i2c: rcar: fix resume by always initializing registers before transfer
    - ipv4: Fix error return value in fib_convert_metrics()
    - kprobes/x86: Do not modify singlestep buffer while resuming
    - nvme-pci: initialize queue memory before interrupts
    - netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain()
    - ARM: dts: imx6q: Use correct SDMA script for SPI5 core
    - ubi: fastmap: Correctly handle interrupted erasures in EBA
    - mm: hugetlb: yield when prepping struct pages
    - tracing: Fix missing return symbol in function_graph output
    - scsi: sg: mitigate read/write abuse
    - s390: Correct register corruption in critical section cleanup
    - drbd: fix access after free
    - cifs: Fix infinite loop when using hard mount option
    - jbd2: don't mark block as modified if the handle is out of credits
    - ext4: make sure bitmaps and the inode table don't overlap with bg
      descriptors
    - ext4: always check block group bounds in ext4_init_block_bitmap()
    - ext4: only look at the bg_flags field if it is valid
    - ext4: verify the depth of extent tree in ext4_find_extent()
    - ext4: include the illegal physical block in the bad map ext4_error msg
    - ext4: clear i_data in ext4_inode_info when removing inline data
    - ext4: add more inode number paranoia checks
    - ext4: add more mount time checks of the superblock
    - ext4: check superblock mapped prior to committing
    - HID: i2c-hid: Fix "incomplete report" noise
    - HID: hiddev: fix potential Spectre v1
    - HID: debug: check length before copy_to_user()
    - x86/mce: Detect local MCEs properly
    - x86/mce: Fix incorrect "Machine check from unknown source" message
    - media: cx25840: Use subdev host data for PLL override
    - mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
    - dm bufio: avoid sleeping while hol...

Changed in linux-lts-xenial (Ubuntu Trusty):
status: In Progress → Fix Released
Changed in linux-lts-xenial (Ubuntu Trusty):
status: In Progress → Fix Released

This is the patch for trusty debian-installer to pick up this new xenial hwe kernel.

It's been tested by the customer with the SF 8000 series NICs on amd64 bare metal (back when using test packages from their private PPA), so the netboot install works on that NIC model.

I built it on PPA for the supported architectures on trusty (amd64/i386, arm64/armhf, ppc64el/powerpc) [1], and used the built netboot images for testing.

I tested for no regressions / without that NIC adapter, in the following platforms, plain and LVM partitioning, per discussion with folks from our server iso automated testing and server/arm teams.

- amd64 bare-metal & kvm guest
- arm64 qemu guest (see [2])
- ppc64el qemu guest (see [3])

On testing, the installer boots with the new kernel, installs it to the system, and the installed system boots correctly with it.

[1] https://launchpad.net/~mfo/+archive/ubuntu/sf188840di/
[2] https://wiki.ubuntu.com/ARM64/QEMU
[3] https://buggy.link/2018/01/31/ppc64le-on-x86_64-qemu-full-system-emulation.html

no longer affects: debian-installer (Ubuntu Xenial)
Changed in debian-installer (Ubuntu):
status: New → Invalid
Eric Desrochers (slashd) on 2018-08-30
Changed in debian-installer (Ubuntu Trusty):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
tags: added: patch
Eric Desrochers (slashd) wrote :

Sponsored for Trusty.

The 4.4.0-134 is indeed containing the required commits mentioned above :

$ git log ...
67f5d9c sfc: Add PCI ID for Solarflare 8000 series 10/40G NIC
4a76ed3 sfc: make TSO version a per-queue parameter

$ git describe --contains 67f5d9c
Ubuntu-lts-4.4.0-134.160_14.04.1~320

$ git describe --contains 4a76ed3
Ubuntu-lts-4.4.0-134.160_14.04.1~321

Hello Mauricio, or anyone else affected,

Accepted debian-installer into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/debian-installer/20101020ubuntu318.44 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-trusty to verification-done-trusty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-trusty. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in debian-installer (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-trusty
Changed in linux-lts-xenial (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Eric Desrochers (slashd) wrote :

FTBFS on Trusty amd64 arch in investigation
[Build #15318223] amd64 build of debian-installer 20101020ubuntu318.44 in ubuntu trusty PROPOSED

Eric Desrochers (slashd) wrote :

According to SRU it may be related to :
https://bugs.launchpad.net/ubuntu/+source/shim/+bug/1708245

cyphermox will need to be involve.

@slashd thanks for the pointers.

Yesterday/today I worked on understanding and providing a workaround for this,
but a more complete solution (more updates to secure-boot related packages)
is already in the works (in cyphermox's PPAs) and should be rolled out soon,
according to him.

Once that hits trusty-proposed, the d-i build should work.

The secure boot packages on trusty-proposed are now correct (LP: #1708245),
and the d-i rebuild on amd64 is in progress (verified successfully on PPA).

The debian-installer images in trusty-proposed work correctly; changing verification tags to done.

I tested in the following platforms, with plain and LVM partitioning (details in comment #10).
- amd64 bare-metal & qemu-kvm guest
- i386 qemu-kvm guest
- arm64 qemu guest
- ppc64el qemu guest

The installer boots with the new kernel, installs it to the system, and the installed system boots correctly with it.

The NIC enablement has been validated previously by the customer with the updated kernel package used in these d-i images.

tags: added: verification-done-trusty
removed: verification-needed-trusty

The verification of the Stable Release Update for debian-installer has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package debian-installer - 20101020ubuntu318.44

---------------
debian-installer (20101020ubuntu318.44) trusty; urgency=medium

  * Move lts-xenial kernels to 4.4.0-134 (LP: #1783152)

 -- Mauricio Faria de Oliveira <email address hidden> Thu, 23 Aug 2018 18:38:36 -0300

Changed in debian-installer (Ubuntu Trusty):
status: Fix Committed → Fix Released
no longer affects: linux-lts-xenial (Ubuntu Precise)
no longer affects: linux (Ubuntu Precise)
no longer affects: debian-installer (Ubuntu Precise)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers