Fix several bugs in RDMA/hns driver

Bug #1770974 reported by dann frazier on 2018-05-13
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
dann frazier
Bionic
Undecided
dann frazier

Bug Description

[Impact]
Several issues and features in the RDMA/hns driver for HiSilicon network adapters have recently landed upstream. This includes fixes for crashes and endianness issues, among others.

[Test Case]
Server side:
ubuntu@d06-1:~$ ib_write_bw -n 5 -d hns_2

************************************
* Waiting for client to connect... *
************************************
hr_qp->port_num= 0x1
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port : OFF Device : hns_2
 Number of qps : 1 Transport type : IB
 Connection type : RC Using SRQ : OFF
 CQ Moderation : 5
 Mtu : 1024[B]
 Link type : Ethernet
 GID index : 2
 Max inline data : 0[B]
 rdma_cm QPs : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x000c PSN 0x368fa8 RKey 0x000200 VAddr 0x00ffffbe390000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:228:68:237
 remote address: LID 0000 QPN 0x000c PSN 0x19f3aa RKey 0x000200 VAddr 0x00ffff83380000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:228:68:192
---------------------------------------------------------------------------------------
 #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
 65536 5 108.08 108.08 0.001729
---------------------------------------------------------------------------------------

Client side:
ubuntu@d06-2:~$ ib_write_bw -n 5 -d hns_2 10.228.68.237
hr_qp->port_num= 0x1
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port : OFF Device : hns_2
 Number of qps : 1 Transport type : IB
 Connection type : RC Using SRQ : OFF
 TX depth : 5
 CQ Moderation : 5
 Mtu : 1024[B]
 Link type : Ethernet
 GID index : 2
 Max inline data : 0[B]
 rdma_cm QPs : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x000c PSN 0x19f3aa RKey 0x000200 VAddr 0x00ffff83380000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:228:68:192
 remote address: LID 0000 QPN 0x000c PSN 0x368fa8 RKey 0x000200 VAddr 0x00ffffbe390000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:228:68:237
---------------------------------------------------------------------------------------
 #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
 65536 5 108.08 108.08 0.001729
---------------------------------------------------------------------------------------

[Regression Risk]
TLDR; this driver isn't usable today, so changes to it carry negligible regression risk.

These patches are localized to the hns/RDMA driver. This driver is for hardware in the hip07/hip08 SoCs, which Ubuntu supports in the D05 and D06 servers respectively. D05 firmware has intentionally disabled this feature by not exposing the ACPI ID for it. The driver therefore doesn't find the device on that platform, so there is no regression risk.

D06 *does* enable this device in firmware. However, the current bionic kernel crashes when loading the base ethernet driver (hns3 - LP: #1768670) on this platform, so this feature is also currently unusable there.

CVE References

dann frazier (dannf) on 2018-05-13
Changed in linux (Ubuntu):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu):
assignee: nobody → dann frazier (dannf)
Changed in linux (Ubuntu Bionic):
assignee: nobody → dann frazier (dannf)
dann frazier (dannf) on 2018-06-07
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
dann frazier (dannf) wrote :

= Verification =
== Server Side ==
ubuntu@d06-1:~$ cat /proc/version
Linux version 4.15.0-24-generic (buildd@bos02-arm64-008) (gcc version 7.3.0 (Ubuntu/Linaro 7.3.0-16ubuntu3)) #26-Ubuntu SMP Wed Jun 13 08:44:37 UTC 2018
ubuntu@d06-1:~$ ib_write_bw -n 5 -d hns_2

************************************
* Waiting for client to connect... *
************************************
hr_qp->port_num= 0x1
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port : OFF Device : hns_2
 Number of qps : 1 Transport type : IB
 Connection type : RC Using SRQ : OFF
 CQ Moderation : 5
 Mtu : 1024[B]
 Link type : Ethernet
 GID index : 2
 Max inline data : 0[B]
 rdma_cm QPs : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x000c PSN 0x11b76 RKey 0x000200 VAddr 0x00ffff9d943000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:228:68:53
 remote address: LID 0000 QPN 0x000c PSN 0xca0ca0 RKey 0x000200 VAddr 0x00ffff96905000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:228:68:74
---------------------------------------------------------------------------------------
 #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
 65536 5 108.06 108.06 0.001729
---------------------------------------------------------------------------------------

== Client Side ==ubuntu@d06-2:~$ ib_write_bw -n 5 -d hns_2 10.228.68.53
hr_qp->port_num= 0x1
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port : OFF Device : hns_2
 Number of qps : 1 Transport type : IB
 Connection type : RC Using SRQ : OFF
 TX depth : 5
 CQ Moderation : 5
 Mtu : 1024[B]
 Link type : Ethernet
 GID index : 2
 Max inline data : 0[B]
 rdma_cm QPs : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x000c PSN 0xca0ca0 RKey 0x000200 VAddr 0x00ffff96905000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:228:68:74
 remote address: LID 0000 QPN 0x000c PSN 0x11b76 RKey 0x000200 VAddr 0x00ffff9d943000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:228:68:53
---------------------------------------------------------------------------------------
 #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
 65536 5 108.06 108.06 0.001729
---------------------------------------------------------------------------------------

tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (49.5 KiB)

This bug was fixed in the package linux - 4.15.0-24.26

---------------
linux (4.15.0-24.26) bionic; urgency=medium

  * linux: 4.15.0-24.26 -proposed tracker (LP: #1776338)

  * Bionic update: upstream stable patchset 2018-06-06 (LP: #1775483)
    - drm: bridge: dw-hdmi: Fix overflow workaround for Amlogic Meson GX SoCs
    - i40e: Fix attach VF to VM issue
    - tpm: cmd_ready command can be issued only after granting locality
    - tpm: tpm-interface: fix tpm_transmit/_cmd kdoc
    - tpm: add retry logic
    - Revert "ath10k: send (re)assoc peer command when NSS changed"
    - bonding: do not set slave_dev npinfo before slave_enable_netpoll in
      bond_enslave
    - ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy
    - ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pkts
    - KEYS: DNS: limit the length of option strings
    - l2tp: check sockaddr length in pppol2tp_connect()
    - net: validate attribute sizes in neigh_dump_table()
    - llc: delete timers synchronously in llc_sk_free()
    - tcp: don't read out-of-bounds opsize
    - net: af_packet: fix race in PACKET_{R|T}X_RING
    - tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets
    - net: fix deadlock while clearing neighbor proxy table
    - team: avoid adding twice the same option to the event list
    - net/smc: fix shutdown in state SMC_LISTEN
    - team: fix netconsole setup over team
    - packet: fix bitfield update race
    - tipc: add policy for TIPC_NLA_NET_ADDR
    - pppoe: check sockaddr length in pppoe_connect()
    - vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multi
    - amd-xgbe: Add pre/post auto-negotiation phy hooks
    - sctp: do not check port in sctp_inet6_cmp_addr
    - amd-xgbe: Improve KR auto-negotiation and training
    - strparser: Do not call mod_delayed_work with a timeout of LONG_MAX
    - amd-xgbe: Only use the SFP supported transceiver signals
    - strparser: Fix incorrect strp->need_bytes value.
    - net: sched: ife: signal not finding metaid
    - tcp: clear tp->packets_out when purging write queue
    - net: sched: ife: handle malformed tlv length
    - net: sched: ife: check on metadata length
    - llc: hold llc_sap before release_sock()
    - llc: fix NULL pointer deref for SOCK_ZAPPED
    - net: ethernet: ti: cpsw: fix tx vlan priority mapping
    - virtio_net: split out ctrl buffer
    - virtio_net: fix adding vids on big-endian
    - KVM: s390: force bp isolation for VSIE
    - s390: correct module section names for expoline code revert
    - microblaze: Setup dependencies for ASM optimized lib functions
    - commoncap: Handle memory allocation failure.
    - scsi: mptsas: Disable WRITE SAME
    - cdrom: information leak in cdrom_ioctl_media_changed()
    - m68k/mac: Don't remap SWIM MMIO region
    - block/swim: Check drive type
    - block/swim: Don't log an error message for an invalid ioctl
    - block/swim: Remove extra put_disk() call from error path
    - block/swim: Rename macros to avoid inconsistent inverted logic
    - block/swim: Select appropriate drive on device open
    - block/swim: Fix array bounds check
    - block/swim: Fix IO error at end of medium
    -...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (4.1 KiB)

This bug was fixed in the package linux - 4.15.0-29.31

---------------
linux (4.15.0-29.31) bionic; urgency=medium

  * linux: 4.15.0-29.31 -proposed tracker (LP: #1782173)

  * [SRU Bionic][Cosmic] kernel panic in ipmi_ssif at msg_done_handler
    (LP: #1777716)
    - ipmi_ssif: Fix kernel panic at msg_done_handler

  * Update to ocxl driver for 18.04.1 (LP: #1775786)
    - misc: ocxl: use put_device() instead of device_unregister()
    - powerpc: Add TIDR CPU feature for POWER9
    - powerpc: Use TIDR CPU feature to control TIDR allocation
    - powerpc: use task_pid_nr() for TID allocation
    - ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
    - ocxl: Expose the thread_id needed for wait on POWER9
    - ocxl: Add an IOCTL so userspace knows what OCXL features are available
    - ocxl: Document new OCXL IOCTLs
    - ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()

  * Critical upstream bugfix missing in Ubuntu 18.04 - frequent Xorg crash after
    suspend (LP: #1776887)
    - ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL

  * Hard LOCKUP observed on stressing Ubuntu 18 04 (LP: #1777194)
    - powerpc: use NMI IPI for smp_send_stop
    - powerpc: Fix smp_send_stop NMI IPI handling

  * IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on
    CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n (LP: #1773964)
    - rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops

  * [Regression] EXT4-fs error (device sda2): ext4_validate_block_bitmap:383:
    comm stress-ng: bg 4705: bad block bitmap checksum (LP: #1781709)
    - SAUCE: Revert "UBUNTU: SAUCE: ext4: fix ext4_validate_inode_bitmap: comm
      stress-ng: Corrupt inode bitmap"
    - SAUCE: ext4: check for allocation block validity with block group locked

linux (4.15.0-28.30) bionic; urgency=medium

  * linux: 4.15.0-28.30 -proposed tracker (LP: #1781433)

  * Cannot set MTU higher than 1500 in Xen instance (LP: #1781413)
    - xen-netfront: Fix mismatched rtnl_unlock
    - xen-netfront: Update features after registering netdev

linux (4.15.0-27.29) bionic; urgency=medium

  * linux: 4.15.0-27.29 -proposed tracker (LP: #1781062)

  * [Regression] EXT4-fs error (device sda1): ext4_validate_inode_bitmap:99:
    comm stress-ng: Corrupt inode bitmap (LP: #1780137)
    - SAUCE: ext4: fix ext4_validate_inode_bitmap: comm stress-ng: Corrupt inode
      bitmap

linux (4.15.0-26.28) bionic; urgency=medium

  * linux: 4.15.0-26.28 -proposed tracker (LP: #1780112)

  * failure to boot with linux-image-4.15.0-24-generic (LP: #1779827) // Cloud-
    init causes potentially huge boot delays with 4.15 kernels (LP: #1780062)
    - random: Make getrandom() ready earlier

linux (4.15.0-25.27) bionic; urgency=medium

  * linux: 4.15.0-25.27 -proposed tracker (LP: #1779354)

  * hisi_sas_v3_hw: internal task abort: timeout and not done. (LP: #1777736)
    - scsi: hisi_sas: Update a couple of register settings for v3 hw

  * hisi_sas: Add missing PHY spinlock init (LP: #1777734)
    - scsi: hisi_sas: Add missing PHY spinlock init

  * hisi_sas: improve read performance by pre-allocating slot DMA buffers
    (LP: #1777727)
    - scsi: hisi_sas: use dma_zalloc_cohe...

Read more...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers