Bug #1692538 “Ubuntu 16.04.02: ibmveth: Support to enable LSO/CS...” : Zesty (17.04) : Bugs : linux package : Ubuntu

bugproxy (bugproxy) on 2017-05-22

tags:	added: architecture-ppc64le bugnameltc-154875 severity-critical targetmilestone-inin16042
Changed in ubuntu:
assignee:	nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects:	ubuntu → linux (Ubuntu)

Joseph Salisbury (jsalisbury) on 2017-05-25

tags:

added: kernel-da-key

Manoj Iyer (manjo) on 2017-06-01

tags:

added: ubuntu-16.04

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-06-02:

#1

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status:	New → Confirmed

Frank Heimes (fheimes) on 2017-06-03

Changed in ubuntu-power-systems:
status:	New → Confirmed

Manoj Iyer (manjo) on 2017-07-19

Changed in linux (Ubuntu):
importance:	Undecided → Critical
Changed in ubuntu-power-systems:
importance:	Undecided → Critical
Changed in linux (Ubuntu):
assignee:	Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-07-20:

#2

I built Xenial, Zesty and Artful test kernels with commit 66aa0678efc2. The test kernels can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1692538/

Can you test these kernels and see if they resolve this bug?

Thanks in advance!

Frank Heimes (fheimes) on 2017-07-21

Changed in ubuntu-power-systems:
assignee:	nobody → Canonical Kernel Team (canonical-kernel-team)

Manoj Iyer (manjo) on 2017-07-24

tags:

added: triage-g

Revision history for this message

bugproxy (bugproxy) wrote on 2017-07-25: Comment bridged from LTC Bugzilla

#3

------- Comment From <email address hidden> 2017-07-25 14:52 EDT-------
It looks like the directory had artful and zesty. I didnt see anything in Xenial but Zesty/Artful works.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-07-27:

#4

I had a build failure with Xenial. There are some prereq commits required that I am working on identifying now.

Joseph Salisbury (jsalisbury) on 2017-07-30

Changed in linux (Ubuntu Zesty):
importance:	Undecided → Critical
Changed in linux (Ubuntu Xenial):
importance:	Undecided → Critical
Changed in linux (Ubuntu Zesty):
status:	New → In Progress
Changed in linux (Ubuntu Xenial):
status:	New → In Progress
Changed in linux (Ubuntu Artful):
status:	Confirmed → In Progress
assignee:	Canonical Kernel Team (canonical-kernel-team) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
assignee:	nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Xenial):
assignee:	nobody → Joseph Salisbury (jsalisbury)

Seth Forshee (sforshee) on 2017-07-31

Changed in linux (Ubuntu Artful):
status:	In Progress → Fix Committed

Stefan Bader (smb) on 2017-08-11

Changed in linux (Ubuntu Zesty):
status:	In Progress → Fix Committed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-08-16:

#5

This bug was fixed in the package linux - 4.12.0-11.12

---------------
linux (4.12.0-11.12) artful; urgency=low

* linux: 4.12.0-11.12 -proposed tracker (LP: #1709929)

* CVE-2017-1000111
- packet: fix tp_reserve race in packet_set_ring

* CVE-2017-1000112
- udp: consistently apply ufo or fragmentation

  * Please only recommend or suggest initramfs-tools | linux-initramfs-tool for
    kernels able to boot without initramfs (LP: #1700972)
    - Revert "UBUNTU: [Debian] Don't depend on initramfs-tools"
    - [Debian] Don't depend on initramfs-tools

  * Miscellaneous Ubuntu changes
    - SAUCE: (noup) Update spl to 0.6.5.11-ubuntu1, zfs to 0.6.5.11-1ubuntu3
    - SAUCE: powerpc: Always initialize input array when calling epapr_hypercall()

  * Miscellaneous upstream changes
    - selftests: typo correction for memory-hotplug test
    - selftests: check hot-pluggagble memory for memory-hotplug test
    - selftests: check percentage range for memory-hotplug test
    - selftests: add missing test name in memory-hotplug test
    - selftests: fix memory-hotplug test

-- Seth Forshee <email address hidden> Thu, 10 Aug 2017 13:37:00 -0500

Changed in linux (Ubuntu Artful):
status:	Fix Committed → Fix Released

Revision history for this message

Kleber Sacilotto de Souza (kleber-souza) wrote on 2017-08-16:

#6

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:

added: verification-needed-zesty

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-16:

#7

------- Comment From <email address hidden> 2017-08-16 16:14 EDT-------
I added the tag: verification-done-zesty

tags:

added: verification-done-zesty
removed: verification-needed-zesty

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-08-23:

#8

There is now a Xenial test kernel, which has a backport of commit 66aa0678efc2. The test kernels can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1692538/xenial

Can you test these kernels and see if they resolve this bug?

Thanks in advance!

Revision history for this message

bugproxy (bugproxy) wrote on 2017-08-24:

#9

------- Comment From <email address hidden> 2017-08-24 16:07 EDT-------
Works for us.

tags:

added: verification-done-xenial

Joseph Salisbury (jsalisbury) on 2017-08-25

description:

updated

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-08-28:

#10

Download full text (8.5 KiB)

This bug was fixed in the package linux - 4.10.0-33.37

---------------
linux (4.10.0-33.37) zesty; urgency=low

* linux: 4.10.0-33.37 -proposed tracker (LP: #1709303)

  * CVE-2017-1000112
    - Revert "udp: consistently apply ufo or fragmentation"
    - udp: consistently apply ufo or fragmentation

  * CVE-2017-1000111
    - Revert "net-packet: fix race in packet_set_ring on PACKET_RESERVE"
    - packet: fix tp_reserve race in packet_set_ring

  * ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on
    (LP: #1673564)
    - irqchip/gic-v3: Add missing system register definitions
    - arm64: KVM: Do not use stack-protector to compile EL2 code
    - KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2
      registers
    - KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
    - arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
    - KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
    - KVM: arm64: Make kvm_condition_valid32() accessible from EL2
    - KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
    - KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
    - KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
    - KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
    - KVM: arm64: vgic-v3: Add misc Group-0 handlers
    - KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
    - KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
    - arm64: Add MIDR values for Cavium cn83XX SoCs
    - [Config] CONFIG_CAVIUM_ERRATUM_30115=y
    - arm64: Add workaround for Cavium Thunder erratum 30115
    - KVM: arm64: vgic-v3: Add ICV_DIR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_RPR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_CTLR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_PMR_EL1 handler
    - KVM: arm64: Enable GICv3 common sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Log which GICv3 system registers are trapped
    - arm64: KVM: Make unexpected reads from WO registers inject an undef
    - KVM: arm64: Log an error if trapping a read-from-write-only GICv3 access
    - KVM: arm64: Log an error if trapping a write-to-read-only GICv3 access

  * ibmvscsis: Do not send aborted task response (LP: #1689365)
    - target: Fix unknown fabric callback queue-full errors
    - ibmvscsis: Do not send aborted task response
    - ibmvscsis: Clear left-over abort_cmd pointers
    - ibmvscsis: Fix the incorrect req_lim_delta

  * hisi_sas performance improvements (LP: #1708734)
    - scsi: hisi_sas: define hisi_sas_device.device_id as int
    - scsi: hisi_sas: optimise the usage of hisi_hba.lock
    - scsi: hisi_sas: relocate sata_done_v2_hw()
    - scsi: hisi_sas: optimise DMA slot memory

* hisi_sas...

This bug was fixed in the package linux - 4.10.0-33.37

---------------
linux (4.10.0-33.37) zesty; urgency=low

* linux: 4.10.0-33.37 -proposed tracker (LP: #1709303)

* CVE-2017-1000112
    - Revert "udp: consistently apply ufo or fragmentation"
    - udp: consistently apply ufo or fragmentation

* CVE-2017-1000111
    - Revert "net-packet: fix race in packet_set_ring on PACKET_RESERVE"
    - packet: fix tp_reserve race in packet_set_ring

* ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on
    (LP: #1673564)
    - irqchip/gic-v3: Add missing system register definitions
    - arm64: KVM: Do not use stack-protector to compile EL2 code
    - KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2
      registers
    - KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
    - arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
    - KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
    - KVM: arm64: Make kvm_condition_valid32() accessible from EL2
    - KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
    - KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
    - KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
    - KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
    - KVM: arm64: vgic-v3: Add misc Group-0 handlers
    - KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
    - KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
    - arm64: Add MIDR values for Cavium cn83XX SoCs
    - [Config] CONFIG_CAVIUM_ERRATUM_30115=y
    - arm64: Add workaround for Cavium Thunder erratum 30115
    - KVM: arm64: vgic-v3: Add ICV_DIR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_RPR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_CTLR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_PMR_EL1 handler
    - KVM: arm64: Enable GICv3 common sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Log which GICv3 system registers are trapped
    - arm64: KVM: Make unexpected reads from WO registers inject an undef
    - KVM: arm64: Log an error if trapping a read-from-write-only GICv3 access
    - KVM: arm64: Log an error if trapping a write-to-read-only GICv3 access

* ibmvscsis: Do not send aborted task response (LP: #1689365)
    - target: Fix unknown fabric callback queue-full errors
    - ibmvscsis: Do not send aborted task response
    - ibmvscsis: Clear left-over abort_cmd pointers
    - ibmvscsis: Fix the incorrect req_lim_delta

* hisi_sas performance improvements (LP: #1708734)
    - scsi: hisi_sas: define hisi_sas_device.device_id as int
    - scsi: hisi_sas: optimise the usage of hisi_hba.lock
    - scsi: hisi_sas: relocate sata_done_v2_hw()
    - scsi: hisi_sas: optimise DMA slot memory

* hisi_sas driver reports mistakes timed out task for internal abort
    (LP: #1708730)
    - scsi: hisi_sas: fix timeout check in hisi_sas_internal_task_abort()

* scsi: hisi_sas: add null check before indirect pointer dereference
    (LP: #1708714)
    - scsi: hisi_sas: add null check before indirect pointer dereference

* [LTCTest][Opal][FW860.20] HMI recoverable errors failed to recover and
    system goes to dump state. (LP: #1684054)
    - powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y

* Set CONFIG_SATA_HIGHBANK=y on armhf (LP: #1703430)
    - [Config] CONFIG_SATA_HIGHBANK=y

* Adt tests of src:linux time out often on armhf lxc containers (LP: #1705495)
    - [Packaging] tests -- reduce rebuild test to one flavour

* support Hip07/08 I2C controller (LP: #1708293)
    - ACPI / APD: Add clock frequency for Hisilicon Hip07/08 I2C controller
    - i2c: designware: Add ACPI HID for Hisilicon Hip07/08 I2C controller

* Mute key LED does not work on HP ProBook 440 (LP: #1705586)
    - ALSA: hda - Add HP ZBook 15u G3 Conexant CX20724 GPIO mute leds
    - ALSA: hda - Add mute led support for HP ProBook 440 G4

* Hisilicon D05 onboard fibre NIC link indicator LEDs don't work
    (LP: #1704903)
    - net: hns: add acpi function of xge led control

* zesty unable to handle kernel NULL pointer dereference (LP: #1680904)
    - drm/i915: Do not drop pagetables when empty

* hns: use after free in hns_nic_net_xmit_hw (LP: #1704885)
    - net: hns: Fix a skb used after free bug

* [ARM64] config EDAC_GHES=y depends on EDAC_MM_EDAC=y (LP: #1706141)
    - [Config] set EDAC_MM_EDAC=y for ARM64

* [Hyper-V] hv_netvsc: Exclude non-TCP port numbers from vRSS hashing
    (LP: #1690174)
    - hv_netvsc: Exclude non-TCP port numbers from vRSS hashing

* ath10k doesn't report full RSSI information (LP: #1706531)
    - ath10k: add per chain RSSI reporting

* ideapad_laptop don't support v310-14isk (LP: #1705378)
    - platform/x86: ideapad-laptop: Add several models to no_hw_rfkill

* hns: ethtool selftest crashes system (LP: #1705712)
    - net/hns:bugfix of ethtool -t phy self_test

* ath9k freezes suspend resume Ubuntu 17.04 (LP: #1697027)
    - ath9k: fix an invalid pointer dereference in ath9k_rng_stop()

* xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2
    comp_code 13 (LP: #1667750)
    - xhci: Bad Ethernet performance plugged in ASM1042A host

* Migrating KSM page causes the VM lock up as the KSM page merging list is too
    large (LP: #1680513)
    - ksm: introduce ksm_max_page_sharing per page deduplication limit
    - ksm: fix use after free with merge_across_nodes = 0
    - ksm: cleanup stable_node chain collapse case
    - ksm: swap the two output parameters of chain/chain_prune
    - ksm: optimize refile of stable_node_dup at the head of the chain

* Change CONFIG_IBMVETH to module (LP: #1704479)
    - [Config] CONFIG_IBMVETH=m

* CVE-2017-7487
    - ipx: call ipxitf_put() in ioctl error path

* Hotkeys on new Thinkpad systems aren't working (LP: #1705169)
    - platform/x86: thinkpad_acpi: guard generic hotkey case
    - platform/x86: thinkpad_acpi: add mapping for new hotkeys

* misleading kernel warning skb_warn_bad_offload during checksum calculation
    (LP: #1705447)
    - net: reduce skb_warn_bad_offload() noise

* Ubuntu 16.04.02: ibmveth: Support to enable LSO/CSO for Trunk VEA
    (LP: #1692538)
    - ibmveth: Support to enable LSO/CSO for Trunk VEA.

* bonding: stack dump when unregistering a netdev (LP: #1704102)
    - bonding: avoid NETDEV_CHANGEMTU event when unregistering slave

* Ubuntu 16.04 IOB Error when the Mustang board rebooted (LP: #1693673)
    - drivers: net: xgene: Fix redundant prefetch buffer cleanup

* Ubuntu16.04: NVMe 4K+T10 DIF/DIX format returns I/O error on dd with split
    op (LP: #1689946)
    - blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split
      op

* linux >= 4.2: bonding 802.3ad does not work with 5G, 25G and 50G link speeds
    (LP: #1697892)
    - bonding: add 802.3ad support for 25G speeds
    - bonding: fix 802.3ad support for 5G and 50G speeds

* [SRU][Zesty] arm64: Add support for handling memory corruption
    (LP: #1696852)
    - arm64: mm: Update perf accounting to handle poison faults
    - arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
    - arm64: kconfig: allow support for memory failure handling
    - arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling

* [SRU][Zesty] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64
    (LP: #1696570)
    - acpi: apei: read ack upon ghes record consumption
    - ras: acpi/apei: cper: add support for generic data v3 structure
    - cper: add timestamp print to CPER status printing
    - efi: parse ARM processor error
    - arm64: exception: handle Synchronous External Abort
    - acpi: apei: handle SEA notification type for ARMv8
    - acpi: apei: panic OS with fatal error status block
    - efi: print unrecognized CPER section
    - ras: acpi / apei: generate trace event for unrecognized CPER section
    - trace, ras: add ARM processor error trace event
    - ras: mark stub functions as 'inline'
    - arm/arm64: KVM: add guest SEA support
    - acpi: apei: check for pending errors when probing GHES entries
    - [Config] CONFIG_ACPI_APEI_SEA=y

-- Stefan Bader <stefan.bader@canonical.com>  Fri, 11 Aug 2017 11:40:30 +0200

Changed in linux (Ubuntu Zesty):
status:	Fix Committed → Fix Released

bugproxy (bugproxy) on 2017-09-12

tags:

added: targetmilestone-inin16043
removed: targetmilestone-inin16042

Manoj Iyer (manjo) on 2017-09-18

Changed in ubuntu-power-systems:
status:	Confirmed → In Progress
status:	In Progress → Fix Committed

Andrew Cloke (andrew-cloke) on 2017-11-06

Changed in ubuntu-power-systems:
status:	Fix Committed → In Progress

Revision history for this message

Manoj Iyer (manjo) wrote on 2017-11-06:

#11

Needs testing for Xenial.

Revision history for this message

bugproxy (bugproxy) wrote on 2017-11-13:

#12

------- Comment From <email address hidden> 2017-11-13 09:53 EDT-------
Tested on Xenial, looks good.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2017-11-16:

#13

Download full text (3.2 KiB)

Our support team has encountered a case where ibmveth + openvswitch + bnx2x has
lead to some issues, which IBM should probably be aware of before
turning on large segments in more places.

Here's a summary from support for that issue:

==========

[Issue: we see a firmware assertion from an IBM branded bnx2x card.
Decoding the dump with the help of upstream shows that the assert is
caused by a packet with GSO on and gso_size > ~9700 bytes being passed
to the card. We traced the packets through the system, and came up
with this root cause. The system uses ibmveth to talk to AIX LPARs, a
bnx2x network card to talk to the world, and Open vSwitch to tie them
together. There is no VIOS involvement - the card is attached to the
Linux partition.]

The packets causing the issue come through the ibmveth interface -
from the AIX LPAR. The veth protocol is 'special' - communication
between LPARs on the same chassis can use very large (64k) frames to
reduce overhead. Normal networks cannot handle such large packets, so
traditionally, the VIOS partition would signal to the AIX partitions
that it was 'special', and AIX would send regular, ethernet-sized
packets to VIOS, which VIOS would then send out.

This signalling between VIOS and AIX is done in a way that is not
standards-compliant, and so was never made part of Linux. Instead, the
Linux driver has always understood large frames and passed them up the
network stack.

In some cases (e.g. with TCP), multiple TCP segments are coalesced
into one large packet. In Linux, this goes through the generic receive
offload code, using a similar mechanism to GSO. These segments can be
very large which presents as a very large MSS (maximum segment size)
or gso_size.

Normally, the large packet is simply passed to whatever network
application on Linux is going to consume it, and everything is OK.

However, in this case, the packets go through Open vSwitch, and are
then passed to the bnx2x driver. The bnx2x driver/hardware supports
TSO and GSO, but with a restriction: the maximum segment size is
limited to around 9700 bytes. Normally this is more than adequate as
jumbo frames are limited to 9000 bytes. However, if a large packet
with large (>9700 byte) TCP segments arrives through ibmveth, and is
passed to bnx2x, the hardware will panic.

Turning off TSO prevents the crash as the kernel resegments the data
and assembles the packets in software. This has a performance cost.

Clearly at the very least, bnx2x should not crash in this case, and I
am working towards a patch for that.

However, this still leaves us with some issues. The only thing the
bnx2x driver can sensibly do is drop the packet, which will prevent
the crash. However, there will still be issues with large packets:
when they are dropped, the other side will eventually realise that the
data is missing and ask for a retransmit, but the retransmit might
also be too big - there's no way of signalling back to the AIX LPAR
that it should reduce the MSS. Even if the data eventually gets
through there will be a latency/throughput/performance hit.

==========

Seeing as IBM seems to be in active development in this area - indeed
this code explicitly deals with ibm...