Bug #1723127 “Intel i40e PF reset due to incorrect MDD detection...” : Bugs : linux package : Ubuntu

Dan Streetman (ddstreet) on 2017-10-12

Changed in linux (Ubuntu):
status:	New → In Progress
importance:	Undecided → Medium
assignee:	nobody → Dan Streetman (ddstreet)

Revision history for this message

Dan Streetman (ddstreet) wrote on 2017-10-12:

#1

continuing conversation from previous (fix released) bug.

@bjozet, it would help a lot of you could test with the hwe 4.10 kernel and let me know if that fails also, or if it seems to be fixed there. If it works, I can review the changes and possibly find something, and/or work with you on a bisect.

Revision history for this message

Björn Zettergren (bjozet) wrote on 2017-10-12:

#2

We've been using hwe-edge 4.11 for almost 24 hours without problems. We'll test the regular hwe 4.10 also if you think that narrows the bisect.

Revision history for this message

Dan Streetman (ddstreet) wrote on 2017-10-12:

#3

> We'll test the regular hwe 4.10 also if you think that narrows the bisect.

yes please it will help to look just between 4.4 and 4.10. thanks!

Changed in linux (Ubuntu Xenial):
importance:	Undecided → Medium
status:	New → In Progress
assignee:	nobody → Dan Streetman (ddstreet)

Revision history for this message

Björn Zettergren (bjozet) wrote on 2017-10-13:

#4

As of now, we've been running HWE 4.10 for little more than 16 hours and no problems so far. Previously we'd hit the problem within the hour.

There is however one new logmessage that we haven't seen before, neither with 1.4.x driver or 2.0.x. But it might be unrelated, we can't see any particular performance-issues in any of our monitoring/graphs. And the message is:

TCP: bond0.5: Driver has suspect GRO implementation, TCP performance may be compromised.

How do we proceed? :-)

Revision history for this message

Dan Streetman (ddstreet) wrote on 2017-10-13:

#5

> How do we proceed? :-)

one bug at a time, please. As this NIC's "MDD" behavior doesn't indicate what happened that it disliked, I can't tell if that is related or not to the MDD events, but I suspect not, especially if you have not seen that happen for kernels when you did get MDD events.

since the Ubuntu 4.4.0 isn't an ancestor of the Ubuntu 4.10.0 kernel, to bisect we would need to start at the merge base anyway (mainline 4.4 kernel); and since there are no changes to the i40e driver between mainline 4.10 and Ubuntu 4.10.0, a bisect will be a lot easier if we shift over to the mainline kernel series.

Are you able to test various kernel versions during the bisect process? It may take a while, and it's important to make sure at each step to determine for certain if the kernel is 'good' or 'bad' - an incorrect evaluation at any step leads to an incorrect endpoint.

If you are able to help with a kernel bisect by testing, can you test each of these kernels:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-wily/

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/

I expect the v4.4 to be 'bad' (encounter the MDD event) and 4.10 to be 'good' (no MDD event), based on your evaluation of the Ubuntu kernels based on those versions. If those are good/bad as expected, we can start the bisection between them.

Revision history for this message

Björn Zettergren (bjozet) wrote on 2017-10-18:

#6

redacted_i40e_syslog.txt Edit (6.6 KiB, text/plain)

> one bug at a time, please.

Absolutely! I just mentioned the "GRO implementation" because I wondered if it might have been related. I should have googled up better on it beforehand, that would have enlightened me that it wasn't.

I've tested the v4.4-wily kernel in the first link (4.4.0-040400-generic), and it failed miserably directly after the machine came online. I'm attaching a redacted syslog with relevant messages in it. One thing you'll note is that the i40e driver (1.3.x) complains that the firmware is too new, this might be a problem(?), but there's also a message, just before the "TX driver issue detected":

i40e 0000:02:00.1: FD filter programming failed due to incorrect filter parameters

See the attached file for more details.

We're currently running the second kernel v4.10, (4.10.0-041000-generic), and it's running fine so far, but the machine has only been up for 30 minutes, i'll let it run 24 hours, and report back tomorrow, or as soon as status changes, if at all.

Revision history for this message

Björn Zettergren (bjozet) wrote on 2017-10-19:

#7

kernel v4.10, (4.10.0-041000-generic) has been running fine, without any issues since 24 hours. I'd say it's OK, as you suspected.

Revision history for this message

Dan Streetman (ddstreet) wrote on 2017-10-27:

#8

Sorry for the delay.

So we have 2 options on how to continue debugging here:

1. we can try a traditional git bisect. This would involve testing various kernel builds, to try to eventually narrow down the issue to being fixed by a specific commit. It's a long-ish process, depending on how long testing each build takes, and it's critical that verification of 'good' or 'bad' at each step is correct - otherwise the bisect ends at the wrong commit. Each step will involve me building a new kernel, you test with the kernel until it fails or you've tested long enough to be sure that kernel build is 'good'. With hard-to-reproduce problems like this, bisecting can be tough, because if a build doesn't fail for a long time, that doesn't necessarily mean it's "good", it may just not have failed yet, in which case the bisect will end at the wrong commit, which doesn't help with figuring out how to fix anything.

2. Intel has provided me some undocumented commands that will allow controlling what MDD events the nic triggers on. I can provide those instructions, and you can test with each MDD event bit set individually, until the problem reproduces - then we know exactly which MDD source triggered the event, which should help identify what the driver did to cause the MDD event. This way has a much better chance of finding the specific problem, but the downside is you'll need to run undocumented commands with your hardware. I believe there should not be any risk in doing that since the info came from Intel, but I can't personally verify it, as I don't currently have access to this specific NIC.

If you're willing to try #2, I'll add the specific commands/instructions and you can get started testing. Otherwise if you would prefer not to run the undocumented commands, I can start a kernel bisect.

Revision history for this message

Björn Zettergren (bjozet) wrote on 2017-11-01:

#9

No worries, we're not in a hurry.

I'd say we go with option #2. Please provide information on how to proceed, and how to undo any changes we test :)

Revision history for this message

Dan Streetman (ddstreet) wrote on 2017-11-07:

#10

> I'd say we go with option #2. Please provide information on how to proceed, and how to
> undo any changes we test :)

ok, so first, these instructions may cause the card to hang; the system may need to be rebooted or the driver reloaded. The changes here can be undone by resetting the card; rebooting or reloading the driver.

Also please note these instructions are ONLY FOR i40e NICs!

The process here is to clear all the nic's hardware asserts, and then enable each of them one-by-one and try to reproduce the MDD event. That way, when it reproduces, we know exactly which hw assert triggered it.

First, find your nic's pci address, e.g. ethtool -i NIC | grep bus-info

Then (as root) cd to "/sys/kernel/debug/i40e/BUSID" (replace BUSID with your nic's actual pci addr). You should see a "command" file there.

Now zero out the registers:

$ echo write 0xe648c 0 > command
$ echo write 0x442f4 0 > command

Then, set a single bit; starting with 0x1 on the first register:

$ echo write 0xe648c 0x1 > command

Do normal testing. There are 3 possibilities at this step:

a) you test long enough to be sure the problem was avoided
b) your system and/or nic hangs due to an "uncaught" MDD event
c) you reproduce the problem, and see the TX error and PF reset

For either (a) or (b), that means this bit isn't the one we're looking for, so move to the next bit:

$ echo write 0xe648c 0 > command
$ echo write 0x442f4 0 > command
$ echo write 0xe648c 0x2 > command

Then retest. Replace "0x2" with incrementing bits, as you test each bit. Note this is setting individual bits, so the sequence to test is (in hex) 1, 2, 4, 8, 10, 20, 40, 80, 100, etc. This is a 32 bit register so the highest bit to test is 0x80000000. If you test all bits in register 0xe648c without reproducing the problem, then move on to register 0x442f4 testing bit-by-bit again starting at 0x1 again. You should be able to reproduce the problem with one of the bits set in one of these two registers, according to what I've been told by Intel.

As you set each bit, you should get output in your dmesg and/or syslog or kern.log, indicating the current value of the registers, e.g.:

write: 0xe648c = 0x1

You can also manually read the registers at any time with:

$ echo read 0xe648c > command
$ echo read 0x442f4 > command

you should see the results in dmesg/logs, e.g.:

read: 0xe648c = 0x1

Once/if you do reproduce the problem, make note of the values for both registers (i.e. what bit was set), and report that back here. I'll check with Intel to find what the specific bit indicates the problem was.

Thanks!

> I'd say we go with option #2. Please provide information on how to proceed, and how to
> undo any changes we test :)

ok, so first, these instructions may cause the card to hang; the system may need to be rebooted or the driver reloaded.  The changes here can be undone by resetting the card; rebooting or reloading the driver.

Also please note these instructions are ONLY FOR i40e NICs!

The process here is to clear all the nic's hardware asserts, and then enable each of them one-by-one and try to reproduce the MDD event.  That way, when it reproduces, we know exactly which hw assert triggered it.

First, find your nic's pci address, e.g. ethtool -i NIC | grep bus-info

Then (as root) cd to "/sys/kernel/debug/i40e/BUSID" (replace BUSID with your nic's actual pci addr).  You should see a "command" file there.

Now zero out the registers:

$ echo write 0xe648c 0 > command
$ echo write 0x442f4 0 > command

Then, set a single bit; starting with 0x1 on the first register:

$ echo write 0xe648c 0x1 > command

Do normal testing.  There are 3 possibilities at this step:

a) you test long enough to be sure the problem was avoided
b) your system and/or nic hangs due to an "uncaught" MDD event
c) you reproduce the problem, and see the TX error and PF reset

For either (a) or (b), that means this bit isn't the one we're looking for, so move to the next bit:

$ echo write 0xe648c 0 > command
$ echo write 0x442f4 0 > command
$ echo write 0xe648c 0x2 > command

Then retest.  Replace "0x2" with incrementing bits, as you test each bit.  Note this is setting individual bits, so the sequence to test is (in hex) 1, 2, 4, 8, 10, 20, 40, 80, 100, etc.  This is a 32 bit register so the highest bit to test is 0x80000000.  If you test all bits in register 0xe648c without reproducing the problem, then move on to register 0x442f4 testing bit-by-bit again starting at 0x1 again.  You should be able to reproduce the problem with one of the bits set in one of these two registers, according to what I've been told by Intel.

As you set each bit, you should get output in your dmesg and/or syslog or kern.log, indicating the current value of the registers, e.g.:

write: 0xe648c = 0x1

You can also manually read the registers at any time with:

$ echo read 0xe648c > command
$ echo read 0x442f4 > command

you should see the results in dmesg/logs, e.g.:

read: 0xe648c = 0x1

Once/if you do reproduce the problem, make note of the values for both registers (i.e. what bit was set), and report that back here.  I'll check with Intel to find what the specific bit indicates the problem was.

Thanks!

Revision history for this message

Björn Zettergren (bjozet) wrote on 2017-12-06:

#11

Sorry for the delay, I've not forgotten about this, just been swamped with other things. Will hopefully have time to do the tests next week.

Revision history for this message

Stefan Kooman (stefan-n1) wrote on 2018-01-24:

#12

Hi there. I can confirm this problem still exists in newest kernels and with the latest intel drivers as of today:

Jan 19 16:05:19 osd9 kernel: [511271.581413] i40e 0000:02:00.1: TX driver issue detected, PF reset issued
Jan 19 16:09:08 osd9 kernel: [511500.919380] i40e 0000:02:00.0: TX driver issue detected, PF reset issued

driver: i40e-2.4.3 (and xenial / 4.13 shipped driver: 2.1.14-k)
kernel: 4.13.0-25-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 12:16:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux. Kernel loaded with nopti noibrs noibpb (Meltdown / Spetre mitigation disabled).

We can trigger the issue with high load (benchmarking Ceph cluster with fio: 4 clients, 8 threads, iodepth 256, 100% random write, 64K block size).

Only when we use relatively large block size (64K) do we hit this problem. With 4K blocks we do not hit this issue. We haven't tested large random reads (that test is still to be done).

When using openvswitch port-channel (as we do) with jumbo frames ... this port-channel will not come back online after the reset. rmmod i40e / modprobe i40e does the trick though.

Revision history for this message

Dan Streetman (ddstreet) wrote on 2018-02-21:

#13

Hello,

can anyone still experiencing this on the 4.4 kernel please test with the kernel from this PPA:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1723127
Test kernel version is 4.4.0-112.135+hf1723127v20180206b2

If anyone would like to test with the 4.13 kernel please let me know and I can build it with the recent upstream patch (248de22e638f10bd5bfc7624a357f940f66ba137) that may finally fix this.

Revision history for this message

Dan Streetman (ddstreet) wrote on 2018-03-20:

#14

As mentioned, upstream commit 248de22e638f10bd5bfc7624a357f940f66ba137 ("i40e/i40evf: Account for frags split over multiple descriptors in check linearize") appears to finally fix this. This commit is already included in bionic, but is required in artful and earlier.

In xenial, the commit 5c4654daf2e2f25dfbd7fa572c59937ea6d4198b ("i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8K") is also required.

Changed in linux (Ubuntu Artful):
assignee:	nobody → Dan Streetman (ddstreet)
importance:	Undecided → Medium
status:	New → Incomplete
status:	Incomplete → In Progress
Changed in linux (Ubuntu Bionic):
status:	In Progress → Fix Released
Changed in linux (Ubuntu Trusty):
status:	New → Won't Fix

Dan Streetman (ddstreet) on 2018-03-21

description:

updated

Kleber Sacilotto de Souza (kleber-souza) on 2018-04-03

Changed in linux (Ubuntu Xenial):
status:	In Progress → Fix Committed
Changed in linux (Ubuntu Artful):
status:	In Progress → Fix Committed

Revision history for this message

Brad Figg (brad-figg) wrote on 2018-04-10:

#15

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:	added: verification-needed-xenial
tags:	added: verification-needed-artful

Revision history for this message

Brad Figg (brad-figg) wrote on 2018-04-10:

#16

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message

Dan Streetman (ddstreet) wrote on 2018-04-10:

#17

Due to the nature of this bug, being very difficult to reproduce, real verification could take weeks instead of only days. However, one reporter has been running with a test kernel I built here
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1723127

which is the base 4.4.0-112 kernel plus the two patches from this bug. In their testing, running on 6 weeks now, the problem has not reproduced and they have seen no other issues. Of course, that test kernel doesn't have all the other patches that the -proposed kernel has, but that testing is likely the best verification we can get for this particular bug. I have also asked the same reporter to switch their testing from my test kernel over to the -proposed kernel, and to report any unexpected issues they see. If they do report any regression, I'll communicate that here.

Based on that justification, I'll mark this bug as verified.

tags:

added: verification-done-artful verification-done-xenial
removed: verification-needed-artful verification-needed-xenial

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-04-23:

#18

Download full text (17.7 KiB)

This bug was fixed in the package linux - 4.4.0-121.145

---------------
linux (4.4.0-121.145) xenial; urgency=medium

* linux: 4.4.0-121.145 -proposed tracker (LP: #1763687)

* Ubuntu-4.4.0-120.144 fails to boot on arm64* hardware (LP: #1763644)
- [Config] arm64: disable BPF_JIT_ALWAYS_ON

linux (4.4.0-120.144) xenial; urgency=medium

* linux: 4.4.0-120.144 -proposed tracker (LP: #1761438)

  * intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-
    image-4.13.0-37-generic) (LP: #1759920) // CVE-2017-5715 (Spectre v2 Intel)
    - Revert "x86/mm: Only set IBPB when the new thread cannot ptrace current
      thread"
    - x86/speculation: Use Indirect Branch Prediction Barrier in context switch

  * DKMS driver builds fail with: Cannot use CONFIG_STACK_VALIDATION=y, please
    install libelf-dev, libelf-devel or elfutils-libelf-devel (LP: #1760876)
    - [Packaging] include the retpoline extractor in the headers

  * retpoline hints: primary infrastructure and initial hints (LP: #1758856)
    - [Packaging] retpoline-extract: flag *0xNNN(%reg) branches
    - x86/speculation, objtool: Annotate indirect calls/jumps for objtool
    - x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32bit
    - x86/paravirt, objtool: Annotate indirect calls
    - x86/asm: Stop depending on ptrace.h in alternative.h
    - [Packaging] retpoline -- add safe usage hint support
    - [Packaging] retpoline-check -- only report additions
    - [Packaging] retpoline -- widen indirect call/jmp detection
    - [Packaging] retpoline -- elide %rip relative indirections
    - [Packaging] retpoline -- clear hint information from packages
    - SAUCE: modpost: add discard to non-allocatable whitelist
    - KVM: x86: Make indirect calls in emulator speculation safe
    - KVM: VMX: Make indirect call speculation safe
    - x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
    - SAUCE: early/late -- annotate indirect calls in early/late initialisation
      code
    - SAUCE: vga_set_mode -- avoid jump tables
    - [Config] retpoline -- switch to new format
    - [Packaging] final-checks -- remove check for empty retpoline files

  * Xenial update to 4.4.117 stable release (LP: #1756860)
    - IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH ports
    - PM / devfreq: Propagate error from devfreq_add_device()
    - s390: fix handling of -1 in set{,fs}[gu]id16 syscalls
    - ARM: dts: STi: Add gpio polarity for "hdmi,hpd-gpio" property
    - arm: spear600: Add missing interrupt-parent of rtc
    - arm: spear13xx: Fix dmas cells
    - arm: spear13xx: Fix spics gpio controller's warning
    - ALSA: seq: Fix regression by incorrect ioctl_mutex usages
    - KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(),
      by always inlining iterator helper methods
    - x86/cpu: Change type of x86_cache_size variable to unsigned int
    - drm/radeon: adjust tested variable
    - rtc-opal: Fix handling of firmware error codes, prevent busy loops
    - ext4: save error to disk in __ext4_grp_locked_error()
    - ext4: correct documentation for grpid mount option
    - mm: hide a #warning fo...

This bug was fixed in the package linux - 4.4.0-121.145

---------------
linux (4.4.0-121.145) xenial; urgency=medium

* linux: 4.4.0-121.145 -proposed tracker (LP: #1763687)

* Ubuntu-4.4.0-120.144 fails to boot on arm64* hardware (LP: #1763644)
    - [Config] arm64: disable BPF_JIT_ALWAYS_ON

linux (4.4.0-120.144) xenial; urgency=medium

* linux: 4.4.0-120.144 -proposed tracker (LP: #1761438)

* intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-
    image-4.13.0-37-generic) (LP: #1759920) // CVE-2017-5715 (Spectre v2 Intel)
    - Revert "x86/mm: Only set IBPB when the new thread cannot ptrace current
      thread"
    - x86/speculation: Use Indirect Branch Prediction Barrier in context switch

* DKMS driver builds fail with: Cannot use CONFIG_STACK_VALIDATION=y, please
    install libelf-dev, libelf-devel or elfutils-libelf-devel (LP: #1760876)
    - [Packaging] include the retpoline extractor in the headers

* retpoline hints: primary infrastructure and initial hints (LP: #1758856)
    - [Packaging] retpoline-extract: flag *0xNNN(%reg) branches
    - x86/speculation, objtool: Annotate indirect calls/jumps for objtool
    - x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32bit
    - x86/paravirt, objtool: Annotate indirect calls
    - x86/asm: Stop depending on ptrace.h in alternative.h
    - [Packaging] retpoline -- add safe usage hint support
    - [Packaging] retpoline-check -- only report additions
    - [Packaging] retpoline -- widen indirect call/jmp detection
    - [Packaging] retpoline -- elide %rip relative indirections
    - [Packaging] retpoline -- clear hint information from packages
    - SAUCE: modpost: add discard to non-allocatable whitelist
    - KVM: x86: Make indirect calls in emulator speculation safe
    - KVM: VMX: Make indirect call speculation safe
    - x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
    - SAUCE: early/late -- annotate indirect calls in early/late initialisation
      code
    - SAUCE: vga_set_mode -- avoid jump tables
    - [Config] retpoline -- switch to new format
    - [Packaging] final-checks -- remove check for empty retpoline files

* Xenial update to 4.4.117 stable release (LP: #1756860)
    - IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH ports
    - PM / devfreq: Propagate error from devfreq_add_device()
    - s390: fix handling of -1 in set{,fs}[gu]id16 syscalls
    - ARM: dts: STi: Add gpio polarity for "hdmi,hpd-gpio" property
    - arm: spear600: Add missing interrupt-parent of rtc
    - arm: spear13xx: Fix dmas cells
    - arm: spear13xx: Fix spics gpio controller's warning
    - ALSA: seq: Fix regression by incorrect ioctl_mutex usages
    - KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(),
      by always inlining iterator helper methods
    - x86/cpu: Change type of x86_cache_size variable to unsigned int
    - drm/radeon: adjust tested variable
    - rtc-opal: Fix handling of firmware error codes, prevent busy loops
    - ext4: save error to disk in __ext4_grp_locked_error()
    - ext4: correct documentation for grpid mount option
    - mm: hide a #warning for COMPILE_TEST
    - video: fbdev: atmel_lcdfb: fix display-timings lookup
    - console/dummy: leave .con_font_get set to NULL
    - rtlwifi: rtl8821ae: Fix connection lost problem correctly
    - Btrfs: fix deadlock in run_delalloc_nocow
    - Btrfs: fix crash due to not cleaning up tree log block's dirty bits
    - Btrfs: fix unexpected -EEXIST when creating new inode
    - ALSA: hda - Fix headset mic detection problem for two Dell machines
    - ALSA: usb-audio: Fix UAC2 get_ctl request with a RANGE attribute
    - ALSA: hda/realtek: PCI quirk for Fujitsu U7x7
    - ALSA: usb-audio: add implicit fb quirk for Behringer UFX1204
    - ALSA: seq: Fix racy pool initializations
    - mvpp2: fix multicast address filter
    - dm: correctly handle chained bios in dec_pending()
    - x86: fix build warnign with 32-bit PAE
    - vfs: don't do RCU lookup of empty pathnames
    - ARM: pxa/tosa-bt: add MODULE_LICENSE tag
    - ARM: dts: s5pv210: add interrupt-parent for ohci
    - media: r820t: fix r820t_write_reg for KASAN
    - Linux 4.4.117

* zfs system process hung on container stop/delete (LP: #1754584)
    - SAUCE: (noup) zfs to 0.6.5.6-0ubuntu19
    - SAUCE: Fix non-prefaulted page deadlock (LP: #1754584)

* apparmor: fix bad __initdata tagging on, apparmor_initialized (LP: #1758471)
    - SAUCE: apparmor: fix bad __initdata tagging on, apparmor_initialized

* Xenial update to 4.4.116 stable release (LP: #1756121)
    - powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
    - powerpc/64: Fix flush_(d|i)cache_range() called from modules
    - powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
    - powerpc: Simplify module TOC handling
    - ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
    - usbip: vhci_hcd: clear just the USB_PORT_STAT_POWER bit
    - usbip: fix 3eee23c3ec14 tcp_socket address still in the status file
    - net: cdc_ncm: initialize drvflags before usage
    - ASoC: simple-card: Fix misleading error message
    - ASoC: rsnd: don't call free_irq() on Parent SSI
    - ASoC: rsnd: avoid duplicate free_irq()
    - drm: rcar-du: Use the VBK interrupt for vblank events
    - drm: rcar-du: Fix race condition when disabling planes at CRTC stop
    - x86/asm: Fix inline asm call constraints for GCC 4.4
    - ip6mr: fix stale iterator
    - net: igmp: add a missing rcu locking section
    - qlcnic: fix deadlock bug
    - r8169: fix RTL8168EP take too long to complete driver initialization.
    - tcp: release sk_frag.page in tcp_disconnect
    - vhost_net: stop device during reset owner
    - media: soc_camera: soc_scale_crop: add missing
      MODULE_DESCRIPTION/AUTHOR/LICENSE
    - KEYS: encrypted: fix buffer overread in valid_master_desc()
    - don't put symlink bodies in pagecache into highmem
    - crypto: tcrypt - fix S/G table for test_aead_speed()
    - x86/microcode: Do the family check first
    - powerpc/pseries: include linux/types.h in asm/hvcall.h
    - cifs: Fix missing put_xid in cifs_file_strict_mmap
    - cifs: Fix autonegotiate security settings mismatch
    - CIFS: zero sensitive data when freeing
    - dmaengine: dmatest: fix container_of member in dmatest_callback
    - x86/kaiser: fix build error with KASAN && !FUNCTION_GRAPH_TRACER
    - kaiser: fix compile error without vsyscall
    - netfilter: nf_queue: Make the queue_handler pernet
    - posix-timer: Properly check sigevent->sigev_notify
    - usb: gadget: uvc: Missing files for configfs interface
    - sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()
    - sched/rt: Up the root domain ref count when passing it around via IPIs
    - media: dvb-usb-v2: lmedm04: Improve logic checking of warm start
    - media: dvb-usb-v2: lmedm04: move ts2020 attach to dm04_lme2510_tuner
    - mtd: cfi: convert inline functions to macros
    - mtd: nand: brcmnand: Disable prefetch by default
    - mtd: nand: Fix nand_do_read_oob() return value
    - mtd: nand: sunxi: Fix ECC strength choice
    - ubi: block: Fix locking for idr_alloc/idr_remove
    - nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds
    - NFS: Add a cond_resched() to nfs_commit_release_pages()
    - NFS: commit direct writes even if they fail partially
    - NFS: reject request for id_legacy key without auxdata
    - kernfs: fix regression in kernfs_fop_write caused by wrong type
    - ahci: Annotate PCI ids for mobile Intel chipsets as such
    - ahci: Add PCI ids for Intel Bay Trail, Cherry Trail and Apollo Lake AHCI
    - ahci: Add Intel Cannon Lake PCH-H PCI ID
    - crypto: hash - introduce crypto_hash_alg_has_setkey()
    - crypto: cryptd - pass through absence of ->setkey()
    - crypto: poly1305 - remove ->setkey() method
    - nsfs: mark dentry with DCACHE_RCUACCESS
    - media: v4l2-ioctl.c: don't copy back the result for -ENOTTY
    - vb2: V4L2_BUF_FLAG_DONE is set after DQBUF
    - media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF
    - media: v4l2-compat-ioctl32.c: fix the indentation
    - media: v4l2-compat-ioctl32.c: move 'helper' functions to
      __get/put_v4l2_format32
    - media: v4l2-compat-ioctl32.c: avoid sizeof(type)
    - media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32
    - media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer
    - media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs
    - media: v4l2-compat-ioctl32: Copy v4l2_window->global_alpha
    - media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32
    - media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type
    - media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors
    - media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic
    - crypto: caam - fix endless loop when DECO acquire fails
    - arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
    - KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
    - watchdog: imx2_wdt: restore previous timeout after suspend+resume
    - media: ts2020: avoid integer overflows on 32 bit machines
    - media: cxusb, dib0700: ignore XC2028_I2C_FLUSH
    - kernel/async.c: revert "async: simplify lowest_in_progress()"
    - HID: quirks: Fix keyboard + touchpad on Toshiba Click Mini not working
    - Bluetooth: btsdio: Do not bind to non-removable BCM43341
    - Bluetooth: btusb: Restore QCA Rome suspend/resume fix with a "rewritten"
      version
    - signal/openrisc: Fix do_unaligned_access to send the proper signal
    - signal/sh: Ensure si_signo is initialized in do_divide_error
    - alpha: fix crash if pthread_create races with signal delivery
    - alpha: fix reboot on Avanti platform
    - xtensa: fix futex_atomic_cmpxchg_inatomic
    - EDAC, octeon: Fix an uninitialized variable warning
    - pktcdvd: Fix pkt_setup_dev() error path
    - btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker
    - ACPI: sbshc: remove raw pointer from printk() message
    - ovl: fix failure to fsync lower dir
    - mn10300/misalignment: Use SIGSEGV SEGV_MAPERR to report a failed user copy
    - ftrace: Remove incorrect setting of glob search field
    - Linux 4.4.116

* Xenial update to 4.4.116 stable release (LP: #1756121) // CVE-2017-5754
    - Revert "UBUNTU: SAUCE: UBUNTU: [Config] Disable CONFIG_PPC_DEBUG_RFI"
    - Revert "UBUNTU: SAUCE: rfi-flush: Fix some RFI conversions in the KVM code"
    - Revert "UBUNTU: SAUCE: rfi-flush: Fix the 32-bit KVM build"
    - Revert "UBUNTU: SAUCE: rfi-flush: Fallback flush add load dependency"
    - Revert "UBUNTU: SAUCE: rfi-flush: Use rfi-flush in printks"
    - Revert "UBUNTU: SAUCE: rfi-flush: Add no_rfi_flush and nopti comandline
      options"
    - Revert "UBUNTU: SAUCE: rfi-flush: Refactor the macros so the nops are
      defined once"
    - Revert "UBUNTU: SAUCE: rfi-flush: Fix HRFI_TO_UNKNOWN"
    - Revert "UBUNTU: SAUCE: rfi-flush: Fix the fallback flush to actually
      activate"
    - Revert "UBUNTU: SAUCE: rfi-flush: Rework pseries logic to be more cautious"
    - Revert "UBUNTU: SAUCE: rfi-flush: Rework powernv logic to be more cautious"
    - Revert "UBUNTU: SAUCE: rfi-flush: Add barriers to the fallback L1D flushing"
    - Revert "UBUNTU: SAUCE: Fix compilation errors for arch/powerpc/lib/feature-
      fixups.c"
    - Revert "UBUNTU: SAUCE: Remove setup.h include file otherwise compilation
      complains about missing header file."
    - Revert "UBUNTU: SAUCE: powerpc/asm: Allow including ppc_asm.h in asm files"
    - Revert "UBUNTU: SAUCE: rfi-flush: Add speculation barrier before ori 30,30,0
      flush"
    - Revert "UBUNTU: SAUCE: rfi-flush: Allow HV to advertise multiple flush
      types"
    - Revert "UBUNTU: SAUCE: rfi-flush: Support more than one flush type at once"
    - Revert "UBUNTU: SAUCE: rfi-flush: Expand the RFI section to two nop slots"
    - Revert "UBUNTU: SAUCE: rfi-flush: Push the instruction selection down to the
      patching routine"
    - Revert "UBUNTU: SAUCE: rfi-flush: Make l1d_flush_type bit flags"
    - Revert "UBUNTU: SAUCE: rfi-flush: Implement congruence-first fallback flush"
    - Revert "UBUNTU: SAUCE: KVM: Revert the implementation of
      H_GET_CPU_CHARACTERISTICS"
    - Revert "UBUNTU: SAUCE: rfi-flush: kvmppc_skip_(H)interrupt returns to host"
    - Revert "UBUNTU: SAUCE: Fixup rfid in kvmppc_skip_Hinterrupt should be hrfid"
    - Revert "UBUNTU: SAUCE: rfi-flush: Add HRFI_TO_UNKNOWN and use it in denorm"
    - Revert "UBUNTU: SAUCE: rfi-flush: Make DEBUG_RFI a CONFIG option"
    - Revert "UBUNTU: SAUCE: powerpc: Secure memory rfi flush"
    - powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
    - powerpc/64: Add macros for annotating the destination of rfid/hrfid
    - powerpc/64s: Simple RFI macro conversions
    - powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
    - powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
    - powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
    - powerpc/64s: Add support for RFI flush of L1-D cache
    - powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
    - powerpc/pseries: Query hypervisor for RFI flush settings
    - powerpc/powernv: Check device-tree for RFI flush settings
    - powerpc/64s: Wire up cpu_show_meltdown()
    - powerpc/64s: Allow control of RFI flush via debugfs

* Intel i40e PF reset due to incorrect MDD detection (continues...)
    (LP: #1723127)
    - i40e/i40evf: Account for frags split over multiple descriptors in check
      linearize
    - i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8K

* Xenial update to 4.4.115 stable release (LP: #1755509)
    - x86: bpf_jit: small optimization in emit_bpf_tail_call()
    - bpf: fix bpf_tail_call() x64 JIT
    - [Config] CONFIG_BPF_JIT_ALWAYS_ON=y
    - bpf: introduce BPF_JIT_ALWAYS_ON config
    - bpf: arsh is not supported in 32 bit alu thus reject it
    - bpf: avoid false sharing of map refcount with max_entries
    - bpf: fix divides by zero
    - bpf: fix 32-bit divide by zero
    - bpf: reject stores into ctx via st and xadd
    - x86/pti: Make unpoison of pgd for trusted boot work for real
    - kaiser: fix intel_bts perf crashes
    - ALSA: seq: Make ioctls race-free
    - crypto: aesni - handle zero length dst buffer
    - crypto: af_alg - whitelist mask and type
    - power: reset: zx-reboot: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
    - gpio: iop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
    - gpio: ath79: add missing MODULE_DESCRIPTION/LICENSE
    - mtd: nand: denali_pci: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
    - igb: Free IRQs when device is hotplugged
    - KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure
    - KVM: x86: Don't re-execute instruction when not passing CR2 value
    - KVM: X86: Fix operand/address-size during instruction decoding
    - KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
    - KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
    - KVM: x86: ioapic: Preserve read-only values in the redirection table
    - ACPI / bus: Leave modalias empty for devices which are not present
    - cpufreq: Add Loongson machine dependencies
    - bcache: check return value of register_shrinker
    - drm/amdgpu: Fix SDMA load/unload sequence on HWS disabled mode
    - drm/amdkfd: Fix SDMA ring buffer size calculation
    - drm/amdkfd: Fix SDMA oversubsription handling
    - openvswitch: fix the incorrect flow action alloc size
    - mac80211: fix the update of path metric for RANN frame
    - btrfs: fix deadlock when writing out space cache
    - KVM: VMX: Fix rflags cache during vCPU reset
    - xen-netfront: remove warning when unloading module
    - nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)
    - nfsd: Ensure we check stateid validity in the seqid operation checks
    - grace: replace BUG_ON by WARN_ONCE in exit_net hook
    - nfsd: check for use of the closed special stateid
    - lockd: fix "list_add double add" caused by legacy signal interface
    - hwmon: (pmbus) Use 64bit math for DIRECT format values
    - powerpc/ppc64el -- Remove ll_temac module from 64-bit builds
    - net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit
    - quota: Check for register_shrinker() failure.
    - SUNRPC: Allow connect to return EHOSTUNREACH
    - kmemleak: add scheduling point to kmemleak_scan()
    - drm/omap: Fix error handling path in 'omap_dmm_probe()'
    - xfs: ubsan fixes
    - scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path
    - scsi: ufs: ufshcd: fix potential NULL pointer dereference in
      ufshcd_config_vreg
    - media: usbtv: add a new usbid
    - usb: gadget: don't dereference g until after it has been null checked
    - staging: rtl8188eu: Fix incorrect response to SIOCGIWESSID
    - usb: option: Add support for FS040U modem
    - USB: serial: pl2303: new device id for Chilitag
    - USB: cdc-acm: Do not log urb submission errors on disconnect
    - CDC-ACM: apply quirk for card reader
    - USB: serial: io_edgeport: fix possible sleep-in-atomic
    - usbip: prevent bind loops on devices attached to vhci_hcd
    - usbip: list: don't list devices attached to vhci_hcd
    - USB: serial: simple: add Motorola Tetra driver
    - usb: f_fs: Prevent gadget unbind if it is already unbound
    - usb: uas: unconditionally bring back host after reset
    - selinux: general protection fault in sock_has_perm
    - serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS
    - spi: imx: do not access registers while clocks disabled
    - Linux 4.4.115

* retpoline: ignore %cs:0xNNN constant indirections (LP: #1752655)
    - [Packaging] retpoline -- elide %cs:0xNNNN constants on i386

-- Kleber Sacilotto de Souza <kleber.souza@canonical.com>  Fri, 13 Apr 2018 14:42:14 +0200

Changed in linux (Ubuntu Xenial):
status:	Fix Committed → Fix Released
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-04-23:

#20

Download full text (5.6 KiB)

This bug was fixed in the package linux - 4.13.0-39.44

---------------
linux (4.13.0-39.44) artful; urgency=medium

* linux: 4.13.0-39.44 -proposed tracker (LP: #1761456)

  * intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-
    image-4.13.0-37-generic) (LP: #1759920) // CVE-2017-5715 (Spectre v2
    Intel) // CVE-2017-5754
    - x86/mm: Reinitialize TLB state on hotplug and resume

  * intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-
    image-4.13.0-37-generic) (LP: #1759920) // CVE-2017-5715 (Spectre v2 Intel)
    - Revert "x86/mm: Only set IBPB when the new thread cannot ptrace current
      thread"
    - x86/speculation: Use Indirect Branch Prediction Barrier in context switch

  * DKMS driver builds fail with: Cannot use CONFIG_STACK_VALIDATION=y, please
    install libelf-dev, libelf-devel or elfutils-libelf-devel (LP: #1760876)
    - [Packaging] include the retpoline extractor in the headers

  * retpoline hints: primary infrastructure and initial hints (LP: #1758856)
    - [Packaging] retpoline-extract: flag *0xNNN(%reg) branches
    - x86/speculation, objtool: Annotate indirect calls/jumps for objtool
    - x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32bit
    - x86/paravirt, objtool: Annotate indirect calls
    - [Packaging] retpoline -- add safe usage hint support
    - [Packaging] retpoline-check -- only report additions
    - [Packaging] retpoline -- widen indirect call/jmp detection
    - [Packaging] retpoline -- elide %rip relative indirections
    - [Packaging] retpoline -- clear hint information from packages
    - KVM: x86: Make indirect calls in emulator speculation safe
    - KVM: VMX: Make indirect call speculation safe
    - x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
    - SAUCE: early/late -- annotate indirect calls in early/late initialisation
      code
    - SAUCE: vga_set_mode -- avoid jump tables
    - [Config] retpoline -- switch to new format
    - [Packaging] retpoline hints -- handle missing files when RETPOLINE not
      enabled
    - [Packaging] final-checks -- remove check for empty retpoline files

* retpoline: ignore %cs:0xNNN constant indirections (LP: #1752655)
- [Packaging] retpoline -- elide %cs:0xNNNN constants on i386

* zfs system process hung on container stop/delete (LP: #1754584)
- SAUCE: Fix non-prefaulted page deadlock (LP: #1754584)

  * zfs-linux 0.6.5.11-1ubuntu5 ADT test failure with linux 4.15.0-1.2
    (LP: #1737761)
    - SAUCE: (noup) Update zfs to 0.6.5.11-1ubuntu3.2

  * AT_BASE_PLATFORM in AUXV is absent on kernels available on Ubuntu 17.10
    (LP: #1759312)
    - powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features

  * btrfs and tar sparse truncate archives (LP: #1757565)
    - Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
    - Btrfs: fix reported number of inode blocks after buffered append writes

* efifb broken on ThunderX-based Gigabyte nodes (LP: #1758375)
- drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it

  * Intel i40e PF reset due to incorrect MDD detection (continues...)
    (LP: #1723127)
    - i40e/i40ev...

Ubuntu
linux package

Intel i40e PF reset due to incorrect MDD detection (continues...)

Bug Description

CVE References

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
linux (Ubuntu)	Fix Released	Medium	Dan Streetman
Trusty	Won't Fix	Undecided	Unassigned
Xenial	Fix Released	Medium	Dan Streetman
Artful	Fix Released	Medium	Dan Streetman
Bionic	Fix Released	Medium	Dan Streetman

Ubuntulinux package

Intel i40e PF reset due to incorrect MDD detection (continues...)

Bug Description

CVE References

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package