e1000e msix interrupts broken in linux-image-4.15.0-15-generic

Bug #1764892 reported by Zheng Cui on 2018-04-17
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Joseph Salisbury
Bionic
Medium
Joseph Salisbury

Bug Description

== SRU Justification ==
Linux kernel 4.15 has introduced a bug in e1000e msix interrupt drivers,
which violates the e1000e specification. Specifically, the driver
configures auto-clearing of the "OTHER" interrupt types, and the "OTHER" interrupt
 handler expects to see an uncleared interrupt source for the "OTHER" types;
 consequently, the link state change interrupts are not identified by the driver,
 and thus the virtual E1000e device doesn't function correctly inside VMware VMs.

This patch Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")

== Fix ==
745d0bd3af99 ("e1000e: Remove Other from EIAC")

== Regression Potential ==
Low. Fixes an existing regression and limited to e1000e driver.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

== Original Bug Description ==
Hi Ubuntu folks,

Linux kernel 4.15 has introduced a bug in e1000e msix interrupt drivers, which violates the e1000e specification. Specifically, the driver configures auto-clearing of the "OTHER" interrupt types, and the "OTHER" interrupt handler expects to see an uncleared interrupt source for the "OTHER" types; consequently, the link state change interrupts are not identified by the driver, and thus the virtual E1000e device doesn't function correctly inside VMware VMs.

I have verified that Linux kernel 4.16.2 has fixed the issues and our on-perm QE has verified 4.16-RC functions correctly inside VMware VMs.

Could you please crossport the fix from linux-4.16 into Ubuntu 18.04 that would be frozen in 2 days? Here are the change history:

https://lkml.org/lkml/2018/3/25/248

Benjamin Poirier (7):
e1000e: Remove Other from EIAC
Partial revert "e1000e: Avoid receiver overrun interrupt bursts"
e1000e: Fix queue interrupt re-raising in Other interrupt
e1000e: Avoid missed interrupts following ICR read
e1000e: Fix check_for_link return value with autoneg off
Revert "e1000e: Separate signaling for link check/link up"
e1000e: Fix link check race condition

Thanks,

-zheng

CVE References

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1764892

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Zheng Cui (egdirf) on 2018-04-18
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Bionic):
status: Confirmed → Triaged
tags: added: bionic kernel-da-key
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 745d0bd3af99ccc8c5f5822f808cd133eadad6ac. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1764892

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

Zheng Cui (egdirf) wrote :

Thanks Joseph!

I have been having our QE to verify the kernel you provided. I would get back to you once I am informed of the results.

Thanks again,

-zheng

Zheng Cui (egdirf) wrote :

Hi Joseph,

Our QE has verified the sandbox works well. Please see below

</VMware

I've verified the kernel with the following steps:

- Configure only with e1000e driver and reboot 3 times
- Configure with both e1000e driver and vmxnet3 driver and reboot 3 times
- Configure only with vmxnet3 driver and reboot 3 times
- Reconfigure the driver from vmxnet3 to e1000e and reboot 3 times

After each reboot, check output of "ifconfig" and everything goes well.

VMware\>

Thanks,

-zheng

Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Zheng Cui (egdirf) wrote :

Hi Joseph,

Would update #5 be sufficient for the verification? If so, we should be able to update the tag to verification-done-bionic. Otherwise, let me know what we should do next.

Thanks,

-zheng

Zheng Cui (egdirf) wrote :

I have been posting the verification request to our QE team, and would update here once I heard of anything.

Zheng Cui (egdirf) wrote :

Hi,

Our team has verified Ubuntu 18.04 + "proposed" solve the problem and as far as I am concerned, we could mark the tag as "verification-done-bionic":

</VMware

Hi Zheng,

Similar to update #44, I have verified that the bug has been fixed in Ubuntu 18.04 + "proposed" with the following steps :

- Configure only with e1000e driver and reboot 3 times
- Configure with both e1000e driver and vmxnet3 driver and reboot 3 times
- Configure only with vmxnet3 driver and reboot 3 times
- Reconfigure the driver from vmxnet3 to e1000e and reboot 3 times

After each reboot, check output of "ifconfig" and everything goes well.

VMware\>

Thanks,

-zheng

tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (11.4 KiB)

This bug was fixed in the package linux - 4.15.0-23.25

---------------
linux (4.15.0-23.25) bionic; urgency=medium

  * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927)

  * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630)
    - arm64: mmu: add the entry trampolines start/end section markers into
      sections.h
    - arm64: sdei: Add trampoline code for remapping the kernel

  * Some PCIe errors not surfaced through rasdaemon (LP: #1769730)
    - ACPI: APEI: handle PCIe AER errors in separate function
    - ACPI: APEI: call into AER handling regardless of severity

  * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003)
    - scsi: qla2xxx: Fix session cleanup for N2N
    - scsi: qla2xxx: Remove unused argument from qlt_schedule_sess_for_deletion()
    - scsi: qla2xxx: Serialize session deletion by using work_lock
    - scsi: qla2xxx: Serialize session free in qlt_free_session_done
    - scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled.
    - scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
    - scsi: qla2xxx: Prevent relogin trigger from sending too many commands
    - scsi: qla2xxx: Fix double free bug after firmware timeout
    - scsi: qla2xxx: Fixup locking for session deletion

  * Several hisi_sas bug fixes (LP: #1768974)
    - scsi: hisi_sas: dt-bindings: add an property of signal attenuation
    - scsi: hisi_sas: support the property of signal attenuation for v2 hw
    - scsi: hisi_sas: fix the issue of link rate inconsistency
    - scsi: hisi_sas: fix the issue of setting linkrate register
    - scsi: hisi_sas: increase timer expire of internal abort task
    - scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req
    - scsi: hisi_sas: fix return value of hisi_sas_task_prep()
    - scsi: hisi_sas: Code cleanup and minor bug fixes

  * [bionic] machine stuck and bonding not working well when nvmet_rdma module
    is loaded (LP: #1764982)
    - nvmet-rdma: Don't flush system_wq by default during remove_one
    - nvme-rdma: Don't flush delete_wq by default during remove_one

  * Warnings/hang during error handling of SATA disks on SAS controller
    (LP: #1768971)
    - scsi: libsas: defer ata device eh commands to libata

  * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948)
    - ata: do not schedule hot plug if it is a sas host

  * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU
    ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927)
    - powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
    - powerpc/64s: return more carefully from sreset NMI
    - powerpc/64s: sreset panic if there is no debugger or crash dump handlers

  * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564)
    - fsnotify: Fix fsnotify_mark_connector race

  * Hang on network interface removal in Xen virtual machine (LP: #1771620)
    - xen-netfront: Fix hang on device removal

  * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977)
    - net: hns: Avoid action name truncation

  * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849)
    - SAUCE: powerpc/perf: Fix memory allocation for...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers