Unable to handle kernel NULL pointer dereference at isci_task_abort_task

Bug #1726519 reported by Brendan on 2017-10-23
88
This bug affects 14 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Joseph Salisbury
Artful
High
Joseph Salisbury
Bionic
High
Joseph Salisbury

Bug Description

So I just upgrade from zesty zapus to artful aardvark. At boot, right after I enter my drive encryption password, it kernel panics with the above message.

It doesn't even get far enough along in the boot process for syslog to log this panic, so the only info I have is a photo of the panic.

In short, I can't boot using the latest artful aardvark kernel, and I have to boot with the latest zesty zapus kernel.

:~$ lsb_release -rd
Description: Ubuntu 17.10
Release: 17.10

linux-image-4.13.0-16-generic

I expect this isn't normal and I should be able to boot with the new kernel, which I can't.

ProblemType: Bug
DistroRelease: Ubuntu 17.10
Package: linux-image-4.13.0-16-generic 4.13.0-16.19
ProcVersionSignature: Ubuntu 4.10.0-37.41-generic 4.10.17
Uname: Linux 4.10.0-37-generic x86_64
NonfreeKernelModules: nvidia_uvm nvidia_drm nvidia_modeset nvidia
ApportVersion: 2.20.7-0ubuntu3
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: lost 2071 F.... pulseaudio
 /dev/snd/controlC2: lost 2071 F.... pulseaudio
 /dev/snd/controlC0: lost 2071 F.... pulseaudio
CurrentDesktop: XFCE
Date: Mon Oct 23 12:52:20 2017
HibernationDevice: RESUME=UUID=cf59c168-54d0-45b9-b633-240bd76bbaa6
InstallationDate: Installed on 2016-11-01 (355 days ago)
InstallationMedia: Xubuntu 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
MachineType: LENOVO 11361Q0
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.10.0-37-generic root=/dev/mapper/xubuntu--vg-root ro quiet
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-37-generic N/A
 linux-backports-modules-4.10.0-37-generic N/A
 linux-firmware 1.169
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to artful on 2017-10-23 (0 days ago)
dmi.bios.date: 09/29/2016
dmi.bios.vendor: LENOVO
dmi.bios.version: A3KT57AUS
dmi.board.name: LENOVO
dmi.board.vendor: LENOVO
dmi.board.version: NO DPK
dmi.chassis.asset.tag: 573921
dmi.chassis.type: 7
dmi.chassis.vendor: LENOVO
dmi.chassis.version: NONE
dmi.modalias: dmi:bvnLENOVO:bvrA3KT57AUS:bd09/29/2016:svnLENOVO:pn11361Q0:pvrThinkStationC30:rvnLENOVO:rnLENOVO:rvrNODPK:cvnLENOVO:ct7:cvrNONE:
dmi.product.name: 11361Q0
dmi.product.version: ThinkStation C30
dmi.sys.vendor: LENOVO

CVE References

Brendan (lostincynicism) wrote :
Brendan (lostincynicism) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Brendan (lostincynicism) wrote :
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.14 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc6

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: performing-bisect
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Brendan (lostincynicism) wrote :

So I installed the mainline kernel, and the same issue happened. I couldn't get a picture as it was truncated due to resolution being lowered during boot.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a bisect to figure out what commit caused this regression. We need to identify the earliest kernel where the issue started happening as well as the latest kernel that did not have this issue.

Can you test the following kernels and report back? We are looking for the first kernel version that exhibits this bug:

v4.11 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/
v4.12 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/
v4.13-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc1/

You don't have to test every kernel, just up until the kernel that first has this bug.

Thanks in advance!

Brendan (lostincynicism) wrote :

Will do. This is my work computer so it may take me a day or so to complete this.

Brendan (lostincynicism) wrote :

I lied. Just was able to get this testing in.

4.11 - Works
4.12 - Failed to boot with the same panic
4.13 - Didn't test since 4.12 failed

Brendan (lostincynicism) wrote :

Same panic with 4.12-rc1

Changed in linux (Ubuntu Artful):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: Confirmed → In Progress
Changed in linux (Ubuntu Artful):
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v4.11 final and v4.12-rc1. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
221656e7c4ce342b99c31eca96c1cbb6d1dce45f

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1726519

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Brendan (lostincynicism) wrote :

Linux lawl 4.11.0-041100-generic #201710251811 SMP Wed Oct 25 18:14:00 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Works

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
c6a677c6f37bb7abc85ba7e3465e82b9f7eb1d91

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1726519

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Brendan (lostincynicism) wrote :

That most recent one seems to be the culprit, hit the same panic.

Joseph Salisbury (jsalisbury) wrote :

Throughout the bisect process, you will find some kernels have the bug(Bad) and some kernels do not have the bug(good). Eventually the bisect will report the final bad commit, which introduced the regression.

I built the next test kernel, up to the following commit:
e579dde654fc2c6b0d3e4b77a9a4b2d2405c510e

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1726519

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Brendan (lostincynicism) wrote :

Gotcha.

That one also caused the panic

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
a96480723c287c502b02659f4b347aecaa651ea1

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1726519

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Brendan (lostincynicism) wrote :

So I got into some situation where the backup kennel i had was removed via apt-get autoclean. Now I can't boot into any kernel, even recovery. I'll have to reinstall, but then I can try the next kernel

Brendan (lostincynicism) wrote :

That last one works:

Linux lawl 4.11.0-041100-generic #201710271957 SMP Fri Oct 27 19:58:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Brendan (lostincynicism) wrote :

Just to verify, I did go back and try the 4.14 kernel again, and the panic still occurs on the fresh install.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
4ac4d584886a4f47f8ff3bca0f32ff9a2987d3e5

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1726519

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Brendan (lostincynicism) wrote :

That one caused the same panic

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
2bd80401743568ced7d303b008ae5298ce77e695

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1726519

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Brendan (lostincynicism) wrote :

That one works

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
1cd7fabc82eb06c834956113ff287f8848811fb8

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1726519

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Brendan (lostincynicism) wrote :

That one is broken

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
0ddee50e3f22964864b1a5f3cc632dd306ed1060

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1726519

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Brendan (lostincynicism) wrote :

That one works

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
8e8c9d01c5ea33e0d21f13264a9caeed255526d1

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1726519

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Brendan (lostincynicism) wrote :

That one is broken

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
5765d180fd23dc7d76109a0cbb082df6e5cfa67a

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1726519

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Stefan Priebe (s-priebe) wrote :

Hello,

i'm jumping in having the same problem as brendan. I bisected this myself. The offending commit is this one:

909657615d9b3ce709be4fd95b9a9e8c8c7c2be6 is the first bad commit
commit 909657615d9b3ce709be4fd95b9a9e8c8c7c2be6
Author: Christoph Hellwig <email address hidden>
Date: Thu Apr 6 15:36:32 2017 +0200

    scsi: libsas: allow async aborts

    We now first try to call ->eh_abort_handler from a work queue, but libsas
    was always failing that for no good reason. Allow async aborts.

    Reviewed-by: Johannes Thumshirn <email address hidden>
    Reviewed-by: Hannes Reinecke <email address hidden>
    Signed-off-by: Christoph Hellwig <email address hidden>
    Signed-off-by: Martin K. Petersen <email address hidden>

:040000 040000 b15f1eb57bf74667aefb7a304fa09bf58386eaf2 64df540499763db44c7e1b6ec9a55ffa9b2ebedc M drivers

Joseph Salisbury (jsalisbury) wrote :

Thanks for the info and the bisect work, Stefan!

I built an Artful test kernel with a revert of commit 909657615d9b. This test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1726519

Can you test that kernel and report back if it has the bug or not?

With this test kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Stefan Priebe (s-priebe) wrote :

I'm not using ubuntu ;-) just found this report using google. So i can't test - this must be done by Brendan (lostincynicism).

Brendan (lostincynicism) wrote :

That last kernel works! :)

Joseph Salisbury (jsalisbury) wrote :

Thanks for the testing, Brendan. Can you run one more test of the latest mainline kernel? Then I will ping the upstream patch Author.

The 4.14 is available from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14/

Thanks!

Josh G (joshin) wrote :

I'm affected by this too - using Proxmox 5.1 which uses Ubuntu's kernel.

If could get / compile the ZFS modules for 4.14, I could test kernels. Let me see if I can track those down, or build them myself.

Brendan (lostincynicism) wrote :

That kernel still caused the panic

Hi Christoph,

A kernel bug report was opened against Ubuntu [0].  After a kernel
bisect, it was found that reverting the following commit resolved this bug:

909657615d9b ("scsi: libsas: allow async aborts")

The regression was introduced as of v4.12-rc1, and it still exists in
4.14 mainline.

I was hoping to get your feedback, since you are the patch author.  Do
you think gathering any additional data will help diagnose this issue,
or would it be best to submit a revert request?

Thanks,

Joe

[0] http://pad.lv/1726519

Joseph Salisbury (jsalisbury) wrote :

Comment #40 was message sent upstream.

i have the same issue. 4.13.0-17 always this oops on boot. i had 4.10.0-32 still on disk and this is what i'm using for now.

I have same issue. I hoped it was fixed on kernel 4.14.12 (since it is the LTS kernel which patches spectre & meltdown); but it is also broken.

I also do get the error mentioned about.

Can this please be fixed for good? please?

I attach screenshot of a server failing to boot just because of this bug...

the following upstream patch seems like a likely fix:

https://patchwork.kernel.org/patch/10154587/

otherwise, reverting the buggy commit seems to solve the issue as well as a temporary measure.

Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with the patch posted here:
https://marc.info/?l=linux-scsi&m=151557324907914

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1742630

Can you test this kernel and see if it resolves this bug?

Joseph Salisbury (jsalisbury) wrote :

This bug and bug 1742630 are probably duplicates.

Jamie Baxter (himay) wrote :

Hi Joseph,

I know I'm new to this thread (took a while to find a match to the symptom), but I've been experiencing this issue as well (first on Manjaro, now on Xubuntu, both give problems with 4.13 but are fine in 4.9/4.10). Just installed your test kernel (lp1742630) to my Xubu16.04 installation. The panic is gone, but so is the offending SAS controller according to the dmesg output.

Attaching the dmesg (and some system info if needed) outputs, but if you need additional information just say the word.

Marcelo Cerri (mhcerri) on 2018-01-15
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed

In which kernel is fix present? do you know please if it will be backported to 4.14 since it is the LTS kernel?

I dont see this patch in 4.14.13

do you know when it will be out in 4.14 or which 4.15.x (x=?) will this come out?

Changed in linux (Ubuntu Artful):
status: Fix Committed → Incomplete
Changed in linux (Ubuntu Artful):
status: Incomplete → Fix Committed
Seth Forshee (sforshee) on 2018-01-19
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

Thanks a lot to Joseph to find this bug.

I dont understand well the flow how you upstream this patches into LTS kernels (like 4.14, 4.9, etc).

I checked on Torvald's git repository and 4.14.14 still doesnt have this patch.

So I applied the patch mentioned by Joseph by myself and rebuilt the 4.14.14 kernel.

You can find the .deb files here:
http://jaraberrocal.readmyblog.org/webdav/Public/

Launchpad Janitor (janitor) wrote :
Download full text (11.5 KiB)

This bug was fixed in the package linux - 4.13.0-31.34

---------------
linux (4.13.0-31.34) artful; urgency=low

  * linux: 4.13.0-31.34 -proposed tracker (LP: #1744294)

  [ Stefan Bader ]
  * CVE-2017-5715 // CVE-2017-5753
    - SAUCE: s390: improve cpu alternative handling for gmb and nobp
    - SAUCE: s390: print messages for gmb and nobp
    - [Config] KERNEL_NOBP=y

linux (4.13.0-30.33) artful; urgency=low

  * linux: 4.13.0-30.33 -proposed tracker (LP: #1743412)

  * Do not duplicate changelog entries assigned to more than one bug or CVE
    (LP: #1743383)
    - [Packaging] git-ubuntu-log -- handle multiple bugs/cves better

  * Unable to handle kernel NULL pointer dereference at isci_task_abort_task
    (LP: #1726519)
    - Revert "scsi: libsas: allow async aborts"

  * CVE-2017-5715 // CVE-2017-5753
    - SAUCE: x86/microcode: Extend post microcode reload to support IBPB feature
      -- repair missmerge
    - Revert "x86/svm: Add code to clear registers on VM exit"
    - kvm: vmx: Scrub hardware GPRs at VM-exit

linux (4.13.0-29.32) artful; urgency=low

  * linux: 4.13.0-29.32 -proposed tracker (LP: #1742722)

  * CVE-2017-5754
    - Revert "x86/cpu: Implement CPU vulnerabilites sysfs functions"
    - Revert "sysfs/cpu: Fix typos in vulnerability documentation"
    - Revert "sysfs/cpu: Add vulnerability folder"
    - Revert "UBUNTU: [Config] updateconfigs to enable
      GENERIC_CPU_VULNERABILITIES"

linux (4.13.0-28.31) artful; urgency=low

  * CVE-2017-5753
    - SAUCE: x86/kvm: Fix stuff_RSB() for 32-bit

  * CVE-2017-5715
    - SAUCE: x86/kvm: Fix stuff_RSB() for 32-bit

linux (4.13.0-27.30) artful; urgency=low

  [ Andy Whitcroft ]
  * CVE-2017-5753
    - locking/barriers: introduce new memory barrier gmb()
    - bpf: prevent speculative execution in eBPF interpreter
    - x86, bpf, jit: prevent speculative execution when JIT is enabled
    - uvcvideo: prevent speculative execution
    - carl9170: prevent speculative execution
    - p54: prevent speculative execution
    - qla2xxx: prevent speculative execution
    - cw1200: prevent speculative execution
    - Thermal/int340x: prevent speculative execution
    - userns: prevent speculative execution
    - ipv6: prevent speculative execution
    - fs: prevent speculative execution
    - net: mpls: prevent speculative execution
    - udf: prevent speculative execution
    - x86/feature: Enable the x86 feature to control Speculation
    - x86/feature: Report presence of IBPB and IBRS control
    - x86/enter: MACROS to set/clear IBRS and set IBPB
    - x86/enter: Use IBRS on syscall and interrupts
    - x86/idle: Disable IBRS entering idle and enable it on wakeup
    - x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup
    - x86/mm: Set IBPB upon context switch
    - x86/mm: Only set IBPB when the new thread cannot ptrace current thread
    - x86/entry: Stuff RSB for entry to kernel for non-SMEP platform
    - x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm
    - x86/kvm: Set IBPB when switching VM
    - x86/kvm: Toggle IBRS on VM entry and exit
    - x86/kvm: Pad RSB on VM transition
    - x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL fea...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Brendan (lostincynicism) wrote :

Thanks for all your support, Joseph. I plan to donate to Ubuntu as a result of your continued support on this issue.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.13.0-32.35

---------------
linux (4.13.0-32.35) artful; urgency=low

  * CVE-2017-5715 // CVE-2017-5753
    - SAUCE: x86/entry: Fix up retpoline assembler labels

 -- Stefan Bader <email address hidden> Tue, 23 Jan 2018 09:13:39 +0100

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.