xenial kernel crash on HP BL460c G7 (qla24xx problem?)

Bug #1554003 reported by Attila Zsiros on 2016-03-07
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Critical
Joseph Salisbury
Xenial
Critical
Joseph Salisbury

Bug Description

Dear Members,

Latest Xenial kernel (linux-image-generic-4.4.0-8-generic) is crash on booting. Same problem with 4.4.0-9 and 4.4.0-10.

[ 1.654419] qla2xxx [0000:06:00.1]-0034:3: MSI-X: Unsupported ISP
2432 SSVID/SSDID (0x103C,0x1705).
[ 1.685273] hid-generic 0003:03F0:7029.0001: in
[ 1.685273] hid-generic 0003:03F0:7029.0001: input,hidraw0: USB HID
v1.01 Keyboard [HP Virtual Keyboard ] on usb-0000:01:00.4-1/input0
[ 1.685455] input: HP Virtual Keyboard as
/devices/pci0000:00/0000:00:1c.4/0000:01:00.4/usb6/6-1/6-1:1.1/0003:03F0:7029.0002/input/inpu
1048576 bytes) 0000:0
1:00.4/usb6/6-1/6-1:1.1/0003:03F0:7029.0002/input/inpu
[ 2.001109] CR2: 0000000000000050
[ 2.001170] ---
[ end trace 20a247b9d60f8e00 ]---
[ 2.001232] Kernel panic - not syncing: Fatal exception in interrup
[ 2.001232] Kernel panic - not syncing: Fatal exception in interrupt
[ 2.001813] Kernel Offset: disabled
[ 2.001873] ---
[ end Kernel panic - not syncing: Fatal exception in interrupt
[ 1.997314] [<ffffffff8181684b>] do_IRQ+0x4b/0xd0
[ 1.997379] [<ffffffff81814942>] common_interrupt+0x82/0x82
[ 1.997441] <EOI>
[ 1.997336] hpsa 0000:0c:00.0: scsi 1:3:0:0: added RAID HP
[ 1.997491] [<ffffffff816ac22d>] ? cpuidle_enter_state+0x12d/0x270
[ 1.997609] [<ffffffff816ac3a7>] cpuidle_enter+0x17/0x20
[ 1.997674] [<ffffffff810c1402>] call_cpuidle+0x32/0x60
[ 1.997737] [<ff0000:0c:00.0: scsi 1:1:0:0: added Direct-Access HP
[ 1.997737] [<ffffffff816ac383>] ? cpuidle_select+0x13/0x20
[ 1.997800] [<ffffffff810c1696>] cpu_startup_entry+0x266/0x320
[ 1.997866] [<ffffffff8180798c>] rest_init+0x7c/0x80
[ 1.997930] [<ffffffff81f500.0: FW config: function_mode=0x2003,
function_ca
[ 1.997930] [<ffffffff81f59011>] start_kernel+0x481/0x4a2
[ 1.997994] [<ffffffff81f58120>] ? early_idt_handler_array+0x120/0x120
[ 1.998058] [<ffffffff81f58339>] x86_64_start_reservations+0x2a/0x2c
[ 1.998123] [<ffffffff81f58485>] x86
[ 1.998123] [<ffffffff81f58485>] x86_64_start_kernel+0x14a/0x16d
[ 1.998186] Code: 00 48 89 45 d0 31 c0 48 8b 47 58 a8 02 0f 84 cf 00
00 00 48 8b 46 50 49 89 fd 49 89 f4 65 8b 15 06 34 e7 3f 4c 8b b7 10 02
00 00 <39> 50 50 74 11
  89 50 50 48 8b 46 50 8b 40 50 41 89 86 58 8put,hidraw0: USB HID v1.01
Key 74 11 89 50 50 48 8b 46 50 8b 40 50 41 89 86 58 8b 00
[ 2.000922] RIP [<ffffffffc0196d79>]
qla24xx_process_response_queue+0x49/0x4a0 [qla2xxx]
[ 2.001050] RSP <ffff880c0b803dc8>
</>hpiLO-> exit

Best regards Attila
---
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.20-0ubuntu3
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/dsp3', '/dev/dsp1', '/dev/dsp2', '/dev/dsp', '/dev/sequencer'] failed with exit code 1:
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-02-03 (32 days ago)
InstallationMedia: Ubuntu-Server 14.04.3 LTS "Trusty Tahr" - Beta amd64 (20150805)
MachineType: HP ProLiant BL460c G7
NonfreeKernelModules: raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear be2iscsi qla2xxx iscsi_boot_sysfs libiscsi ipmi_si hpsa scsi_transport_fc scsi_transport_iscsi ipmi_msghandler 8021q garp stp mrp llc ghash_clmulni_intel hid_generic be2net aesni_intel aes_x86_64 ablk_helper usbhid cryptd lrw vxlan gf128mul ip6_udp_tunnel glue_helper usb_storage hid udp_tunnel
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_US:en
 TERM=bterm
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/sh
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=linux priority=low vga=788 initrd=initrd.gz ---
ProcVersionSignature: Ubuntu 4.2.0-16.19-generic 4.2.3
RelatedPackageVersions:
 linux-restricted-modules-4.2.0-16-generic N/A
 linux-backports-modules-4.2.0-16-generic N/A
 linux-firmware 1.156
RfKill: Error: [Errno 2] No such file or directory
Tags: xenial
Uname: Linux 4.2.0-16-generic x86_64
UpgradeStatus: Upgraded to xenial on 2016-02-04 (31 days ago)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 05/05/2011
dmi.bios.vendor: HP
dmi.bios.version: I27
dmi.chassis.type: 28
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrI27:bd05/05/2011:svnHP:pnProLiantBL460cG7:pvr:cvnHP:ct28:cvr:
dmi.product.name: ProLiant BL460c G7
dmi.sys.vendor: HP

CVE References

Attila Zsiros (zsirmo) on 2016-03-07
affects: linux (Ubuntu) → linux-meta (Ubuntu)
Brad Figg (brad-figg) on 2016-03-07
affects: linux-meta (Ubuntu) → linux (Ubuntu)

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1554003

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

apport information

tags: added: apport-collected xenial
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Attila Zsiros (zsirmo) on 2016-03-07
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Critical
Attila Zsiros (zsirmo) wrote :

I boot newer (4.4.0-11 with bootparam: modprobe.blacklist=qla2xxx )

manual modporbe:

root@kvm2:~# modprobe qla2xxx
[ 247.431349] qla2xxx [0000:06:00.0]-0034:3: MSI-X: Unsupported ISP 2432 SSVID/SSDID (0x103C,0x1705).
[ 247.807564] qla2xxx [0000:06:00.1]-0034:4: MSI-X: Unsupported ISP 2432 SSVID/SSDID (0x103C,0x1705).
[ 248.200048] [<ffffffff8101bd5a>] oops_end+0xca/0xd0
[ 248.200115] [<ffffffff8106a485>] no_context+0x135/0x380
[ 248.200181] [<ffffffff8106a750>] __bad_area_nosemaphore+0x80/0x1f0
[ 248.200
[ 248.200249] [<ffffffff8106a8d3>] bad_area_nosemaphore+0x13/0x20
[ 248.200316] [<ffffffff8106abab>] __do_page_fault+0xcb/0x440
[ 248.200382] [<ffffffff8106af42>] do_page_fault+0x22/0x30
[ 248.200449] [<ff
[ 248.200449] [<ffffffff8182baf8>] page_fault+0x28/0x30
[ 248.200530] [<ffffffffc072de29>] ? qla24xx_process_response_queue+0x49/0x4a0 [qla2xxx]
[ 248.200615] [<ffffffff810c
[ 248.200615] [<ffffffff810ca0fe>] ? cpuacct_charge+0x4e/0x60
[ 248.200681] [<ffffffff810c0a77>] ? dequeue_rt_stack+0xc7/0x230
[ 248.200749] [<ffffffff810c008f>] ? update_curr_rt+0x13f/0x1d0
[ 248.200830] [<ffffffffc0730581>] qla
[ 248.200830] [<ffffffffc0730581>] qla24xx_intr_handler+0x101/0x300 [qla2xxx]
[ 248.200899] [<ffffffff81824db9>] ? __schedule+0x389/0xae0
[ 248.200965] [<ffffffff810dbfa0>] ? irq_finalize_oneshot.part.35+0xd0/0xd0
[ 248.201034] [<ffffffff810dbfc9>] irq_forced_th
[ 248.201034] [<ffffffff810dbfc9>] irq_forced_thread_fn+0x29/0x70
[ 248.201101] [<ffffffff810dc328>] irq_thread+0x138/0x1b0
[ 248.201167] [<ffffffff810dc140>] ? wake_threads_waitq+0x30/0x30
[ 248.201234] [<ffffffff810dc1f0>] ? irq_thread_dtor+0xb0/
[ 248.201234] [<ffffffff810dc1f0>] ? irq_thread_dtor+0xb0/0xb0
[ 248.201301] [<ffffffff810a0448>] kthread+0xd8/0xf0
[ 248.201367] [<ffffffff810a0370>] ? kthread_worker_fn+0x170/0x170
[ 248.201435] [<ffffffff81829d0f>] ret_from_fork+0x3f/0x70
[ 248.201502] [<ffffffff810a0370>] ? kthread_worker_fn+0x170/0x170
[ 248.201569] ---
[ end trace ba42d5cb2a09b7a9 ]---
</>hpiLO-> exit

Attila Zsiros (zsirmo) wrote :

I try build original kernel from kernel.org(4.4.4). Configure with make-kpkg.
This kernel working perfectly.

Attached dmesg and config.

Tim Gardner (timg-tpi) on 2016-03-08
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: Confirmed → In Progress
Tim Gardner (timg-tpi) wrote :

Attila - Please try the test kernel at http://people.canonical.com/~rtg/lp1554003/1/. Be sure to install both linux-image and linux-image-extra. This kernel has the following patches applied:

Revert "qla2xxx: Fix warning reported by static checker"
Revert "qla2xxx: Fix TMR ABORT interaction issue between qla2xxx and TCM"
Revert "qla2xxx: Use ATIO type to send correct tmr response"
Revert "qla2xxx: use TARGET_SCF_USE_CPUID flag to indiate CPU Affinity"

Reverting all qla2xxx patches since v4.4 isn't really an option since it would impact bug #1541456. If this test kernel doesn't work for you, then it would be helpful to know what was the last version that did work. The post v4.4 qla2xxx updates were first released in linux 4.4.0-4.19, so I'd start with a version prior to that.

Attila Zsiros (zsirmo) wrote :

Tim

I try kernel 4.4.0-12 (ubuntu branch). Same problem occured as what I experienced with the previous versions.

My tests:
Ubuntu branch: 4.4.0-8 - 4.4.0.12 kernel crashed
Kernel.org 4.4.4 kernel booting without problem
Kernel.org 4.5rc7 kernel crashed, same with ubuntu branch!

tags: added: kernel-da-key
Tim Gardner (timg-tpi) wrote :

Does 4.4.0-7 work ?

Attila Zsiros (zsirmo) wrote :

4.4.0-7 crash

I checked older kernels:
4.4.0-3 good!
4.4.0-4 crash

tags: added: performing-bisect
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between 4.4.0-3 and 4.4.0-4. The kernel bisect will require testing of about 6 test kernels.

I built the first test kernel, up to the following commit:
b1aa887dffc9cb81d82b72bdf2af6c9e4b5a0294

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1554003

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Just confirming this on an IBM HS21 with the nightly from 14-Mar-2016 07:01 @ http://cdimage.ubuntu.com/ubuntu-server/daily/current/xenial-server-amd64.iso
IBM Part Number 26R0892

Attila Zsiros (zsirmo) wrote :

I try 4.4.0-4.18~lp1554003Commitb1aa887df kernel, is good!

Attached dmesg.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
17c4da325caf67876458a05baa833a35931f3133

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1554003

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Attila Zsiros (zsirmo) wrote :

4.4.1-040401-generic #201603161907 is crashed
attached screenshot

Attila Zsiros (zsirmo) wrote :

more screenshot

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
fa87c4229357b70340ca35411a1c9013be535e11

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1554003

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Attila Zsiros (zsirmo) wrote :

lp1554003Commitfa87c422 is good, dmesg attached.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
bdddded9ad074a8be28e0a3097598238a02c47c6

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1554003

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Patrick Domack (patrickdk) wrote :

I am also having this issue.

Ubuntu 4.4.0-4.18~lp1554003Commitbdddded9-generic 4.4.1

is good for me.

Attila Zsiros (zsirmo) wrote :

same, good!

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
d3b9bcd9c7d8497eda30f5172503c871501e77b4

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1554003

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Patrick Domack (patrickdk) wrote :

lp1554003Commitd3b9bcd9c works fine, no panics.

Attila Zsiros (zsirmo) wrote :

same, good!

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
b11f66aebbf8369707ddde3a9a7bb01b6cb41d58

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1554003

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Attila Zsiros (zsirmo) wrote :

 b11f66aebbf8369707ddde3a9a7bb01b6cb41d58 is good!

Tim Gardner (timg-tpi) on 2016-03-28
Changed in linux (Ubuntu Xenial):
assignee: Tim Gardner (timg-tpi) → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

The bisect reported the following commit as the first bad commit:

commit cdb898c52d1dfad4b4800b83a58b3fe5d352edde
Author: Quinn Tran <email address hidden>
Date: Thu Dec 17 14:57:05 2015 -0500

    qla2xxx: Add irq affinity notification

I'll build a test kernel with a revert of this commit for testing.

Joseph Salisbury (jsalisbury) wrote :

I built a Xenial test kernel with a revert of commit cdb898c5. However, it also required three other commits:

        5327c7d qla2xxx: use TARGET_SCF_USE_CPUID flag to indiate CPU Affinity
 9095ada target/transport: add flag to indicate CPU Affinity is observed
 fb3269b qla2xxx: Add selective command queuing

The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1554003

Can you test that kernel and report back if it has the bug or not?

Attila Zsiros (zsirmo) wrote :

Revert build is good! Dmesg attached.

Patrick Domack (patrickdk) wrote :

Works fine here also.

Joseph Salisbury (jsalisbury) wrote :

Upstream sent a patch that may fix this bug without needed to revert any commits. I built a Xenial test kernel with this patch, which can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1554003/

Can folks affected by this bug test this kernel and see if it fixes the bug?

Patrick Domack (patrickdk) wrote :

Works for me.

[ 98.715393] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 8.07.00.26-k.
[ 98.715608] qla2xxx [0000:10:00.0]-001d: : Found an ISP2432 irq 18 iobase 0xffffc90003612000.
[ 98.716407] qla2xxx [0000:10:00.0]-0034:1: MSI-X: Unsupported ISP 2432 SSVID/SSDID (0x103C,0x1705).
[ 99.112157] scsi host1: qla2xxx
[ 99.112872] qla2xxx [0000:10:00.0]-00fb:1: QLogic QMH2462 - PCI-Express Dual Channel 4Gb Fibre Channel Mezzanine HBA.
[ 99.112883] qla2xxx [0000:10:00.0]-00fc:1: ISP2432: PCIe (2.5GT/s x4) @ 0000:10:00.0 hdma+ host#=1 fw=8.03.00 (9496).
[ 99.113074] qla2xxx [0000:10:00.1]-001d: : Found an ISP2432 irq 19 iobase 0xffffc9000361e000.
[ 99.113264] qla2xxx [0000:10:00.1]-0034:2: MSI-X: Unsupported ISP 2432 SSVID/SSDID (0x103C,0x1705).
[ 99.500141] scsi host2: qla2xxx
[ 99.500825] qla2xxx [0000:10:00.1]-00fb:2: QLogic QMH2462 - PCI-Express Dual Channel 4Gb Fibre Channel Mezzanine HBA.
[ 99.500836] qla2xxx [0000:10:00.1]-00fc:2: ISP2432: PCIe (2.5GT/s x4) @ 0000:10:00.1 hdma+ host#=2 fw=8.03.00 (9496).
[ 100.717278] qla2xxx [0000:10:00.0]-500a:1: LOOP UP detected (4 Gbps).
[ 101.105997] qla2xxx [0000:10:00.1]-500a:2: LOOP UP detected (4 Gbps).

Attila Zsiros (zsirmo) wrote :

lp1554003V2UpstreamPatch is good, dmesg attached.

Launchpad Janitor (janitor) wrote :
Download full text (3.5 KiB)

This bug was fixed in the package linux - 4.4.0-21.37

---------------
linux (4.4.0-21.37) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1571791

  * linux: MokSBState is ignored (LP: #1571691)
    - SAUCE: (noup) MODSIGN: Import certificates from UEFI Secure Boot
    - SAUCE: (noup) efi: Disable secure boot if shim is in insecure mode
    - SAUCE: (noup) Display MOKSBState when disabled

linux (4.4.0-20.36) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1571069

  * sysfs mount failure during stateful lxd snapshots (LP: #1570906)
    - SAUCE: kernfs: Do not match superblock in another user namespace when
      mounting

  * Kernel Panic in Ubuntu 16.04 netboot installer (LP: #1570441)
    - x86/topology: Fix logical package mapping
    - x86/topology: Fix Intel HT disable
    - x86/topology: Use total_cpus not nr_cpu_ids for logical packages
    - xen/apic: Provide Xen-specific version of cpu_present_to_apicid APIC op
    - x86/topology: Fix AMD core count

  * [regression]: Failed to call clock_adjtime(): Invalid argument
    (LP: #1566465)
    - ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO

linux (4.4.0-19.35) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1570348

  * CVE-2016-2847 (LP: #1554260)
    - pipe: limit the per-user amount of pages allocated in pipes

  * xenial kernel crash on HP BL460c G7 (qla24xx problem?) (LP: #1554003)
    - SAUCE: (noup) qla2xxx: Add irq affinity notification V2

  * arm64: guest hangs when ntpd is running (LP: #1549494)
    - SAUCE: (noup) KVM: arm/arm64: Handle forward time correction gracefully

  * linux: Enforce signed module loading when UEFI secure boot (LP: #1566221)
    - [Config] CONFIG_EFI_SECURE_BOOT_SIG_ENFORCE=y

  * s390/cpumf: Fix lpp detection (LP: #1555344)
    - s390/facilities: use stfl mnemonic instead of insn magic
    - s390/facilities: always use lowcore's stfle field for storing facility bits
    - s390/cpumf: Fix lpp detection

  * s390x kernel image needs weightwatchers (LP: #1536245)
    - [Config] s390x: Use compressed kernel bzImage

  * Surelock GA2 SP1: surelock02p05: Not seeing sgX devices for LUNs after
    upgrading to Ubuntu 16.04 (LP: #1567581)
    - Revert "UBUNTU: SAUCE: (noup) powerpc/pci: Assign fixed PHB number based on
      device-tree properties"

  * Backport upstream bugfixes to ubuntu-16.04 (LP: #1555765)
    - cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path
    - Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"
    - cpufreq: powernv: Add sysfs attributes to show throttle stats

  * systemd-modules-load.service: Failing due to missing module 'ib_iser' (LP: #1566468)
    - [Config] Add ib_iser to generic inclusion list

  * thunderx nic performance improvements (LP: #1567093)
    - net: thunderx: Set recevie buffer page usage count in bulk
    - net: thunderx: Adjust nicvf structure to reduce cache misses

  * fixes for thunderx nic in multiqueue mode (LP: #1567091)
    - net: thunderx: Fix for multiqset not configured upon interface toggle
    - net: thunderx: Fix for HW TSO not enabled for secondary qsets
    - net: thund...

Read more...

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Released
To post a comment you must log in.