Call trace observed while disabling PD on Ubuntu 16.04.4

Bug #1781578 reported by Murthy Bhat
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

I am working on smartpqi linux driver for Microsemi.

We have a storage controller managment application tool to manage storage behind our controller. One of the application option what we have is to disable physical device(PD).
During disable PD opertaion, Disable PD command got timedout. Then I observed below mentioned call trace in dmesg log.
Please note we are running IO stress test on the PD which is being disabled.

Please let me know if there are any known issues specific to Ubuntu kernel with respect to this.
If its a known issue specific to Ubuntu 16.04.4 distro, does this issue already addressed in latest Ubuntu OS distro ?

Thanks in advance!

Sincerely,
Murthy Bhat

Call trace:
smartpqi 0000:92:00.0: removed 5:0:2:0 30000d1701cf2001 Direct-Access ATA ST1000NX0303 AIO+ qd=32
[ 1199.201495] Buffer I/O error on dev sdk, logical block 244190645, async page read
[ 1343.006515] sdn:
[ 1358.264503] INFO: task kworker/173:1:1620 blocked for more than 120 seconds.
[ 1358.271550] Tainted: G OE 4.13.0-36-generic #40~16.04.1-Ubuntu
[ 1358.278966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1358.286806] kworker/173:1 D 0 1620 2 0x00000000
[ 1358.286832] Workqueue: events pqi_rescan_worker [smartpqi]
[ 1358.286836] Call trace:
[ 1358.286844] [<ffff000008086068>] __switch_to+0x90/0xa8
[ 1358.286855] [<ffff000008a761c4>] __schedule+0x35c/0x8b0
[ 1358.286858] [<ffff000008a7674c>] schedule+0x34/0x90
[ 1358.286865] [<ffff0000084ccaa8>] blk_mq_freeze_queue_wait+0x70/0xd8
[ 1358.286869] [<ffff0000084cd230>] blk_mq_freeze_queue+0x28/0x38
[ 1358.286873] [<ffff0000084cd498>] blk_freeze_queue+0x20/0x30
[ 1358.286878] [<ffff0000084bd830>] blk_cleanup_queue+0x78/0x128
[ 1358.286883] [<ffff000008764f3c>] __scsi_remove_device+0x64/0x128
[ 1358.286884] [<ffff000008765030>] scsi_remove_device+0x30/0x48
[ 1358.286887] [<ffff000008765228>] scsi_remove_target+0x188/0x1c8
[ 1358.286899] [<ffff000000bcbc08>] sas_rphy_remove+0x78/0x98 [scsi_transport_sas]
[ 1358.286907] [<ffff000000bcdbd8>] sas_port_delete+0x38/0x160 [scsi_transport_sas]
[ 1358.286918] [<ffff00000141a32c>] pqi_free_sas_port+0x4c/0x88 [smartpqi]
[ 1358.286926] [<ffff00000141a6fc>] pqi_remove_sas_device+0x24/0x38 [smartpqi]
[ 1358.286933] [<ffff000001413b80>] pqi_update_device_list+0x520/0x818 [smartpqi]
[ 1358.286939] [<ffff000001415af8>] pqi_scan_scsi_devices+0x420/0x980 [smartpqi]
[ 1358.286946] [<ffff0000014160b4>] pqi_rescan_worker+0x24/0x30 [smartpqi]
[ 1358.286955] [<ffff0000080f3dfc>] process_one_work+0x14c/0x420
[ 1358.286957] [<ffff0000080f41fc>] worker_thread+0x12c/0x470
[ 1358.286961] [<ffff0000080fae80>] kthread+0x108/0x138
[ 1358.286963] [<ffff0000080838e0>] ret_from_fork+0x10/0x30
[ 1358.287045] INFO: task pain:6788 blocked for more than 120 seconds.
[ 1358.293326] Tainted: G OE 4.13.0-36-generic #40~16.04.1-Ubuntu
[ 1358.300730] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1358.308567] pain D 0 6788 6633 0x00000000

Tags: artful
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1781578

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: artful
Revision history for this message
Murthy Bhat (bhatmurt) wrote :

Due to nature of the defect, could not run the apport-collect command

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.18 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc5

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Murthy Bhat (bhatmurt) wrote :

Hi,
As you suggested, we have upgraded the kernel as per https://wiki.ubuntu.com/Kernel/MainlineBuilds?action=show&redirect=KernelMainlineBuilds and
OS boots to the updated kernel properly on our cavium server
Due to some issues with aarch64 kernel packages, the ‘build’ directory is missing under /lib/modules/<kver>/ after the update hence unable to build our smartpqi driver to retest this.

Please suggest how to upgrade to latest upstream kernel on aarch64 so that we can build our smartpqi storage driver.

We have downloaded the following packages from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc5 and installed.

linux-headers-4.18.0-041800rc5_4.18.0-041800rc5.201807152130_all.deb
linux-headers-4.18.0-041800rc5-generic_4.18.0-041800rc5.201807152130_arm64.deb
linux-image-4.18.0-041800rc5-generic_4.18.0-041800rc5.201807152130_arm64.deb
linux-modules-4.18.0-041800rc5-generic_4.18.0-041800rc5.201807152130_arm64.deb

Regards/Ram

Revision history for this message
Murthy Bhat (bhatmurt) wrote :

FYI,

This issue is reported on Ubuntu 16.04.4 TLS kernel - Hardware Enablement (HWE) on cavium server by our test team

Regards/Murthy Bhat

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Murthy Bhat (bhatmurt) wrote :

Hi,

Can someone help to resolve this issue? The issue is not seen on GA release.

Regards/Murthy Bhat

Revision history for this message
dann frazier (dannf) wrote :

The smartpqi driver is included in the linux-modules deb. If you are trying to build/use an out-of-tree smartpqi driver (which your taint flags suggest), please be clear about that. It could quite likely be a bug in that out-of-tree driver vs. a bug with Ubuntu.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.