user space process hangs when reading partition table of disk

Bug #1740309 reported by Andreas Pokorny on 2017-12-27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

user space process as in parted or fdisk or update-grub2 that read partition tables freeze on my system:
/dev/sda/ GPT disk with efi, swap and ubuntu artful partition
/dev/sdb old style partition table with one ext4 partition
/dev/sdc GPT disk with efi partition, windows 10

I.e. if I run as root :
# fdisk -l
Disk /dev/sda: 232,9 GiB, 250059350016 bytes, 488397168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 56FF6AB0-076F-483C-890F-EC4B4FAF6313

Device Start End Sectors Size Type
/dev/sda1 2048 2000895 1998848 976M EFI System
/dev/sda2 2000896 10000383 7999488 3,8G Linux swap
/dev/sda3 10000384 488396799 478396416 228,1G Linux filesystem

Disk /dev/sdb: 489,1 GiB, 525112713216 bytes, 1025610768 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x2f11f9c4

Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 1025609727 1025607680 489G 83 Linux
<...process hangs here forever...>

With gdb I stepped through fdisk -l and saw that the call to open ('dev/sdc'..) never returns.
--
ApportVersion: 2.20.7-0ubuntu3.6
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/pcmC0D3p: andreas 1807 F...m pulseaudio
 /dev/snd/controlC0: andreas 1807 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 17.10
HibernationDevice: RESUME=UUID=8a7a6077-0a3e-4641-8f15-bd6f09826510
InstallationDate: Installed on 2017-12-26 (1 days ago)
InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20171018)
IwConfig:
 lo no wireless extensions.

 enp8s0 no wireless extensions.

 enp12s0 no wireless extensions.
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
Package: linux-image-4.13.0-21-generic 4.13.0-21.24
PackageArchitecture: amd64
ProcFB:
 0 radeondrmfb
 1 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-21-generic.efi.signed root=UUID=4f18ad67-a4f0-4e03-b061-66615751538e ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 4.13.0-21.24-generic 4.13.13
RelatedPackageVersions:
 linux-restricted-modules-4.13.0-21-generic N/A
 linux-backports-modules-4.13.0-21-generic N/A
 linux-firmware 1.169.1
RfKill:

Tags: artful
Uname: Linux 4.13.0-21-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 12/09/2013
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P1.80
dmi.board.name: EP2C602
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP1.80:bd12/09/2013:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnEP2C602:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.

summary: - user space process when reading partition table from disk
+ user space process hangs when reading partition table from disk
summary: - user space process hangs when reading partition table from disk
+ user space process hangs when reading partition table of disk

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1740309

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: artful

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
description: updated
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.15 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc6/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete

The problem still exists with 4.15.0-041500rc6-generic

Prior versions were also affected. I initially suspected faulty hardware, and for months did not care about the contents of the disks. I only looked back at the problem when I finally replaced the hard disk. At least with 4.12 the problem also occurred.

Besides 4.15.0-041500rc6-generic and 4.13 the problem occurs on:

4.12.14-041214-generic #201709200843 SMP Wed Sep 20 12:46:23 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

but it did not occur on:
4.4.109-0404109-generic #201801022112 SMP Tue Jan 2 21:13:34 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
4.10.0-041000-generic #201702191831 SMP Sun Feb 19 23:33:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
4.11.12-041112-generic #201707210350 SMP Fri Jul 21 07:53:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I will jump through the 4.12 builds tomorrow.

I made a mistake - the problem also occurs on 4.12.0-041200-generic #201707022031
So the problem was added somewhere between 4.11.12 and 4.12.0

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Kai-Heng Feng (kaihengfeng) wrote :

Please find the first v4.12-rc* kernel that has this problem.

It already happens with: 4.12.0-041200rc1-generic #201705131731

Kai-Heng Feng (kaihengfeng) wrote :

The a bisection between v4.11 to v4.12-rc1 is needed.

Here's the step
$ sudo apt build-dep linux
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ git bisect start
$ git bisect good v4.11
$ git bisect bad v4.12-rc1
$ make localmodconfig
$ make -j`nproc` deb-pkg
Install the newly built kernel.
If the issue still happens,
$ git bisect bad
Otherwise,
$ git bisect good
Repeat to "make -j`nproc` deb-pkg" until you find the commit that causes the regression.

It looks like this change caused it, this also seems to cause a null ptr error on startup, which I thought was unrelated.

909657615d9b3ce709be4fd95b9a9e8c8c7c2be6 is the first bad commit
commit 909657615d9b3ce709be4fd95b9a9e8c8c7c2be6
Author: Christoph Hellwig <email address hidden>
Date: Thu Apr 6 15:36:32 2017 +0200

    scsi: libsas: allow async aborts

    We now first try to call ->eh_abort_handler from a work queue, but libsas
    was always failing that for no good reason. Allow async aborts.

    Reviewed-by: Johannes Thumshirn <email address hidden>
    Reviewed-by: Hannes Reinecke <email address hidden>
    Signed-off-by: Christoph Hellwig <email address hidden>
    Signed-off-by: Martin K. Petersen <email address hidden>

:040000 040000 b15f1eb57bf74667aefb7a304fa09bf58386eaf2 64df540499763db44c7e1b6ec9a55ffa9b2ebedc M drivers

The change removes an early exit that would protect a call to lldd_abort_task...

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 9bd55bce83af..ee6b39a1db69 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -491,9 +491,6 @@ int sas_eh_abort_handler(struct scsi_cmnd *cmd)
        struct Scsi_Host *host = cmd->device->host;
        struct sas_internal *i = to_sas_internal(host->transportt);

- if (current != host->ehandler)
- return FAILED;
-
        if (!i->dft->lldd_abort_task)
                return FAILED;

To post a comment you must log in.