Boot wedges on 4.10.0-22 but not on -21

Bug #1696665 reported by Chris West
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned

Bug Description

After a kernel update, the machine wedges during late boot.

Symptoms:

 * lightdm never gets a chance to start because things like gpu-manager are wedged, and must start before it
 * gpu-manager uses 100% cpu
 * sudo / ip / anything that uses netlink hangs forever and can't be ^C'd
 * can't reboot; processes like network-manager are wedged and unkillable

I'm running a relatively standard Zesty install, with a couple of extra packages (e.g. official Docker). Laptop is a generic unbranded desktop-replacement from https://pcspecialist.co.uk/.

My guess at the problem is these blocked kernel threads:
[ 39.105233] sysrq: SysRq : Show Blocked State
[ 39.105248] task PC stack pid father
[ 39.105265] kworker/0:1 D 0 67 2 0x00000000
[ 39.105270] Workqueue: kec_query acpi_ec_event_processor
[ 39.105271] Call Trace:
[ 39.105273] __schedule+0x233/0x6f0
[ 39.105274] schedule+0x36/0x80
[ 39.105275] schedule_timeout+0x22a/0x3f0
[ 39.105276] ? del_timer_sync+0x48/0x50
[ 39.105277] ? schedule_timeout+0x1e7/0x3f0
[ 39.105277] __down_timeout+0x7d/0xd0
[ 39.105278] down_timeout+0x4c/0x60
[ 39.105280] acpi_os_wait_semaphore+0x56/0x70
[ 39.105281] acpi_ut_acquire_mutex+0x46/0x80
[ 39.105282] acpi_ns_get_node+0x28/0x58
[ 39.105283] acpi_ns_evaluate+0x52/0x252
[ 39.105284] acpi_evaluate_object+0x148/0x258
[ 39.105285] acpi_ec_event_processor+0x6e/0xa0
[ 39.105286] process_one_work+0x1fc/0x4b0
[ 39.105287] worker_thread+0x4b/0x500
[ 39.105288] kthread+0x109/0x140
[ 39.105289] ? process_one_work+0x4b0/0x4b0
[ 39.105290] ? kthread_create_on_node+0x60/0x60
[ 39.105291] ret_from_fork+0x2c/0x40

Googlin' this stack trace gets basically nothing.

I have had no other problems (no weird hangs, ...) with this machine on previous kernels. The binary nVidia driver does cause issues, but I don't use it.

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-4.10.0-22-generic 4.10.0-22.24
ProcVersionSignature: Ubuntu 4.10.0-21.23-generic 4.10.11
Uname: Linux 4.10.0-21-generic x86_64
ApportVersion: 2.20.4-0ubuntu4.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: faux 2840 F.... pulseaudio
Date: Thu Jun 8 08:16:01 2017
HibernationDevice: RESUME=UUID=90cfcad3-decf-4ee3-acee-22da8e067a01
InstallationDate: Installed on 2017-02-24 (103 days ago)
InstallationMedia: Ubuntu 16.10 "Yakkety Yak" - Release amd64 (20161012.2)
MachineType: Notebook N85_N87,HJ,HJ1,HK1
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.10.0-21-generic.efi.signed root=/dev/mapper/ubuntu--vg-root ro
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-21-generic N/A
 linux-backports-modules-4.10.0-21-generic N/A
 linux-firmware 1.164.1
SourcePackage: linux
UpgradeStatus: Upgraded to zesty on 2017-04-26 (42 days ago)
dmi.bios.date: 01/23/2017
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1.05.07
dmi.board.asset.tag: Tag 12345
dmi.board.name: N85_N87,HJ,HJ1,HK1
dmi.board.vendor: Notebook
dmi.board.version: Not Applicable
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: Notebook
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1.05.07:bd01/23/2017:svnNotebook:pnN85_N87,HJ,HJ1,HK1:pvrNotApplicable:rvnNotebook:rnN85_N87,HJ,HJ1,HK1:rvrNotApplicable:cvnNotebook:ct10:cvrN/A:
dmi.product.name: N85_N87,HJ,HJ1,HK1
dmi.product.version: Not Applicable
dmi.sys.vendor: Notebook

Revision history for this message
Chris West (faux) wrote :
Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Chris West (faux) wrote :

I tried the Ubuntu mainline builds. The bug happens on mainline 4.10.12 (and 13, and 15, and 17..), but not on 4.10.11. This tallies well with the Ubuntu versions, where 11 is before -22, and 12 is after.

The failure on 4.10.12-mainline is attached.

Linux version 4.10.12-041012-generic (kernel@gloin) (gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12) ) #201704210512 SMP

Revision history for this message
Chris West (faux) wrote :

Bisecting. Trying a single boot only because it appears 100% reproducible. Famous last words.

# bad: [0077e0558cc53950b2d5cd9458641443161ca7db] x86/signals: Fix lower/upper bound reporting in compat siginfo
# good: [c0cf63ef356bab81f71fd9e3a00a9c731f9ec680] UBUNTU: SAUCE: PCI: Apply the new generic I/O management on PCI IO hosts

~25 commits remaining.

Revision history for this message
Chris West (faux) wrote :

It's somewhere in:
* e8f09b159634 - (refs/bisect/bad) drm/fb-helper: Allow var->x/yres(_virtual) < fb->width/height again (3 weeks ago) <Michel Dänzer>
* 62dd80f54582 - drm/etnaviv: fix missing unlock on error in etnaviv_gpu_submit() (3 weeks ago) <Wei Yongjun>
* 1611347ef50a - (HEAD) drm/nouveau: initial support (display-only) for GP107 (3 weeks ago) <Ben Skeggs>
* e0cc39c8a55c - (refs/bisect/good-e0cc39c8a55cd1cb51260678910d9bb1042cf5cd) drm/nouveau/kms/nv50: fix double dma_fence_put() when destroying plane state (3 weeks ago) <Ben Skeggs>

Rather predictably, the machine has a GP107 card: a mobile 1050.

Blacklisting nouveau fixes the hang (copied from SO).

>> /etc/modprobe.d/blacklist.conf
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

$ sudo update-initramfs -u

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.