Precise only boots ~1 in 10 times with the NVIDIA driver enabled

Bug #1185340 reported by Karl Trygve Kalleberg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers (Ubuntu)
New
Undecided
Unassigned

Bug Description

Release: Ubuntu 12.04.2 LTS
Package: nvidia-current 304.88-0ubuntu0.0.2
Kernel: 3.2.0-44-generic

I installed Precise from scratch on a Thinkpad T420s with NVIDIA Optimus. The BIOS is configured to "Discrete graphics", i.e. Optimus is turned off, and the NVIDIA video card is turned on, and it is the only card visible from Linux.

On bootup the machine locks up after grub before I get the decrypt hardware password prompt approximately 9 out of 10 times. In about 1 out of 10 boots, I get to the password prompt, and the machine works as expected afterwards.

If I set the acpi=no kernel parameter, booting seems to always work, but I only get one CPU core activated.

When booting in recovery mode, I notice that the lockup usually happens in this stretch of kernel initialization:

[ 2.026423] sd 0:0:0:0: [sda] Attached SCSI disk
[ 2.036151] usb 1-1: new high-speed USB device number 2 using ehci_hcd
[ 2.168956] hub 1-1:1.0: USB hub found
[ 2.169240] hub 1-1:1.0: 6 ports detected
[ 2.279887] usb 2-1: new high-speed USB device number 2 using ehci_hcd
[ 2.351789] ata2: SATA link down (SStatus 0 SControl 300)
[ 2.412641] hub 2-1:1.0: USB hub found
[ 2.412923] hub 2-1:1.0: 8 ports detected
[ 2.483714] usb 1-1.3: new full-speed USB device number 3 using ehci_hcd
[ 2.647636] usb 1-1.6: new high-speed USB device number 4 using ehci_hcd
[ 2.679364] ata5: SATA link down (SStatus 0 SControl 300)
[ 2.680465] Freeing unused kernel memory: 924k freed
[ 2.680629] Write protecting the kernel read-only data: 12288k
[ 2.683768] Freeing unused kernel memory: 1596k freed
[ 2.686126] Freeing unused kernel memory: 1188k freed
[ 2.697599] udevd[108]: starting version 175
[ 2.712918] e1000e: Intel(R) PRO/1000 Network Driver - 1.5.1-k
[ 2.713006] e1000e: Copyright(c) 1999 - 2011 Intel Corporation.
[ 2.713122] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 2.713219] e1000e 0000:00:19.0: setting latency timer to 64
[ 2.713326] e1000e 0000:00:19.0: irq 48 for MSI/MSI-X
[ 2.716222] sdhci: Secure Digital Host Controller Interface driver
[ 2.716312] sdhci: Copyright(c) Pierre Ossman
[ 2.716828] sdhci-pci 0000:05:00.0: SDHCI controller found [1180:e823] (rev 4)
[ 2.717097] sdhci-pci 0000:05:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 2.717524] sdhci-pci 0000:05:00.0: setting latency timer to 64
[ 2.717545] mmc0: no vmmc regulator found
[ 2.717789] Registered led device: mmc0::
[ 2.719020] mmc0: SDHCI controller on PCI [0000:05:00.0] using DMA
[ 2.729653] acpi device:0a: registered as cooling_device4
[ 2.729856] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:09/LNXVIDEO:01/input/input4
[ 2.730052] ACPI: Video Device [VID1] (multi-head: yes rom: yes post: no)
[ 2.815419] usb 2-1.4: new high-speed USB device number 3 using ehci_hcd
[ 2.902710] e1000e 0000:00:19.0: eth0: (PCI Express:2.5GT/s:Width x1) f0:de:f1:c2:bb:67
[ 2.902821] e1000e 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
[ 2.902941] e1000e 0000:00:19.0: eth0: MAC: 10, PHY: 11, PBA No: 1000FF-0FF
[ 8.462440] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[ 8.839798] udevd[400]: starting version 175

After observing a few hundred boot lockups, it would appear that the lockup can happen after either the USB subsystem started its initialization, after the e1000e network card initialization or after the MMC initialization. For all I know, these things run in parallell, and somebody (NVIDIA?) is not holding the appropriate locks, thus interfering with the other initialization code.

The lockup mostly results state where the kernel complains about USB reset errors approx every second:

[ timestamp] hub 1-1:1.0: hub_port_status failed (err = -100)
[ timestamp] hub 1-1.5:1.0: hub_port_status failed (err = -100)
[ timestamp+1sec ] hub 1-1:1.0: cannot reset port 6 (err = -110)
[ timestamp+2sec ] hub 1-1:1.0: cannot reset port 6 (err = -110)
.. etc

This problem is not really new; I've experienced it for a long while, but the likelihood of a successful boot seems to have dropped lot (from 1 in 2-3 to 1 in 10) after updating to kernel 3.2.0-44 + ubuntu drivers 304.88-0ubuntu0.0.2.

(If I boot the machine using the Intel graphics card, this never happens.)

Tags: precise
bugbot (bugbot)
tags: added: precise
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.