psb_gfx boot hang on Atom N2600 (GMA3600 Cedarview)

Bug #944929 reported by Jim Klein on 2012-03-02
46
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned

Bug Description

psb_gfx causes kernel stall at bootup immediately after framebuffer initializes. Screen goes black and system unresponsive. Usual suspects - disable vt switching, console=tty1, nosplash, no plymouth, disable plymouth, hard set framebuffer mode - don't solve the problem. Problem also exists in mainline and vanilla upstream kernels with gma500_gfx, although gma500_gfx on 3.3rc5 gets slightly further in that it redraws the screen text after the framebuffer initializes at a rate of about 1 line every 3 seconds. If you wait long enough, kernel kicks out stalls as follows (on 3.3rc5):

ioremap error for 0x3f675000-0x3f678000, requested 0x10, got 0x0
gma500 0000:00:02.0: GTT PCI BAR not initialized.
gma500 0000:00:02.0: GATT PCI BAR not initialized.
INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 2, t=15029 jiffies)

dmesg doesn't offer much insight:

Feb 17 09:33:25 system-2124d8 kernel: [ 176.194626] [drm] Initialized drm 1.1.0 20060810
Feb 17 09:33:25 system-2124d8 kernel: [ 176.226288] gma500 0000:00:02.0: setting latency timer to 64
Feb 17 09:33:25 system-2124d8 kernel: [ 176.228773] gma500 0000:00:02.0: irq 46 for MSI/MSI-X
Feb 17 09:33:25 system-2124d8 kernel: [ 176.229052] ioremap error for 0x3f675000-0x3f678000, requested 0x10, got 0x0
Feb 17 09:33:25 system-2124d8 kernel: [ 176.229415] gma500 0000:00:02.0: GTT PCI BAR not initialized.
Feb 17 09:33:25 system-2124d8 kernel: [ 176.229489] gma500 0000:00:02.0: GATT PCI BAR not initialized.
Feb 17 09:33:25 system-2124d8 kernel: [ 176.229556] Stolen memory information
Feb 17 09:33:25 system-2124d8 kernel: [ 176.229560] base in RAM: 0x3f800000
Feb 17 09:33:25 system-2124d8 kernel: [ 176.229565] size: 7932K, calculated by (GTT RAM base) - (Stolen base), seems wrong
Feb 17 09:33:25 system-2124d8 kernel: [ 176.229571] the correct size should be: 8M(dvmt mode=3)
Feb 17 09:33:25 system-2124d8 kernel: [ 176.232523] Set up 1983 stolen pages starting at 0x3f800000, GTT offset 0K
Feb 17 09:33:25 system-2124d8 kernel: [ 176.232700] [drm] SGX core id = 0x00000000
Feb 17 09:33:25 system-2124d8 kernel: [ 176.232705] [drm] SGX core rev major = 0x00, minor = 0x00
Feb 17 09:33:25 system-2124d8 kernel: [ 176.232710] [drm] SGX core rev maintenance = 0x00, designer = 0x00
Feb 17 09:33:25 system-2124d8 kernel: [ 176.251286] acpi device:1d: registered as cooling_device4
Feb 17 09:33:25 system-2124d8 kernel: [ 176.252745] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input10
Feb 17 09:33:25 system-2124d8 kernel: [ 176.253191] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no)
Feb 17 09:33:25 system-2124d8 kernel: [ 176.253375] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
Feb 17 09:33:25 system-2124d8 kernel: [ 176.253396] [drm] No driver support for vblank timestamp query.
Feb 17 09:33:26 system-2124d8 kernel: [ 177.064883] gma500 0000:00:02.0: allocated 1024x600 fb
Feb 17 09:33:26 system-2124d8 kernel: [ 177.065254] fbcon: psbfb (fb0) is primary device
Feb 17 09:34:57 system-2124d8 kernel: [ 227.887149] Console: switching to colour frame buffer device 128x37
Feb 17 09:34:57 system-2124d8 kernel: [ 267.849207] fb0: psbfb frame buffer device
Feb 17 09:34:57 system-2124d8 kernel: [ 267.849221] drm: registered panic notifier
Feb 17 09:34:57 system-2124d8 kernel: [ 267.849594] [drm] Initialized gma500 1.0.0 2011-06-06 for 0000:00:02.0 on minor 0

drm.debug looks like this before the stall:

[ 3.902363] Refined TSC clocksource calibration: 1595.999 MHz.
[ 3.902515] Switching to clocksource tsc
[ 4.392088] gma500 0000:00:02.0: allocated 1024x600 fb
[ 4.392357] fbcon: psbfb (fb0) is primary device
[ 44.751817] Console: switching to colour frame buffer device 128x37

Likely to be a big problem, as N2600 netbooks/devices are beginning to flood into the market.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-17-generic-pae 3.2.0-17.27
ProcVersionSignature: Ubuntu 3.2.0-17.27-generic-pae 3.2.6
Uname: Linux 3.2.0-17-generic-pae i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC271X Analog [ALC271X Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
ApportVersion: 1.93-0ubuntu2
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC271X Analog [ALC271X Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: user 1592 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0x46100000 irq 45'
   Mixer name : 'Realtek ALC271X'
   Components : 'HDA:80862880,80860101,00100000 HDA:10ec0269,1025058f,00100100'
   Controls : 20
   Simple ctrls : 11
Date: Fri Mar 2 08:05:43 2012
HibernationDevice: RESUME=UUID=1aec9a0a-14e2-4f24-8e2b-fdaf4efe1552
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Beta i386 (20120301)
MachineType: Gateway LT40
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-17-generic-pae root=UUID=9b5d2d21-d044-4abf-ad4c-6ff3c2eedf58 ro quiet
PulseSinks:
 Error: command ['pacmd', 'list-sinks'] failed with exit code 1: Home directory /home/user not ours.
 No PulseAudio daemon running, or not running as session daemon.
PulseSources:
 Error: command ['pacmd', 'list-sources'] failed with exit code 1: Home directory /home/user not ours.
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-17-generic-pae N/A
 linux-backports-modules-3.2.0-17-generic-pae N/A
 linux-firmware 1.71
SourcePackage: linux
StagingDrivers: rts_pstor
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 01/12/2012
dmi.bios.vendor: Insyde Corp.
dmi.bios.version: V1.04
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: SJE01_CT
dmi.board.vendor: Gateway
dmi.board.version: Base Board Version
dmi.chassis.type: 10
dmi.chassis.vendor: Chassis Manufacturer
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnInsydeCorp.:bvrV1.04:bd01/12/2012:svnGateway:pnLT40:pvrV1.04:rvnGateway:rnSJE01_CT:rvrBaseBoardVersion:cvnChassisManufacturer:ct10:cvrChassisVersion:
dmi.product.name: LT40
dmi.product.version: V1.04
dmi.sys.vendor: Gateway

Jim Klein (jklein) wrote :
Brad Figg (brad-figg) on 2012-03-02
Changed in linux (Ubuntu):
status: New → Confirmed
Jim Klein (jklein) wrote :

Also, standard install won't work due to hang - only way to get installed is alternate, then boot from something else and blacklist psb_gfx.

Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report at bugzilla.kernel.org [1]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

If you are comfortable with opening a bug upstream, It would be great if you can report back the upstream bug number in this bug report. That will allow us to link this bug to the upstream report.

[1] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
importance: Undecided → Medium
importance: Medium → High
Jim Klein (jklein) wrote :

We now have this mostly worked out. Alan put together the attached patch upstream, and the gma500_gfx driver now loads properly. Only remaining problem appears to be resume from suspend - still looking into that. Works pretty well with the new xorg modesetting (fallback) driver at http://xorg.freedesktop.org/archive/individual/driver/ - no disappearing cursor, external ports are visible, etc. New xorg driver is a vast improvement over traditional fallback and should be part of Precise.

Jim Klein (jklein) wrote :

Tested against Gateway/Acer and HP so far, but suspect will work with all of them.

tags: added: patch
Jim Klein (jklein) wrote :

Resume hang is the xorg-modesetting driver (drat!) fbdev/vesa work, but just barely. Gonna need a good xorg driver for this one.

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-18.28
Jim Klein (jklein) on 2012-03-06
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Jim Klein (jklein) wrote :

FYI, I applied the above patch to the psb_gfx driver in staging on the default 3.2.0-18 kernel for precise and can confirm that it resolves the issue.

Jim Klein (jklein) wrote :

Scratch that - it loads, but there are still issues. Looks like a backport of the 3.3 version will be necessary.

Joseph Salisbury (jsalisbury) wrote :

@Jim

You commented that a backport of the 3.3 version will be necessary. Is there a patch available for v3.3 that resolves this bug, which is different than the patch in comment #4?

tags: added: kernel-da-key kernel-key
Jim Klein (jklein) wrote :

The patch above is for 3.3. As of today it has been applied upstream. Attempted the backport the driver myself but alas I am not a hardware guy. I can get it to build and load, but it causes a kernel oops when trying to setup the framebuffer (psb_gtt_free_range in framebuffer.c, if you're interested).

Joseph Salisbury (jsalisbury) wrote :

Yes, I see the patch in Linus' mainline tree with commit:
055bf38d3d6069707e2d555cffdde629b8404ff2

I'll see if this has been queued up for linux-stable yet.

Joseph Salisbury (jsalisbury) wrote :

I didn't see <email address hidden> on the Cc list for the patch. I replied to the patch submission on LKML to see if it will be submitted to the stable kernel.

In the mean time, can you test the v3.3rc-6 kernel[0] to confirm it resolves the bug?

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3-rc6-precise/

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Jim Klein (jklein) wrote :

I'll poke at it some more, but would probably disable the cedarview build option (for now) on 3.2 (CONFIG_DRM_PSB_CDV). Device works with vesafb and fbdev, just suffers from no access to external ports and a little pointer flicker.

Hi Jim,

Based on comment #8 it sounds like backporting upstream commit 055bf38d3d6069707e2d555cffdde629b8404ff2 to the Precise Ubuntu kernel at least got you past the boot hang issue:

  commit 055bf38d3d6069707e2d555cffdde629b8404ff2
  Author: Alan Cox <email address hidden>
  Date: Mon Mar 5 14:22:16 2012 +0000

    drm, gma500: Fix Cedarview boot failures in 3.3-rc

Assuming this is correct, I've built a Precise test kernel with commit 055bf38d3d6069707e2d555cffdde629b8404ff2 backported. I've then disabled CONFIG_STUB_POULSBO so that the psb_gfx driver is loaded by default for devices which are supported by both drivers (this is more for bug 899244). However, if this patch helps you out, I wanted to get users from bug 899244 to test as well to ensure there are no nasty side affects for others. If you could please test this kernel for me it would be much appreciated.

http://people.canonical.com/~ogasawara/lp944929/i386/

If this at least gets you past the boot issue, I can investigate providing a full backported driver via our linux-backports-modules package. Thanks.

Jim Klein (jklein) wrote :

Thanks Leann, but tried that already (see comment 11). Your's is doing the same thing as mine, ie throwing a kernel oops "Unable to handle kernel NULL pointer dereference" from psb_driver_load->psb_fbdev_init. Right now I'm trying to track down the null pointer.

Jim Klein (jklein) wrote :

OK patch this to 3.2.0-18.28 and Cedarview will work - and by work I mean load ok, creates framebuffer ok, work in X with fbdev driver. Will still need a proper X driver for full operation, and to expose any bugs I might have missed.

Robert Hooker (sarvatt) wrote :

This looks like a mashup of many commits from 3.3 which normally would be a huge maintenance burden, but these changes are limited to cedarview specific files (the code is duplicated out into each specific machine generation) so it wont impact the other gma500 using drivers that do work properly. The only changes not specific to cedarview are

diff --git a/drivers/staging/gma500/framebuffer.c b/drivers/staging/gma500/framebuffer.c
index 3f39a37..07c1af8 100644
--- a/drivers/staging/gma500/framebuffer.c
+++ b/drivers/staging/gma500/framebuffer.c
@@ -244,7 +244,6 @@ static struct fb_ops psbfb_roll_ops = {
        .fb_imageblit = cfb_imageblit,
        .fb_pan_display = psbfb_pan,
        .fb_mmap = psbfb_mmap,
- .fb_sync = psbfb_sync,
        .fb_ioctl = psbfb_ioctl,
 };

which is covered by this changelog note saying its safe to do on other GPUs

 - fix a bug where gtt roll mode called psbfb_sync, which tries to sync
   the 2D engine. On other devices it is harmless as the 2D engine is
   present but not in use when in gtt roll mode, on Cedartrail it causes
   a hang

as well as converting 2 dev_err to dev_dbg to make them less verbose. Given that GMA500 most likely wont receive updates in 3.2.x stable releases because it moved out of staging a changed quite a bit the burden seems minimal.

Tim Gardner (timg-tpi) wrote :

If these are indeed a mashup of 3.3 patches, then I need provenance for each patch involved, e.g., the list of commits and their respective Signed-off-by signatures, etc.

tags: removed: kernel-key

Asus EEE PC 1225c. Intel GMA3600 based on Intel Atom N2600. Ubuntu 12.10 Beta 2 not boot on this netbook. I have black screen in Xorg starting. I can't swich on tty1-6 and i can't see Xorg logs.

thunderamur (thunderamur) wrote :

All version of Ubuntu 12.10 don't start live-session and after installation on Intel Atom N2800 (gma3650). But installing pass without troubles.

tags: added: bios-outdated-v1.09 bot-stop-nagging needs-upstream-testing
removed: kernel-request-3.2.0-18.28

Jim Klein, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

If reproducible, could you also please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc5

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers