System hard locks with Lucid

Bug #535572 reported by Ancoron Luziferis
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I'm currently using Kubuntu Lucid Alpha 3 with latest updates applied and the more recent xorg drivers from the xorg-edgers PPA on launchpad as this machine has a AMD/ATI Radeon HD4770 (RV740) and only the 2.6.33 kernel has the initialization fix to run this system at all (of course vesa could be used otherwise but that is a waste of resources).

The system comes up in UMS only (KMS is still broken with this card). It runs fine for a while and then hard locks at some point.

There is nothing in the logs. If I wouldn't run the clock at my Logitech G15 LCD keyboard I wouldn't know when it locked up.

I monitor the system to exclude failing hardware.

Beside those hard locks I discovered some other oddities:

- KVM kernel modules (kvm, kvm-amd) doesn't load at boot time (not even with forcing them through /etc/modules)
- g15daemon starts but then just whipes away, LCD is black after that

Hardware:
- AMD Phenom II X4 955 (3.2GHz)
- 2x 2GB DDR3 RAM (1333MHz)
- Gigabyte GA-790XTA-UD4 motherboard
- AMD/ATI Radeon HD4770 (RV740)
- Dell WFP3007 30" LCD (2560x1600)
- 2x 500GB (Hitachi Deskstar T7K500, no RAID)

I've even updated the BIOS in hope that it fixes the issue.

$ lsb_release -rd
Description: Ubuntu lucid (development branch)
Release: 10.04
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/dsp', '/dev/snd/by-path', '/dev/snd/controlC1', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1c', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D2c', '/dev/snd/seq', '/dev/snd/timer', '/dev/sequencer2', '/dev/sequencer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'SB'/'HDA ATI SB at 0xfe024000 irq 16'
   Mixer name : 'Realtek ALC889'
   Components : 'HDA:10ec0889,1458a102,00100004'
   Controls : 41
   Simple ctrls : 23
Card1.Amixer.info:
 Card hw:1 'HDMI'/'HDA ATI HDMI at 0xfdffc000 irq 19'
   Mixer name : 'ATI R6xx HDMI'
   Components : 'HDA:1002aa01,00aa0100,00100100'
   Controls : 4
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [off]
DistroRelease: Ubuntu 10.04
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=81ddb40a-ad96-4059-8bba-e18c013d5c48
InstallationMedia: Kubuntu 10.04 "Lucid Lynx" - Alpha amd64 (20100225)
MachineType: Gigabyte Technology Co., Ltd. GA-790XTA-UD4
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-22-generic root=UUID=aac3cf7c-104c-4af2-930a-b848744a59c0 ro crashkernel=384M-2G:64M,2G-:128M quiet splash radeon.modeset=0
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-22.33-generic 2.6.32.11+drm33.2
Regression: No
RelatedPackageVersions: linux-firmware 1.34
Reproducible: Yes
RfKill:

Tags: lucid filesystem needs-upstream-testing
Uname: Linux 2.6.32-22-generic x86_64
UserGroups:

dmi.bios.date: 12/14/2009
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F3a
dmi.board.name: GA-790XTA-UD4
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF3a:bd12/14/2009:svnGigabyteTechnologyCo.,Ltd.:pnGA-790XTA-UD4:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-790XTA-UD4:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: GA-790XTA-UD4
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
Ancoron Luziferis (ancoron) wrote :
Revision history for this message
Ancoron Luziferis (ancoron) wrote :
Revision history for this message
Ancoron Luziferis (ancoron) wrote :
Revision history for this message
Ancoron Luziferis (ancoron) wrote :
Revision history for this message
Ancoron Luziferis (ancoron) wrote :
Revision history for this message
Ancoron Luziferis (ancoron) wrote :

This shows that the last cron job ran before the hard lock occurred at Mar 10 01:38 (this time is where the LCD clock of the G15 froze).

Revision history for this message
Ancoron Luziferis (ancoron) wrote :

...and nothing suspicious in the daemon.log

tags: added: kernel-series-unknown
Revision history for this message
Ancoron Luziferis (ancoron) wrote :
Download full text (9.1 KiB)

I switched over to the Lucid kernel 2.6.32-16 as I saw that the RV740 initialization fix was backported.

With that kernel the system hard locks too!

There is a strange thing happening with both kernels too that might be related to this:

After some idle time of the system it begins to stop screen updates whenever the mouse is not moving or I'm not typing on the keyboard. It even freezes (visually) when watching a video that doesn't consume much CPU time.

I first thought this might be an issue for the graphics driver but then noticed that the monitoring graphs do not "jump" when the screen froze and wake it up again by simply typing or moving the mouse. So the whole thing of gathering data and up to displaying them in some applet or so stops. This does occur even when running something in a terminal:

$ while [ 0 ]; do oldtime=$time; time=`date +'%s.%N'`; echo "($time - $oldtime - 1) * 1000" | bc -l | xargs printf "[$time] missed %1.0f ms\n"; sleep 1; done
[1268384756.719304336] missed 1268384755 ms
[1268384760.956376208] missed 3237 ms
[1268384761.969785663] missed 13 ms
[1268384762.989047073] missed 19 ms
[1268384764.008941132] missed 20 ms
[1268384765.027888570] missed 19 ms
[1268384766.047799420] missed 20 ms
[1268384767.067276339] missed 19 ms
[1268384768.085794982] missed 19 ms
[1268384769.105963537] missed 20 ms
[1268384774.316241301] missed 4210 ms
[1268384775.333372425] missed 17 ms
[1268384776.352990352] missed 20 ms
[1268384777.372331300] missed 19 ms
[1268384778.391534041] missed 19 ms
[1268384779.411323055] missed 20 ms
[1268384780.430166883] missed 19 ms
[1268384790.774276584] missed 9344 ms
[1268384791.795092061] missed 21 ms
[1268384794.283809211] missed 1489 ms
[1268384795.303478736] missed 20 ms
[1268384797.004297712] missed 701 ms
[1268384798.019775137] missed 15 ms
[1268384799.038297034] missed 19 ms
[1268384800.056771282] missed 18 ms
[1268384801.937260241] missed 880 ms
[1268384802.954018928] missed 17 ms
[1268384821.527875748] missed 17574 ms
[1268384822.544634286] missed 17 ms
[1268384823.564795546] missed 20 ms
[1268384824.583885167] missed 19 ms

This does not occur when the system is under heavy fire. I just made a test with iozone:

$ iozone -e -i0 -i1 -i2 -i8 -s1g -r128k -t2 -F /tmp/ioz1.tmp /tmp/ioz2.tmp

And during the tests no freeze occurs as long as there is much disk activity or CPU time consumed, although in betwwen there are some lags:

[1268385201.647888700] missed 376064 ms
[1268385202.667718263] missed 20 ms
[1268385203.686453300] missed 19 ms
[1268385204.705197525] missed 19 ms
[1268385205.723523933] missed 18 ms
[1268385206.741247016] missed 18 ms
[1268385207.759106707] missed 18 ms
[1268385208.777791422] missed 19 ms
[1268385209.796496832] missed 19 ms
[1268385210.820466739] missed 24 ms
[1268385211.837558364] missed 17 ms
[1268385212.858761885] missed 21 ms
[1268385214.097645350] missed 239 ms
[1268385215.115377250] missed 18 ms
[1268385216.134211222] missed 19 ms
[1268385217.154853358] missed 21 ms
[1268385218.175325240] missed 20 ms
[1268385219.195318476] missed 20 ms
[1268385220.215492131] missed 20 ms
[1268385221.234281344] missed 19 ms
[1268385222.256452693] missed 22 ms
[1268385223.276511999...

Read more...

Revision history for this message
Ancoron Luziferis (ancoron) wrote :

$ uname -a
Linux ladydeath 2.6.32-16-generic #25-Ubuntu SMP Tue Mar 9 16:33:12 UTC 2010 x86_64 GNU/Linux

$ cat /proc/version_signature
Ubuntu 2.6.32-16.25-generic

Revision history for this message
Ancoron Luziferis (ancoron) wrote :

I'm currently using XFCE4 to see if the hard locks depend on some KDE thing to run (but I doubt that). At least I had the system running for about 10 hours staying in KDM, not logged in and it didn't lock up.

Usually the system locks up within a few hours whether I'm actually using it or not.

P.S.: The time lags I captured previously doesn't affect the logitech G15 keyboard clock (which is controlled by g15daemon in software too) so I immediately see when the system is locked when I come back to it.

Revision history for this message
Ancoron Luziferis (ancoron) wrote :

Another interesting thing is that even when I'm working inside a terminal fine tuning some longer command line and e.g. pressing and holding one of the arrow keys the screen updates stop occurs and even the key events stop.

Revision history for this message
Ancoron Luziferis (ancoron) wrote :

Strange... I experience those visual locks all the time but since I'm using XFCE4 currently for more than 24 hours doing the same things as with KDE4 (even same applications, e.g. gwenview, dolphin, kopete, krdc, ...) I didn't get a freeze yet.

I'll update once more and switch over to KDE4 again to see if it is solved. This let's me think it has to do with the KDE4 power management stuff (PowerDevil in KDE4) as it is disabled if another power manager is in place (xfce4-power-manager). So if I see the freeze still occurring I'll try if disabling PowerDevil resolves the issue.

In that case although the freeze may be triggered by PowerDevil may originate from some lower level.

Revision history for this message
Ancoron Luziferis (ancoron) wrote :

I just got the hard lock again with XFCE4. So it's not related to some PowerDevil thing.

Revision history for this message
Ancoron Luziferis (ancoron) wrote :

Strange...

no freeze with Cool'n'Quiet disabled in the BIOS. Although now I don't have CPU scaling available and so the system is much louder (of course) as it could be when idling around.

Revision history for this message
Ancoron Luziferis (ancoron) wrote :

Well, the machine now seems to run a little longer than before, but one update and a reboot later I got the hard lock back again. Even with C&Q disabled and with a "fix" for the big time lags ("acpi_skip_timer_override" for the linux boot).

This time it locked up within minutes while working at it.

Could someone be so kind and give me something to try besides what I already did?

Ancoron

tags: added: kernel-bug
removed: kernel-series-unknown
tags: added: kernel-series-unknown
summary: - System hard locks with 2.6.33 mainline
+ System hard locks with Lucid
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Ancoron,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 535572

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Ancoron Luziferis (ancoron) wrote :

For some time now the system runs somewhat stable. I enabled Cool & Quite in the BIOS but disabled C1E support.

If I enable C1E support than I immediately get those big lags back (doesn't depend on a specific kernel as I tested 2.6.32 up to 2.6.34) and I presume that after some time the machine would lock again. With C1E disabled I get the following output from my script which is much nicer:

...
[2010-05-28 07:05:21.500] missed 27 ms
[2010-05-28 07:05:22.528] missed 28 ms
[2010-05-28 07:05:23.555] missed 27 ms
[2010-05-28 07:05:24.583] missed 28 ms
[2010-05-28 07:05:25.610] missed 27 ms
[2010-05-28 07:05:26.637] missed 27 ms
[2010-05-28 07:05:27.665] missed 28 ms
[2010-05-28 07:05:28.693] missed 27 ms
[2010-05-28 07:05:29.720] missed 27 ms
[2010-05-28 07:05:30.747] missed 27 ms
[2010-05-28 07:05:31.774] missed 27 ms
[2010-05-28 07:05:32.802] missed 28 ms
[2010-05-28 07:05:33.829] missed 27 ms
[2010-05-28 07:05:34.858] missed 29 ms
[2010-05-28 07:05:35.885] missed 27 ms
[2010-05-28 07:05:36.913] missed 27 ms
...

So the "lags" are now considered to be stable (26-29ms). But C1E support is a nice thing regarding power management so this still is an issue and probably will be for Ubuntu 10.10.

Changed in linux (Ubuntu):
status: Expired → Incomplete
tags: removed: kj-expired
Revision history for this message
Ancoron Luziferis (ancoron) wrote : AlsaDevices.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Ancoron Luziferis (ancoron) wrote : AplayDevices.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : ArecordDevices.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : BootDmesg.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : Card1.Codecs.codec.0.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : IwConfig.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : Lspci.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : Lsusb.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : PciMultimedia.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : ProcModules.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : UdevDb.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : UdevLog.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote : WifiSyslog.txt

apport information

Revision history for this message
Ancoron Luziferis (ancoron) wrote :

Those apport-collect is from the Lucid kernel running with C1E enabled.

Revision history for this message
Charlie Daly (cdaly) wrote :

I just got a hard lock in Lucid. I am not sure if it's related. I booted from another disk (same system otherwise) so that logs would be unaffected.

uname -a
Linux lucid-8g 2.6.32-22-generic #36-Ubuntu SMP Thu Jun 3 22:02:19 UTC 2010 i686 GNU/Linux

The computer is a Toshiba Satellite T110 with 4G of RAM. It's been working fine for about a week. The setup was fairly standard except that I have a LUKS encrypted SDHC disk mounted as home.
I also had a LUKS encrypted USB disk at the time and was copying files. All file activity seemed to have stopped and the system froze. Clock stopped updating and I couldn't get a terminal using the CTRL, ALT F1 combination.

Revision history for this message
Charlie Daly (cdaly) wrote :
Revision history for this message
Charlie Daly (cdaly) wrote :
Revision history for this message
Ancoron Luziferis (ancoron) wrote :

Hi Charlie,

your hardware setup seems to differ a lot from mine. I presume the problem I reported here deals with some AMD-related things (CPU or some never Hardware on the board where support was incomplete in kernel 2.6.32).

I currently use 2.6.35-rc3 and since 2.6.34 I didn't experience any freezes again.

However, to verify that you have the same problem as I did please put the following into a script and run it in a terminal (let it run until you experience the freeze again and upload the produced log here):

#!/bin/sh

logfile="ticks-`date +'%F-%H-%M-%S'`.log"

time=0

while [ 0 ]; do
 oldtime=$time
 time=`date +'%s.%N'`
 sec=`echo "$time" | sed -e 's/^\([0-9]\+\)\..*$/\1/g'`
 ms=`echo "$time" | sed -e 's/^.*\.\([0-9]\{3\}\).*$/\1/g'`
 pretty="`date --date "1970-01-01 $sec sec" +'%F %T'`.$ms"
 echo "($time - $oldtime - 1) * 1000" | bc -l | xargs printf "[$pretty] missed %1.0f ms\n" | tee -a $logfile
 sleep 1
done

Revision history for this message
Charlie Daly (cdaly) wrote :

Hi Ancoron,
I guess you're right that it is unrelated. For one thing, my freeze was a one-off. I haven't managed to duplicate it. I use Ubuntu a lot, but I'm not really au fait with the kernel and therefore am not sure how to find out more about the bug in order to report it properly.

Assuming that it was file system related and it affected the kernel, it is likely to be a file system driver that is at fault.

In any event, I ran your script for about 90 minutes and had no ill effects.

Revision history for this message
Ancoron Luziferis (ancoron) wrote :

If your freeze happened during file system activity did the system swap heavily?

I ask because we have another bug here: #561210

Although the summary tells something else the main reason for those freezes seem to be heavy swapping issues, And so your freeze could be related to that too.

To test that you could make your system consume all memory (maybe already swapping some MiB's). And then start such a file copy job as you did before. If it freezes then or makes your mouse pointer movement stuttering or the system less responsive then chances are good that it is your issue too.

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.