Ubuntu

Unusable Slowness In 2.6.38-8

Reported by Chad Hogg on 2011-06-06
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
High
linux (Ubuntu)
Medium
Unassigned

Bug Description

After upgrading from 10.10 to 11.04, I found that my system ran several orders of magnitude more slowly. For example, as part of my debugging process I removed and reinstalled the proprietary nVidia drivers. Running `sudo aptitude remove nvidia-current` in a gnome-terminal required 15 minutes of wall-clock time. During this, I had a second terminal open with `top` running, and it showed that throughout this 15 minute ordeal aptitude or one of its subprocesses (dkpg, man-db, etc) was pegging the CPU to 90% or greater utilization. Starting a web browser to file this report took about 10 minutes, then another 2-3 just to render the simple page. Characters appear in this box multiple seconds after I type them. Etc.

This is the case with the Unity shell and with Ubuntu Classic without visual effects (or whatever it is called). It happens with nvidia, nv, and nouveau. However, if I boot into my old kernel (2.6.35-28), then I get the same responsive system that I had been used to.

I was able to find very few similar sounding reports on the web, and all of them are from people using the same family of hardware: the N51 line of laptops by ASUS. See, for example, http://ubuntuforums.org/showthread.php?t=1743324

I imagine it will be rather difficult to debug a hardware-specific problem without having access to that hardware, and I can get by using the old kernel, but I will be happy to perform any tests or provide any further information that might be deemed useful.

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: linux-image (not installed)
ProcVersionSignature: Ubuntu 2.6.38-8.42-generic 2.6.38.2
Uname: Linux 2.6.38-8-generic x86_64
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC269 Analog [ALC269 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: chad 1545 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xfbff8000 irq 50'
   Mixer name : 'Realtek ALC269'
   Components : 'HDA:10ec0269,10431593,00100004'
   Controls : 13
   Simple ctrls : 8
Card1.Amixer.info:
 Card hw:1 'NVidia'/'HDA NVidia at 0xfde7c000 irq 16'
   Mixer name : 'Nvidia GPU 0a HDMI/DP'
   Components : 'HDA:10de000a,10de0101,00100100'
   Controls : 16
   Simple ctrls : 4
Date: Mon Jun 6 05:30:36 2011
HibernationDevice: RESUME=UUID=b13a5503-fdf3-47a9-b949-bb74412a45f6
MachineType: ASUSTeK Computer Inc. N51Vn
ProcEnviron:
 LANGUAGE=en_US:en
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: root=UUID=9fff8217-4b9c-4bf3-92b9-8285350e3feb ro quiet splash
RelatedPackageVersions:
 linux-restricted-modules-2.6.38-8-generic N/A
 linux-backports-modules-2.6.38-8-generic N/A
 linux-firmware 1.52
SourcePackage: linux
UpgradeStatus: Upgraded to natty on 2011-06-05 (0 days ago)
WpaSupplicantLog:

dmi.bios.date: 06/12/2009
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 211
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: N51Vn
dmi.board.vendor: ASUSTeK Computer Inc.
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK Computer Inc.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr211:bd06/12/2009:svnASUSTeKComputerInc.:pnN51Vn:pvr1.0:rvnASUSTeKComputerInc.:rnN51Vn:rvr1.0:cvnASUSTeKComputerInc.:ct10:cvr1.0:
dmi.product.name: N51Vn
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK Computer Inc.

Chad Hogg (chadhogg) wrote :
Brad Figg (brad-figg) on 2011-06-06
Changed in linux (Ubuntu):
status: New → Confirmed
Chad Hogg (chadhogg) wrote :

I ran a test of my own devising, in which I wrote a trivial C program that calculates the square of a number using a quadratic algorithm, built it, and ran it under both the old and new kernels. Here are the results:

[2.6.35-28 kernel]
chad@siga-asus:~/temp/testing$ time ./a.out 20000
The square of 20000, calculated very slowly, is 400000000

real 0m1.045s
user 0m1.040s
sys 0m0.000s

[2.6.38-8 kernel]
chad@siga-asus:~/temp/testing$ time ./a.out 20000
The square of 20000, calculated very slowly, is 400000000

real 0m53.264s
user 0m49.130s
sys 0m3.100s

Since the "real" time is roughly the sum of "user" and "sys", this seems to be a confirmation that the CPU is the problem, rather than memory or I/O transfers. More interestingly, the "user" time dominating the "sys" time, which seems to indicate that it is not just system calls that are running at a glacial pace.

Chad Hogg (chadhogg) wrote :

I installed the 2.6.38-02063808.201106040910_amd64 mainline kernel following instructions from https://wiki.ubuntu.com/Kernel/MainlineBuilds/, and found that the bug persists there as well. Following that, I performed a manual binary-ish search between the newest kernel known to work and the oldest kernel known to have this behavior.

v2.6.38.8-natty (2.6.38-02063808.201106040910) fails
v2.6.37.1-natty (2.6.37-02063701.201102181135) fails
v2.6.36.1-natty (2.6.36-02063601.201011231330) fails
v2.6.35-maverick (2.6.35-020635) succeeds

So I would be inclined to start looking at changes that were made between the 2.6.35 kernel and the 2.6.36 kernel, except for one thing: the 2.6.36-* and later mainline kernels were configured with natty in mind, while the 2.6.35-* kernels were configured with maverick in mind. I suspect that the change in configuration is more likely to blame for my problems than a change in the kernel itself.

Is there a way I can apply the maverick configuration to the 2.6.36 kernel or the natty configuration to the 2.6.35 kernel to test this?

John Johansen (jjohansen) wrote :

I have a bisect kernel for you to test, whether or not this succeeds there will be several more kernels to test to narrow down the problem.

using
  dpkg -i
please install the test kernel from
  http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.1_amd64.deb

and report back whether this fixes the bug or not for you?

Chad Hogg (chadhogg) wrote :

That kernel panics at boot time. It does not write anything to /var/log, so I am attempting to transcribe the last part of the screendump here:

RIP [<ffffffff8114c76e>] kmem_cache_alloc+0x6e/0x140
 RSP <ffff88012db6fe08>
---[ end trace 0005c10720098185 ]---
general protection fault: 0000 [#8] SMP
last sysfs file: /sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input2/name
CPU 0
Modules linke in: nouveau ttm drm_kms_helper i2c_algo_bit parport_pc ppdev snd_hda_codec_nvhdmi snd_hda_codec_realtek snd_hda_intel joydev snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi uvcvideo snd_seq_midi_event snd_seq r852 snd_timer videodev sm_common btusb v411_compat v412_compat_ioctl32 snd_seq_device bluetooth nand nand_ids nand_ecc mtd psmouse serio_raw video output snd asus_laptop sparse_keymap soundcore snd_page_alloc lp parport ahci firewire_ohci sdhci_pic sdhci libahci firewire_core crc_itu_t atl1c

Pid: 910, comm: upstart-socket- Tainted: G D 2.6.39-2-generic #7 N51Vn /N51Vn
RIP: 0010:[<ffffffff8114c76e>] [<ffffffff8114c76e>] kmem_cache_alloc+0x6e/0x140
RSP: 0018:ffff88012db95d68 EFLAGS: 00010002
RAX: ffff88000a011d50 RBX: 68355737741a6fdc RCX: ffffffff8115d15a
RDX: 0000000000000000 RSI: 00000000000080d0 RDI: ffffffff81a847b0
RBP: ffff88012db95da8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000002 R11: 0000000000000000 R12: ffffffff8a847b0
R13: 00000000000080d0 R14: 00000000000080d0 R15: 0000000000000246
FS: 00007fcade6c1720(0000) GS:ffff88000a000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcadd929090 CR3: 00000012d16d000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000400
Process upstart-socket- (pid: 910, threadinfo ffff88012db94000, task ffff88012e2fc4d0)
Stack:
 ffff88012e2fc4d0 ffffffff8115d15a 000000009d662d853 0000000000000001
<0> ffff88012dbfe900 ffff88012db95e28 ffff88013875b00 ffff88012e2fc4d0
<0> ffff88012db95dc8 ffffffff8115d15a 0000000000008000 0000000000000000
Call Trace:
 [<ffffffff8115d15a>] ? get_empty_filp+0x7a/0x170
 [<ffffffff8115d1ta>] get_empty_filp+0x7a/0x170
 [<ffffffff8116a343>] do_filp_open+0x163/0x5f0
 [<ffffffff8110682a>] ? unlock_page+0x2a/0x40
 [<ffffffff815a25de>] ? _raw_spin_lock+0xe/0x20
 [<ffffffff811755ea>] ? alloc_fd+0x10a/0x150
 [<ffffffff81159c15>] do_sys_open+0x65/0x120
 [<ffffffff81159d10>] sys_open+0x20/0x30
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: 89 c7 fa 66 0f 1f 44 00 00 65 48 8b 14 25 80 eb 00 00 49 8b 04 24 48 8d 04 02 48 8b 18 48 85 db 0f 84 7e 00 00 00 49 63 54 24 18 <48> 8b 14 13 48 89 10 4c 89 ff 57 9d 0f 1f 44 00 00 48 85 db 75
RIP [<ffffffff8114c76e>] kmem_cache_alloc+0x6e/0x140
 RSP <ffff88012db95d68>
---[ end trace 005c10720098186 ]---
Call Trace:
 [<ffffffff8159f68f>] panic+0x91/0x1a1

Chad Hogg (chadhogg) wrote :

I am pleased to report that this one boots and does *NOT* exhibit the bug.

Chad Hogg (chadhogg) wrote :

The third test kernel is bad (boots, but exhibits the bug).

Chad Hogg (chadhogg) wrote :

Test #4 is GOOD. (Well, except that neither it nor test #2 seem to recognize my wireless networking hardware.)

John Johansen (jjohansen) wrote :

5th test kernel (sorry these are taking so long there are a lot of build failures and git bisect skips)
  http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.5_amd64.deb

Chad Hogg (chadhogg) wrote :

Test #5 is GOOD. (With the same caveat.) I understand completely; if most commits did not need to be skipped, I would have continued doing this myself. (Are there no consequences to breaking the build other than the piles of dead kittens?)

John Johansen (jjohansen) wrote :

No real consequences, it just takes longer as you have to kick off a build let it fail, bisect skip and the startup a new build again. As for your wireless not working, that really isn't a surprise as I haven't been building the firmware package to match the kernel, its an extra step I was hoping to avoid.

 6th test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.6_amd64.deb

Chad Hogg (chadhogg) wrote :

Test #6 does not panic, but it does not boot either. Rather, it seems to stop in its tracks after "CE: hpet increased min_delta_ns to 37968 nseclock script 1"

Chad Hogg (chadhogg) wrote :

Test #7 is GOOD.

Chad Hogg (chadhogg) wrote :

Test #8 is GOOD.

Chad Hogg (chadhogg) wrote :

Test #9 is GOOD.

Chad Hogg (chadhogg) wrote :

Test #10 is GOOD.

Chad Hogg (chadhogg) wrote :

Test #11 fails to boot, in the same way that test #6 did.

Chad Hogg (chadhogg) wrote :

Test #12 is GOOD.

Chad Hogg (chadhogg) wrote :

Test #13 is GOOD. If it's not too much trouble, with future ones would you mind posting what revision they are based on?

John Johansen (jjohansen) wrote :

14th test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.14_amd64.deb

it is based on cc7e7d38e9170719f780dd16312eef216714ad35
13th was based on 9ceb4c99f3f117dba16487d7c06790f0238726f8
Last Bad commit that you have tested f6f94e2ab1b33f0082ac22d71f66385a60d8157f (2.6.36)

Chad Hogg (chadhogg) wrote :

Test #14 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.15_amd64.deb

based on 055a1b8c9927bc587f293020a54c6cd8e24dfac0

Chad Hogg (chadhogg) wrote :

Test #15 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.16_amd64.deb

based on e46924d246e028c94689087db0699438343a344e

Chad Hogg (chadhogg) wrote :

Kernel #16 is untestable, like #6 and #11 before it.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.17_amd64.deb

based on e7ee762cf074b0fd8eec483d0cef8fdbf0d04b81

Chad Hogg (chadhogg) wrote :

Kernel #17 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.18_amd64.deb

based on 2c48a7d615b82e030196e8b61ab0c7933be16dff

Chad Hogg (chadhogg) wrote :

Kernel #18 is GOOD.

Chad Hogg (chadhogg) wrote :

Actually, I think there was something wrong in your last bisection. Prior to kernel #18, we knew the problem was introduced no earlier than e7ee762cf074b0fd8eec483d0cef8fdbf0d04b81 (Sep 24, 2010) and no later than f6f94e2ab1b33f0082ac22d71f66385a60d8157f (Oct 20, 2010). Your post claims that kernel #18 is from commit 2c48a7d615b82e030196e8b61ab0c7933be16dff (Jul 27, 2010), which we already know to be prior to the introduction of the bug.

John Johansen (jjohansen) wrote :

Not necessarily, bisects can do some funny things with merges. If a merge falls in the range, the set of patches in the merge can be against earlier kernels. Eg. I have bisected between 2.6.35 and 2.6.36 and run into patches committed against 2.6.33, it can be really confusing when you run into it the first time (well and really still is confusing at times).

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.19_amd64.deb

based on c9fbdd5f131440981b124883656ea21fb12cde4a

Chad Hogg (chadhogg) wrote :

Kernel #19 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.20_amd64.deb

based on 5d4abf93ea3192cc666430225a29a4978c97c57d

Chad Hogg (chadhogg) wrote :

Kernel #20 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.21_amd64.deb

based on 4a73a43741489a652588460e72be959e60bcb9ec

Chad Hogg (chadhogg) wrote :

Kernel #21 is untestable, like #6, #11, and #16 before it.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.22_amd64.deb

based on 018378c55b03f88ff513aba4e0e93b8d4a9cf241

Chad Hogg (chadhogg) wrote :

Kernel #22 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.23_amd64.deb

based on 6e17b0276452912cb13445e5ea552b599984675f

Chad Hogg (chadhogg) wrote :

Kernel #23 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.24_amd64.deb

based on 231ded903f2f30b2bcdbf6672c7197187e3bb1ee

Chad Hogg (chadhogg) wrote :

Kernel #24 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.25_amd64.deb

based on d299eadc098743ea0cfbf9502fb04abf1d39ce36

Chad Hogg (chadhogg) wrote :

Kernel #25 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.26_amd64.deb

based on b83f920e179101a54721e5ab1d6c3edfb9d4bcbb

Chad Hogg (chadhogg) wrote :

Kernel #26 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.27_amd64.deb

based on a0d069ea2c7b81a453d258c7f60e1f61a3fcbd9f

Chad Hogg (chadhogg) wrote :

Kernel #27 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.28_amd64.deb

based on d9a73c00161f3eaa4c8c035c62f45afd1549e38a

Chad Hogg (chadhogg) wrote :

Kernel #28 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.29_amd64.deb

based on 216d7cdd3b060518a2d4faf584eb15ef5af862b6

Chad Hogg (chadhogg) wrote :

Kernel #29 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.30_amd64.deb

based on 30c278192f9ab06125fb042f6e46763e0fd7140a

Chad Hogg (chadhogg) wrote :

Kernel #30 is BAD. Maybe we are actually getting somewhere!

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.31_amd64.deb

based on 0c6532e4e3b0c8bd18dd0a5cc1894a1944997cc6

Chad Hogg (chadhogg) wrote :

Kernel #31 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.32_amd64.deb

based on 9a725995e88fd3fd79daf99819c51d676ba37ad9

Chad Hogg (chadhogg) wrote :

Kernel #32 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.33_amd64.deb

based on 22a57f5896df218356bae6203dfaf04bcfd6c88c

Chad Hogg (chadhogg) wrote :

Kernel #33 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.34_amd64.deb

based on b4973ae9dac3397499f5576c591d5c5bf51c68c6

Chad Hogg (chadhogg) wrote :

Kernel #34 is GOOD.

John Johansen (jjohansen) wrote :

next test kernel
   http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.35_amd64.deb

based on 969a6e521730153380ad7781095f503c040b684c

Chad Hogg (chadhogg) wrote :

Kernel #35 is GOOD.

I suppose that constant branching and merging has some major benefits in development, but it certainly makes this difficult. If there were a linear series of commits, 18 tests would have been sufficient to find an offender among the entire 235277 commits that I am seeing in the tree. Any guess as to how many this is going to require? I started a bisection to follow along once you started posting hashes, and according to it between kernel #17 and kernel #35 we have only decreased the number of revisions left to test from 3419 to 2683.

John Johansen (jjohansen) wrote :

From a developer pov I actually don't like merge commits, the are a pita. Where merges are nice is for the tree maintainers as they don't have to deal with the individual commit series but track/deal with merge entries. The deeper you go into tree maintainer hiearchy the more important it becomes (basically it is what lets linus scale).

As for how much longer considering bisect keeps lying and saying there is about 12 steps left but the number of commits is not being halved, I have no idea how many more it will go through. This has been the longest most painful bisect I have ever done, not only from the pov of number of builds we have cycled through, but also the number of build failures. For almost every build there have been multiple commits that fail to build. This just shouldn't be happening with upstream commits.

So I took a guess and moved the head manually for this next one, bisect is reporting 706 commits (~10 steps), and if bisect only steps a little, I'll do it again.

next test kernel
  http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.36_amd64.deb

based on 62bdb288bf464862a2801b2e53aadc6c4d100fab

Chad Hogg (chadhogg) wrote :

I have only more frustration for you: kernel #36 is untestable (same as #6, #11, #16, #21). Back when I was trying this myself, I recall reading "but git may eventually be unable to tell the first bad commit among a bad commit and one or more skipped commits" in the documentation. Perhaps that is why it seems to be making very poor choices of what commit to test next.

When I asked, long ago, about consequences for breaking the build, I meant that in any project I've worked on (all much smaller than this), if someone committed non-building code, that was the end of their privileges to do so without supervision. As many bad commits as there are in this project, I don't know what to think.

John Johansen (jjohansen) wrote :

It is a possibility that bisecting won't be able to find the bad commit, but we aren't there yet. There a still hundreds of commits to search through. As for the kernel the rule is you don't break the build, there are occasions when it happens, often because it build under one config and then fails under another, I have never seen so many failures before and I hate to guess why that is.

next test kernel
  http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.37_amd64.deb

based on a4099ae79d04ecf31bd0fc5aa4c1472b6fa7993a

Chad Hogg (chadhogg) wrote :

Kernel #37 is also untestable.

John Johansen (jjohansen) wrote :

next test kernel
  http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.38_amd64.deb

based on cb655d0f3d57c23db51b981648e452988c0223f9

Chad Hogg (chadhogg) wrote :

Kernel #38 panics at boot.

John Johansen (jjohansen) wrote :

all right, hopefully this will be more stable its -rc6

next test kernel
  http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.39_amd64.deb

based on 899611ee7d373e5eeda08e9a8632684e1ebbbf00

Chad Hogg (chadhogg) wrote :

You might think that, but you would be incorrect. Kernel #39 is untestable.

John Johansen (jjohansen) wrote :

thanks, 2 more test kernels, one based on rc3 and the other is a rebuild and upload of #35 (last good kernel) as a sanity check if rc3 is failing

  rc3 http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.rc3_amd64.deb
  #35 http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.35_amd64.deb

Chad Hogg (chadhogg) wrote :

Neither rc3 nor the rebuild of kernel #35 are actually at that location on the webserver.

John Johansen (jjohansen) wrote :

hrmm sorry about that, I am not sure what happened. Try again now.

Chad Hogg (chadhogg) wrote :

rc3 has a brand new failure mode: early in the boot process the screen goes completely black and stays that way permanently. I can attach a dmesg log from it if you would like. /me weeps. The rebuild of kernel #35 is still GOOD.

Chad Hogg (chadhogg) wrote :

Every one of them fails to boot on my system.

John Johansen (jjohansen) wrote :

All right if you are game I would like to try chasing down where it starts failing to boot and where it works again, and if we are lucky maybe we can also trace down the bug we are looking for.

so might as well test rc7 and rc8 too
rc7 http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.rc7_amd64.deb

rc8 http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.rc8_amd64.deb

next test kernel
  http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.40_amd64.deb

based on 2f2c779583e9646097b57599f8efeb8eca7bd654

After this the plan is to bisect both sides, between the last good commit - rc1, and rc8 - last bad commit

Chad Hogg (chadhogg) wrote :

Kernel #40 does not boot. Release candidates 7 and 8 are not in your www directory.

John Johansen (jjohansen) wrote :

Sorry about that, I think I purged them accidentally when removing some of the older test kernels, rc7 and rc8 are there now, as well as a new test kernel

  http://people.canonical.com/~jj/linux-image-2.6.39-2-generic_2.6.39-2.7~lp793437.41_amd64.deb

based on 3c09e2647b5e1f1f9fd383971468823c2505e1b0

we are narrowing in on where you start getting failures to boot, if this kernel boots we know the failure comes from merge
  2f2c779583e9646097b57599f8efeb8eca7bd654
else its in the 7 commits between 969a6e521730153380ad7781095f503c040b684c..3c09e2647b5e1f1f9fd383971468823c2505e1b0

Chad Hogg (chadhogg) wrote :

Neither release candidate boots, but kernel #41 does (and is GOOD).

Chad Hogg (chadhogg) wrote :

So ... have we given up?

Nate Cornell (nathandcornell) wrote :

This is still an issue in the 11.10 beta.

Chad Hogg (chadhogg) wrote :

Unfortunately, I am no longer able to participate in testing. After losing John Johansen's interest, I overwrote my Ubuntu installation with Debian Stable. Whether this will be an issue for them as well once they adopt a more modern kernel remains to be seen.

It most likely will, I tried fedora 15 and it had the exact same problem.

I also tried compiling the latest 3.1rc4 kernel from kernel.org in the vain hope that whatever was causing this could have fixed accidentaly fixed itself in a newer merge, but that's not how it works, it still had the problem.

About this bug, jjohansen stated in #92 that if that kernel worked then the offending merge was 2f2c779583e9646097b57599f8efeb8eca7bd654, and chadhog stated in his reply that the kernel _did_ work so if I'm not misinterpreting this, we already know what merge was causing it?

I do have an n51vn laptop available for testing if anyone wants to keep trying, I fear I lack the knowledge and skill to do it myself but I can follow directions.

Chad Hogg (chadhogg) wrote :

He was referring to merge 2f2c779583e9646097b57599f8efeb8eca7bd654 being the last one before a series of kernels that would not even successfully boot on my hardware (and presumably yours as well). That is useful information, but not that useful since whatever was broken in that series has already been fixed. As for when the extreme slowness began, we were still quite far from finding that.

Nate Cornell (nathandcornell) wrote :

I am with Carles; although I am inexperienced, I would rather do all I can to help resolve this bug than to stand idle while the Linux Kernel moves forward without me.
Thanks for continuing to monitor this bug, Chad, despite have abandoned Ubuntu.

I am not sure what I can do to contribute aside from installing and testing kernels, but I want to keep this thread alive until this bug is addressed and resolved.

Any developers who can lend a hand by picking up where Mr. Johansen left off, we could sure use the help!

will H. (a757482) wrote :

Hello! Thanks to Chad and John for you work in trying to fix this problem. I'm not sure if those of you who are trying to fix this problem will find the following information useful, but for the rest of you trying to get to a usable system, there is a temporary fix. When you load up the live CD (or just boot into the OS if you already have it installed), just put the computer to standby mode (sleep). When the system wakes up, the speed and responsiveness of the system is back to how one would expect of a normal working system until you restart your computer. My suggestion, if you are using a live cd for whatever reason, is to put the computer to standby mode after you have selected the "try ubuntu" choice, and let all the components load first.

I sincerely hope the importance of this bug is elevated, as it is a complete show stopper, from personal and anecdotal evidence. Since this bug has to do with the kernel I cannot install ANY up-to-date distribution with this kernel. Since I am trying to recover data from a 3 TB drive, I MUST use a recent kernel as they are the only ones that can handle large partitions. Again thanks to all of you who have attempted to help fix this situation.

Cheers,

Will

Your best bet is probably reporting the bug upstream, with your new data on the problem someone might take interest in it, I'm not knowledgeable on the matter at all, but your workaround seems to suggest something amiss with the acpi support (most likely a hardware bug though) and I remember there being quite a stir about that around the time this problem started.

Just to be sure, do you have an asus n51 laptop, how do you trigger this "stand-by mode"?

If I have understood correctly, your workaround involves loading the system, putting it in sleep mode and then starting it up again?

Will H (dudemanguybob) wrote :

Hello Carles,

This is Will from post 100. My friend let me use his email address, but I suppose I should start using my own, hence the different account. I have an n51vn laptop, and yes I meant sleep mode. Your last statement is exactly what I do to put the system in a usable state. Thanks a bunch for trying to help in this situation.

Cheers,

Will

I will test this workaround as soon as I manage to find the other laptop, it is a bit of a shame that the workaround pretty much involves never turning the machine off, but for a portable computer I guess it is not so bad, depending on how much power draw it has got in sleep mode.

You shouldn't thank me, I really can't help you having found this workaround you already know more about the problem than I do.

If reporting the bug upstream does not work you could try testing with another distribution (like fedora, to name one) and reporting the bug in their bug tracker if it also shows the same symptoms (it did the last time I tried, I think it was in august), then you could also reference the new report with this one.

It will not fix the problem but it may attract the eyeballs of people with enough knowledge to fix it.

Will H (dudemanguybob) wrote :

I believe the previous posters and their thought that the problem stems from the kernel. I have tried the latest Fedora, Linux Mint, and Debian distros. Fedora and Linux Mint (since it is derived from Ubuntu) both exhibit the same symptoms, but Debian did not (I believe because it doesn't use a bleeding edge kernel).

I am pretty inexperienced with the dev side of linux in general. Mostly what I know is to search forums for the problems I have from a search engine. So, I know what reporting upstream means, but I don't know where to do it. Quite honestly, this is my first time that I've even posted on a forum about a linux problem, and I can count the number of time I have posted on forums in general on my fingers.

Can you direct me where to go to report this bug upstream? Thanks and Merry Christmas!

Cheers,

Will

I would usually tell you to post it in kernel.org's bugzilla, however I don't think this is an option because it has been down since the website was hacked earlier this year.

So I guess the closest thing would be to send a mail to the linux kernel mailing list, without doing a proper bisect and not having much technical insight into the problem (we don't even know what version was the first to show the issues, we guess it was somewhere around the 2.6.36 release) it is very likely that the message will be ignored but it doesn't hurt to try.

Other than that, not all distributions compile kernels in the same way, there might very well be a distribution that by chance does not have this problem, however a lot of distributions out there nowadays are pretty much repackaged ubuntu/debian so I would not really count on it. Testing and reporting the bug in their respective bug tracking/reporting systems might help though, as a rule of thumb, the more awareness there is about a problem the better.

Nate Cornell (nathandcornell) wrote :

Reported this bug in kernel.org's bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=42725

Changed in linux:
importance: Unknown → High
status: Unknown → Incomplete
emuse (goemusic) wrote :

Hi,

Just to give an up, I'm not an ubuntu user, but have exactly the same problem with an asus N51 core 2 duo and kernels >= 2.6.38. I've been following kernels up to 3.2.7 and the bug persists until now. I was pleased to find the mentioned workaround here: going into standby mode and back does restore normal CPU speed!

Also, turning off ACPI will work with normal CPU speed.

The slowdown appears at some rather late stage during bootup, bootup speed with systemd is almost normal.

Thanks to the reporters

Frank

ASUS N51 Laptops compatibility workaround....

Enter BIOS
then
Select IDE Configuration
then
Select SATA Operation Mode
then
Set SATA Operation Mode to "Compadible"
Exit Save Changes BIOS
Install as normal
Works for all Linux and BSD distros
.........

Kevin A. (d876037) wrote :

@Ken #108

That method did not work for me (yes I do have an Asus N51vn). I don't know about the wake up process, but I don't know why it would change the SATA Operation Mode during wake up from sleep to make the system work properly again. The other commenters stated that putting the computer to sleep and waking up returned the system to normal operation for that session.

The workaround described in message #108 does not work for me either, has it worked for anyone?

Nate Cornell (nathandcornell) wrote :

No, Ken's solution in #108 failed to resolve it for me as well.
I wasn't surprised, however; it's pretty clear this is an ACPI problem.

tags: added: needs-upstream-testing
tags: added: regression-release

Chad Hogg, thank you for reporting this and helping make Ubuntu better. Natty reached EOL on October 2012.
Please see this document for currently supported Ubuntu releases:
https://wiki.ubuntu.com/Releases

We were wondering if this is still an issue in a supported release? If so, could you please test for this with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the kernel in the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested and remove the tag:
needs-upstream-testing

This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the text:
needs-upstream-testing

If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested.

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested.

If you are unable to test the mainline kernel, please comment as to why specifically you were unable to test it and add the following tags:
kernel-unable-to-test-upstream
kernel-unable-to-test-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested.

Please let us know your results. Thank you for your understanding.

Helpful Bug Reporting Tips:
https://help.ubuntu.com/community/ReportingBugs

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Nate Cornell (nathandcornell) wrote :

As noted by Chad in Comment #96, he has switched to a different distro.
The only progress made on this bug since May has been within the Linux Kernel Bug Tracker: http://bugzilla.kernel.org/show_bug.cgi?id=42725

I tested 12.10 as you suggested, but the problem remains.
The latest generic kernel image also has this problem.

It appears to be an issue with the /sys/class/thermal/thermal_zone0/trip_point_1_temp showing up as '0' for the Intel Core 2 Duo CPU P8700 2.53GHz processors, causing the processor to be constantly throttled.

Nate Cornell, if you have a bug in Ubuntu, could you please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see the Ubuntu Bug Control and Ubuntu Bug Squad article:
https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue

and Ubuntu Community article:
https://help.ubuntu.com/community/ReportingBugs

When opening up the new report, please feel free to subscribe me to it. Thank you for your understanding.

tags: added: kernel-bug-exists-upstream-3.7.0-030700rc1
removed: needs-upstream-testing
Nate Cornell (nathandcornell) wrote :

This same bug still affects currently supported distributions.
As I noted above, I tested 12.04 and the latest mainline kernel, both have the same problem.

I am not reporting a different issue, I was merely trying to give you an update on this issue since it has been over a year since anyone from Canonical has show any interest in this bug. I just thought it would be helpful for you to have the latest information.
This is the same defect. I have the exact same hardware as Chad, who abandoned Ubuntu because of this bug, so I am following up instead.

I am unclear why you want a new bug report for this bug; shouldn't we just try to resolve this one?

tags: added: needs-upstream-testing
removed: kernel-bug-exists-upstream-3.7.0-030700rc1
Chad Hogg (chadhogg) wrote :

As Nate pointed out, I no longer have an Ubuntu installation to test with, and I do not have a spare partition available for one either. I could test some live CD's, but since Nate has exactly the same issue on exactly the same hardware and has already tested both the current release and the upstream kernel, I'm not going to waste my time or CDs unless you need me to.

Chad Hogg, thank you for your comments. Regarding them https://bugs.launchpad.net/ubuntu/+source/linux/+bug/793437/comments/116 :
>"As Nate pointed out, I no longer have an Ubuntu installation to test with, and I do not have a spare partition available for one either. I could test some live CD's, but since Nate has exactly the same issue on exactly the same hardware and has already tested both the current release and the upstream kernel"

This is not how bug reporting works. For more on this please see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/793437/comments/114.

>", I'm not going to waste my time or CDs unless you need me to."

If you do not have a Ubuntu installation, or are unwilling to test this on your hardware that you originally reported with, then you are welcome to mark this report Status Invalid.

Thank you for your understanding.

Chad Hogg (chadhogg) wrote :

OK, I've changed the status to Invalid. Not because it is not a bug, and not because it was not confirmed by several people, one of whom is willing to keep testing it, but because I personally cannot take the time to test it right now. It would have been nice if I had not lost the interest of the developer who was working on it a year ago, but perhaps if Nate Cornell makes a separate bug report you can work with him.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in linux:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.