kernels following version 3.8.0-27 do not start

Bug #1237392 reported by Guus Bonnema on 2013-10-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Joseph Salisbury
Raring
High
Joseph Salisbury
Saucy
High
Joseph Salisbury
Trusty
High
Joseph Salisbury

Bug Description

All kernel updates on ubuntu following kernel 3.8.0-27 result in an empty screen with no indication of startup, no response to the keyboard. I can only start up using kernel version 3.8.0-27-generic. All following numbers (up to and including 3.8.0-31-generic) do not result in a running system. Also, the display no message at all when starting up (after grub).

I checked the logfiles with "grep -n "Linux version 3.8.0-" *" and found only 27 to be logged, nothing for the other kernels. I am at a loss how to continue.

I am running the most recent version of Ubuntu server.

Two days ago I reinstalled Ubuntu with essentially the same result. The installation kernel version 3.8.0-19 works fine. The updated kernel 3.8.0-31 does not work.

The following is the situation:

    When I let the machine start on itself I get no output at all.
    When I manually tell ubuntu to start the most recent kernel, I get 2 lines:

"Loading Linux-3.8.0-31-generic ..." "Loading initial ramdisk ..."

that's it.

    When I manually start the recovery kernel, I get a lot of information. Finally it drops to a terminal and indicates it could not load the LVM partition /dev/mapper/Server-Root. This is why I never see any result in my message or dmesg on disk.

The problem seems to be with the Areca Raid controller. It does a reset of the scsi bus eh until it finally gives up and drops to a terminal. A while after the system goes berserk with messages of non responding scsi bus.

As I cannot reach my harddisk, is there a way to find out what is wrong? Why does everything work fine up to and including kernel 27 and starts to malfunction after that? Did the Areca software change in some way?

P.S. When I start kernel version 3.8.0-19, everything works fine.

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: ubuntu-release-upgrader-core 1:0.192.13
ProcVersionSignature: Ubuntu 3.8.0-19.30-generic 3.8.8
Uname: Linux 3.8.0-19-generic x86_64
ApportVersion: 2.9.2-0ubuntu8.3
Architecture: amd64
CrashDB: ubuntu
Date: Wed Oct 9 14:53:03 2013
InstallationDate: Installed on 2013-10-07 (1 days ago)
InstallationMedia: Ubuntu-Server 13.04 "Raring Ringtail" - Release amd64 (20130423.1)
MarkForUpload: True
PackageArchitecture: all
ProcEnviron:
 LANGUAGE=en_US:en
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: ubuntu-release-upgrader
Symptom: release-upgrade
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Guus Bonnema (gbonnema) wrote :
tags: added: regression-update
affects: ubuntu-release-upgrader (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1237392

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Guus Bonnema (gbonnema) wrote :

As stated, using the recovery kernel, I drop to a terminal. The prompt is {initramfs} and does not contain the program apport-collect. It is not possible to run this command.

I am sorry. Will set the status to "Confirmed" as requested by the previous note.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.12 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.12-rc4-saucy/

tags: added: performing-bisect
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

We can perform a kernel bisect if the latest mainline kernel still exhibits the bug.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Guus Bonnema (gbonnema) wrote :

The kernel you proposed to install, I installed. It produces pretty much the same behaviour. So I guess it is not solved upstream.
I cannot change the status to "kernel-bug-exists-upstream", it is not part of the options. So I changed it to confirmed.

Hopefully you can change the status properly.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you test the following two upstream kernels:

3.8.13.4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8.13.4-raring/
3.8.13.5: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8.13.5-raring/

We can bisect between these two versions if 3.8.13.4 is good and 3.8.13.5 is bad.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Guus Bonnema (gbonnema) wrote :

Right. Done that. Not sure how you got to these 2 kernels, but you are absolutely right!
3.8.13.4 works: just booted, logged on and rebooted. No problem there.
3.8.13.5 had the same symptoms as the non-working kernels. i checked with the recovery kernel and got largely the same output. It had the same problem.

So, what do we do now?

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I was able to determine those upstream versions by looking at the Saucy changelog. The Ubuntu 3.8.0-27 kernel was based off of upstream 3.8.13.4 and we know that did not have the bug. The Ubuntu 3.8.0-28 kernel was based off of upstream 3.8.13.5, so it was just a matter of testing and confirming.

The next step is to perform a kernel bisect to identify the exact commit that introduced this regression. It will require testing about 6 kernels, which I will build and post links to here. After I build a kernel, I'll post a link to it here for you to download and test. After your testing, I'll build the next kernel based on your results if the test kernel had the bug or not. Eventually it will narrow down to one commit.

I'll post a link shortly with the first test kernel.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

There seems to be only one commit that was introduced for the aacraid driver between .4 and .5:

commit b10e3e058cfd2711674eb8516fc8260d507393c3
Author: Mahesh Rajashekhara <email address hidden>
Date: Tue Jun 18 17:02:07 2013 +0530

    [SCSI] aacraid: Fix for arrays are going offline in the system. System hangs

    commit c5bebd829dd95602c15f8da8cc50fa938b5e0254 upstream.

Before we bisect, I built a test kernel with this commit reverted. Can you test this kernel and see if it resolves this bug? It can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1237392/

You will need to install both the linux-image and linux-image-extra .deb packages.

Revision history for this message
Guus Bonnema (gbonnema) wrote :

I installed the kernel, but it shows the same errors. Weird though, if this is the only commit related to Areca Raid Controller.

What I notice is that the messages are mostly from "arcmsr2" like: "arcmsr2: executing hw bus reset ...." and
"arcmsr: scsi bus reset eh returns with success" and also: "Areca Raid Controller2: F/W v1.46 ...(etc)..."

Could the problem be a different patch / commit as the driver names differ?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v3.8.13.4 final and v3.8.13.5. The kernel bisect will require testing of about 6 test kernels.

I built the first test kernel, up to the following commit:
20910cf880512c21d5eea62a229ea15e1a9256f1

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1237392

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Guus Bonnema (gbonnema) wrote :

Not sure what went wrong. First I compiled the kernel, and I saw that the name was the same as an existing kernel. It didnt work, same error. However, to be sure I uninstalled all the handmade kernels using synaptic and now I cannot compile the kernel.

Also, I am missing one deb if I compare with the other kernels: they all had 3 files: 2 header files (one AMD64, one all) and one kernel image.

The error message I get is simply :

"
Errors were encountered while processing:
 linux-headers-3.8.13-03081304-generic
"

Do I need the 3rd header file?

I want to be sure it is indeed not working using this kernel, because as 4 works, it would be weird if this one doesn't (would mean this patch is to blame I guess).

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

For comment #12, you would only need to install the linux-image .deb package. No need to install the headers.

Revision history for this message
Guus Bonnema (gbonnema) wrote :

thanks, did that. The kernel does not work. It displays the same problems.

Does that mean this is the offending patch? Do we need more installs to find the commit that did it?

Let me know.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

You mention that you compiled the kernel. The test kernel in comment #12 is already compiled and would need to be installed with dpkg. The bisect will require testing of several kernels. Some of these kernels will have the bug and some will not. After you test kernels, I will feed the result back into the bisect, which will tell us which kernel to build next.

Revision history for this message
Guus Bonnema (gbonnema) wrote :

I must choose my words more carefully: I did dpkg -i *.deb (without the header file this time).
It did not work for this kernel. Before I compile a new kernel, I remove the old one.

Let me know.

Revision history for this message
Guus Bonnema (gbonnema) wrote :

Sorry, said compile again: mean dpkg -i.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update. During the bisect we will have some kernels that will exhibit the bug and some that will not. Eventually it will point us to the commit that introduced this. I'll update the bisect with your results and build the next test kernel.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
b25f2c8b46e9362a633a71f6c1a73cc0af279f2b

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1237392

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Guus Bonnema (gbonnema) wrote :

Well: it works! The kernel start, I can logon, and do stuff!! Very encouraging.

Let me know when you have the next kernel to do according to the bisect.

And thank you for your effort

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
58ac02df571bce9739a4aa5e116f061d871399e3

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1237392

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Guus Bonnema (gbonnema) wrote :

Nope, didn't work. Looking forward to your next kernel.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
b10e3e058cfd2711674eb8516fc8260d507393c3

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1237392

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Guus Bonnema (gbonnema) wrote :

Nope, not working. same same.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
3c5559ae2080ba6006d055160cc1aea934b3dda3

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1237392

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Guus Bonnema (gbonnema) wrote :

Right. This one runs ok! I will just keep it running and await your next kernel.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
197f396ad5532223f777f829d35007fd191d3cf0

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1237392

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Guus Bonnema (gbonnema) wrote :

Indeed the kernel runs again. Are we getting closer to the culprit?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Yes, we are getting close. Only a couple more test kernels and we should know the bad commit.

I built the next test kernel, up to the following commit:
d588489e9162cb5af0defed2edefd1d1037c4905

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1237392

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Guus Bonnema (gbonnema) wrote :

This one shows the error.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
c8c65e5ac4768d1ecf8af37cd0a4fcdc17d41a7d

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1237392

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Guus Bonnema (gbonnema) wrote :

It Works!! I can't help but feel victory when it works only, though I think I understand how a bisect works: like a binary search right? Anyway: it works!!!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The bisect pointed to the following commit as the cause of the regression:

Author: Martin K. Petersen <email address hidden>
Date: Thu Jun 6 22:15:55 2013 -0400

    [SCSI] sd: Update WRITE SAME heuristics

I'll build a test kernel with this commit reverted a post a link shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

commit 66c28f97120e8a621afd5aa7a31c4b85c547d33d upstream

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 66c28f97 reverted.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1237392

Can you test that kernel and report back if it has the bug or not.

Changed in linux (Ubuntu Raring):
status: New → In Progress
Changed in linux (Ubuntu Saucy):
status: New → In Progress
Changed in linux (Ubuntu Raring):
importance: Undecided → High
Changed in linux (Ubuntu Saucy):
importance: Undecided → High
Changed in linux (Ubuntu Raring):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Saucy):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Guus Bonnema (gbonnema) wrote :

Hi Joseph,

I installed the image files and tested and it runs.

However, as before I could not install all: specifically the tools file gave a problem.
Not sure if it matters for you or not, so
I copy the last part of the messages, so you can see it seems to be a dependency error.

COPY:
Generating grub.cfg ...
Found linux image: /boot/vmlinuz-3.11.0-13-generic
Found initrd image: /boot/initrd.img-3.11.0-13-generic
Found linux image: /boot/vmlinuz-3.8.13-03081304-generic
Found initrd image: /boot/initrd.img-3.8.13-03081304-generic
Found linux image: /boot/vmlinuz-3.8.0-32-generic
Found initrd image: /boot/initrd.img-3.8.0-32-generic
Found linux image: /boot/vmlinuz-3.8.0-31-generic
Found initrd image: /boot/initrd.img-3.8.0-31-generic
Found linux image: /boot/vmlinuz-3.8.0-19-generic
Found initrd image: /boot/initrd.img-3.8.0-19-generic
Found memtest86+ image: /boot/memtest86+.bin
done
dpkg: dependency problems prevent configuration of linux-tools-3.11.0-13-generic:
 linux-tools-3.11.0-13-generic depends on linux-tools-3.11.0-13; however:
  Package linux-tools-3.11.0-13 is not installed.

dpkg: error processing linux-tools-3.11.0-13-generic (--install):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 linux-tools-3.11.0-13-generic

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

No need to install the tools .deb, only the linux-image and linux-image-extra .debs.

Just to confirm, this kernel resolves the bug?

Revision history for this message
Guus Bonnema (gbonnema) wrote :

Yep, bug resolved. Thank you very much for your effort and time.

Do you know in what ubuntu update it will be?

Changed in linux (Ubuntu Trusty):
status: Confirmed → In Progress
tags: added: saucy trusty
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Upstream reports a new patch, which is in the 3.13 kernel:
https://lkml.org/lkml/2013/10/29/588

I built a test kernel with this patch, which can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1237392

Can you give this kernel a test and see if it resolves this bug?

Revision history for this message
Guus Bonnema (gbonnema) wrote :

Hi Joseph,

It works!

Thanks for giving me the detailed information. So actually the firmware of my raidcard is old.
Is this something I could easily update? Or best to stay with the current version now that the patch was made?

Thanks!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Well the patch is not currently in the mainline kernel. I'll ask if it can make it into the 3.12 kernel. You could also upgrade your firmware. There are some details here:

http://www.areca.com.tw/support/main.htm

Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Released
Changed in linux (Ubuntu Raring):
status: In Progress → Fix Released
Changed in linux (Ubuntu Saucy):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers