Ubuntu 15.10 crashes with RAID-10

Bug #1516547 reported by Kenny Lindberg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned
Wily
Invalid
High
Joseph Salisbury

Bug Description

When trying to install Ubuntu 15.10 with a RAID-10 created the system crashes after some time during the installation. Hitting Alt+F4 gives the following output as seen from screenshots in attachments.

It looks like kernel 4.2 have some issues with RAID-10 as we can install Ubuntu 15.04 (with kernel 3.19) without problems. If we upgrade to kernel 4.2 in Ubuntu 15.04 we get the same errors.

We have no problems when installing without RAID, RAID-0 or RAID-1. We've also tried on different hardware and different disk types/sizes but we still get system crash.

Revision history for this message
Kenny Lindberg (kennyl) wrote :
Revision history for this message
Kenny Lindberg (kennyl) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1516547/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → linux (Ubuntu)
tags: added: wily
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1516547

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.3 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.3-wily/

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Kenny Lindberg (kennyl)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Kenny Lindberg (kennyl) wrote :

We've located the issue to be somewhere in the RAID rebuilding process.

Turning down the RAID rebuild speed with the command "echo 1 > /proc/sys/dev/raid/speed_limit_max" will make the installation success but this is only pushing the crash for later in the OS.

We've also noticed when installing without RAID an installation takes about 15 minutes. With RAID it can take up to one hour as it seems like the RAID rebuild is lacking the install process. This isn't experienced in Ubuntu 15.04.

It's not possible the see the same errors from the screenshots in syslog probably because the system is unable to respond and read/write to the disk. Also when trying to login from the terminal the server just hangs after entering username.

We'll try to test with kernel 4.3.

Revision history for this message
Kenny Lindberg (kennyl) wrote :
Revision history for this message
Kenny Lindberg (kennyl) wrote :

The issue persist when using kernel 4.3.0-040300-generic

tags: added: kernel-bug-exists-upstream
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a bisect to figure out what commit caused this regression. We need to identify the earliest kernel where the issue started happening as well as the latest kernel that did not have this issue.

Can you test the following kernels and report back? We are looking for the first kernel version that exhibits this bug:

4.0 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-vivid/
4.1 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.1-wily/
4.2 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.2-wily/

You don't have to test every kernel, just up until the kernel that first has this bug. We can then narrow down further by testing some release candidates.

Thanks in advance!

tags: added: performing-bisect
Revision history for this message
Kenny Lindberg (kennyl) wrote :

The issue seems to start from kernel 4.2 and up as we haven't been able to reproduce the problem with kernel 4.1

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Next, we need to figure out which release candidate introduced this bug. Once we find that, we can bisect down to the exact commit.

Can you test the following kernels and post back? We are looking for the first kernel version that exhibits this bug.

Can you next test the v4.2-rc5 kernel? It can be downloaded from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.2-rc5-unstable/

If that kernel exhibits the bug, we would want to test rc2. If it does not have the bug, we would want to test rc6 or rc7. Basically we want to know that last good rc and the first bad rc.

Revision history for this message
Kenny Lindberg (kennyl) wrote :

We've tested kernel v4.2-rc5-unstable which exhibits the same bug. We're now testing downwards through the rc kernels.

While testing v4.2-rc5-unstable we managed to get some logs from `dmesg` just before the crash. See attachment.

Revision history for this message
Kenny Lindberg (kennyl) wrote :
Revision history for this message
Kenny Lindberg (kennyl) wrote :

Our testings shows that all kernels from v4.2-rc1 to v4.2-rc5 has this bug.

We can test downwards further but from which kernel should we continue? v4.1-rc8-unstable?

While testing on different hardware we've experienced that this bug doesn't seems to appear when having SSD disk only (uptime is now 5 days).

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Wily):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Wily):
importance: Undecided → High
status: New → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the testing. The results would indicate the bug was introduced in v4.2-rc1 since 4.1 final did not exhibit the bug.

I started a kernel bisect between v4.1 and v4.2-rc1.

I built the first test kernel, up to the following commit:
4570a37169d4b44d316f40b2ccc681dc93fedc7b

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Kenny Lindberg (kennyl) wrote :

Kernel tested. This seems to be stable as the server has been running for 24 hours now and the RAID has rebuild without crash.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
9b289c2610c8d0e595c19d4dca318bb43b891841

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Kenny Lindberg (kennyl) wrote :

Kernel tested. Also seems to be stable.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
c0c3a718e3ab2430a52a60d614b109e5e48e83e2

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Kenny Lindberg (kennyl) wrote :

Kernel tested. Seems to be stable.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
7adf12b87f45a77d364464018fb8e9e1ac875152

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Kenny Lindberg (kennyl) wrote :

Kernel tested. Seems to be stable.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
a7ba4bf5e7ff6bfe83e41c748b77b49297c1b5d9

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Kenny Lindberg (kennyl) wrote :

Kernel tested. Seems to be stable.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
d033ed9eeafc3bf33ce2de286ea2fb2c63e1c183

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Kenny Lindberg (kennyl) wrote :

Kernel tested. Seems to be stable.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
1b3618b60a487fa219c5381a9c983a00c40e6477

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Kenneth Østrup (kennetho) wrote :

Hi Joseph. I am taking over testing for Kenny. I have tested your latest test kernel and succesfully reproduced the fault, just as the new kernel booted. I am not even able to limit rebuild speed before the problem appears.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
b9df84fd7c05cc300d6d14f022b8a00773ebcf8c

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Kenneth Østrup (kennetho) wrote :

Hi again Joseph. This kernel is working as expected. No crashes so far, RAID rebuild complete.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
5c65e7be4cbb015e39759275746f31bb6fa74f77

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Kenneth Østrup (kennetho) wrote :

Hi again Joseph. This kernel is working as expected. No crashes so far (running for 9 days now), RAID rebuild complete.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
22a093b2fb52fb656658a32adc80c24ddc200ca4

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Kenneth Østrup (kennetho) wrote :

Hi again Joseph. This kernel is working as expected. No crashes so far, RAID rebuild complete.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
14a6f1989dae9445d4532941bdd6bbad84f4

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Christoffer Stokbaek (stokbaek) wrote :

Hey Joseph. This kernel is working as expected. No crashes so far, RAID rebuild complete.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
59fd132340b3e37b83179d2fcb673980035edf62

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Christoffer Stokbaek (stokbaek) wrote :

Hey Joseph. This kernel is working as expected. No crashes so far, RAID rebuild complete.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
b0996ae48285364710bce812e70ce67771ea6ef7

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Christoffer Stokbaek (stokbaek) wrote :

Hey Joseph. This kernel is working as expected. No crashes so far, RAID rebuild complete.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
a88464a8b0ffb2f8dfb69d3ab982169578b50f22

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1516547

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Kenny Lindberg (kennyl) wrote :

Hi Joseph.

Sorry for the late reply.

Ubuntu 16.04, which uses kernel 4.4, has been tested without facing these errors and Ubuntu 15.10 is also end-of-life July 2016.

Therefore we've decided to stop our testings as it seems to be a much deeper investigation.

Thank you for all the help.

Changed in linux (Ubuntu Wily):
status: In Progress → Incomplete
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
dino99 (9d9) wrote :

closed as per #42 comment

Changed in linux (Ubuntu Wily):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.