Kernel crash on Dell Latitude 5530 (tg3 driver involved)

Bug #1428111 reported by Andreas Herr
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Vivid
Fix Released
High
Joseph Salisbury

Bug Description

After upgrading from Ubuntu 14.10 to 15.04 the system is not starting anymore. After disabling the splash screen I get lots of Error messages on the console. The system is unresponsive. Kernel 3.16 runs without problems. So does 3.19 on another system without Broadcom LAN-adapter. "screenshot" is attached. The error occures before entering the password for Luks-encryption. So it's very likely, it still doing initrd-stuff ...

System is a Dell Latitude 5530, Core i5, 16 GB RAM
running Ubuntu 15.04 (AMD64) with all updates available until 2015-03-04 (upgraded from 14.10)

lspci.log & dmesg.log are taken wth the "old" kernel 3.16 as was not able to boot 3.19.

Revision history for this message
Andreas Herr (herr) wrote :
Brad Figg (brad-figg)
affects: linux-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1428111

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.0 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-rc2-vivid/

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Andreas Herr (herr) wrote :

Due to the nature of this bug, I am not able to give a log. I will try the upstream kernel and give feedback.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Andreas Herr (herr) wrote :

Kernel 4.0.0-040000rc2-generic #201503031836 is working (so far). System boots up as normal. No suspicious entries in the logfiles.

tags: added: kernel-fixed-upstream
tags: added: performing-bisect
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for testing!

There are two commits in v4.0 that may be the fix:
d0af71a tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()
20d14a5 tg3: move init/deinit from open/close to probe/remove

I built a Vivid test kernel with a cherry-pick of these two commits. It can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1428111/

Can you test this kernel and see if it resolves this bug?

Note that you will need to install both the linux-image and linux-image-extra .deb packages.

tags: added: vivid
Revision history for this message
Andreas Herr (herr) wrote :

linux-image-3.19.0-7-generic_3.19.0-7.7~lp1428111v1_amd64.deb fixes the bug. Thank you very much for this fast bug fixing.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks again for testing.

We next need to figure out which of these commits is the fix. I have a feeling it is:

d0af71a tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()

I built a Vivid test kernel with a cherry-pick of just this one commit. It can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1428111/

Can you test this kernel and see if it resolves this bug? If it does, I'll request this commit in upstream Stable and send a SRU request for the Ubuntu kernels.

Revision history for this message
Andreas Herr (herr) wrote :

Your feeling was right. This patch solves the problem.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for testing, Andreas! I'll send a request to have this commit included in the upstream stable releases as well as request it in Vivid.

Since we are unsure which commit causes the regression, can you also test the latest upstream 3.16 stable kernel to see if the commit that causes the regression came down from stable? It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.16.7-ckt7-utopic/

Changed in linux (Ubuntu Vivid):
status: Confirmed → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
tags: removed: performing-bisect
Revision history for this message
Andreas Herr (herr) wrote :

3.16.7-031607-generic is also working

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you run a couple of more tests against the following two kernels:

Upstream 3.18: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.18.8-vivid/
Upstream 3.19: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.19-vivid/

Revision history for this message
Andreas Herr (herr) wrote :

3.19-vivid (3.19.0-031900-generic) has the bug, 3.18 (3.18.8-031808-generic) works fine.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks again for testing, Andreas. This means only the 3.19 kernel needs the patch and not 3.18 and earlier.

Changed in linux (Ubuntu Vivid):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.19.0-8.8

---------------
linux (3.19.0-8.8) vivid; urgency=low

  [ Andy Whitcroft ]

  * ubuntu: vbox -- elide the new symlinks and reconstruct on clean:
    - LP: #1426113
  * rebase to stable v3.19.1

  [ John Johansen ]

  * SAUCE: (no-up): apparmor: fix mediation of fs unix sockets
    - LP: #1408833

  [ Leann Ogasawara ]

  * Release Tracking Bug
    - LP: #1429940

  [ Upstream Kernel Changes ]

  * xen: correct bug in p2m list initialization
  * net/mlx5_core: Fix configuration of log_uar_page_sz
    - LP: #1419938
  * tpm/ibmvtpm: Additional LE support for tpm_ibmvtpm_send
    - LP: #1420575
  * net/mlx4_core: Maintain a persistent memory for mlx4 device
    - LP: #1422481
  * net/mlx4_core: Set device configuration data to be persistent across
    reset
    - LP: #1422481
  * net/mlx4_core: Refactor the catas flow to work per device
    - LP: #1422481
  * net/mlx4_core: Enhance the catas flow to support device reset
    - LP: #1422481
  * net/mlx4_core: Activate reset flow upon fatal command cases
    - LP: #1422481
  * net/mlx4_core: Manage interface state for Reset flow cases
    - LP: #1422481
  * net/mlx4_core: Handle AER flow properly
    - LP: #1422481
  * net/mlx4_core: Enable device recovery flow with SRIOV
    - LP: #1422481
  * net/mlx4_core: Reset flow activation upon SRIOV fatal command cases
    - LP: #1422481
  * tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()
    - LP: #1428111
  * rebase to v3.19.1
    - LP: #1410704
    - LP: #1411193
    - LP: #1400215
 -- Leann Ogasawara <email address hidden> Mon, 09 Mar 2015 10:08:29 -0700

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.