Bug #1301496 “kernel crash: Unable to handle kernel paging reque...” : Bugs : linux package : Ubuntu

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-02:

#1

trace from 3.13.0-8-generic Edit (6.1 KiB, text/plain)

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-02:

#2

trace from 3.13.0-19-generic Edit (6.3 KiB, text/plain)

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-02: AudioDevicesInUse.txt

#3

AudioDevicesInUse.txt Edit (1.2 KiB, text/plain)

apport information

tags:	added: apport-collected trusty uec-images
description:	updated

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-02: BootDmesg.txt

#4

BootDmesg.txt Edit (18.9 KiB, text/plain)

apport information

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-02: CurrentDmesg.txt

#5

CurrentDmesg.txt Edit (10.7 KiB, text/plain)

apport information

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-02: IwConfig.txt

#6

IwConfig.txt Edit (175 bytes, text/plain)

apport information

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-02: ProcCpuinfo.txt

#7

ProcCpuinfo.txt Edit (346 bytes, text/plain)

apport information

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-02: ProcInterrupts.txt

#8

ProcInterrupts.txt Edit (599 bytes, text/plain)

apport information

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-02: ProcModules.txt

#9

ProcModules.txt Edit (1.3 KiB, text/plain)

apport information

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-02: UdevDb.txt

#10

UdevDb.txt Edit (143.8 KiB, text/plain)

apport information

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-02: UdevLog.txt

#11

UdevLog.txt Edit (421.1 KiB, text/plain)

apport information

Revision history for this message

Jorge Castro (jorge) wrote on 2014-04-02:

#12

I was doing a juju deploy when this happened. I got a segfault for doing a "juju status" and then the terminal/ssh connection froze almost immediately after.

Revision history for this message

Brad Figg (brad-figg) wrote on 2014-04-02: Missing required logs.

#13

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1301496

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Joseph Salisbury (jsalisbury) on 2014-04-02

Changed in linux (Ubuntu):
importance:	Undecided → High
tags:	added: kernel-da-key ppc64el
Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2014-04-02:

#14

Can you see if this issue also happens on the 3.13.0-21 kernel? It can be downloaded from:

https://launchpad.net/ubuntu/trusty/+source/linux/3.13.0-21.43

The ppc64el image can be directly downloaded from:
https://launchpad.net/ubuntu/+source/linux/3.13.0-21.43/+build/5866502

Revision history for this message

Matt Bruzek (mbruzek) wrote on 2014-04-03:

#15

The /var/log/dmesg file from wolfe-01. Edit (16.5 KiB, text/plain)

We just experienced the same problem on wolfe-01 today. We were deploying charms with juju and noticed that juju status did not return the right output.

The kernel that was running is:

Linux wolfe-01 3.13.0-21-generic #43-Ubuntu SMP Mon Mar 31 22:54:04 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

I got the syslog and the dmesg file off the server and will attach them to this report.

Revision history for this message

Matt Bruzek (mbruzek) wrote on 2014-04-03:

#16

The /var/log/syslog file from wolfe-01. Edit (595.0 KiB, text/plain)

Revision history for this message

Anton Blanchard (anton-samba) wrote on 2014-04-06:

#17

Lots going on here. First looking at the syslog file from Matt. I notice a lot of:

Apr 3 20:57:45 wolfe-01 kernel: [ 4062.074422] jujud[1929]: bad frame in setup_rt_frame: 0000000000000000 nip 0000000000000000 lr 0000000000000000

Looks like we smashed our stack. This seems to be a separate issue because we continue on even after these failures.

At some point dpkg-query starts SEGVing:

Apr 3 20:57:54 wolfe-01 kernel: [ 4071.263070] dpkg-query[20115]: unhandled signal 11 at 0000000000010028 nip 0000000010003380 lr 0000000010002968 code 30001
Apr 3 20:57:57 wolfe-01 kernel: [ 4074.437029] dpkg-query[20208]: unhandled signal 11 at 0000000000010028 nip 0000000010003380 lr 0000000010002968 code 30001
Apr 3 20:58:00 wolfe-01 kernel: [ 4077.612284] dpkg-query[20291]: unhandled signal 11 at 0000000000010028 nip 0000000010003380 lr 0000000010002968 code 30001

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2014-04-07:

#18

Was there a prior Trusty kernel version that did not exhibit this bug?

tags:

added: kernel-key

Revision history for this message

Steve Langasek (vorlon) wrote on 2014-04-07:

#19

fwiw, I've investigated the dpkg segfaults, and seen the following:

$ gdb dpkg
GNU gdb (Ubuntu 7.7-0ubuntu3) 7.7
[...]
Reading symbols from dpkg...Reading symbols from /usr/lib/debug//usr/bin/dpkg...done.
done.
(gdb) run -l
Starting program: /usr/bin/dpkg -l

Program received signal SIGSEGV, Segmentation fault.
filesdbinit () at ../../src/filesdb.c:571
571 ../../src/filesdb.c: No such file or directory.
(gdb) print bins
$1 = {0x0 <repeats 9441 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 6942 times>}
(gdb)

On a healthy system, this looks like:

(gdb) break filesdbinit
Breakpoint 2 at 0x10003338: file ../../src/filesdb.c, line 565.
(gdb) print bins
$12 = {0x0 <repeats 131072 times>}
(gdb)

Note that bins is an array of pointers.

(gdb) print sizeof(bins[0])
$6 = 8
(gdb)

So once every 8192 elements, there's a wrong bit in the array; 8192*8 is 64k of memory.

This could be a bug in any of the kernel, qemu, or the underlying host. Note that after a reboot of wolfe, the VMs are reported to be stable again for the past 72 hours (!). So it's possible this points to a bug with the host OS/kernel.

There is a second P7 system, postal, which has been exhibiting the same kinds of problems as wolfe. Adam can speak to this in more detail, and facilitate any necessary diagnostics on postal.

Revision history for this message

Adam Conrad (adconrad) wrote on 2014-04-08:

#20

For what it's worth, the stability offered by a reboot was short-lived, and wolfe's gone back to hating its users.

Revision history for this message

Andy Whitcroft (apw) wrote on 2014-04-08:

#21

Ad this seems to be reproducible we might want to spin up one of the affected machines with a 4K kernel and see if that avoids the issue. Of course as we have seen with other bugs, assumptions in the client s/w may be to blame.

Revision history for this message

Andy Whitcroft (apw) wrote on 2014-04-09:

#22

Got hold of one of these machines in this "everything is exploding" state. Used the below test program to dump out the static variables and obtain the alignment of the corruption. (This program does not manipulate this data which eliminates a bug in dpkg as cause.) Note that the corruption is at the start of the page (and although most elided here repeats on each page thereafter):

===
#include <stdio.h>

static char b[65536 * 16];

main(int argc, char *argv[])
{
int p;

        printf("%08lx\n", (long)b);
        for (p = 0; p < sizeof(b); p++) {
                if (b[p]) {
                        printf("%d != 0 @ %d [%08lx]\n", b[p], p, (long)&b[p]);
                }
        }
}
===
10011068
68 != 0 @ 61336 [10020000]
20 != 0 @ 61340 [10020004]
2 != 0 @ 61342 [10020006]
1 != 0 @ 61344 [10020008]
75 != 0 @ 61348 [1002000c]
3 != 0 @ 61349 [1002000d]
2 != 0 @ 61352 [10020010]
8 != 0 @ 61353 [10020011]
[...]
68 != 0 @ 126872 [10030000]
20 != 0 @ 126876 [10030004]
2 != 0 @ 126878 [10030006]
1 != 0 @ 126880 [10030008]
75 != 0 @ 126884 [1003000c]
3 != 0 @ 126885 [1003000d]
2 != 0 @ 126888 [10030010]
8 != 0 @ 126889 [10030011]
[...]
===

I also dumped the corruption in full in a more readable form, I would note that this seems to contain 'lo' and 'eth0' as if it were networking related:

===
000000 0044 0000 0014 0002 0001 0000 034b 0000
         D \0 \0 \0 024 \0 002 \0 001 \0 \0 \0 K 003 \0 \0
000010 0802 fe80 0001 0000 0008 0001 007f 0100
       002 \b 200 376 001 \0 \0 \0 \b \0 001 \0 177 \0 \0 001
000020 0008 0002 007f 0100 0007 0003 6f6c 0000
        \b \0 002 \0 177 \0 \0 001 \a \0 003 \0 l o \0 \0
000030 0014 0006 ffff ffff ffff ffff bb72 0054
       024 \0 006 \0 377 377 377 377 377 377 377 377 r 273 T \0
000040 bb72 0054 0050 0000 0014 0002 0001 0000
         r 273 T \0 P \0 \0 \0 024 \0 002 \0 001 \0 \0 \0
000050 034b 0000 1802 0080 000c 0000 0008 0001
         K 003 \0 \0 002 030 200 \0 \f \0 \0 \0 \b \0 001 \0
000060 000a 8203 0008 0002 000a 8203 0008 0004
        \n \0 003 202 \b \0 002 \0 \n \0 003 202 \b \0 004 \0
000070 000a ff03 0009 0003 7465 3068 0000 0000
        \n \0 003 377 \t \0 003 \0 e t h 0 \0 \0 \0 \0
000080 0014 0006 ffff ffff ffff ffff bcc8 0054
       024 \0 006 \0 377 377 377 377 377 377 377 377 310 274 T \0
000090 bcc8 0054 0000 0000 0000 0000 0000 0000
       310 274 T \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000a0 0000 0000 0000 0000 0000 0000 0000 0000
        \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
010000 0044 0000 0014 0002 0001 0000 034b 0000
===

I should note at this point that this differs from the corruption as seen by @vorlon which showed a single bit change in each page.

Got hold of one of these machines in this "everything is exploding" state.  Used the below test program to dump out the static variables and obtain the alignment of the corruption.  (This program does not manipulate this data which eliminates a bug in dpkg as cause.)  Note that the corruption is at the start of the page (and although most elided here repeats on each page thereafter):

===
#include <stdio.h>

static char b[65536 * 16];

main(int argc, char *argv[])
{
        int p;

printf("%08lx\n", (long)b);
        for (p = 0; p < sizeof(b); p++) {
                if (b[p]) {
                        printf("%d != 0 @ %d [%08lx]\n", b[p], p, (long)&b[p]);
                }
        }
}
===
10011068
68 != 0 @ 61336 [10020000]
20 != 0 @ 61340 [10020004]
2 != 0 @ 61342 [10020006]
1 != 0 @ 61344 [10020008]
75 != 0 @ 61348 [1002000c]
3 != 0 @ 61349 [1002000d]
2 != 0 @ 61352 [10020010]
8 != 0 @ 61353 [10020011]
[...]
68 != 0 @ 126872 [10030000]
20 != 0 @ 126876 [10030004]
2 != 0 @ 126878 [10030006]
1 != 0 @ 126880 [10030008]
75 != 0 @ 126884 [1003000c]
3 != 0 @ 126885 [1003000d]
2 != 0 @ 126888 [10030010]
8 != 0 @ 126889 [10030011]
[...]
===

I also dumped the corruption in full in a more readable form, I would note that this seems to contain 'lo' and 'eth0' as if it were networking related:

===
000000    0044    0000    0014    0002    0001    0000    034b    0000
         D  \0  \0  \0 024  \0 002  \0 001  \0  \0  \0   K 003  \0  \0
000010    0802    fe80    0001    0000    0008    0001    007f    0100
       002  \b 200 376 001  \0  \0  \0  \b  \0 001  \0 177  \0  \0 001
000020    0008    0002    007f    0100    0007    0003    6f6c    0000
        \b  \0 002  \0 177  \0  \0 001  \a  \0 003  \0   l   o  \0  \0
000030    0014    0006    ffff    ffff    ffff    ffff    bb72    0054
       024  \0 006  \0 377 377 377 377 377 377 377 377   r 273   T  \0
000040    bb72    0054    0050    0000    0014    0002    0001    0000
         r 273   T  \0   P  \0  \0  \0 024  \0 002  \0 001  \0  \0  \0
000050    034b    0000    1802    0080    000c    0000    0008    0001
         K 003  \0  \0 002 030 200  \0  \f  \0  \0  \0  \b  \0 001  \0
000060    000a    8203    0008    0002    000a    8203    0008    0004
        \n  \0 003 202  \b  \0 002  \0  \n  \0 003 202  \b  \0 004  \0
000070    000a    ff03    0009    0003    7465    3068    0000    0000
        \n  \0 003 377  \t  \0 003  \0   e   t   h   0  \0  \0  \0  \0
000080    0014    0006    ffff    ffff    ffff    ffff    bcc8    0054
       024  \0 006  \0 377 377 377 377 377 377 377 377 310 274   T  \0
000090    bcc8    0054    0000    0000    0000    0000    0000    0000
       310 274   T  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000a0    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
010000    0044    0000    0014    0002    0001    0000    034b    0000
===

I should note at this point that this differs from the corruption as seen by @vorlon which showed a single bit change in each page.

Revision history for this message

Andy Whitcroft (apw) wrote on 2014-04-09:

#23

000000 0044 0000
          LEN TYPE
000000 0014 0002 0001 0000 034b 0000
                          LEN TYPE
000010 0802 fe80 0001 0000 0008 0001 007f 0100
                                          LEN TYPE 127.0.0.1
000020 0008 0002 007f 0100 0007 0003 6f6c 0000
          LEN TYPE 127.0.0.1 LEN TYPE lo
000030 0014 0006 ffff ffff ffff ffff bb72 0054
          LEN TYPE
000040 bb72 0054

000040 0050 0000
                          LEN TYPE
000040 0014 0002 0001 0000
                                          LEN TYPE
000050 034b 0000 1802 0080 000c 0000 0008 0001
                                                          LEN TYPE
000060 000a 8203 0008 0002 000a 8203 0008 0004
          10.0.3.130 LEN TYPE 10.0.3.130 LEN TYPE
000070 000a ff03 0009 0003 7465 3068 0000 0000
          10.0.3.255 LEN TYPE eth0

000080 0014 0006 ffff ffff ffff ffff bcc8 0054
LEN TYPE
000090 bcc8 0054

This looks a little bit like the sort of contents we might expect to see dumped from a call to GETADDRS
against PF_UNSPEC which calls out to all of the inet{,6}_fill_ifaddr() handlers, though nested:

[IFA_LOCAL(fe80...), IFA_ADDRESS(127.0.0.1), IFA_LOCAL(127.0.0.1), IFA_LABEL(lo),IFA_CACHEINFO(...)]
[IFA_LOCAL(00008000...??), IFA_ADDRESS(10.0.3.130), IFA_LOCAL(10.0.3.130), IFA_BROADCAST(10.0.3.255), IFA_LABEL(eth0)]

Where:

  IFA_LOCAL (16 bytes, ipv6 or 4 bytes, ipv4)
  IFA_ADDRESS (4 bytes, ipv4)
  IFA_BROADCAST (4 bytes, ipv4)
  IFA_CACHEINFO (16 bytes)

Revision history for this message

Scott Moser (smoser) wrote on 2014-04-10:

#24

Download full text (6.2 KiB)

regarding stack 4k page kernel, this just happened on wolfe-02. running (I believe) a 4k page kernel.
$ grep CONFIG_PPC.*.*PAGES /boot/config-3.13.0-8-generic
CONFIG_PPC_4K_PAGES=y
# CONFIG_PPC_64K_PAGES is not set

wolfe-02 login: [241848.101690] Unable to handle kernel paging request for data at address 0x2001400000044
[241848.112613] Faulting instruction address: 0xc000000000954b60
[241848.112704] Oops: Kernel access of bad area, sig: 11 [#1]
[241848.112777] SMP NR_CPUS=2048 NUMA pSeries
[241848.112871] Modules linked in: btrfs xor raid6_pq libcrc32c veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables dm_crypt
[241848.113353] CPU: 1 PID: 10355 Comm: kworker/u4:0 Not tainted 3.13.0-8-generic #28-Ubuntu
[241848.113465] Workqueue: netns .cleanup_net
[241848.113540] task: c0000002192621f0 ti: c000000253eec000 task.ti: c000000253eec000
[241848.113655] NIP: c000000000954b60 LR: c000000000954b68 CTR: c000000000954b00
[241848.113768] REGS: c000000253eef760 TRAP: 0300 Not tainted (3.13.0-8-generic)
[241848.113871] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44002024 XER: 00000000
[241848.114117] CFAR: c0000000001d2820 DAR: 0002001400000044 DSISR: 40000000 SOFTE: 1
GPR00: c000000000954b68 c000000253eef9e0 c0000000010b0dd0 0002001400000044
GPR04: f00000000b7698d8 c00000034674d900 c000000000954b68 c0000003fe023508
GPR08: 0000000000010000 c0000002536a0000 000000000000000e 0000000000000001
GPR12: 0000000044002028 c00000000fe80300 c0000000000c3f00 c000000363207c40
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000001 c000000000f630fc
GPR24: 0000000000000001 fffffffffffffef7 0000000000000000 c000000000f58638
GPR28: 0000000000000001 c0000003506f3900 0000000000002000 0000000000000000
[241848.115585] NIP [c000000000954b60] .tcp_net_metrics_exit+0x60/0x110
[241848.115675] LR [c000000000954b68] .tcp_net_metrics_exit+0x68/0x110
[241848.115762] Call Trace:
[241848.115805] [c000000253eef9e0] [c000000000954b68] .tcp_net_metrics_exit+0x68/0x110 (unreliable)
[241848.115945] [c000000253eefa70] [c0000000008cc49c] .ops_exit_list.isra.2+0x6c/0xd0
[241848.116078] [c000000253eefb00] [c0000000008ccef0] .cleanup_net+0x150/0x250
[241848.116198] [c000000253eefbc0] [c0000000000b9e28] .process_one_work+0x1a8/0x4d0
[241848.116320] [c000000253eefc60] [c0000000000baaf0] .worker_thread+0x180/0x4a0
[241848.116429] [c000000253eefd30] [c0000000000c4010] .kthread+0x110/0x130
[241848.116538] [c000000253eefe30] [c00000000000a160] .ret_from_kernel_thread+0x5c/0x7c
[241848.116659] Instruction dump:
[241848.116731] 7d295030 2f890000 e93d0288 419e0058 3bc00000 3b800001 60000000 60420000
[241848.116920] 7bc81f24 7c69402a 2fa30000 419e0024 <ebe30000> 4b8b809d 60000000 2fbf0000
[241848.117115] ---[ end trace 531dcfc8ed4b2948 ]---
[241848.124367]
[241848.129853] Unable to handle kernel paging request for data at address 0xffffffffffffffd8
[241848.129968] Faulting instruction address: 0xc0000000000c49c0
[241848.130056] Oops: Kernel access of bad area, sig: 11 [#2]
[241848.130128...

regarding stack 4k page kernel, this just happened on wolfe-02. running (I believe) a 4k page kernel.
$ grep CONFIG_PPC.*.*PAGES /boot/config-3.13.0-8-generic 
CONFIG_PPC_4K_PAGES=y
# CONFIG_PPC_64K_PAGES is not set

wolfe-02 login: [241848.101690] Unable to handle kernel paging request for data at address 0x2001400000044
[241848.112613] Faulting instruction address: 0xc000000000954b60
[241848.112704] Oops: Kernel access of bad area, sig: 11 [#1]
[241848.112777] SMP NR_CPUS=2048 NUMA pSeries
[241848.112871] Modules linked in: btrfs xor raid6_pq libcrc32c veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables dm_crypt
[241848.113353] CPU: 1 PID: 10355 Comm: kworker/u4:0 Not tainted 3.13.0-8-generic #28-Ubuntu
[241848.113465] Workqueue: netns .cleanup_net
[241848.113540] task: c0000002192621f0 ti: c000000253eec000 task.ti: c000000253eec000
[241848.113655] NIP: c000000000954b60 LR: c000000000954b68 CTR: c000000000954b00
[241848.113768] REGS: c000000253eef760 TRAP: 0300   Not tainted  (3.13.0-8-generic)
[241848.113871] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44002024  XER: 00000000
[241848.114117] CFAR: c0000000001d2820 DAR: 0002001400000044 DSISR: 40000000 SOFTE: 1
GPR00: c000000000954b68 c000000253eef9e0 c0000000010b0dd0 0002001400000044
GPR04: f00000000b7698d8 c00000034674d900 c000000000954b68 c0000003fe023508
GPR08: 0000000000010000 c0000002536a0000 000000000000000e 0000000000000001
GPR12: 0000000044002028 c00000000fe80300 c0000000000c3f00 c000000363207c40
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000001 c000000000f630fc
GPR24: 0000000000000001 fffffffffffffef7 0000000000000000 c000000000f58638
GPR28: 0000000000000001 c0000003506f3900 0000000000002000 0000000000000000
[241848.115585] NIP [c000000000954b60] .tcp_net_metrics_exit+0x60/0x110
[241848.115675] LR [c000000000954b68] .tcp_net_metrics_exit+0x68/0x110
[241848.115762] Call Trace:
[241848.115805] [c000000253eef9e0] [c000000000954b68] .tcp_net_metrics_exit+0x68/0x110 (unreliable)
[241848.115945] [c000000253eefa70] [c0000000008cc49c] .ops_exit_list.isra.2+0x6c/0xd0
[241848.116078] [c000000253eefb00] [c0000000008ccef0] .cleanup_net+0x150/0x250
[241848.116198] [c000000253eefbc0] [c0000000000b9e28] .process_one_work+0x1a8/0x4d0
[241848.116320] [c000000253eefc60] [c0000000000baaf0] .worker_thread+0x180/0x4a0
[241848.116429] [c000000253eefd30] [c0000000000c4010] .kthread+0x110/0x130
[241848.116538] [c000000253eefe30] [c00000000000a160] .ret_from_kernel_thread+0x5c/0x7c
[241848.116659] Instruction dump:
[241848.116731] 7d295030 2f890000 e93d0288 419e0058 3bc00000 3b800001 60000000 60420000
[241848.116920] 7bc81f24 7c69402a 2fa30000 419e0024 <ebe30000> 4b8b809d 60000000 2fbf0000
[241848.117115] ---[ end trace 531dcfc8ed4b2948 ]---
[241848.124367]
[241848.129853] Unable to handle kernel paging request for data at address 0xffffffffffffffd8
[241848.129968] Faulting instruction address: 0xc0000000000c49c0
[241848.130056] Oops: Kernel access of bad area, sig: 11 [#2]
[241848.130128] SMP NR_CPUS=2048 NUMA pSeries
[241848.130220] Modules linked in: btrfs xor raid6_pq libcrc32c veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables dm_crypt
[241848.130695] CPU: 0 PID: 10355 Comm: kworker/u4:0 Tainted: G      D      3.13.0-8-generic #28-Ubuntu
[241848.130833] task: c0000002192621f0 ti: c000000253eec000 task.ti: c000000253eec000
[241848.130938] NIP: c0000000000c49c0 LR: c0000000000bb278 CTR: c0000000000ded80
[241848.131042] REGS: c000000253eeeed0 TRAP: 0300   Tainted: G      D       (3.13.0-8-generic)
[241848.131145] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 42004028  XER: 00000000
[241848.131391] CFAR: 00003fff90fa7250 DAR: ffffffffffffffd8 DSISR: 40000000 SOFTE: 0
GPR00: c0000000000bb278 c000000253eef150 c0000000010b0dd0 c0000002192621f0
GPR04: 0000000000000000 0000000000000001 0000000000000800 0000000000000000
GPR08: 0000000000000001 0000000000000000 0000000000000001 0000000000000018
GPR12: 0000000022004088 c00000000fe80000 000000000075a000 0000000000000001
GPR16: 0000000000000000 c000000000eee780 c000000000096c48 c00000000110df34
GPR20: c000000000eee780 0000000000000001 0000000000000000 0000000000000004
GPR24: c000000000eee780 c000000001109a60 c000000253eec000 c000000000eee780
GPR28: c000000219262690 0000000000000000 c0000002192621f0 c0000002192621f0
[241848.132789] NIP [c0000000000c49c0] .kthread_data+0x20/0x40
[241848.132861] LR [c0000000000bb278] .wq_worker_sleeping+0x28/0xf0
[241848.132947] Call Trace:
[241848.132988] [c000000253eef150] [c0000002192632d4] 0xc0000002192632d4 (unreliable)
[241848.133109] [c000000253eef1d0] [c0000000000bb278] .wq_worker_sleeping+0x28/0xf0
[241848.133231] [c000000253eef260] [c000000000a14a7c] .__schedule+0x65c/0x8c0
[241848.133338] [c000000253eef4e0] [c000000000096c48] .do_exit+0x738/0xb30
[241848.133443] [c000000253eef5d0] [c000000000021c60] .die+0x2f0/0x450
[241848.133550] [c000000253eef670] [c0000000000474e0] .bad_page_fault+0xe0/0x130
[241848.133656] [c000000253eef6f0] [c000000000009284] handle_page_fault+0x2c/0x30
[241848.133780] --- Exception: 300 at .tcp_net_metrics_exit+0x60/0x110
[241848.133780]     LR = .tcp_net_metrics_exit+0x68/0x110
[241848.133938] [c000000253eefa70] [c0000000008cc49c] .ops_exit_list.isra.2+0x6c/0xd0
[241848.134057] [c000000253eefb00] [c0000000008ccef0] .cleanup_net+0x150/0x250
[241848.134161] [c000000253eefbc0] [c0000000000b9e28] .process_one_work+0x1a8/0x4d0
[241848.134280] [c000000253eefc60] [c0000000000baaf0] .worker_thread+0x180/0x4a0
[241848.134385] [c000000253eefd30] [c0000000000c4010] .kthread+0x110/0x130
[241848.134489] [c000000253eefe30] [c00000000000a160] .ret_from_kernel_thread+0x5c/0x7c
[241848.134607] Instruction dump:
[241848.134677] 4e800020 60000000 60000000 60420000 7c0802a6 fbe1fff8 f8010010 f821ff81
[241848.134850] 7c7f1b78 60000000 60000000 e93f0458 <e869ffd8> 38210080 e8010010 ebe1fff8
[241848.135026] ---[ end trace 531dcfc8ed4b2949 ]---
[241848.140660]
[241848.140706] Fixing recursive fault but reboot is needed!

Joseph Salisbury (jsalisbury) on 2014-04-15

tags:

removed: kernel-key

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2014-06-13: Test with newer development kernel (3.13.0-24.46)

#25

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

With the recent release of this Ubuntu release, would like to confirm if this bug is still present. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

sudo apt-get update
sudo apt-get dist-upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status:	Confirmed → Incomplete
tags:	added: kernel-request-3.13.0-24.46

Revision history for this message

Launchpad Janitor (janitor) wrote on 2014-08-13:

#26

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status:	Incomplete → Expired

Ubuntu
linux package

kernel crash: Unable to handle kernel paging request for data

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntulinux package

kernel crash: Unable to handle kernel paging request for data

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package