Occasional kernel error while running commands in LXC

Bug #1073238 reported by Tim
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

I run LXC containers on a mounted LVM snapshot
Run some commands within the container (this and the init commands are susceptible to this issue.)
I then stop the container with lxc-stop and delete, then recreate the snapshot and repeat the process

Refer to my LXC bug posting for more information:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1071910

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: linux-image-3.5.0-17-generic 3.5.0-17.28
ProcVersionSignature: Ubuntu 3.5.0-17.28-generic 3.5.5
Uname: Linux 3.5.0-17-generic x86_64
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Oct 30 15:16 seq
 crw-rw---T 1 root audio 116, 33 Oct 30 15:16 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.6.1-0ubuntu3
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Date: Tue Oct 30 16:19:32 2012
Ec2AMI: ami-3d4ff254
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1a
Ec2InstanceType: m1.small
Ec2Kernel: aki-825ea7eb
Ec2Ramdisk: unavailable
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-17-generic N/A
 linux-backports-modules-3.5.0-17-generic N/A
 linux-firmware 1.95
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to quantal on 2012-10-22 (7 days ago)

Revision history for this message
Tim (iceczd) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1073238

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Tim (iceczd) wrote :

confirmed from apport-collect 1073238

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Tim (iceczd) wrote :
Download full text (6.4 KiB)

You can see the error in WifiSyslog.txt - please specify anything else I can provide to help solve this issue:

Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252907] ------------[ cut here ]------------
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252921] kernel BUG at /build/buildd/linux-3.5.0/arch/x86/mm/fault.c:396!
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252926] invalid opcode: 0000 [#1] SMP
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252932] CPU 0
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252934] Modules linked in: veth dm_snapshot xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables bridge stp llc isofs microcode acpiphp
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252958]
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252960] Pid: 8140, comm: cat Not tainted 3.5.0-17-generic #28-Ubuntu
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252966] RIP: e030:[<ffffffff8168533f>] [<ffffffff8168533f>] vmalloc_fault+0x11f/0x208
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252979] RSP: e02b:ffff880002f1d9b8 EFLAGS: 00010046
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252983] RAX: ffff880026caeff8 RBX: ffffe8ffffc00ac8 RCX: 0000000000000000
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252988] RDX: 00003ffffffff000 RSI: ffff880000000ff8 RDI: 0000000000000000
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252993] RBP: ffff880002f1d9d8 R08: ffff880017c6ae70 R09: 00007f7b4d46e000
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252998] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880066231e88
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253003] R13: ffff880026caeff8 R14: ffff880000000ff8 R15: 0000000000000002
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253012] FS: 00007f7b4d68c700(0000) GS:ffff88006a000000(0000) knlGS:0000000000000000
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253017] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253021] CR2: ffffe8ffffc00ac8 CR3: 0000000066231000 CR4: 0000000000002660
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253027] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253033] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253038] Process cat (pid: 8140, threadinfo ffff880002f1c000, task ffff88002470dc00)
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253044] Stack:
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253046] ffffe8ffffc00ac8 0000000000000029 ffff880002f1daf8 0000000000000000
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253055] ffff880002f1dae8 ffffffff816858f9 0000000000000657 ffffffff812e79e1
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253064] ffff88002470dc00 0000000000000060 ffff880055ecdd1c ffff88005636b540
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253072] Call Trace:
Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253078] [<ffffffff816858f9>] ...

Read more...

Revision history for this message
Tim (iceczd) wrote :

I believed this issue was solved between kernel v3.5.0 and v3.5.7 as I just upgraded to 3.5.7 and tested my code over 120 iterations without issue.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Tim (iceczd) wrote :
Download full text (6.3 KiB)

And then I rebooted, and it occurred on the first iteration

Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992470] ------------[ cut here ]------------
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992483] kernel BUG at /home/apw/COD/linux/arch/x86/mm/fault.c:396!
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992489] invalid opcode: 0000 [#1] SMP
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992495] CPU 0
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992497] Modules linked in: veth dm_snapshot xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables bridge stp llc isofs microcode acpiphp
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992524]
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992527] Pid: 1233, comm: telnet Not tainted 3.5.7-030507-generic #201210130556
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992534] RIP: e030:[<ffffffff8169d8e4>] [<ffffffff8169d8e4>] vmalloc_fault+0x114/0x1cf
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992550] RSP: e02b:ffff880045525978 EFLAGS: 00010046
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992554] RAX: ffff88006639cff8 RBX: ffffe8ffffc00a98 RCX: 0000000000000000
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992559] RDX: ffff880000000000 RSI: ffffe8ffffc00a98 RDI: 000000065fb22067
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992564] RBP: ffff880045525998 R08: ffff8800028fb370 R09: 0000000000000001
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992569] R10: 000000000098967f R11: 0000000000000001 R12: ffff88006639cff8
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992574] R13: ffffffff81c0be88 R14: ffff880000000ff8 R15: ffff880045525ab8
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992585] FS: 00007f7203243740(0000) GS:ffff88006a000000(0000) knlGS:0000000000000000
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992591] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992595] CR2: ffffe8ffffc00a98 CR3: 000000004551e000 CR4: 0000000000002660
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992600] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992606] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992612] Process telnet (pid: 1233, threadinfo ffff880045524000, task ffff8800327e8000)
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992617] Stack:
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992620] 0000000000000029 0000000000000000 0000000000000001 0000000000000060
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992629] ffff880045525aa8 ffffffff8169dee0 ffff8800455259e8 ffff8800327e8000
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992637] 0000000000000002 ffffe8ffffc00a98 0000000000000002 0000000000000004
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992645] Call Trace:
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992652] [<ffffffff8169dee0>] do_page_fault+0x3c0/0x520
Oct 30 19:57:31 ip-10-72-206-25 kernel: [40516179.992661] [<ffffffff8119a387>] ? do_select+0x537/0x5c0
Oct 30 19:57:31 ip-10-7...

Read more...

Changed in linux (Ubuntu):
status: Fix Released → New
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1073238

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Tim (iceczd) wrote :

The issue has been previously confirmed using:
apport-collect 1073238

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you test the 3.5.7 kernel located at:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5.7-quantal/

If that fixes the bug, then the fix will make its way into 12.10 through the normal stable release update process.

tags: added: vmalloc-fault
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: needs-upstream-testing
Revision history for this message
Tim (iceczd) wrote :

That was the one I tested after the initial error in 3.5.0, I am testing 3.6.3 at the moment, and then I will test 3.7. The kernel bug from the syslog for 3.5.7 is here:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073238/comments/6

Revision history for this message
Tim (iceczd) wrote :

Tested 3.6.3 with over 400 iterations without encountering this issue.

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.6.3-quantal/

I am getting an occasional LVM hang when removing my volumes, but that isn't a kernel issue so I'll look for the solution elsewhere.

Revision history for this message
Tim (iceczd) wrote :

NOTE: 3.6.3 doesn't have the memory cgroup built in to the kernel so I had to disable my config for that.

Revision history for this message
Tim (iceczd) wrote :

It looks like there is a decent solution for implementing the memory cgroup in the kernel, but having it disabled by default to reduce the overhead it causes, discussed here:
https://bugs.archlinux.org/task/32057

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Revision history for this message
Andy Whitcroft (apw) wrote :

Also noted at RedHat: https://bugzilla.redhat.com/show_bug.cgi?id=914737

Potential patch: https://lkml.org/lkml/2013/2/16/167 which seems to be working its way upstream and is CC: stable. We should expect to get this in time via that route if nothing else occurs.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.