'Kernel bug - invalid opcode: 0000 [#1] SMP' is reported at 'Preparing linux-image-extra-3.10.0-0-generic' stage of multi-lvm installations of amd64 saucy server

Bug #1195710 reported by Para Siva on 2013-06-28
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Saucy
High
Unassigned

Bug Description

The following error is reported during multi-lvm saucy server installations of amd64 imges of 20130628. Occurs on both amd64 and i386 installations. The occurrence is not always but during majority of runs.

Please also see comment #7 for more information

Jun 28 09:25:47 kernel: [ 242.360845] Kernel BUG at ffffffff8125d351 [verbose debug info unavailable]
Jun 28 09:25:47 kernel: [ 242.360848] invalid opcode: 0000 [#1] SMP
Jun 28 09:25:47 kernel: [ 242.360851] Modules linked in: squashfs(F) xfs(F) reiserfs jfs btrfs(F) xor(F) zlib_deflate(F) raid6_pq(F) libcrc32c(F) ext2(F) virtio_balloon(F) nls_utf8(F) isofs(F) usb_storage(F) vga16fb(F) vgastate(F) floppy(F)
Jun 28 09:25:47 kernel: [ 242.360865] CPU: 0 PID: 20274 Comm: dpkg Tainted: GF 3.10.0-0-generic #7-Ubuntu
Jun 28 09:25:47 kernel: [ 242.360867] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
Jun 28 09:25:47 kernel: [ 242.360869] task: ffff88001c552ee0 ti: ffff880012c7a000 task.ti: ffff880012c7a000
Jun 28 09:25:47 kernel: [ 242.360871] RIP: 0010:[<ffffffff8125d351>] [<ffffffff8125d351>] ext4_mb_release_group_pa+0x111/0x120
Jun 28 09:25:47 kernel: [ 242.360894] RSP: 0018:ffff880012c7ba80 EFLAGS: 00010202
Jun 28 09:25:47 kernel: [ 242.360896] RAX: 0000000000000013 RBX: ffff88000fe82d00 RCX: 0000000000000000
Jun 28 09:25:47 kernel: [ 242.360897] RDX: 0000000000001fff RSI: 0000000000027fff RDI: 0000000000000001
Jun 28 09:25:47 kernel: [ 242.360899] RBP: ffff880012c7bab0 R08: 0000000000002000 R09: ffff880012c7ba80
Jun 28 09:25:47 kernel: [ 242.360900] R10: ffff880012c7ba84 R11: 0000000000000000 R12: ffff88001582a800
Jun 28 09:25:47 kernel: [ 242.360902] R13: ffff88000fe82d00 R14: ffff880012c7bae8 R15: 0000000000000014
Jun 28 09:25:47 kernel: [ 242.360904] FS: 00007fb1de042800(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000
Jun 28 09:25:47 kernel: [ 242.360905] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 28 09:25:47 kernel: [ 242.360907] CR2: 00007fb1ddeb8000 CR3: 0000000014b9a000 CR4: 00000000000006f0
Jun 28 09:25:47 kernel: [ 242.360913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 28 09:25:47 kernel: [ 242.360918] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 28 09:25:47 kernel: [ 242.360919] Stack:
Jun 28 09:25:47 kernel: [ 242.360920] 00001fff00000013 ffff88000fe82d00 ffff880012c7bab8 ffff88000fe82d20
Jun 28 09:25:47 kernel: [ 242.360923] ffff88001582a800 0000000000000014 ffff880012c7bb48 ffffffff8125d5ba
Jun 28 09:25:47 kernel: [ 242.360926] ffff880012c7bae8 ffff88001bd6a500 ffff880012c7bae8 ffff88000fe82d20
Jun 28 09:25:47 kernel: [ 242.360928] Call Trace:
Jun 28 09:25:47 kernel: [ 242.360934] [<ffffffff8125d5ba>] ext4_mb_discard_lg_preallocations+0x25a/0x350
Jun 28 09:25:47 kernel: [ 242.360937] [<ffffffff8125db68>] ext4_mb_release_context+0x4b8/0x590
Jun 28 09:25:47 kernel: [ 242.360941] [<ffffffff812608e4>] ext4_mb_new_blocks+0x304/0x540
Jun 28 09:25:47 kernel: [ 242.360958] [<ffffffff8113c668>] ? release_pages+0x1d8/0x210
Jun 28 09:25:47 kernel: [ 242.360961] [<ffffffff812570ed>] ext4_ext_map_blocks+0x64d/0xf60
Jun 28 09:25:47 kernel: [ 242.360967] [<ffffffff8122df05>] ext4_map_blocks+0x295/0x4a0
Jun 28 09:25:47 kernel: [ 242.360971] [<ffffffff8123086e>] mpage_da_map_and_submit+0x14e/0x420
Jun 28 09:25:47 kernel: [ 242.360974] [<ffffffff8123134e>] ? ext4_da_writepages+0x2de/0x5d0
Jun 28 09:25:47 kernel: [ 242.360977] [<ffffffff8123139d>] ext4_da_writepages+0x32d/0x5d0
Jun 28 09:25:47 kernel: [ 242.360986] [<ffffffff8113b00e>] do_writepages+0x1e/0x40
Jun 28 09:25:47 kernel: [ 242.360989] [<ffffffff81130cb9>] __filemap_fdatawrite_range+0x59/0x60
Jun 28 09:25:47 kernel: [ 242.360991] [<ffffffff81130d83>] filemap_fdatawrite_range+0x13/0x20
Jun 28 09:25:47 kernel: [ 242.361000] [<ffffffff811c2f8a>] SyS_sync_file_range+0x14a/0x160
Jun 28 09:25:47 kernel: [ 242.361007] [<ffffffff816de62f>] tracesys+0xe1/0xe6
Jun 28 09:25:47 kernel: [ 242.361009] Code: c3 10 31 d2 45 89 e9 45 89 f0 44 89 f9 4c 89 e6 ff d0 48 8b 03 48 85 c0 75 e0 eb 88 41 8b 7d 4c 85 ff 75 07 31 c9 e9 46 ff ff ff <0f> 0b 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
Jun 28 09:25:47 kernel: [ 242.361032] RIP [<ffffffff8125d351>] ext4_mb_release_group_pa+0x111/0x120
Jun 28 09:25:47 kernel: [ 242.361035] RSP <ffff880012c7ba80>
Jun 28 09:25:47 kernel: [ 242.361039] ---[ end trace e4be50d51f7681ce ]---
Jun 28 09:25:47 in-target: E
Jun 28 09:25:47 in-target: :
Jun 28 09:25:47 in-target: Sub-process /usr/bin/dpkg received a segmentation fault.
Jun 28 09:25:47 in-target:
Jun 28 09:25:47 in-target: debconf (developer): <-- STOP

Steps to reproduce:
1. Do the following to install utah
   sudo apt-add-repository -y ppa:utah/stable
   sudo apt-get update
   sudo apt-get install utah

2. Now run,

sudo -u utah -i run_utah_tests.py -i /var/cache/utah/iso/saucy-server-amd64.iso -p lp:ubuntu-test-cases/server/preseeds/multi-lvm.preseed lp:ubuntu-test-cases/server/runlists/multi-lvm.run -x /etc/utah/bridged-network-vm.xml

Note: multi-lvm tests possibly mean that there are multiple partitions with lvm. Please see the pressed file in http://bazaar.launchpad.net/~ubuntu-server-dev/ubuntu-test-cases/server-tests-raring/view/head:/preseeds/multi-lvm.preseed to see the relevant entries.

Attached is the console output saved in a txt file.

The impacted smoke test is:
https://jenkins.qa.ubuntu.com/view/Saucy/view/Smoke%20Testing/job/saucy-server-amd64-smoke-multi-lvm/54/

Para Siva (psivaa) wrote :
Para Siva (psivaa) wrote :
description: updated
Para Siva (psivaa) wrote :

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1195710

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: saucy
Para Siva (psivaa) wrote :

Could not collect further logs and the bug was encountered during the installation.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Para Siva (psivaa) on 2013-06-28
description: updated
Para Siva (psivaa) on 2013-06-28
description: updated
Para Siva (psivaa) wrote :
Download full text (4.5 KiB)

The stack trace for i386 failure is
Jun 28 13:06:55 kernel: [ 196.197710] Kernel BUG at c12094bc [verbose debug info unavailable]
Jun 28 13:06:55 kernel: [ 196.197713] invalid opcode: 0000 [#1] SMP
Jun 28 13:06:55 kernel: [ 196.197716] Modules linked in: squashfs(F) xfs(F) reiserfs jfs btrfs(F) xor(F) zlib_deflate(F) raid6_pq(F) libcrc32c(F) ext2(F) virtio_balloon(F) nls_utf8(F) isofs(F) usb_storage(F) vga16fb(F) vgastate(F) floppy(F)
Jun 28 13:06:55 kernel: [ 196.197727] CPU: 0 PID: 31522 Comm: in-target Tainted: GF 3.10.0-0-generic #7-Ubuntu
Jun 28 13:06:55 kernel: [ 196.197729] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
Jun 28 13:06:55 kernel: [ 196.197731] task: ddec4d40 ti: d36aa000 task.ti: d36aa000
Jun 28 13:06:55 kernel: [ 196.197734] EIP: 0060:[<c12094bc>] EFLAGS: 00010202 CPU: 0
Jun 28 13:06:55 kernel: [ 196.197751] EIP is at ext4_mb_release_group_pa+0x10c/0x110
Jun 28 13:06:55 kernel: [ 196.197753] EAX: 00000001 EBX: d6a1c580 ECX: 00000015 EDX: 00000000
Jun 28 13:06:55 kernel: [ 196.197754] ESI: d3fba240 EDI: d3fba240 EBP: d36abb30 ESP: d36abb04
Jun 28 13:06:55 kernel: [ 196.197756] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Jun 28 13:06:55 kernel: [ 196.197758] CR0: 80050033 CR2: b76140f2 CR3: 019f7000 CR4: 000006b0
Jun 28 13:06:55 kernel: [ 196.197765] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jun 28 13:06:55 kernel: [ 196.197768] DR6: ffff0ff0 DR7: 00000400
Jun 28 13:06:55 kernel: [ 196.197770] Stack:
Jun 28 13:06:55 kernel: [ 196.197771] d36abb1c d36abb20 d36abb30 c1208922 d36abb74 ddadf000 00000015 00001fff
Jun 28 13:06:55 kernel: [ 196.197775] d6a1c580 d3fba240 ddadf000 d36abba0 c12096af d4c09230 00000000 00000000
Jun 28 13:06:55 kernel: [ 196.197779] dd2987e0 ddadf400 dd121040 dd2987e0 d36abb74 00000016 d3fba250 d3fba3a8
Jun 28 13:06:55 kernel: [ 196.197783] Call Trace:
Jun 28 13:06:55 kernel: [ 196.197788] [<c1208922>] ? ext4_mb_load_buddy+0x212/0x2f0
Jun 28 13:06:55 kernel: [ 196.197792] [<c12096af>] ext4_mb_discard_lg_preallocations+0x1ef/0x2d0
Jun 28 13:06:55 kernel: [ 196.197795] [<c1209ba6>] ext4_mb_release_context+0x416/0x4f0
Jun 28 13:06:55 kernel: [ 196.197799] [<c120c749>] ext4_mb_new_blocks+0x309/0x560
Jun 28 13:06:55 kernel: [ 196.197802] [<c12034de>] ext4_ext_map_blocks+0x9ee/0xf10
Jun 28 13:06:55 kernel: [ 196.197808] [<c1223db9>] ? jbd2_journal_get_write_access+0x29/0x30
Jun 28 13:06:55 kernel: [ 196.197814] [<c11dd819>] ? ext4_mark_iloc_dirty+0x3c9/0x5a0
Jun 28 13:06:55 kernel: [ 196.197818] [<c11db555>] ext4_map_blocks+0x2b5/0x4d0
Jun 28 13:06:55 kernel: [ 196.197830] [<c1108b82>] ? find_get_pages_tag+0xb2/0x150
Jun 28 13:06:55 kernel: [ 196.197834] [<c11ddd40>] mpage_da_map_and_submit+0x120/0x5c0
Jun 28 13:06:55 kernel: [ 196.197838] [<c12048bd>] ? __ext4_journal_start_sb+0x6d/0x130
Jun 28 13:06:55 kernel: [ 196.197841] [<c11de8a8>] ext4_da_writepages+0x2b8/0x550
Jun 28 13:06:55 kernel: [ 196.197845] [<c1111d9a>] do_writepages+0x1a/0x40
Jun 28 13:06:55 kernel: [ 196.197848] [<c1109814>] __filemap_fdatawrite_range+0x54/0x60
Jun 28 13:06:55 kernel: [ 196.197850] [<c11098eb>] filemap_flush+0x2b/0x30
Jun 28 13:06:55 kernel:...

Read more...

Para Siva (psivaa) wrote :

So this appears to happen on installations (i386 and amd64) only during multi-lvm installations. i.e.
d-i partman-auto/choose_recipe \
       select Separate /home, /usr, /var, and /tmp partitions
Tried a number of runs of single lvm tests but could not see the issue there and other server installations.
 "d-i partman-auto/choose_recipe select All files in one partition (recommended for new users)" is the relevant preseed entry in single lvm test installations.

One other difference I notice in between two preseeds is that in multi-lvm(where the bug is) I see the following entry
d-i base-installer/kernel/override-image string linux-generic-pae
as opposed to that in simple lvm (where the failure does not occur)
d-i base-installer/kernel/override-image string linux-server

Incidentally I notice that this linux-generic-pae is only present in the failing multi-lvm preseeds. No other server tests use this preseed and they seem to be working fine. Curious if that's any reason.

Andy Whitcroft (apw) wrote :

In saucy both linux-server and linux-generic-pae should be meta packages trigging an install of linux-generic. So I would expect those overrides to mean the same thing.

Para Siva (psivaa) on 2013-06-28
description: updated
Changed in linux (Ubuntu):
importance: Undecided → Medium
importance: Medium → High
tags: added: kernel-da-key
Para Siva (psivaa) wrote :

This issue is still occurring with 3.10.0-2-generic as well.

I (blindly) run the test with 40 G disk space to see if the issue goes away but it doesn't. But when I increased the memory from 512 M to 1G the issue did not occur out of my 7 attempts. Not sure if it narrows down the issue but just tried.

Para Siva (psivaa) wrote :

The preseed file used for this installation

Andy Whitcroft (apw) wrote :

I have been able to reproduce this locally in a KVM instance. This does appear to be an ext4 level issue, quite why this occurs only in the more complex case I am sure. Investigation continues.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.10.0-2.11

---------------
linux (3.10.0-2.11) saucy; urgency=low

  [ Andy Whitcroft ]

  * [Config] enforce CONFIG_DEBUG_INFO
  * SAUCE: intel_pstate -- toggle default to disable
    - LP: #1188647
  * [Packaging] add accellerators for binary-udeb
  * SAUCE: ext4: fix ext4_get_group_number() at cluster boundaries
    - LP: #1195710

  [ John Johansen ]

  * SAUCE: (no-up) apparmor: fix apparmor module status for none root users
    - LP: #1199912

  [ Leann Ogasawara ]

  * d-i: Add qlcnic to nic-modules
    - LP: #1196597

  [ Tim Gardner ]

  * [Debian] Prepare to build using arch specific compiler
  * Build armhf using gcc-4.7
 -- Tim Gardner <email address hidden> Thu, 11 Jul 2013 08:56:44 -0600

Changed in linux (Ubuntu Saucy):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers