3.0.0.-23.38-virtual kernel regression kills EC2 instances

Bug #1026690 reported by Ben Howard
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Critical
Unassigned
Oneiric
Fix Released
Critical
Stefan Bader

Bug Description

Oneiric instance that upgrade to the latest kernel and new builds based on the 3.0.0-23.38-virtual kernel crash on boot and do not come up. Console log shows that the instance hangs after boot.

See:
https://jenkins.qa.ubuntu.com/view/ec2%20AMI%20Testing/view/Overview/job/oneiric-server-ec2-daily/

Example console log:
[ 0.636857] registered taskstats version 1
[ 0.637111] BUG: unable to handle kernel paging request at e4963dd8
[ 0.637120] IP: [<c0394a70>] atomic64_read_cx8+0x4/0xc
[ 0.637133] *pdpt = 0000000024964027 *pde = 0000000001fd8067 *pte = 8000000024963061
[ 0.637144] Oops: 0003 [#1] SMP
[ 0.637150] Modules linked in:
[ 0.637156]
[ 0.637160] Pid: 38, comm: modprobe Not tainted 3.0.0-23-virtual #38-Ubuntu
[ 0.637169] EIP: 0061:[<c0394a70>] EFLAGS: 00010246 CPU: 0
[ 0.637175] EIP is at atomic64_read_cx8+0x4/0xc
[ 0.637180] EAX: 00000000 EBX: 00000000 ECX: e4963dd8 EDX: e4963dd8
[ 0.637186] ESI: b7712000 EDI: e4963dd8 EBP: e497fd4c ESP: e497fd10
[ 0.637193] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: e021
[ 0.637198] Process modprobe (pid: 38, ti=e497e000 task=e4908cc0 task.ti=e497e000)
[ 0.637206] Stack:
[ 0.637209] c02080a7 e497fd28 c0106fae 00000001 b7711fff e497fd90 e4965160 e4960010
[ 0.637222] b7711fff c01f2750 b7712000 b7710000 00000000 e4965160 b7712000 e497fd74
[ 0.637234] c0208679 b7712000 00000000 e48ff400 b7710000 e497fd90 e48ff400 e4965108
[ 0.637247] Call Trace:
[ 0.637255] [<c02080a7>] ? unmap_page_range+0x167/0x220
[ 0.637263] [<c0106fae>] ? __raw_callee_save_xen_restore_fl+0x6/0x8
[ 0.637273] [<c01f2750>] ? pagevec_move_tail+0x30/0x30
[ 0.637279] [<c0208679>] unmap_vmas+0x99/0x110
[ 0.637285] [<c020c37e>] unmap_region+0x7e/0xe0
[ 0.637292] [<c020c13d>] ? detach_vmas_to_be_unmapped+0x7d/0xc0
[ 0.637299] [<c020d1fe>] do_munmap+0x1ae/0x200
[ 0.640002] [<c0275f73>] ? do_mmap+0x63/0x70
[ 0.640002] [<c0650725>] elf_map+0xa7/0xe6
[ 0.640002] [<c0650947>] load_elf_interp+0x1e3/0x309
[ 0.640002] [<c0276670>] load_elf_binary+0x6f0/0xad0
[ 0.640002] [<c03965e2>] ? _copy_from_user+0x42/0x60
[ 0.640002] [<c023a8fd>] search_binary_handler+0xad/0x2c0
[ 0.640002] [<c0275f80>] ? do_mmap+0x70/0x70
[ 0.640002] [<c023b98f>] do_execve_common+0x1ef/0x270
[ 0.640002] [<c023ba27>] do_execve+0x17/0x20
[ 0.640002] [<c01119c7>] sys_execve+0x37/0x70
[ 0.640002] [<c06664ee>] ptregs_execve+0x12/0x18
[ 0.640002] [<c065f594>] ? syscall_call+0x7/0xb
[ 0.640002] [<c01b00d8>] ? kallsyms_symbol_next+0x48/0x80
[ 0.640002] [<c010d580>] ? kernel_execve+0x20/0x30
[ 0.640002] [<c0168125>] ? ____call_usermodehelper+0xd5/0x100
[ 0.640002] [<c0168050>] ? __call_usermodehelper+0x90/0x90
[ 0.640002] [<c06669be>] ? kernel_thread_helper+0x6/0x10
[ 0.640002] Code: 53 ff 0f 45 c2 eb ee 8d 74 26 00 55 89 e5 83 ec 04 8d 45 14 8b 4d 10 89 04 24 8b 55 0c 8b 45 08 e8 b6 ff ff ff c9 c3 89 d8 89 ca <3e> 0f c7 09 c3 8d 76 00 0f c7 0e 75 fb c3 66 90 89 d8 89 ca 3e
[ 0.640002] EIP: [<c0394a70>] atomic64_read_cx8+0x4/0xc SS:ESP e021:e497fd10
[ 0.640002] CR2: 00000000e4963dd8
[ 0.640002] ---[ end trace edcb400dc294c6aa ]---
[ 0.792952] blkfront: xvda1: barrier or flush: disabled
[ 0.794662] Setting capacity to 16777216
[ 0.794675] xvda1: detected capacity change from 0 to 8589934592

Brad Figg (brad-figg)
affects: linux-meta (Ubuntu) → linux (Ubuntu)
Scott Moser (smoser)
tags: added: ec2-images regression-release
Revision history for this message
Kate Stewart (kate.stewart) wrote :

clean up tagging and servies targetting to reflect scope is oneiric.

Changed in linux (Ubuntu Oneiric):
status: New → Confirmed
importance: Undecided → Critical
tags: added: oneiric
Andy Whitcroft (apw)
Changed in linux (Ubuntu Oneiric):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
tags: added: kernel-da-key
Revision history for this message
Stefan Bader (smb) wrote :

This looked much like something that was discussed on stable for 3.2 and looking there we have the following commit (which for Precise is directly after the commit that introduced the problem and both being not released, yet). Natty has neither patch.

commit 6e60d7d7667a47b1b8760a45d95547799f4df2c5
Author: Andrea Arcangeli <email address hidden>
Date: Wed Jun 20 12:52:57 2012 -0700
    thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
    BugLink: http://bugs.launchpad.net/bugs/1022747
    commit e4eed03fd06578571c01d4f1478c874bb432c815 upstream.

commit 8c35b3d703285b89e5508cf74c786177ffabf216
Author: Andrea Arcangeli <email address hidden>
Date: Tue May 29 15:06:49 2012 -0700
    mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race conditi
    BugLink: http://bugs.launchpad.net/bugs/1022747
    commit 26c191788f18129af0eb32a358cdaea0c7479626 upstream.

Revision history for this message
Stefan Bader (smb) wrote :

For 3.0.y upstream only the ("mm: pmd_read_atomic: fix 32bit..." has been picked and released, the other patch is missing there.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

Resync of S3 EC2 Mirror started 18:07 UTC. Package should be removed within ~30 minutes.

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Oneiric):
status: Confirmed → Fix Committed
Revision history for this message
Luis Henriques (henrix) wrote :

From comments in bug #1026730 and IRC chat, I'm tagging this bug as verified in Oneiric.

tags: added: verification-done-oneiric
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.0.0-23.39

---------------
linux (3.0.0-23.39) oneiric-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #1026730

  [ Upstream Kernel Changes ]

  * thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
    - LP: #1026690
 -- Luis Henriques <email address hidden> Thu, 19 Jul 2012 10:54:38 -0700

Changed in linux (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.