ami-6836dc01 8.04 32 bit AMI kernel lock bug
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Medium
|
Unassigned | ||
Hardy |
Fix Released
|
Undecided
|
Stefan Bader |
Bug Description
SRU Justification:
Impact: For i386 PGDs are stored in a linked list. For this two elements of struct page are (mis-)used. To have a backwards pointer, the private field is assigned a pointer to the index field of the previous struct page. The main problem there was that list_add and list_del operations accidentally were done twice. Which leads to accesses to (after first list operation) innocent struct pages.
Fix: This is a bit more than needed to fix the bug itself, but it will bring our code more into a shape that resembles upstream (factually there is only a 2.6.18 upstream but that code did not do the double list access).
Testcase: Running a 32bit domU (64bit Hardy dom0, though that should not matter) with the xen kernel and doing a lot of process starts (like the aslr qa regression test does) would quite soon crash because the destructor of a PTE (which incidentally is stored in index) was suddenly overwritten.
---
For months we have been working around a bug in ami-6836dc01, but
this seems not to be reported any place. Is this a known issue?
When we use ruby/puppet (from the Canonical repo) on an instance with
this AMI (e.g. a c1.medium) or in some cases when using java
applications the instance gets locked up.
Our work-around is using kernel 2.6.27-22-xen instead - the person
who created the fixed AMI used this method:
- launch instance of ami-7e28ca17 (instance #1)
- modprobe loop on instance #1
- copy up creds, jdk and ec2-ami-tools to /dev/shm on instance #1
- launch instance of ami-69d73000
(canonical-
to grab kernel modules from (instance #2)
- tar.gz /lib/modules/
- scp to instance #1 and untar in /lib/modules
- rm -rf the old /lib/modules/
- edit quick-bundle script on instance #1 to hard-code AKI to
aki-20c12649, ARI to ari-21c12648 (the AKI and ARI from instance #2).
- hard-coded manifest name, bucket to whatever.
- run pre-clean script on instance #1
- run quick-bundle script on instance #1
The console output from a locked instance is attached
affects: | ubuntu → linux-meta (Ubuntu) |
Changed in linux-meta (Ubuntu): | |
importance: | Undecided → Medium |
affects: | linux-meta (Ubuntu) → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
assignee: | nobody → Stefan Bader (stefan-bader-canonical) |
status: | New → In Progress |
Changed in linux (Ubuntu Hardy): | |
assignee: | nobody → Stefan Bader (stefan-bader-canonical) |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
status: | In Progress → Invalid |
assignee: | Stefan Bader (stefan-bader-canonical) → nobody |
Changed in linux (Ubuntu Hardy): | |
status: | In Progress → Fix Committed |
description: | updated |
description: | updated |
Hi, hardy-8. 04-i386- server- 20091130 beta-us/ vmlinuz- 2.6.27- 22-xen- i386-us. manifest. xml).
Thank you for taking the time to open a bug report.
Just for clarity, the issue here is with:
us-east-1 ami-7e28ca17 canonical ubuntu-
And the bug opener has found that they can fix the issue by using the jaunty kernels, which were never officially released by Canonical. (aki-20c12649 canonical-
The ami itself for hardy is not the newest release, but the kernel it uses is the newest released kernels for hardy (ubuntu- kernels- us/ubuntu- hardy-i386- linux-image- 2.6.24- 10-xen- v-2.6.24- 10-kernel. img.manifest. xml).
to help us debug the issue, could you please hardy-daily- i386-server- 20110314 hardy-i386- linux-image- 2.6.24- 28-xen- v-2.6.24- 28.86-kernel
a.) give any information you have on how you can reproduce this issue
b.) try to reproduce on one of the hardy daily build amis. The latest daily build image of hardy uses a substantially newer kernel. I would suggest trying to reproduce on:
us-east-1 ami-22cc3e4b canonical ubuntu-
which uses kernel:
us-east-1 aki-7e15e617 canonical ubuntu-