Ubuntu 10.04 freeze on boot when using more than 578M in Xenserver / Xen

Bug #790747 reported by Davim on 2011-05-31
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Lucid
High
Unassigned

Bug Description

Binary package hint: linux-image-2.6.35-29-generic-pae

Hello,
I'm using Ubuntu 10.04.2 LTS as a gest on a Citrix Xenserver Pool.
After upgrading the kernel to 2.6.35-29-generic-pae, released a few days ago some of my VMs stoped booting.
If I reduce the VM memory to 512M it will boot but if I give the VM 1024M or 2048M of RAM all I get is a black screen and the VM won't boot.

The VM was working fine with 2048M of RAM before the reboot.

Davim (davim) on 2011-05-31
description: updated
Davim (davim) wrote :

Just made a new test to confirm the bug.

I've created a new VM with 1024M of RAM and used Ubuntu 10.04 net install.
The VM installs fine but it doesn't even boot for the first time, if I reduce the memory to 512M the VM boots into the newly installed OS...

Davim (davim) on 2011-06-01
Changed in linux (Ubuntu):
status: New → Confirmed
Davim (davim) on 2011-06-01
summary: - Ubuntu 10.04 freeze on boot when using more than 512M
+ Ubuntu 10.04 freeze on boot when using more than 512M in Xenserver
Davim (davim) on 2011-06-01
tags: added: regression-update

This is very likely the result of an upstream stable patch that was discovered late in the testing and unfortunately the kernel was released despite that finding. The following patch is suspected to break Xen:

commit 5490ee42c5725aa3e32634c19f70913e0e634d0c
Author: Stefano Stabellini <email address hidden>
Date: Fri Feb 18 11:32:40 2011 +0000

    xen: set max_pfn_mapped to the last pfn mapped

We try to resolve this as quickly as possible by reverting that patch and are still in the process of finding an acceptable long-term solution for the issue the patch tried to address.

tags: added: kernel-server lucid
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu Lucid):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
importance: Undecided → High
status: New → In Progress
Davim (davim) wrote :

Ok, what bug did that patch affect? What was it supposed to fix?

Stefan Bader (smb) wrote :

The patch came in as part of an upstream stable update, so there was not a specific lp bug addressed. It was supposed to fix some boot problems on 64bit (related to himap cleanup). I still need to understand things better to know whether this is just a quite bad side-effect or whether the change is plainly wrong for kernel versions before around 2.6.38.

Davim (davim) wrote :

Ok, thnanks for the reply.

Please let me know if you need me for any testing.

Mathieu Mitchell (mat128) wrote :

This bug also affects 2.6.32-32 for Lucid.
It is also easily reproduced using Debian Squeeze + Xen 4.0.

Here is a list of what we tested:
Ubuntu Lucid VM on 2.6.32-31, 2G RAM --> Works
Ubuntu Lucid VM on 2.6.32-32, 2G RAM --> Fails
Ubuntu Lucid VM on 2.6.32-32, 500MB --> Works

Steps to reproduce:

- Install Debian Squeeze on your host server
- Install Xen 4 according to Debian's documentation
- Use xen-tools (xen-create-image) to setup a Ubuntu Lucid virtual machine
- Boot using more than 512MB ram, it should fail.
- Boot using less than 512MB ram (e.g. 500MB), it should work.

summary: - Ubuntu 10.04 freeze on boot when using more than 512M in Xenserver
+ Ubuntu 10.04 freeze on boot when using more than 512M in Xenserver / Xen

The value for my setup is not 512MB.

Here are my test results using a Ubuntu Lucid VM on 2.6.32-32:
512M OK
576M OK
578M OK
579M BAD
580M BAD
584M BAD
592M BAD
608M BAD
640M BAD
768M BAD
1024M BAD
2048M BAD

Mathieu Mitchell (mat128) wrote :

Exact same results (579MB = no boot) on a different distro (custom built) for the host and a freshly installed Ubuntu Lucid on 2.6.32-32.

summary: - Ubuntu 10.04 freeze on boot when using more than 512M in Xenserver / Xen
+ Ubuntu 10.04 freeze on boot when using more than 578M in Xenserver / Xen
Mathieu Mitchell (mat128) wrote :

Looks highly related to Debian bug report 621072: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=621072

Bastian over there reverted commit 67e87f0a1c5cbc750f81ebf6a128e8ff6f4376cc (https://patchwork.kernel.org/patch/252321/) and it worked for them.

Stefan Bader (smb) wrote :

@Mathieu, it is sort of related to the whole thing of cleanup_highmap suddenly poking at page table mappings it did not before (on 64bit) and Xen trying to work-around that in a way that changes behaviour on 32bit. We will likely see a partial revert of that upstream. But for the patch you mentioned, that was never added to stable and not in any 10.04.

I don't think we really need to change max_pfn_mapped behaviour for 2.6.32-2.6.35 since the change to cleanup_highmap was reverted from both longterm trees and so Xen does not need to adapt for that. If longterm ever picks up cleanup_highmap, we will need the xen patch we reverted, the patch Mathieu pointed out and the partial revert which I do not see upstream yet.

I just checked and any 2.6.32-33.64 kernel or higher for Lucid should work again. Can someone confirm this?

Stefan Bader (smb) wrote :

For Maverick (10.10) this should be fixed with 2.6.35-30.52 or higher.

Mathieu Mitchell (mat128) wrote :

I can confirm that the following kernel boots fine with 2048M allocated to the virtual machine:

# uname -a
Linux testlucid 2.6.32-33-generic-pae #66-Ubuntu SMP Tue Jun 7 20:02:23 UTC 2011 i686 GNU/Linux
# dpkg -l | grep linux-image-2.6.32-33
ii linux-image-2.6.32-33-generic-pae 2.6.32-33.66 Linux kernel image for version 2.6.32 on x86
# free -m
             total used free shared buffers cached
Mem: 2009 56 1952 0 14 17
-/+ buffers/cache: 24 1985
Swap: 1023 0 1023

Stefan Bader (smb) wrote :

Revert of upstream patch scheduled to be in Ubuntu-2.6.32-33.64 or later.

Changed in linux (Ubuntu Lucid):
status: In Progress → Fix Committed
Stefan Bader (smb) wrote :

Setting manually to released.

Changed in linux (Ubuntu Lucid):
assignee: Stefan Bader (stefan-bader-canonical) → nobody
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.