Comment 15 for bug 1111470

Revision history for this message
Konrad Rzeszutek Wilk (konrad-wilk) wrote : Re: Precise kernel not bootable under Xen - alloc_l1_table

Stefan, so your gut feeling about the 228000 was right.

he E820_UNUSABLE regions are memory that "can" be used (and if you look in 'xen_memory_setup' that is how we set aside the memory for the balloon region - look for the 'type = E820_UNUSABLE). So by that logic, E820_UNUSABLE region should get the same treatment as the rest of the E820_RAM memory.

So this "hack"
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 8971a26..e8172bf 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -396,6 +396,7 @@ char * __init xen_memory_setup(void)
      extra_pages);
  i = 0;
  while (i < memmap.nr_entries) {
+ bool fix_unusable = true;
   u64 addr = map[i].addr;
   u64 size = map[i].size;
   u32 type = map[i].type;
@@ -407,9 +408,16 @@ char * __init xen_memory_setup(void)
     size = min(size, (u64)extra_pages * PAGE_SIZE);
     extra_pages -= size / PAGE_SIZE;
     xen_add_extra_mem(addr, size);
- } else
+ } else {
     type = E820_UNUSABLE;
+ fix_unusable = false;
+ }
   }
+ /*
+ * Not sure about this.
+ */
+ if (type == E820_UNUSABLE && fix_unusable)
+ type = E820_RESERVED;

   xen_align_and_add_e820_region(addr, size, type);

Would potentially fix it. But I am not sure what are the other cases where:
 a) It is OK to ignore E820_UNUSABLE altogether as provided by the hypervisor. Are there legitimate reasons for the BIOS to mark those as E820_UNUSABLE? Perhaps memory hotplug? (Jinsong from Intel could help answer that).
 b) Other? Perhaps the fix is in the hypervisor by clipping said memory completely out of the E820? In other words as if it had run with the 'mem=X' and it is oblivious to the non-MTRR region. But what if the MTRR region lies right in smack of other regions (like https://lkml.org/lkml/2012/8/24/474). The choice there would be to remove the E820 completly (but then we would think it is a PCI region, which might be OK or not - but this reminds me of http://lists.xen.org/archives/html/xen-devel/2011-02/msg01238.html where "gaps" are considered as I/O regions and could end up with the intel-agp trying to use it as its "flush" region).
c) Just leave it is as and document users to use 'dom0_mem=max' ? Perhaps we should codify it then? So if we detect the MTRR invalid regions we automatically set the dom0 maxpages as if 'dom0_mem=max:<up to MTRR>' was done? In reality that is what the hypervisor is doing - it will ignore those regions altogether - it is our misfortunate that we treat the region as if it was RAM (which actually is the right *thing*).

Thoughts?