Crash in dom0 when accessing clipped RAM

Bug #1111470 reported by Bob Ball
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xen (Ubuntu)
Incomplete
Medium
Stefan Bader
Precise
Won't Fix
Medium
Unassigned
Quantal
Won't Fix
Medium
Unassigned

Bug Description

Precise kernel 3.2.0-36 is not bootable, however upgrading to Quantal kernel (linux-image-3.5.0-22-generic) resovles the issue. Serial output from Precise kernel is:

(XEN) Xen version 4.1.2 (Ubuntu 4.1.2-2ubuntu2.5) (<email address hidden>) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) Tue Jan 8 14:07:12 UTC 2013
(XEN) Bootloader: GRUB 1.99-21ubuntu3.7
(XEN) Command line: placeholder loglvl=all guest_loglvl=all com1=115200,8n1 console=com1,vga noreboot
(XEN) Video information:
(XEN) VGA is text mode 80x25, font 8x16
(XEN) VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN) Found 2 MBR signatures
(XEN) Found 2 EDD information structures
(XEN) WARNING: MTRRs do not cover all of memory.
(XEN) Truncating RAM from 9109504kB to 9043968kB
(XEN) Xen-e820 RAM map:
(XEN) 0000000000000000 - 000000000008f000 (usable)
(XEN) 000000000008f000 - 00000000000a0000 (reserved)
(XEN) 00000000000e0000 - 0000000000100000 (reserved)
(XEN) 0000000000100000 - 00000000ce674000 (usable)
(XEN) 00000000ce674000 - 00000000ce6cc000 (ACPI NVS)
(XEN) 00000000ce6cc000 - 00000000cf5fb000 (usable)
(XEN) 00000000cf5fb000 - 00000000cf608000 (reserved)
(XEN) 00000000cf608000 - 00000000cf6a6000 (usable)
(XEN) 00000000cf6a6000 - 00000000cf6ab000 (ACPI data)
(XEN) 00000000cf6ab000 - 00000000cf6f2000 (ACPI NVS)
(XEN) 00000000cf6f2000 - 00000000cf6f3000 (usable)
(XEN) 00000000cf6f3000 - 00000000cf6ff000 (ACPI data)
(XEN) 00000000cf6ff000 - 00000000cf700000 (usable)
(XEN) 00000000cf700000 - 00000000d0000000 (reserved)
(XEN) 00000000fff00000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 0000000228000000 (usable)
(XEN) 0000000228000000 - 000000022c000000 (unusable)
(XEN) ACPI: RSDP 000FE020, 0014 (r0 INTEL )
(XEN) ACPI: RSDT CF6FD038, 004C (r1 INTEL DG965SS 685 1000013)
(XEN) ACPI: FACP CF6FC000, 0074 (r1 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: DSDT CF6F7000, 40E9 (r1 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: FACS CF6AB000, 0040
(XEN) ACPI: APIC CF6F6000, 0078 (r1 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: WDDT CF6F5000, 0040 (r1 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: MCFG CF6F4000, 003C (r1 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: ASF! CF6F3000, 00A6 (r32 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: SSDT CF6AA000, 01BC (r1 INTEL CpuPm 685 MSFT 1000013)
(XEN) ACPI: SSDT CF6A9000, 0175 (r1 INTEL Cpu0Ist 685 MSFT 1000013)
(XEN) ACPI: SSDT CF6A8000, 0175 (r1 INTEL Cpu1Ist 685 MSFT 1000013)
(XEN) ACPI: SSDT CF6A7000, 0175 (r1 INTEL Cpu2Ist 685 MSFT 1000013)
(XEN) ACPI: SSDT CF6A6000, 0175 (r1 INTEL Cpu3Ist 685 MSFT 1000013)
(XEN) System RAM: 8053MB (8247112kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-0000000228000000
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 000fe200
(XEN) DMI 2.4 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x408
(XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[404,0], pm1x_evt[400,0]
(XEN) ACPI: wakeup_vec[cf6ab00c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) Processor #0 6:15 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
(XEN) Processor #1 6:15 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode: Flat. Using 1 I/O APICs
(XEN) PCI: MCFG configuration 0: base f0000000 segment 0 buses 0 - 127
(XEN) PCI: Not using MMCONFIG.
(XEN) Table is not found!
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) IRQ limits: 24 GSI, 376 MSI/MSI-X
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2131.258 MHz processor.
(XEN) Initing memory sharing.
(XEN) mce_intel.c:1162: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank 1 extended MCE MSR 0
(XEN) Intel machine check reporting enabled
(XEN) I/O virtualisation disabled
(XEN) ENABLING IO-APIC IRQs
(XEN) -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) Platform timer is 3.579MHz ACPI PM Timer
�(XEN) Allocated console ring of 16 KiB.
(XEN) VMX: Supported advanced features:
(XEN) - APIC TPR shadow
(XEN) - MSR direct-access bitmap
(XEN) HVM: ASIDs disabled.
(XEN) HVM: VMX enabled
(XEN) Brought up 2 CPUs
(XEN) CPUIDLE: disabled due to no HPET. Force enable with 'cpuidle'.
(XEN) ACPI sleep modes: S3
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Xen kernel: 64-bit, lsb, compat32
(XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x2060000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN) Dom0 alloc.: 0000000218000000->000000021c000000 (1978550 pages to be allocated)
(XEN) Init. ramdisk: 00000002258bc000->0000000227fff200
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN) Loaded kernel: ffffffff81000000->ffffffff82060000
(XEN) Init. ramdisk: ffffffff82060000->ffffffff847a3200
(XEN) Phys-Mach map: ffffffff847a4000->ffffffff856effd0
(XEN) Start info: ffffffff856f0000->ffffffff856f04b4
(XEN) Page tables: ffffffff856f1000->ffffffff85720000
(XEN) Boot stack: ffffffff85720000->ffffffff85721000
(XEN) TOTAL: ffffffff80000000->ffffffff85800000
(XEN) ENTRY ADDRESS: ffffffff81cfc200
(XEN) Dom0 has maximum 2 VCPUs
(XEN) Scrubbing Free RAM: .done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 216kB init memory.
mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) mm.c:825:d0 Non-privileged (0) attempt to map I/O space 00228000
(XEN) mm.c:1222:d0 Failure in alloc_l1_table: entry 0
(XEN) mm.c:2177:d0 Error while validating mfn 81de (pfn 1e97d8) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:2985:d0 Error while pinning mfn 81de
(XEN) traps.c:451:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.1.2 x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e033:[<ffffffff816415e1>]
(XEN) RFLAGS: 0000000000000282 EM: 1 CONTEXT: pv guest
(XEN) rax: 00000000ffffffea rbx: 00000000001e97d8 rcx: 0000000000000007
(XEN) rdx: 0000000000000000 rsi: 0000000000000001 rdi: ffffffff81c01c30
(XEN) rbp: ffffffff81c01c48 rsp: ffffffff81c01be8 r8: 00003ffffffff000
(XEN) r9: ffff880000000000 r10: 0000000000007ff0 r11: 0000000000000000
(XEN) r12: 0000000228000000 r13: 0000000000000000 r14: 0000000000000000
(XEN) r15: 0000000000000140 cr0: 000000008005003b cr4: 00000000000026f0
(XEN) cr3: 0000000219c05000 cr2: 0000000000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
(XEN) Guest stack trace from rsp=ffffffff81c01be8:
(XEN) 0000000000000007 0000000000000000 ffffffff816415e1 000000010000e030
(XEN) 0000000000010082 ffffffff81c01c28 000000000000e02b ffffffff816415dd
(XEN) ffffffff81c01c58 0000000100000000 00000000000081de 00000000001e97d8
(XEN) ffffffff81c01c68 ffffffff81d008bf 00000001e97d8000 ffffffffff4b8a00
(XEN) ffffffff81c01d08 ffffffff8163f25d 00000000cf4b9000 ffffffffff4f8000
(XEN) ffff8801e97d8000 80000000000001e3 0000000228200000 ffffffffff4b8000
(XEN) 8000000000000163 000000022c000000 0000000000000140 0000000000000000
(XEN) 0000000000000000 00000001e97d8000 ffffffff81c01ce8 ffffffffff4b8000
(XEN) 0000000228000000 0000000000000000 0000000000000000 0000000000000008
(XEN) ffffffff81c01d98 ffffffff8163f3d0 ffffffff81c01d68 8000000000000163
(XEN) ffff88022c000000 0000000000000000 000000022c000000 ffffffffff478000
(XEN) 0000000000000000 000000022c000000 0000000000000000 000000022c000000
(XEN) ffffffff81c01d78 ffff88022c000000 ffff88022c000000 ffffffffff478000
(XEN) 0000000000000000 000000022c000000 ffffffff81c01e08 ffffffff8163f626
(XEN) 0000000000022000 ffff880228000000 ffff880228000000 0000000000000000
(XEN) 0000000000001000 00000001e97fa000 ffffffff81c01e08 0000000000000000
(XEN) ffffffff81c01e48 ffffffff81c01e48 ffffffff81c01e48 ffffffff81a2bf39
(XEN) ffffffff81c01ed8 ffffffff816240ca 000000205d303030 000000022c000000
(XEN) 0000000000000001 0000000228000000 000000022c000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.

Tags: quantal
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1111470

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: quantal
Revision history for this message
Bob Ball (bob-ball) wrote : Re: Precise kernel not bootable under Xen - alloc_l1_table

Cannot generate log files since the server does not boot.

description: updated
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Bob Ball (bob-ball) wrote :

Set to confirmed as per automated instructions

Bob Ball (bob-ball)
description: updated
Revision history for this message
Stefan Bader (smb) wrote :

The serial log only covers the hypervisor start (crash is about when the dom0 kernel would start). So its not clear which precise kernel version is failing. Also is there a previous precise kernel version that did work (which could be selected through grub)? Or was that the initial install?

Revision history for this message
Bob Ball (bob-ball) wrote :

I've updated to clarify it was kernel 3.2.0-36 which is failing (sorry - I tried to set this as the package but it switched to the source package and therefore lost the version information). A colleague reports that 3.2.0-15 is successful

description: updated
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
status: Incomplete → Confirmed
Revision history for this message
Stefan Bader (smb) wrote :

3.2.0-15 would be a version even prior Precise release. I cannot even find a tag with that version. And for a 3.5.0 version number it would also be pre-release (but more likely possible).
I got no problems booting my test machine with that kernel, but that is AMD based. So its not quite a proof.

That 00228000 address from the IO space mapping error looks a bit related to the border of the last two E820 ranges. There is also this warning about MTRR ranges not covering all memory and ram being clamped down in some way... Seems all pointing to some odd memory setup issue. Maybe you could try to add "dom0_mem=512M,max:512M" as a test...

Revision history for this message
Bob Ball (bob-ball) wrote :

I know that 3.5.0-36 is a dev kernel - I followed http://packages.qa.ubuntu.com/qatracker/milestones/223/builds/25321/downloads to test it just to try and debug where the issue was.

Very curiously setting an explicit dom0 memory does allow the machine to boot - that was a surprise to me!

I don't know if my colleague is using an explicit dom0 memory allocation - if he is then it's very possible that it's not even a regression and all current/historic kernels may have required this flag. I'll check this tomorrow when my colleague is back.

Revision history for this message
Mike McClurg (mike-mcclurg) wrote :

I'm a collegue of "the collegue" that Bob mentioned. I just logged into "the collegue's" computer, and he's got dom0_mem=2048M in his Xen commandline:

GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=2048M console=com1 com1=38400,8n1 loglvl=all guest_loglvl=all"

He may have set this up before the first time he booted Xen. I've used Xen and Precise before, without setting dom0_mem, and I haven't run into this problem myself. I can't recall the last 3.2 kernel I used that worked, though. Bob, I suppose we could just install the last 10 or so and see if we can track down the fail.

Revision history for this message
Konrad Rzeszutek Wilk (konrad-wilk) wrote :

To get a better idea of what is wrong can you run a test for me please? That is to boot the kernel+Xen without the 'dom0_mem' flag so that it crashes. But lets make sure that on the Linux command line you have: "console=hvc0 earlyprintk=xen debug loglevel=8" and for extra measure on the Xen command line pls also append: "sync_console console=com1 com1=38400,8n1 loglvl=all guest_loglvl=all".

That should pinpoint where we fail.

Lastly, Stefan, is the kernel-debuginfo available somewhere easily? I am mighty curious what EIP: ffffffff816415e1 is? I figured it has to be pin_pagetable_pfn or xen_alloc_pte_init. And that the bug is somewhere in the E820 parsing where it includes 228000 as RAM.

Also, it would be interesting to see if 'e820-mtrr-clip e820-verbose ' on the Xen command line resolves this as well?

Revision history for this message
Stefan Bader (smb) wrote :

Bob, I was mentioning the versions not so much because it might be a development version but because I suspect there is some confusion going on between 3.2.0 and 3.5.0 and their abi versions (the number behind the '-'). Sometimes better to be very anal there in asking to avoid more confusion. ;-)

Bob, Mike, yeah I had been changing my test machine from using dom0_mem to allow it to use all memory but it did still work. I somewhat suspect that this could be related how this machine (or class of machines) set up their BIOS tables.

Konrad, usually those would be available at http://ddebs.ubuntu.com/pool/main/l/linux/. Unfortunately we, err, got a slight problem of occasionally loosing them. Since they are big there is a cleanup task at work. And that is sometimes a bit over-eager. I did a compile with 3.2.0-36.57 (seems that while not looking we went ahead and 3.2.0-37.58 is "current"). Here is the relevant snippet from System.map:

ffffffff81641543 t set_page_prot
ffffffff81641581 t xen_remap_domain_mfn_range.part.21
ffffffff8164158c t pin_pagetable_pfn
ffffffff816415e5 t p2m_top_index.part.3

So it seems to be pin_pagetable_pfn. But I will attach the whole map too. I am hesitating a bit for the other files since they are so big, but if you need anything, let me know.

Revision history for this message
Stefan Bader (smb) wrote :
Revision history for this message
Bob Ball (bob-ball) wrote :
Download full text (13.1 KiB)

Correction from previous (now hidden) comment - sorry, pasted wrong log.

Added the arguments requested by Konrad. If there are other kernels you'd like us to try to pinpoint which kernel introduced the issue, please let me know.

(XEN) Xen version 4.1.2 (Ubuntu 4.1.2-2ubuntu2.5) (<email address hidden>) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) Tue Jan 8 14:07:12 UTC 2013
(XEN) Console output is synchronous.
(XEN) Bootloader: GRUB 1.99-21ubuntu3.7
(XEN) Command line: placeholder loglvl=all guest_loglvl=all com1=115200,8n1 console=com1,vga noreboot sync_console
(XEN) Video information:
(XEN) VGA is text mode 80x25, font 8x16
(XEN) VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN) Found 2 MBR signatures
(XEN) Found 2 EDD information structures
(XEN) WARNING: MTRRs do not cover all of memory.
(XEN) Truncating RAM from 9109504kB to 9043968kB
(XEN) Xen-e820 RAM map:
(XEN) 0000000000000000 - 000000000008f000 (usable)
(XEN) 000000000008f000 - 00000000000a0000 (reserved)
(XEN) 00000000000e0000 - 0000000000100000 (reserved)
(XEN) 0000000000100000 - 00000000ce674000 (usable)
(XEN) 00000000ce674000 - 00000000ce6cc000 (ACPI NVS)
(XEN) 00000000ce6cc000 - 00000000cf5fb000 (usable)
(XEN) 00000000cf5fb000 - 00000000cf608000 (reserved)
(XEN) 00000000cf608000 - 00000000cf6a6000 (usable)
(XEN) 00000000cf6a6000 - 00000000cf6ab000 (ACPI data)
(XEN) 00000000cf6ab000 - 00000000cf6f2000 (ACPI NVS)
(XEN) 00000000cf6f2000 - 00000000cf6f3000 (usable)
(XEN) 00000000cf6f3000 - 00000000cf6ff000 (ACPI data)
(XEN) 00000000cf6ff000 - 00000000cf700000 (usable)
(XEN) 00000000cf700000 - 00000000d0000000 (reserved)
(XEN) 00000000fff00000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 0000000228000000 (usable)
(XEN) 0000000228000000 - 000000022c000000 (unusable)
(XEN) ACPI: RSDP 000FE020, 0014 (r0 INTEL )
(XEN) ACPI: RSDT CF6FD038, 004C (r1 INTEL DG965SS 685 1000013)
(XEN) ACPI: FACP CF6FC000, 0074 (r1 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: DSDT CF6F7000, 40E9 (r1 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: FACS CF6AB000, 0040
(XEN) ACPI: APIC CF6F6000, 0078 (r1 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: WDDT CF6F5000, 0040 (r1 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: MCFG CF6F4000, 003C (r1 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: ASF! CF6F3000, 00A6 (r32 INTEL DG965SS 685 MSFT 1000013)
(XEN) ACPI: SSDT CF6AA000, 01BC (r1 INTEL CpuPm 685 MSFT 1000013)
(XEN) ACPI: SSDT CF6A9000, 0175 (r1 INTEL Cpu0Ist 685 MSFT 1000013)
(XEN) ACPI: SSDT CF6A8000, 0175 (r1 INTEL Cpu1Ist 685 MSFT 1000013)
(XEN) ACPI: SSDT CF6A7000, 0175 (r1 INTEL Cpu2Ist 685 MSFT 1000013)
(XEN) ACPI: SSDT CF6A6000, 0175 (r1 INTEL Cpu3Ist 685 MSFT 1000013)
(XEN) System RAM: 8053MB (8247112kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-0000000228000000
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 000fe200
(XEN) DMI 2.4 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x408
(XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[404,0], pm1x_evt[400,0]
(XEN) ACPI: wak...

Revision history for this message
Stefan Bader (smb) wrote :

Oh, right, so it tries to actually map the last range (which is marked unusable in E820):

(XEN) 0000000228000000 - 000000022c000000 (unusable)
...
[ 0.000000] init_memory_mapping: 0000000228000000-000000022c000000
[ 0.000000] 0228000000 - 022c000000 page 4k
[ 0.000000] kernel direct mapping tables up to 22c000000 @ 1e97d8000-1e97fa000

Now why would it do that...

Revision history for this message
Konrad Rzeszutek Wilk (konrad-wilk) wrote :

Stefan, so your gut feeling about the 228000 was right.

he E820_UNUSABLE regions are memory that "can" be used (and if you look in 'xen_memory_setup' that is how we set aside the memory for the balloon region - look for the 'type = E820_UNUSABLE). So by that logic, E820_UNUSABLE region should get the same treatment as the rest of the E820_RAM memory.

So this "hack"
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 8971a26..e8172bf 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -396,6 +396,7 @@ char * __init xen_memory_setup(void)
      extra_pages);
  i = 0;
  while (i < memmap.nr_entries) {
+ bool fix_unusable = true;
   u64 addr = map[i].addr;
   u64 size = map[i].size;
   u32 type = map[i].type;
@@ -407,9 +408,16 @@ char * __init xen_memory_setup(void)
     size = min(size, (u64)extra_pages * PAGE_SIZE);
     extra_pages -= size / PAGE_SIZE;
     xen_add_extra_mem(addr, size);
- } else
+ } else {
     type = E820_UNUSABLE;
+ fix_unusable = false;
+ }
   }
+ /*
+ * Not sure about this.
+ */
+ if (type == E820_UNUSABLE && fix_unusable)
+ type = E820_RESERVED;

   xen_align_and_add_e820_region(addr, size, type);

Would potentially fix it. But I am not sure what are the other cases where:
 a) It is OK to ignore E820_UNUSABLE altogether as provided by the hypervisor. Are there legitimate reasons for the BIOS to mark those as E820_UNUSABLE? Perhaps memory hotplug? (Jinsong from Intel could help answer that).
 b) Other? Perhaps the fix is in the hypervisor by clipping said memory completely out of the E820? In other words as if it had run with the 'mem=X' and it is oblivious to the non-MTRR region. But what if the MTRR region lies right in smack of other regions (like https://lkml.org/lkml/2012/8/24/474). The choice there would be to remove the E820 completly (but then we would think it is a PCI region, which might be OK or not - but this reminds me of http://lists.xen.org/archives/html/xen-devel/2011-02/msg01238.html where "gaps" are considered as I/O regions and could end up with the intel-agp trying to use it as its "flush" region).
c) Just leave it is as and document users to use 'dom0_mem=max' ? Perhaps we should codify it then? So if we detect the MTRR invalid regions we automatically set the dom0 maxpages as if 'dom0_mem=max:<up to MTRR>' was done? In reality that is what the hypervisor is doing - it will ignore those regions altogether - it is our misfortunate that we treat the region as if it was RAM (which actually is the right *thing*).

Thoughts?

Revision history for this message
Stefan Bader (smb) wrote :

So I found the following definition about the E820 address ranges in the ACPI 3.0a spec (section 14, table 14-1):

AddressRangeUnusable: "This range of address contains memory in which errors have been detected. This range must not be used by the OSPM."

That rather sounds like it may not be good to go there. I must admit, I cannot recall having seen "unusable". Rather "reserved". But who really remembers those things...

Revision history for this message
Bob Ball (bob-ball) wrote :

Do you want me to try the above "hack" as a patch to confirm that it works? I wasn't sure from the comments if this is a proposed fix or not!

Revision history for this message
Stefan Bader (smb) wrote :

Ok, I found the messages about trunkating memory in the hypervisor code. And when that happens, as Konrad mentioned, the range can be set to unusable by the hypervisor. It would be good to have a glimpse on the dmesg from a non-xen boot with the same kernel on the same machine. Just to see the E820 layout there and that should also show mtrr coverage.

Revision history for this message
Stefan Bader (smb) wrote :

Bob, sorry, seems you commented while I was typing up mine. I would wait with the hack. We are both not sure whether this is the right way to go.

Revision history for this message
Bob Ball (bob-ball) wrote :

Interesting bit of the normal dmesg boot looks to be:

[ 0.000000] mtrr_cleanup: can not find optimal value
[ 0.000000] please specify mtrr_gran_size/mtrr_chunk_size
[ 0.000000] e820: update [mem 0xcf700000-0xffffffff] usable ==> reserved
[ 0.000000] e820: update [mem 0x228000000-0x22bffffff] usable ==> reserved
[ 0.000000] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 64MB of RAM.
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: at /build/buildd/linux-lts-quantal-3.5.0/arch/x86/kernel/cpu/mtrr/cleanup.c:971 mtrr_trim_uncached_memory+0x299/0x2c0()
[ 0.000000] Hardware name:
[ 0.000000] Modules linked in:
[ 0.000000] Pid: 0, comm: swapper Not tainted 3.5.0-22-generic #34~precise1-Ubuntu
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff81052c9f>] warn_slowpath_common+0x7f/0xc0
[ 0.000000] [<ffffffff81052cfa>] warn_slowpath_null+0x1a/0x20
[ 0.000000] [<ffffffff81d002e7>] mtrr_trim_uncached_memory+0x299/0x2c0
[ 0.000000] [<ffffffff81cfa02b>] setup_arch+0x499/0x821
[ 0.000000] [<ffffffff81685ba6>] ? printk+0x61/0x63
[ 0.000000] [<ffffffff81cf3954>] start_kernel+0xd4/0x3d2
[ 0.000000] [<ffffffff81cf3397>] x86_64_start_reservations+0x131/0x135
[ 0.000000] [<ffffffff81cf3120>] ? early_idt_handlers+0x120/0x120
[ 0.000000] [<ffffffff81cf3468>] x86_64_start_kernel+0xcd/0xdc
[ 0.000000] ---[ end trace 056f32e7b63342ac ]---

I've attached the full dmesg incase there are other things that you want to see.

Revision history for this message
Stefan Bader (smb) wrote :

Oh yes, so this machine has a bit of a complicated usable/reserved pattern. Which cannot be represented with only the 8 variable MTRR registers. The Linux kernel seems to give up and make it reserved, while Xen tries to use it because it gets marked unusable which is actually used as "usable for guest memory" somehow...

Hm Konrad, I wonder, Xen has detected the same and that sets it to unusable, but then unusable seems to be used in a different meaning. Should those cases maybe use reserved, too?

Revision history for this message
Stefan Bader (smb) wrote :

Ok, lets try this (http://people.canonical.com/~smb/lp1111470/) modified hypervisor. For now I was a bit conservative and only modified things in a way that will cause the clipped (not covered by mtrr) area to be removed from e820 instead of changing the type to unusable. If that does work for the affected machine, I take the discussion about it upstream.

Revision history for this message
Stefan Bader (smb) wrote :

Any news?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Bob Ball (bob-ball) wrote :

I can confirm it's bootable with the patch.

Many thanks Stefan, and sorry for the delay in responding to the last message!

Revision history for this message
Stefan Bader (smb) wrote :

Thanks Bob. Now, lets see how proposal goes forth and back.

affects: linux (Ubuntu) → xen (Ubuntu)
Changed in xen (Ubuntu):
status: Incomplete → Triaged
Changed in xen (Ubuntu Precise):
importance: Undecided → Medium
Changed in xen (Ubuntu Quantal):
importance: Undecided → Medium
Changed in xen (Ubuntu Precise):
status: New → Triaged
Changed in xen (Ubuntu Quantal):
status: New → Triaged
Changed in xen (Ubuntu):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
Stefan Bader (smb)
summary: - Precise kernel not bootable under Xen - alloc_l1_table
+ Crash in dom0 when accessing clipped RAM
Revision history for this message
Stefan Bader (smb) wrote :

In upstream discussion this seems to be seen as something that should rather get fixed in the kernel and not the hypervisor. I will see what ideas come up there and get back with an updated change to test. Stay tuned...

Revision history for this message
Stefan Bader (smb) wrote :

One thing for clarification. When using the unpatched xen-hypervisor and a recent kernel (like v3.8) from http://kernel.ubuntu.com/~kernel-ppa/mainline/ and no dom0_mem option on the affected machine. Does that still crash the same?

Revision history for this message
Bob Ball (bob-ball) wrote :

We've already identified that a quantal development kernel (linux-image-3.5.0-22-generic) works with no dom0_mem option, so I imagine that the latest kernel would also work - is there a particular version that you'd like me to try?

If xen upstream thinks the kernel should be fixed, then is question whether we want to isolate+backport the fix which is already in the quantal development kernel?

Revision history for this message
Stefan Bader (smb) wrote :

The question can clearly answered with yes. The slight confusion was that during all the comments it was not clear to me whether the Quantal kernel was indeed used without or with the dom0_mem option. I took Mike's comment as if it was only *with* this option. Personally I find the usage of "unusable" for used memory slightly confusing but it would then not be the issue.

So if Quantal works even without that option, could you attach the dmesg of a Xen boot on that machine. Probably the difference in output helps to find out which changes make it work.

Revision history for this message
Bob Ball (bob-ball) wrote :

Okay - I wasn't sure that it was so clear! I thought there was a plan to backport the quantal kernel to precise as the LTS, but anyway!

See attached for the xen dmesg and host dmesg from a successful boot with original Xen and quantal kernel.

Revision history for this message
Bob Ball (bob-ball) wrote :

And the xen dmesg

Revision history for this message
Stefan Bader (smb) wrote :

The LTS backport kernels are mainly hardware enablement options. But the Precise kernel is still a supported option, so if possible we want bugs fixed (if possible == if not requiring half of the next kernel backported).

OK, so that range appears in the 1-1 mappings but seems not to be freed and the max_pfn later is below that range. Not sure yet what that exactly means but at least gives some better idea where to look.

Revision history for this message
Stefan Bader (smb) wrote :

Hm, one change to think of would be 2e2fb75475c2fc74c98100f1468c8195fee49f3b
  xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM

That would be in with v3.5-rc3. A final proof usually needs to bisect it. To narrow down the attempts there you could try various kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/. If it really was the change above, then a v3.5-rc2 would crash and a -rc3 would boot. Though usually one is not so lucky and the fix is in a -rc1, that is most things... :)

Revision history for this message
Bob Ball (bob-ball) wrote :

Unfortunately -rc1, -rc2 and -rc3 all boot on my machine... Sorry!

Revision history for this message
Stefan Bader (smb) wrote :

So the fix is rather somewhere before... still has v3,3 and v3.4 at least...

Revision history for this message
Bob Ball (bob-ball) wrote :

3.3 and 3.4 both boot... Am trying some earlier kernels

Revision history for this message
Bob Ball (bob-ball) wrote :

From http://kernel.ubuntu.com/~kernel-ppa/mainline/ v3-3-precise boots but v3-2-39-precise does not.

Revision history for this message
Stefan Bader (smb) wrote :

Ok, so something between 3.2 and 3.3 which obviously has not been considered as stable material. As said, it usually turns out that 3.3-rc1 is ok, too. I'd love to be wrong there but...

Revision history for this message
Stefan Bader (smb) wrote :

I went through changes between 3.2 and 3.3, but from the descriptions nothing was really jumping out. So I fear the only way to isolate it will be a bisect. It will be taking its time since usually there are about 10-12 iterations if 3.3-rc1 is the first one working. Might be a bit less if it would be a later rc.
Bob, if you have the time to try kernels, let me know (after finding the first good 3.3-rc) and I would prepare further kernels.

Stefan Bader (smb)
Changed in xen (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Rolf Leggewie (r0lf) wrote :

quantal has seen the end of its life and is no longer receiving any updates. Marking the quantal task for this ticket as "Won't Fix".

Changed in xen (Ubuntu Quantal):
status: Triaged → Won't Fix
Revision history for this message
Steve Langasek (vorlon) wrote :

The Precise Pangolin has reached end of life, so this bug will not be fixed for that release

Changed in xen (Ubuntu Precise):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.