APIC Latency causing BSOD.

Bug #692570 reported by La' Maze Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Invalid
Undecided
Unassigned

Bug Description

I have a Proxmox Server with the following specs:

Version:

pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-28
pve-kernel-2.6.32-4-pve: 2.6.32-28
qemu-server: 1.1-25
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-9
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-2
ksm-control-daemon: 1.0-4

VM Configuration:

name: TS64
ide2: none,media=cdrom
bootdisk: ide0
ostype: w2k3
ide0: data:vm-104-disk-1
memory: 10240
sockets: 1
vlan0: virtio=C6:4C:B3:BB:AD:67
onboot: 1
cores: 4
boot: cad
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1

CPU 2x Xeon Quad Core E5620 2.4GHZ Processors:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
stepping : 2
cpu MHz : 2400.323
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 9
cpu cores : 4
apicid : 19
initial apicid : 19
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid
bogomips : 4800.19
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

Performance:

CPU BOGOMIPS: 76803.21
REGEX/SECOND: 850066
HD SIZE: 33.96 GB (/dev/mapper/pve-root)
BUFFERED READS: 333.03 MB/sec
AVERAGE SEEK TIME: 6.10 ms
FSYNCS/SECOND: 2948.85
DNS EXT: 131.42 ms
DNS INT: 1.28 ms

I've been successfully running 2 Windows 2003 32-Bit Standard Edition Servers on this server for over a month now. Both were migrations from actual physical servers. However, I've continued to receive random crashes on a Windows 2003 64-bit standard edition running terminal services, which was a fresh install. The server runs fine for hours under a decent load (20 users) and then will crash with a 3B bug check code (System_Service_Exception). I opened a ticket with Microsoft and submitted multiple memory dumps and their engineers suggested the following:

Dump Analyses Result:
===================

What happened is that the OS initiated an APIC /software interrupt. This is handled by the APIC in real hardware. In your Virtual Environment case where you are using Linux based VM – Proxmox, the VM implementation somehow has to make it happen on a virtual machine with the same latency in the virtual APIC. The problem is that there is a latency between when it was initiated and when it happened.

Below are the details for understanding the process or concept of APIC interrupts:

What the Local APIC Is
The Local APIC (LAPIC) is a circuit that is part of the CPU chip. It contains these basic elements:
A mechanism for generating
1. interrupts
2. A mechanism for accepting interrupts
3. A timer

If you have a multiprocessor system, the APIC's are wired together so they can communicate. So the LAPIC on CPU 0 can communicate with the LAPIC on CPU 1, etc.

What the IO APIC Is

This is a separate chip that is wired to the Local APIC's so it can forward interrupts on to the CPU chips. It is programmed similar to the 8259's but has more flexibility.
It is wired to the same bus as the Local APIC's so it can communicate with them.

Note:- In our scenario, it’s all Virtualized interrupts or calls because of hypervisor in picture and thus we need to contact the VM application vendor to get a check of this latency issue in APIC interrupt.
------------------------------------------------End of Message----------------------------------

Their engineers are saying that there is a latency issue with APIC. I'm not exactly sure how this can be corrected. Is this a known issue and is their a solution to this problem. I love Proxmox, but my main reason for using it was to upgrade my terminal server to better hardware, while leveraging it for other virtual machines as well. The forum administrator for Proxmox, suggested I bring this issue to your attention.

Revision history for this message
Thomas Huth (th-huth) wrote :

Since there hasn't been an answer to this within the last years, it looks like nobody here knows about Proxmox problems. So let's close this ticket...

Changed in qemu:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.