Bug #854829 “[Oneiric] Xen HVM: transmit timeouts on emulated 81...” : Bugs : xen package : Ubuntu

Stefan Bader (smb) on 2011-09-20

Changed in xen (Ubuntu):
assignee:	nobody → Stefan Bader (stefan-bader-canonical)
importance:	Undecided → Medium
status:	New → Confirmed

Revision history for this message

Stefan Bader (smb) wrote on 2011-09-23:

#1

Testing various emulated devices shows:
- ne2k_pci gets the tx problems almost immediatly
- 8139cp on any slightly higher traffic
- e1000 seems to work

When pirq's are not used, 8139cp also works again. So there seems to be a subtle bug in the delivery of those.

Dave Walker (davewalker) on 2011-09-27

tags:

added: server-o-ro

Revision history for this message

Stefan Bader (smb) wrote on 2011-10-04:

#2

Download full text (3.3 KiB)

Replicating some information that was sent to the mailing list:

It took quite a bit of time but at least I got some hopefully useful information
now. So in general, whenever an interrupt is asserted,
the hypervisor runs through this:

__hvm_pci_intx_assert:
  when assert count was 0 before incrementing
    call assert_gsi
      call send_guest_pirq (when hvm uses pirq)

In the send_guest_pirq chain is a call to evtchn_set_pending which tests as one
of the first actions whether evtchn_pending in the shared_info is set. If that
is the case the call immediately returns with 1.

Adding printks to call_assert_gsi, I noticed that
- When things stop working, the last call to send_guest_pirq returned 1.
- But not every time the return code is one, the stall happens.
- e1000 also has cases where send_guest_pirq returns 1 but they happen much
less often (than using the 8139cp).

Usually every intx_assert has a intx_deassert call that follows. when the stall
occurs, this does not happen. Right here I got some troubles to understand where
this intx_deassert is actually triggered. With an added WARN_ON the stack traces
seem odd, like this:

(XEN) [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130
(XEN) [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60
(XEN) [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60
(XEN) [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220
(XEN) [<ffff82c48017dba6>] copy_from_user+0x26/0x90
(XEN) [<ffff82c4801f9446>] do_iret+0xb6/0x1a0
(XEN) [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d

Not really sure how one gets from do_update_descriptor to do_hvm_op and the only
thing in there which does the deassert is some irq level setting.

Actually the guest does not really do much do EOI (which I had been assuming).
But since domain_pirq_to_irq maps to 0 for emuirqs, the call to
PHYSDEVOP_irq_status_query will hit the following and not set the flag for
needing EOI.

        irq_status_query.flags = 0;
        if ( is_hvm_domain(v->domain) &&
             domain_pirq_to_irq(v->domain, irq) <= 0 )
        {
            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
            break;
        }

So all the guest is doing is to clear evtchn_pending in the pirq EOI function. I
fail to understand what actually is doing the hvm_pci_intx_deassert calls but
the way the fasteoi code in the guest looks to be working, there seems to be
some gap between calling the handler and the eoi function... So from what I see,
I would assume the following:

dom0 domU
- intx_assert (count 0->1)
- send_guest_pirq = 0
  (evtchn_pending = 1)
                                         - upcall starts fasteoi handler
- something does intx_deassert
  (count 1->0)
- intx_assert (count 0->1)
- send_guest_pirq = 1
  (evtchn_pending still set)
                                         - handler->eoi sets evtchn to 0 but
                                           otherwise does nothing
- there is no intx_deassert, so even
  when another intx_assert would happen
  (which does not seem to be the case)
  no further send_guest_pirq would be
  called.

Unfortunately I do miss some details on the inner wo...

Replicating some information that was sent to the mailing list:

It took quite a bit of time but at least I got some hopefully useful information
now. So in general, whenever an interrupt is asserted,
the hypervisor runs through this:

__hvm_pci_intx_assert:
  when assert count was 0 before incrementing
    call assert_gsi
      call send_guest_pirq (when hvm uses pirq)

In the send_guest_pirq chain is a call to evtchn_set_pending which tests as one
of the first actions whether evtchn_pending in the shared_info is set. If that
is the case the call immediately returns with 1.

Adding printks to call_assert_gsi, I noticed that
- When things stop working, the last call to send_guest_pirq returned 1.
- But not every time the return code is one, the stall happens.
- e1000 also has cases where send_guest_pirq returns 1 but they happen much
  less often (than using the 8139cp).

Usually every intx_assert has a intx_deassert call that follows. when the stall
occurs, this does not happen. Right here I got some troubles to understand where
this intx_deassert is actually triggered. With an added WARN_ON the stack traces
seem odd, like this:

(XEN)    [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130
(XEN)    [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60
(XEN)    [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60
(XEN)    [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220
(XEN)    [<ffff82c48017dba6>] copy_from_user+0x26/0x90
(XEN)    [<ffff82c4801f9446>] do_iret+0xb6/0x1a0
(XEN)    [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d

Not really sure how one gets from do_update_descriptor to do_hvm_op and the only
thing in there which does the deassert is some irq level setting.

Actually the guest does not really do much do EOI (which I had been assuming).
But since domain_pirq_to_irq maps to 0 for emuirqs, the call to
PHYSDEVOP_irq_status_query will hit the following and not set the flag for
needing EOI.

irq_status_query.flags = 0;
        if ( is_hvm_domain(v->domain) &&
             domain_pirq_to_irq(v->domain, irq) <= 0 )
        {
            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
            break;
        }

So all the guest is doing is to clear evtchn_pending in the pirq EOI function. I
fail to understand what actually is doing the hvm_pci_intx_deassert calls but
the way the fasteoi code in the guest looks to be working, there seems to be
some gap between calling the handler and the eoi function... So from what I see,
I would assume the following:

dom0                                     domU
- intx_assert (count 0->1)
- send_guest_pirq = 0
  (evtchn_pending = 1)
                                         - upcall starts fasteoi handler
- something does intx_deassert
  (count 1->0)
- intx_assert (count 0->1)
- send_guest_pirq = 1
  (evtchn_pending still set)
                                         - handler->eoi sets evtchn to 0 but
                                           otherwise does nothing
- there is no intx_deassert, so even
  when another intx_assert would happen
  (which does not seem to be the case)
  no further send_guest_pirq would be
  called.

Unfortunately I do miss some details on the inner working here. Generally I
wonder whether not setting the needsEOI flag for those pirqs just is the
problem. But it also could be intentional...

Revision history for this message

Stefan Bader (smb) wrote on 2011-10-04:

#3

Stefano Stabellini replied:

Thanks for the very detailed analysis. It seems to me that the problem is that if the interrupt is a level
triggered interrupt when the guest issues an EOI we should be reinjecting the interrupt again if it has been issued a second time in the meantime. However this doesn't happen if the interrupt has been remapped onto an even channel. In that case the guest is not even going to issue an EOI at all.
So I wrote a patch to force the guest to issue EOIs even on remapped irqs; in the hypercall handler we check whether we need to reinject the interrupt and if that is the case we set the corresponding event channel pending.
Could you please try the patch I appended? I haven't been able to reproduce your problem so I am not really sure if it works.

diff -r e042fb60e0ee xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c Thu Sep 29 11:23:01 2011 +0000
+++ b/xen/arch/x86/physdev.c Fri Sep 30 14:01:46 2011 +0000
@@ -11,6 +11,7 @@
#include <asm/current.h>
#include <asm/io_apic.h>
#include <asm/msi.h>
+#include <asm/hvm/irq.h>
#include <asm/hypercall.h>
#include <public/xen.h>
#include <public/physdev.h>
@@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
             pirq_guest_eoi(pirq);
+ if ( is_hvm_domain(v->domain) &&
+ domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
+ {
+ struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
+ int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
+
+ /* if this is a level irq and count > 0, send another
+ * notification */
+ if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
+ && hvm_irq->gsi_assert_count[gsi] )
+ send_guest_pirq(v->domain, pirq);
+ }
         spin_unlock(&v->domain->event_lock);
         ret = 0;
         break;
@@ -327,12 +340,6 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( (irq < 0) || (irq >= v->domain->nr_pirqs) )
             break;
         irq_status_query.flags = 0;
- if ( is_hvm_domain(v->domain) &&
- domain_pirq_to_irq(v->domain, irq) <= 0 )
- {
- ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
- break;
- }

/*
* Even edge-triggered or message-based IRQs can need masking from

Stefano Stabellini replied:

Thanks for the very detailed analysis. It seems to me that the problem is that if the interrupt is a level
triggered interrupt when the guest issues an EOI we should be reinjecting the interrupt again if it has been issued a second time in the meantime. However this doesn't happen if the interrupt has been remapped onto an even channel. In that case the guest is not even going to issue an EOI at all.
So I wrote a patch to force the guest to issue EOIs even on remapped irqs; in the hypercall handler we check whether we need to reinject the interrupt and if that is the case we set the corresponding event channel pending.
Could you please try the patch I appended? I haven't been able to reproduce your problem so I am not really sure if it works.

diff -r e042fb60e0ee xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Thu Sep 29 11:23:01 2011 +0000
+++ b/xen/arch/x86/physdev.c	Fri Sep 30 14:01:46 2011 +0000
@@ -11,6 +11,7 @@
 #include <asm/current.h>
 #include <asm/io_apic.h>
 #include <asm/msi.h>
+#include <asm/hvm/irq.h>
 #include <asm/hypercall.h>
 #include <public/xen.h>
 #include <public/physdev.h>
@@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
             pirq_guest_eoi(pirq);
+        if ( is_hvm_domain(v->domain) &&
+                domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
+        {
+            struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
+            int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
+
+            /* if this is a level irq and count > 0, send another
+             * notification */ 
+            if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
+                    && hvm_irq->gsi_assert_count[gsi] )
+                send_guest_pirq(v->domain, pirq);
+        }
         spin_unlock(&v->domain->event_lock);
         ret = 0;
         break;
@@ -327,12 +340,6 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( (irq < 0) || (irq >= v->domain->nr_pirqs) )
             break;
         irq_status_query.flags = 0;
-        if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
-        {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
-            break;
-        }
 
         /*
          * Even edge-triggered or message-based IRQs can need masking from

Revision history for this message

Stefan Bader (smb) wrote on 2011-10-04:

#4

The patch needed some changes because the code we use in Oneiric is 4.1.1 and is different here. Tested the adapted patch and did not see the network stalls anymore.

Revision history for this message

Stefan Bader (smb) wrote on 2011-10-04:

#5

Minimal fix against Xen 4.1.1 Edit (2.2 KiB, text/plain)

This is the (tested) minimal fix for the xen 4.1.1 hypervisor. The problem with the more complete patch against 4.2 is that it needs to keep the number of "lost" interrupts per pirq. It does that by adding an element to the pirq struct (which does not exist in 4.1.1). So for the backport of that one would need to find a good place to keep those counters. Not sure whether for example the domain info could be extended without causing some bad interaction. I may try that or we just go with the minimal approach (which does not fix a related gap in handling passthrough devices). At least that fixes the more obvious issues.

Revision history for this message

Ubuntu Foundations Team Bug Bot (crichton) wrote on 2011-10-04:

#6

The attachment "Minimal fix against Xen 4.1.1" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags:

added: patch

Revision history for this message

Stefan Bader (smb) wrote on 2011-10-06:

#7

Backport version of upstream patch Edit (6.0 KiB, text/plain)

For completeness this is the backported version of the Xen 4.2 patch. But since it is more complex, the recommendation would be to go with the first (simpler) patch.

Stefan Bader (smb) on 2011-10-06

Changed in xen (Ubuntu):
assignee:	Stefan Bader (stefan-bader-canonical) → Chuck Short (zulcss)
status:	Confirmed → Triaged

Revision history for this message

Martin Pitt (pitti) wrote on 2011-10-11: Please test proposed package

#8

Hello Stefan, or anyone else affected,

Accepted xen into oneiric-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in xen (Ubuntu Oneiric):
status:	Triaged → Fix Committed
tags:	added: verification-needed

Revision history for this message

Stefan Bader (smb) wrote on 2011-10-26:

#9

Tested the proposed package and it does give a working 8139cp connection in HVM mode.

tags:

added: verification-done
removed: verification-needed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-10-27:

#10

This bug was fixed in the package xen - 4.1.1-2ubuntu4.1

---------------
xen (4.1.1-2ubuntu4.1) oneiric-proposed; urgency=low

  * debian/patches/xen-pirq-resubmit-irq.patch: Retrigger
    pirq events when asserted while processing. Thanks to Stefan Bader
    (LP: #854829)
-- Chuck Short <email address hidden> Mon, 10 Oct 2011 19:30:09 -0400

Changed in xen (Ubuntu Oneiric):
status:	Fix Committed → Fix Released

Launchpad Janitor (janitor) on 2011-11-03

Changed in xen (Ubuntu Precise):
status:	New → Fix Released

Ubuntu
xen package

[Oneiric] Xen HVM: transmit timeouts on emulated 8139cp device

Bug Description

Related branches

Other bug subscribers

Patches

Remote bug watches

	Status	Importance	Assigned to
xen (Ubuntu)	Fix Released	Medium	Chuck Short
Oneiric	Fix Released	Medium	Chuck Short
Precise	Fix Released	Undecided	Unassigned

Ubuntuxen package

[Oneiric] Xen HVM: transmit timeouts on emulated 8139cp device

Bug Description

Related branches

Other bug subscribers

Patches

Remote bug watches

Ubuntu
xen package