[Hyper-V] x86/ioapic: Disable interrupts when re-routing legacy IRQs

Bug #1508593 reported by Joshua R. Poulson
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Trusty
Invalid
High
Joseph Salisbury
Vivid
Invalid
High
Joseph Salisbury
Wily
Fix Released
High
Joseph Salisbury

Bug Description

A sporadic hang with consequent crash is observed when booting Hyper-V Gen1 guests...

Sauce request for upstream submission:

https://lkml.org/lkml/2015/10/15/673

From Vitaly Kuznetsov <>
Subject [PATCH] x86/ioapic: Disable interrupts when re-routing legacy IRQs
Date Thu, 15 Oct 2015 19:42:23 +0200

A sporadic hang with consequent crash is observed when booting Hyper-V Gen1
guests:

 Call Trace:
  <IRQ>
  [<ffffffff810ab68d>] ? trace_hardirqs_off+0xd/0x10
  [<ffffffff8107b616>] queue_work_on+0x46/0x90
  [<ffffffff81365696>] ? add_interrupt_randomness+0x176/0x1d0
  ...
  <EOI>
  [<ffffffff81471ddb>] ? _raw_spin_unlock_irqrestore+0x3b/0x60
  [<ffffffff810c295e>] __irq_put_desc_unlock+0x1e/0x40
  [<ffffffff810c5c35>] irq_modify_status+0xb5/0xd0
  [<ffffffff8104adbb>] mp_register_handler+0x4b/0x70
  [<ffffffff8104c55a>] mp_irqdomain_alloc+0x1ea/0x2a0
  [<ffffffff810c7f10>] irq_domain_alloc_irqs_recursive+0x40/0xa0
  [<ffffffff810c860c>] __irq_domain_alloc_irqs+0x13c/0x2b0
  [<ffffffff8104b070>] alloc_isa_irq_from_domain.isra.1+0xc0/0xe0
  [<ffffffff8104bfa5>] mp_map_pin_to_irq+0x165/0x2d0
  [<ffffffff8104c157>] pin_2_irq+0x47/0x80
  [<ffffffff81744253>] setup_IO_APIC+0xfe/0x802
  ...
  [<ffffffff814631c0>] ? rest_init+0x140/0x140
The issue is easily reproducible with a simple instrumentation: if
mdelay(10) is put between mp_setup_entry() and mp_register_handler() calls
in mp_irqdomain_alloc() Hyper-V guest always fails to boot when re-routing
IRQ0. The issue seems to be caused by the fact that we don't disable
interrupts while doing IOPIC programming for legacy IRQs and IRQ0 actually
happens. Decorate manipulations with legacy IRQs with local_irq_save()/
local_irq_restore().

Cc: Thomas Gleixner <email address hidden>
Cc: Ingo Molnar <email address hidden>
Cc: "H. Peter Anvin" <email address hidden>
Cc: Jiang Liu <email address hidden>
Cc: Yinghai Lu <email address hidden>
Cc: K. Y. Srinivasan <email address hidden>
Signed-off-by: Vitaly Kuznetsov <email address hidden>
---
It may make sense to have interrupts disabled for non-legacy IRQs as well
but I'm unaware of any bugs with them at this moment.
---
 arch/x86/kernel/apic/io_apic.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 5c60bb1..9aac777 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2907,6 +2907,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
  struct irq_data *irq_data;
  struct mp_chip_data *data;
  struct irq_alloc_info *info = arg;
+ unsigned long flags = 0;

  if (!info || nr_irqs > 1)
   return -EINVAL;
@@ -2939,11 +2940,16 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,

  cfg = irqd_cfg(irq_data);
  add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin);
+
+ if (virq < nr_legacy_irqs())
+ local_irq_save(flags);
  if (info->ioapic_entry)
   mp_setup_entry(cfg, data, info->ioapic_entry);
  mp_register_handler(virq, data->trigger);
- if (virq < nr_legacy_irqs())
+ if (virq < nr_legacy_irqs()) {
   legacy_pic->mask(virq);
+ local_irq_restore(flags);
+ }

  apic_printk(APIC_VERBOSE, KERN_DEBUG
       "IOAPIC[%d]: Set routing entry (%d-%d -> 0x%x -> IRQ %d Mode:%i Active:%i Dest:%d)\n",
--
2.4.3

Revision history for this message
Joshua R. Poulson (jrp) wrote :

Argh, this should be in linux-kernel, not systemd.

Changed in systemd (Ubuntu):
status: New → Invalid
no longer affects: systemd (Ubuntu)
Joshua R. Poulson (jrp)
Changed in linux-kernel (Ubuntu):
status: New → Confirmed
affects: linux-kernel (Ubuntu) → linux (Ubuntu)
penalvch (penalvch)
Changed in linux (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
tags: added: kernel-hyper-v
tags: added: trusty vivid wily
Changed in linux (Ubuntu Trusty):
status: New → In Progress
Changed in linux (Ubuntu Vivid):
status: New → In Progress
Changed in linux (Ubuntu Wily):
status: New → In Progress
Changed in linux (Ubuntu Trusty):
importance: Undecided → High
Changed in linux (Ubuntu Vivid):
importance: Undecided → High
Changed in linux (Ubuntu Wily):
importance: Undecided → High
Changed in linux (Ubuntu Trusty):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Vivid):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Wily):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Wily test kernel with a cherry-pick of the following commit, which is now in upstream as of v4.3-rc7:

c0ff971 x86/ioapic: Disable interrupts when re-routing legacy IRQs

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1508593/wily/

Can you test this kernel and see if it resolves this bug?

Also, do you know if this commit is also needed in Vivid and Trusty? If it is, there will be several prerequisite commits required for each.

Changed in linux (Ubuntu Vivid):
status: In Progress → Incomplete
Changed in linux (Ubuntu Wily):
status: In Progress → Incomplete
status: Incomplete → In Progress
Changed in linux (Ubuntu Trusty):
status: In Progress → Incomplete
Revision history for this message
Joshua R. Poulson (jrp) wrote :

Yes, we need this for Vivid and Trusty. Thanks!

Changed in linux (Ubuntu Vivid):
status: Incomplete → In Progress
Changed in linux (Ubuntu Trusty):
status: Incomplete → In Progress
Revision history for this message
Paula Crismaru (pcrismaru) wrote :

I tested kernel #2 on Wily. I attached 3 legacy network adapters and all of them got IP addresses, so the changes are ok.

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Wily):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-wily
Revision history for this message
Joshua R. Poulson (jrp) wrote :

We'll test the kernel in -proposed right away. Thanks!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Have you had a chance to test -proposed as of yet so we can mark the bug as 'verification-done-wily'?

Revision history for this message
Joshua R. Poulson (jrp) wrote :

In progress. Sorry, there were a lot of -proposed requests this week.

Changed in linux (Ubuntu):
status: Triaged → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Chris Valean (cvalean)
tags: added: verification-done-wily
removed: verification-needed-wily
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

After working on the backport to Vivid, I'm not sure if the patch(Commit c0ff971) requested in this bug is applicable to Vivid and Trusty.

Commit c0ff971 disables interrupts when re-routing legacy IRQs. However, the functions affected by commit c0ff971 were not added until v4.2-rc1 by commit:
'49c7e60 x86/irq: Implement callbacks to enable hierarchical irqdomains on IOAPICs'

Heirarchical irqdomains were not added until v4.2-rc1 as well, by commit:
'b5dc8e6 x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors'

The reproducer listed in the description cannot be attempted since mp_setup_entry() and mp_register_handler() dont exist in the 3.19 or 3.13 kernels since they were added in 4.2-rc1 by commit 49c7e60.

Is there a way to confirm and test that commit c0ff971 is needed in Vivid and Trusty?

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.2.0-21.25

---------------
linux (4.2.0-21.25) wily; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1522108

  [ Upstream Kernel Changes ]

  * staging/dgnc: fix info leak in ioctl
    - LP: #1509565
    - CVE-2015-7885
  * [media] media/vivid-osd: fix info leak in ioctl
    - LP: #1509564
    - CVE-2015-7884
  * KEYS: Fix race between key destruction and finding a keyring by name
    - LP: #1508856
    - CVE-2015-7872
  * KEYS: Fix crash when attempt to garbage collect an uninstantiated
    keyring
    - LP: #1508856
    - CVE-2015-7872
  * KEYS: Don't permit request_key() to construct a new keyring
    - LP: #1508856
    - CVE-2015-7872
  * isdn_ppp: Add checks for allocation failure in isdn_ppp_open()
    - LP: #1508329
    - CVE-2015-7799
  * ppp, slip: Validate VJ compression slot parameters completely
    - LP: #1508329
    - CVE-2015-7799

linux (4.2.0-20.24) wily; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1521753

  [ Andy Whitcroft ]

  * [Tests] gcc-multilib does not exist on ppc64el
    - LP: #1515541

  [ Joseph Salisbury ]

  * SAUCE: scsi_sysfs: protect against double execution of
    __scsi_remove_device()
    - LP: #1509029

  [ Manoj Kumar ]

  * SAUCE: (noup) cxlflash: Fix to escalate LINK_RESET also on port 1
    - LP: #1513583

  [ Matthew R. Ochs ]

  * SAUCE: (noup) cxlflash: Fix to avoid virtual LUN failover failure
    - LP: #1513583

  [ Oren Givon ]

  * SAUCE: (noup) iwlwifi: Add new PCI IDs for the 8260 series
    - LP: #1517375

  [ Seth Forshee ]

  * [Config] CONFIG_DRM_AMDGPU_CIK=n
    - LP: #1510405

  [ Upstream Kernel Changes ]

  * net/mlx5e: Disable VLAN filter in promiscuous mode
    - LP: #1514861
  * drivers: net: xgene: fix RGMII 10/100Mb mode
    - LP: #1433290
  * HID: rmi: Disable scanning if the device is not a wake source
    - LP: #1515503
  * HID: rmi: Set F01 interrupt enable register when not set
    - LP: #1515503
  * net/mlx5e: Ethtool link speed setting fixes
    - LP: #1517919
  * scsi_scan: don't dump trace when scsi_prep_async_scan() is called twice
    - LP: #1517942
  * x86/ioapic: Disable interrupts when re-routing legacy IRQs
    - LP: #1508593
  * xhci: Workaround to get Intel xHCI reset working more reliably
  * megaraid_sas: Do not use PAGE_SIZE for max_sectors
    - LP: #1475166
  * net: usb: cdc_ether: add Dell DW5580 as a mobile broadband adapter
    - LP: #1513847
  * KVM: svm: unconditionally intercept #DB
    - LP: #1520184
    - CVE-2015-8104

 -- Luis Henriques <email address hidden> Wed, 02 Dec 2015 17:30:58 +0000

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Trusty):
status: In Progress → Invalid
Changed in linux (Ubuntu Vivid):
status: In Progress → Invalid
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.