No network connectivity for new domU's (No available IRQ to bind to: increase NR_DYNIRQS)

Bug #341846 reported by Wido den Hollander
4
Affects Status Importance Assigned to Milestone
xen-3.2 (Fedora)
Fix Released
High
xen-3.2 (Ubuntu)
New
Undecided
Unassigned
xen-meta (Ubuntu)
New
Undecided
Unassigned

Bug Description

I have a Ubuntu 8.04 server running with Xen 3.2

Linux vps-pool-01.xen.pcextreme.nl 2.6.24-22-xen #1 SMP Mon Nov 24 21:35:54 UTC 2008 x86_64 GNU/Linux

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=8.04
DISTRIB_CODENAME=hardy
DISTRIB_DESCRIPTION="Ubuntu 8.04.2"

The machine is pretty powerfull:
* Dual CPU Quad Core Xeon
* 64GB FB-Dimm DDR2
* Dual Intel Corporation 80003ES2LAN Gbit Ethernet

On the machine are 78 domU's running and they run fine, but when starting domU number 79 it boots fine, but the network does not come up.

The network looks like:
* eth0: untagged VLAN connected to private LAN
* eth0.711: tagged VLAN for management
* eth0.710 <> vlanbr710, connectivity for domU's

A snippet of "brctl show"

bridge name bridge id STP enabled interfaces
vlanbr710 8000.feffffffffff no eth0.710
                                                        vps01
                                                        vps02
                                                        vps33
                                                        vps04
                                                        vps05

What caught my attention is that the bridge of the newest domU drops all the traffic:

vps78 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:31755 overruns:0 carrier:0
          collisions:0 txqueuelen:32
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

When i do a tcpdump in the domU i see no traffic coming into the domU and as you can see, the RX/TX bytes stays at 0.0B

All the new domU's i create don't have network access, but the running domU's are working fine.

I found a post on the Xen mailinglist: http://www.nabble.com/XEN-3.1:-critical-bug:-vif-init-failure-after-creating-15-17-VMs-(XENBUS:-Timeout-connecting-to-device:-device-vif)-td11571794.html

On that post the bug seemed fixed in the upstream code.

Since this machine is in production at the moment i can't reboot it or build some new kernels.

Does anyone know how to fix this?

Revision history for this message
In , Bryn (bryn-redhat-bugs) wrote :

Description of problem:
The number of Xen domains that can be started is determined in part by the
number of available dynamic IRQs and the number of IRQs used by each guest. This
is limited by the compile time constant NR_DYNIRQS:

#define NR_DYNIRQS 256

When this number is exceeded, find_unbound_irq() will fail and panic the system:

+static int find_unbound_irq(void)
+{
+ int irq;
+
+ /* Only allocate from dynirq range */
+ for (irq = DYNIRQ_BASE; irq < NR_IRQS; irq++)
+ if (irq_bindcount[irq] == 0)
+ break;
+
+ if (irq == NR_IRQS)
+ panic("No available IRQ to bind to: increase NR_IRQS!n");
+
+ return irq;
+}

With typical guests needing a minimum of two interrupts this places an upper
bound on the number of guests that can be created.

Version-Release number of selected component (if applicable):
2.6.18-86.el5xen

How reproducible:
100%

Steps to Reproduce:
1. Boot a xen dom0
2. Configure a large number of guests
3. Start booting guests one at a time

Actual results:
Eventually (assuming sufficient memory / I/O resources are available) the Dom0
guest will panic:

Kernel panic - not syncing: No available IRQ to bind to: increase NR_IRQS!
(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.

Expected results:
No panic.

Additional info:

Revision history for this message
In , Bryn (bryn-redhat-bugs) wrote :
Revision history for this message
In , Chris (chris-redhat-bugs) wrote :

OK. I briefly took a look at this. Upstream xen has since changed it so that
if you run out of IRQs, you don't panic; this was put into xen-unstable c/s
12790. I think we should definitely take that patch.

In the thread mentioned in Comment #2, Keir said that it would be nice to
allocate IRQs dynamically, make it a config option, or have a boot option that
you could pass to increase the number. I think allocating the IRQs dynamically
is going to be a non starter, since it would likely require changes to the IRQ
code that Xen shares with the bare metal kernel. So that leaves us with 2 options:

1. Have a boot time option that allows you to increase the number of IRQs at
boot time.

2. Just increase NR_DYNIRQS

I like option 2, since it is a better user experience, but we can consider 1 as
well.

Chris Lalancette

Revision history for this message
In , Don (don-redhat-bugs) wrote :

adding to RHEL5.2 release notes updates:

<quote>
    * dom0 has a system-wide IRQ (interrupt request line) limit of 256, which is
consumed as follows:
          o 3 per physical CPU.
          o 1 per guest device (i.e. NIC or block device)

      When the IRQ limit is reached, the system will crash. As such, check your
IRQ consumption to make sure that the number of guests you create (and their
respective block devices) do not exhaust the IRQ limit.
</quote>

please advise if any further revisions are required. thanks!

Revision history for this message
In , Issue (issue-redhat-bugs) wrote :

----- Additional Comments From <email address hidden> 2008-04-28 11:09 EDT
-------
Should there be something added to let the user know how to check their
used
Dynamic IRQs? I worry that without this the user might not know how to
determine the number of Dynamic IRQs they have used. I ran this in dom0
on a
blade with 2 guests running:

[root@host ~]# grep Dynamic-irq /proc/interrupts | wc -l
30

This event sent from IssueTracker by jkachuck
 issue 173656

Revision history for this message
In , Don (don-redhat-bugs) wrote :

thanks, appending to note:

<quote>
To determine how many IRQs you are currently consuming, run the command grep
Dynamic-irq /proc/interrupts | wc -l.
</quote>

please advise if any further revisions are required. thanks!

Revision history for this message
In , RHEL (rhel-redhat-bugs) wrote :

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
release.

Revision history for this message
In , Ryan (ryan-redhat-bugs) wrote :

Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes.

This Release Note is currently located in the Known Issues section.

Revision history for this message
In , Ryan (ryan-redhat-bugs) wrote :

Release note added. If any revisions are required, please set the
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Revision history for this message
In , Bill (bill-redhat-bugs) wrote :

I have a test image that has the fix for the panic as well as an increase in the number of IRQs (256 more). Unfortunately the increase break the kernel abi and some further work is needed to see if that could be overcome. While this is being
looked at could you please test this kernel to see if it solves the crash issue and what number of guests can be started with this change? The image is people.redhat.com/bburns/kernel-xen-2.6.18-103.el5IRQFIX.x86_64.rpm

Thanks.

Revision history for this message
In , Don (don-redhat-bugs) wrote :

in kernel-2.6.18-113.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Revision history for this message
In , Bill (bill-redhat-bugs) wrote :

Putting this back to assigned. Testing of the pre-beta kernels has shown that the fix for the crash when exhausting the IRQs was not effective. It added the error path logic but there was a flaw in the implementation using unsigned variables and comparing them for < 0. Upstream has fixed and it's a small incremental change to incorporate the fix.

Revision history for this message
In , Bill (bill-redhat-bugs) wrote :

Created attachment 321335
Posted patch.

Patch to fix checking for negative IRQ return values.

Revision history for this message
In , Don (don-redhat-bugs) wrote :

in kernel-2.6.18-121.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Revision history for this message
In , Bill (bill-redhat-bugs) wrote :

Release note updated. If any revisions are required, please set the
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,9 +1,2 @@
 (all architectures)
-dom0 has a system-wide IRQ (interrupt request line) limit of 256, which is consumed as follows:
+When the Dynamic IRQs available for guests virtual machines were exhausted the domain 0 kernel would crash. This patch fixed the crash condition and also greatly increased the number of availble IRQs for x86_64 platforms.-
- * 3 per physical CPU.
- * 1 per guest device (i.e. NIC or block device)
-
-When the IRQ limit is reached, the system will crash. As such, check your IRQ consumption to make sure that the number of guests you create (and their respective block devices) do not exhaust the IRQ limit.
-
-To determine how many IRQs you are currently consuming, run the command grep Dynamic-irq /proc/interrupts | wc -l.

Revision history for this message
In , James (james-redhat-bugs) wrote :

With much help from rharper, I finally have a small, ramdisk-based guest config suitable for creating many guest instances on my RHEL5.3 beta (2.6.18-121.el5xen #1 SMP Mon Oct 27 22:03:03 EDT 2008 x86_64) system:

  [root@elm3c13 xen]# cat /xen/disk2/etc-xen-share/test1
  name = "test1"
  maxmem = 64
  memory = 64
  vcpus = 1
  kernel = "/etc/xen/vmlinuz-autobench-xen"
  root = "/dev/xvda"
  extra = "console=xvc0"
  on_poweroff = "destroy"
  on_reboot = "restart"
  on_crash = "preserve"
  vif = [ '' ]
  disk = [ 'tap:aio:/etc/xen/initrd-1.1-i386.img,xvda,r' ]

When I tried to create a bunch of guests based upon this config, I ran into 'Error: (12, 'Cannot allocate memory')' messages at the 89th guest, well before IRQs were exhausted ('grep Dynamic-irq /proc/interrupts | wc -l' reports only 202, and I had 26 even before creating the first guest). I also saw some '(XEN) Cannot handle page request order 0!' messages on the console while these failures were occurring.

The system has plenty of free memory (MemTotal: 33554432 kB; MemFree: 32273780 kB), so this error is confusing. Am I doing something wrong?

Revision history for this message
In , James (james-redhat-bugs) wrote :

(In reply to comment #27)
> my RHEL5.3 beta
> (2.6.18-121.el5xen #1 SMP Mon Oct 27 22:03:03 EDT 2008 x86_64) system:

Er, I meant to say snap1.

Revision history for this message
In , James (james-redhat-bugs) wrote :

Oh, a couple of other data points I forgot to mention. My first attempt to workaround this issue was to reduce maxmem and memory from 64 to 32. But the system failed in exactly the same way, and still at the 89th guest.

Then I thought I might perhaps consume IRQs more quickly by allocating more CPUs per guest. But bumping vcpus from 1 to 4 caused the system to hit the 'Cannot allocate memory' failure even earlier -- at the 68th rather than the 89th guest instance.

Revision history for this message
In , Bill (bill-redhat-bugs) wrote :

Yes, it's unlikely that you will be able to exhaust the IRQs since the patch increased them by quite a large margin. It is assumed that with the IRQ limit out of the way the next limitation would be hit. Please file a separate bug report with the details. I think for verification of this bug, getting past the 70 or so guests that used to fail is sufficient. Thanks for the testing!

Revision history for this message
In , Ryan (ryan-redhat-bugs) wrote :

Release note updated. If any revisions are required, please set the
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,2 +1 @@
-(all architectures)
+When the Dynamic IRQs available for guests virtual machines were exhausted, the dom0 kernel would crash. In this update, the crash condition has been fixed, and the number of available IRQs has been increased, which resolves this issue.-When the Dynamic IRQs available for guests virtual machines were exhausted the domain 0 kernel would crash. This patch fixed the crash condition and also greatly increased the number of availble IRQs for x86_64 platforms.

Revision history for this message
In , James (james-redhat-bugs) wrote :

For the sake of completeness, I tried my test scenario on a plain vanilla RHEL5.2 Xen installation. As I'd hoped, I saw the following:

  Kernel panic - not syncing: No available IRQ to bind to: increase NR_IRQS!

   (XEN) Domain 0 crashed: rebooting machine in 5 seconds.

This happens at the 116th guest, which attempts to use the 257th IRQ.

I then repeated the experiment, but first installed the -119 kernel -- that was the last test kernel provided prior to moving my testing to RHEL5.3. With that setup, I observed that I can allocate at least 256 guests with at least 512 IRQs without crashing. If desired, I can rerun that setup to the next power of 2 to see what happens.

In both of the RHEL5.2 cases, I see the following errors starting at the 100th guest:

  tap tap-312-51712: 2 getting info
  blk_tap: Error initialising /dev/xen/blktap - No more devices
  blk_tap: Error initialising /dev/xen/blktap - No more devices
  <last msg repeats about 8 times per guest-creation attempt>

The guests get created, but eventually get marked 'crashed'.

Bottom line is that it seems that we used to be able to get 99 usable guests with RHEL5.2, whereas with RHEL5.3, we can only get 88, at least based upon this particular guest config. Not saying that's a problem -- just providing an FYI.

Unless there are requests for further tests, I think this bug can be closed. I'll open a separate bug to track the 'Cannot allocate memory' issue.

Revision history for this message
In , Bill (bill-redhat-bugs) wrote :

Thanks for the testing. Please do open the new BZ to track the memory issue. I think with the existing testing and my forcing IRQ exhaustion via a kernel hack I am confident this issue is all set.

Revision history for this message
In , Chris (chris-redhat-bugs) wrote :

(In reply to comment #33)
> In both of the RHEL5.2 cases, I see the following errors starting at the 100th
> guest:
>
> tap tap-312-51712: 2 getting info
> blk_tap: Error initialising /dev/xen/blktap - No more devices
> blk_tap: Error initialising /dev/xen/blktap - No more devices
> <last msg repeats about 8 times per guest-creation attempt>

If I remember correctly, you are running into blktap limitations here. There is a hard-coded 100 disk limit currently in blktap, so you get the "No more devices" message when you try to add more disks and it doesn't find any more room in the array. You'll probably have better luck using LVM backed guests, since there is no such limitation there.

If we really want to support more blktap disks (and we probably do), we should open up another bug to up that limit in blktap (but this will have to be for later releases).

Chris Lalancette

Revision history for this message
In , errata-xmlrpc (errata-xmlrpc-redhat-bugs) wrote :

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Revision history for this message
Wido den Hollander (wido) wrote :

What i forgot to mention:

root@vps-pool-01:~# xm network-list vps78
Idx BE MAC Addr. handle state evt-ch tx-/rx-ring-ref BE-path
0 0 00:16:3e:12:de:e4 0 6 8 768 /769 /local/domain/0/backend/vif/247/0
root@vps-pool-01:~# xm network-list vps79
Idx BE MAC Addr. handle state evt-ch tx-/rx-ring-ref BE-path
0 0 00:16:3e:82:f1:e5 0 4 8 768 /769 /local/domain/0/backend/vif/233/0
root@vps-pool-01:~#

vps78 is NOT working and vps79 IS working.

The difference i can find is the "state", on working domU's the state is "4" and with vps78 the state is "6"

Revision history for this message
Wido den Hollander (wido) wrote :

I have done some searching and found that the problem is caused by netif_map in Xen.

In my "dmesg" in found this:

vif vif-263-0: 28 mapping shared-frames 768/769 port 8

I setup a replica (software) from the running setup but i was not able te reproduce the problem, i started 86 domU's and then i ran out of memory, so i couldn't create more domU's.

Revision history for this message
Wido den Hollander (wido) wrote :

After some more searching on the net i was told this is an issue with the grant tables in Xen:

"The problem you're facing seems to be the grant tables in Xen.
Probably the entries are too much or something got wrong.
Printing debug message in Xen would help you in that case."

At the moment i am trying to find some information in the debug messages, but their does not seem to be much information regarding the vif's.

I am pushing my boss to let me buy a second server wich is exactly the same as the one already running so i can create a exact replica of the setup.

Revision history for this message
Wido den Hollander (wido) wrote :

I kept searching and searching and i found that when i remove the swap device (xvda2) from my domU, then the network works OK.

My config looks like:
kernel = '/boot/vmlinuz-2.6.24-22-xen'
ramdisk = '/boot/initrd.img-2.6.24-22-xen'
memory = 512
vcpus = 1

root = '/dev/xvda1 ro'

disk = [
                  'phy:/dev/xen-domains-root/vps78-root,xvda1,w'
                  'phy:/dev/xen-domains-swap/vps78-swap,xvda2,w'
              ]

name = 'vps78'

vif = [ 'mac=00:16:3e:12:de:e4,vifname=vps78,bridge=vlanbr710' ]

on_poweroff = 'destroy'
on_reboot = 'restart'
on_crash = 'restart'

extra='xencons=tty1 rootflags=quota'

This does NOT work, it results in the "vif mapping shared-frames" message.

But when i remove xvda2 from the config file, the domU boots fine and the network also works.

When i later on attach xvda2, i get this:

root@vps-pool-01:~# xm block-attach vps78 /dev/xen-domains-swap/vps78-swap xvda2 w
root@vps-pool-01:~# dmesg
[2540234.810635] blkback: ring-ref 299, event-channel 8, protocol 1 (x86_64-abi)
[2540234.810813] vbd vbd-280-51714: 28 mapping ring-ref 299 port 8
root@vps-pool-01:~#

So it seems i run into a limit of number of mappings that are possible?

Revision history for this message
Wido den Hollander (wido) wrote :

I have been able to reproduce the bug and found out what goes wrong.

After creating 79 VM's with all 2 vbd (root disk and swap) and one vif i ran into the problem again.

One error caught my attention (and is only mentioned once!):

[ 628.040958] No available IRQ to bind to: increase NR_DYNIRQS.

After some searching i found:
* https://bugzilla.redhat.com/show_bug.cgi?id=442736
* http://www.nabble.com/Unable-to-start-more-than-103-VMs-td11420853.html

I downloaded the source of "linux-image-2.6.24-23-xen" and modified "debian/binary-custom.d/xen/patchset/001-xen-base.patch" on line 87148

Here you should change NR_DYNIRQS from 256 to 1024.

When you have a VM with 3 "devices" (in my case 2x vbd, 1x vif) you reserve 3 Dynamic IRQ's per domU.

79 * 3 = 237 IRQ's

The default limit of 256 is reached since the dom0 also reserves some IRQ's.

You can find out how many IRQ's you are using with:

grep Dynamic-irq /proc/interrupts |wc -l

I have now been able to address over 256 IRQ's and start 90 VM's without any troubles.

I think Ubuntu should raise the limit from 256 to 512 or 1024 OR make it a option wich can be passed to the kernel on boot (like suggested in the RedHat bugreport)

Changed in xen-3.2:
status: Unknown → Fix Released
Revision history for this message
uv ag (groovydude7) wrote :

HI,

I am running Ubuntu 8.04 LTS. I wanted to see if this NR_DYNIRQ bug is fixed in the latest kernel update.

i want to be able to scale beyond 70-80 VMs on a production system and want to know if this will work or not on my system.
(I use 4 IRQs per domU)

The version of the kernel that I currently have is 2.6.24-21 and when I downloaded the kernel sources, I can see that
debian/binary-custom.d/xen/patchset/001-xen-base.patch sets NR_DYNIRQ to 256 (which will limit the number of domU's significantly).

I believe the latest kernel available through Ubuntu update is 2.6.24-24-xen for this version of Ubuntu. Is this bug fixed in this version of the kernel? Should I update to this?

thanks a lot!

Revision history for this message
uv ag (groovydude7) wrote :

*BUMP*

Hi All,

I am running Ubuntu 8.04 LTS (with Xen 3.2.rc1) I want to be able to scale beyond 70-80 VMs on a production system. I already downloaded the latest kernel (2.6.24-24 version) and changed "debian/ binary-custom.d/xen/patchset/001-xen-base.patch" to 1024 instead of 256 (there was only one place that this change was required i believe). I rebuilt my kernel, but I am still running into the issue that once I have used about 230 IRQ the next domU dont start.

I keep getting the following message in the any domU I start up after this.(I have 51 domUs running right now, each has two network interfaces and uses a total of 4 IRQs it seems).

[ 5.338606] XENBUS: Waiting for devices to initialise: 295s...290s...285s...280s...275s...270s...265s...260s...255s...250s...245s...240s...235s...230s...225s...220s...215s...210s...205s...200s...195s...190s...185s...180s...175s...170s...165s...160s...155s...150s...145s...140s...135s...130s...125s...120s...115s...110s...105s...100s...95s...90s...85s...80s...75s...70s...65s...60s...55s...50s...45s...40s...35s...30s...25s...20s...15s...10s...5s...0s...
[ 300.317849] XENBUS: Timeout connecting to device: device/vbd/769 (local state 3, remote state 2)
[ 300.317854] XENBUS: Device not ready: device/vbd/769
[ 300.317911] XENBUS: Timeout connecting to device: device/vbd/770 (local state 3, remote state 2)
[ 300.317914] XENBUS: Device not ready: device/vbd/770
[ 300.317916] XENBUS: Device with no driver: device/console/0
[ 300.317930] Magic number: 1:252:3141
[ 300.318088] Freeing unused kernel memory: 212k freed

check root= bootarg cat /proc/cmdline
        or missing modules, devices: cat /proc/modules ls /dev
ALERT! /dev/hda2 does not exist. Dropping to a shell!

BusyBox v1.1.3 (Debian 1:1.1.3-5ubuntu12) Built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)

>> grep Dynamic-irq /proc/interrupts | wc -l returns - 230

any help or advice is appreciated very much...

Changed in xen-3.2 (Fedora):
importance: Unknown → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.