Too many open files when large number of routers on a host

Bug #1737866 reported by Xav Paice
40
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Neutron Open vSwitch Charm
Invalid
Undecided
Unassigned
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Mitaka
Fix Committed
Medium
James Page
Ocata
Fix Released
Medium
James Page
Pike
Fix Released
Medium
James Page
openvswitch (Ubuntu)
Fix Released
Medium
Unassigned
Xenial
Fix Released
Medium
James Page
Artful
Won't Fix
Medium
Unassigned
Bionic
Fix Released
Medium
Unassigned

Bug Description

[Impact]
OpenStack environments running large numbers of routers and dhcp agents on a single host can hit the NOFILES limit in OVS, resulting in broken operation of virtual networking.

[Test Case]
Deploy openstack environment; create large number of virtual networks and routers.
OVS will start to error with 'Too many open files'

[Regression Potential]
Minimal - we're just increasing the NOFILE limit via the systemd service definition.

[Original Bug Report]
When there are a large number of routers and dhcp agents on a host, we see a syslog error repeated:

"hostname ovs-vswitchd: ovs|1762125|netlink_socket|ERR|fcntl: Too many open files"

If I check the number of filehandles owned by the pid for "ovs-vswitchd unix:/var/run/openvswitch/db.sock" I see close to/at 65535 files.

If I then run the following, we double the limit and (in our case) saw the count rise to >80000:

prlimit -p $pid --nofile=131070

We need to be able to:
- monitor via nrpe, if the process is running short on filehandles
- configure the limit so we have the option to not run out.

Currently, if I restart the process, we'll lose this setting.

Needless to say, openvswitch running out of filehandles causes all manner of problems for services which use it.

Revision history for this message
James Page (james-page) wrote :

This needs fixing in the systemd units for ovs; raising packaging tasks.

Changed in openvswitch (Ubuntu):
status: New → Triaged
milestone: none → ubuntu-18.04
importance: Undecided → Medium
Changed in charm-neutron-openvswitch:
status: New → Invalid
Changed in openvswitch (Ubuntu Artful):
status: New → Triaged
Changed in openvswitch (Ubuntu Xenial):
status: New → Triaged
Changed in openvswitch (Ubuntu Artful):
importance: Undecided → Medium
Changed in openvswitch (Ubuntu Xenial):
importance: Undecided → Medium
Revision history for this message
Alvaro Uria (aluria) wrote :

Hi, this is affecting a production environment running xenial+ocata. Could we have an ETA for the systemd config to land Xenial packages?

Thanks much!

Revision history for this message
James Page (james-page) wrote :

I've uploaded fixes to bionic; will do SRU's for Xenial->Artful once that's into the release pocket and the current 2.8.1 ovs stable release is through.

Changed in openvswitch (Ubuntu Bionic):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openvswitch - 2.8.1-0ubuntu3

---------------
openvswitch (2.8.1-0ubuntu3) bionic; urgency=medium

  * Updates to systemd configuration:
    - Move to distinct units for ovsdb-server and ovs-vswitchd.
  * Drop obsolete upstart configuration file.
  * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
  * d/control: Bump minimum debhelper version to 10, drop BD on
    dh-systemd.
  * d/p/dpif-kernel-gre-mtu-workaround.patch,
    d/p/dpif-netlink-rtnl-Use-65000-instead-of-65535-as-tunnel-MTU.patch:
    Cherry pick in-flight fixes for workaround to correctly set MTU
    of GRE devices via netlink (LP: #1742505).

 -- James Page <email address hidden> Thu, 18 Jan 2018 15:26:41 +0200

Changed in openvswitch (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Xav Paice (xavpaice) wrote :

Any update on when we might land an SRU for Xenial?

Revision history for this message
Xav Paice (xavpaice) wrote :

Subscribed field-high because we have an active environment (more?) that are are affected by this using Xenial/Ocata, and we really need that SRU released.

James Page (james-page)
Changed in openvswitch (Ubuntu Artful):
status: Triaged → Won't Fix
Revision history for this message
James Page (james-page) wrote :

As restarts of OVS are service impacting, I was holding off on uploading this fix until we had point releases for OVS - which we now do.

Changed in openvswitch (Ubuntu Xenial):
status: Triaged → In Progress
Changed in cloud-archive:
status: New → Fix Released
description: updated
Changed in openvswitch (Ubuntu Xenial):
assignee: nobody → James Page (james-page)
Revision history for this message
James Page (james-page) wrote : Please test proposed package

Hello Xav, or anyone else affected,

Accepted openvswitch into ocata-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ocata-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ocata-needed to verification-ocata-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ocata-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-ocata-needed
Revision history for this message
James Page (james-page) wrote :

uploaded to xenial for SRU team review

Revision history for this message
Robie Basak (racb) wrote :

> Subscribed field-high because we have an active environment (more?) that are are affected by this using Xenial/Ocata, and we really need that SRU released.

Are you aware that systemd supports drop-in overrides for individual configuration items? So until this SRU is released, you could work around on a production system by dropping in the right file in /etc that will make exactly the same functional change the SRU will. See systemd.unit(5) for details.

Revision history for this message
Robie Basak (racb) wrote :

Hello Xav, or anyone else affected,

Accepted openvswitch into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/openvswitch/2.5.5-0ubuntu0.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in openvswitch (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
tags: added: sts-sru-needed
Revision history for this message
Hua Zhang (zhhuabj) wrote :

I have verified openvswitch-switch=2.5.5-0ubuntu0.16.04.1, it looks good to me.

root@16.04:/tmp/ovs/openvswitch-2.5.5$ grep -r '1048576' ./debian/
./debian/changelog: * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
./debian/openvswitch-switch.openvswitch-nonetwork.service:LimitNOFILE=1048576

root@16.04:~$ grep -r '1048576' /lib/systemd/system/openvswitch*
/lib/systemd/system/openvswitch-nonetwork.service:LimitNOFILE=1048576

root@node1:~# grep -r '1048576' /proc/`pidof ovsdb-server`/limits
Max open files 1048576 1048576 files

root@node1:~# grep -r '1048576' /proc/`pidof ovs-vswitchd`/limits
Max open files 1048576 1048576 files

Hua Zhang (zhhuabj)
tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
James Page (james-page) wrote :

Hello Xav, or anyone else affected,

Accepted openvswitch into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
Revision history for this message
Edward Hope-Morley (hopem) wrote :

looks like the upload to pike proposed didn't update this lp but it is defo there:

openvswitch (2.8.4-0ubuntu0.17.10.1) xenial; urgency=medium

  * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
  * New upstream point release (LP: #1787519):
    - d/p/s390x-stp-timeout.patch: Dropped, equivalent
      change upstream.

 -- James Page <email address hidden> Fri, 17 Aug 2018 08:01:11 +0100

tags: added: verification-pike-needed
Revision history for this message
James Page (james-page) wrote :

Regression testing for xenial + UCA/pike-proposed

======
Totals
======
Ran: 94 tests in 362.4202 sec.
 - Passed: 89
 - Skipped: 5
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 432.8592 sec.

$ apt-cache policy openvswitch-switch
openvswitch-switch:
  Installed: 2.8.4-0ubuntu0.17.10.1
  Candidate: 2.8.4-0ubuntu0.17.10.1
  Version table:
 *** 2.8.4-0ubuntu0.17.10.1 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu xenial-proposed/pike/main amd64 Packages
        100 /var/lib/dpkg/status
     2.5.4-0ubuntu0.16.04.1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
     2.5.2-0ubuntu0.16.04.2 500
        500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
     2.5.0-0ubuntu1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

tags: added: verification-pike-done
removed: verification-pike-needed
Revision history for this message
James Page (james-page) wrote :

xenial + UCA/ocata-proposed

======
Totals
======
Ran: 94 tests in 585.1451 sec.
 - Passed: 88
 - Skipped: 6
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 892.7394 sec.

$ apt-cache policy openvswitch-switch
openvswitch-switch:
  Installed: 2.6.3-0ubuntu0.17.04.1~cloud0
  Candidate: 2.6.3-0ubuntu0.17.04.1~cloud0
  Version table:
 *** 2.6.3-0ubuntu0.17.04.1~cloud0 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu xenial-proposed/ocata/main amd64 Packages
        100 /var/lib/dpkg/status
     2.5.4-0ubuntu0.16.04.1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
     2.5.2-0ubuntu0.16.04.2 500
        500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
     2.5.0-0ubuntu1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

Revision history for this message
James Page (james-page) wrote : Update Released

The verification of the Stable Release Update for openvswitch has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package openvswitch - 2.8.4-0ubuntu0.17.10.1
---------------

 openvswitch (2.8.4-0ubuntu0.17.10.1) xenial; urgency=medium
 .
   * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
   * New upstream point release (LP: #1787519):
     - d/p/s390x-stp-timeout.patch: Dropped, equivalent
       change upstream.

Revision history for this message
James Page (james-page) wrote :

The verification of the Stable Release Update for openvswitch has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package openvswitch - 2.6.3-0ubuntu0.17.04.1~cloud0
---------------

 openvswitch (2.6.3-0ubuntu0.17.04.1~cloud0) xenial; urgency=medium
 .
   * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
   * d/watch: Fix watchfile for upstream website changes.
   * New upstream point release (LP: #1787599).
     - d/p/CVE-2017-9214.patch,CVE-2017-9264.patch,CVE-2017-9265.patch:
       Dropped, included in upstream release.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openvswitch - 2.5.5-0ubuntu0.16.04.1

---------------
openvswitch (2.5.5-0ubuntu0.16.04.1) xenial; urgency=medium

  * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
  * d/watch: Update for upstream website changes.
  * New upstream point release (LP: #1788103).
  * d/p/CVE-2017-9214.patch: Dropped, included upstream.

 -- James Page <email address hidden> Wed, 22 Aug 2018 09:36:55 +0100

Changed in openvswitch (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
James Page (james-page) wrote :

The verification of the Stable Release Update for openvswitch has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package openvswitch - 2.5.5-0ubuntu0.16.04.1~cloud0
---------------

 openvswitch (2.5.5-0ubuntu0.16.04.1~cloud0) trusty-mitaka; urgency=medium
 .
   * New upstream release for the Ubuntu Cloud Archive.
 .
 openvswitch (2.5.5-0ubuntu0.16.04.1) xenial; urgency=medium
 .
   * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
   * d/watch: Update for upstream website changes.
   * New upstream point release (LP: #1788103).
   * d/p/CVE-2017-9214.patch: Dropped, included upstream.

Revision history for this message
Hua Zhang (zhhuabj) wrote :

Failed to verify trusty-proposed/mitaka/main 2.5.5-0ubuntu0.16.04.1~cloud0 since trusty doesn't use systemd as default

root@trusty:/tmp/openvswitch-2.5.5# grep -r '1048576' ./debian/
./debian/changelog: * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
./debian/openvswitch-switch.openvswitch-nonetwork.service:LimitNOFILE=1048576

root@trusty:~# grep -r '1048576' /lib/systemd/system/openvswitch*
/lib/systemd/system/openvswitch-nonetwork.service:LimitNOFILE=1048576

root@trusty:~# grep -r 'Max open files' /proc/`pidof ovsdb-server`/limits
Max open files 1024 4096 files

root@trusty:~# grep -r 'Max open files' /proc/`pidof ovs-vswitchd`/limits
Max open files 65535 65535 files

tags: added: verification-mitaka-failed
removed: verification-mitaka-needed
Revision history for this message
Hua Zhang (zhhuabj) wrote :

sucessfully to verify xenial-proposed/ocata/main 2.6.3-0ubuntu0.17.04.1~cloud0

root@xenial:~# grep -r '1048576' /lib/systemd/system/openvswitch*
/lib/systemd/system/openvswitch-nonetwork.service:LimitNOFILE=1048576
root@xenial:~# grep -r 'Max open files' /proc/`pidof ovsdb-server`/limits
Max open files 1048576 1048576 files
root@xenial:~# grep -r 'Max open files' /proc/`pidof ovs-vswitchd`/limits
Max open files 1048576 1048576 files

tags: added: verification-ocata-done
removed: verification-ocata-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote :

I've moved mitaka back to Triaged since this is only fixed so far in the systemd init file.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Note that this is fixed in xenial (mitaka) but not in trusty (mitaka) because trusty uses upstart.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Re the trusty-mitaka failure, from comment in http://upstart.ubuntu.com/wiki/Stanzas#limit it looks like this should fix it - https://pastebin.ubuntu.com/p/PMmQTNQsxZ/

tags: added: sts-sru-done
removed: sts-sru-needed
Revision history for this message
James Page (james-page) wrote :

I've committed the trusty/mitaka nofiles fix for the upstart unit to the git repository for OVS.

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Xav, or anyone else affected,

Accepted openvswitch into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/openvswitch/2.5.9-0ubuntu0.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in openvswitch (Ubuntu Xenial):
status: Fix Released → Fix Committed
tags: added: verification-needed-xenial
removed: verification-done-xenial
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (openvswitch/2.5.9-0ubuntu0.16.04.1)

All autopkgtests for the newly accepted openvswitch (2.5.9-0ubuntu0.16.04.1) for xenial have finished running.
The following regressions have been reported in tests triggered by the package:

neutron/2:8.4.0-0ubuntu7.5 (ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/xenial/update_excuses.html#openvswitch

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Corey Bryant (corey.bryant) wrote : Please test proposed package

Hello Xav, or anyone else affected,

Accepted openvswitch into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
removed: verification-mitaka-failed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openvswitch - 2.5.9-0ubuntu0.16.04.2

---------------
openvswitch (2.5.9-0ubuntu0.16.04.2) xenial-security; urgency=medium

  * SECURITY UPDATE: buffer overflow decoding malformed packets in lldp
    - debian/patches/CVE-2015-8011.patch: check lengths in lib/lldp/lldp.c.
    - CVE-2015-8011
  * SECURITY UPDATE: Externally triggered memory leak in lldp
    - debian/patches/CVE-2020-27827.patch: properly free memory in
      lib/lldp/lldp.c.
    - CVE-2020-27827

 -- Marc Deslauriers <email address hidden> Fri, 08 Jan 2021 07:30:54 -0500

Changed in openvswitch (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.