Too many open files when large number of routers on a host

Bug #1737866 reported by Xav Paice on 2017-12-13
42
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack neutron-openvswitch charm
Undecided
Unassigned
Ubuntu Cloud Archive
Undecided
Unassigned
Mitaka
Medium
James Page
Ocata
Medium
James Page
Pike
Medium
James Page
openvswitch (Ubuntu)
Medium
Unassigned
Xenial
Medium
James Page
Artful
Medium
Unassigned
Bionic
Medium
Unassigned

Bug Description

[Impact]
OpenStack environments running large numbers of routers and dhcp agents on a single host can hit the NOFILES limit in OVS, resulting in broken operation of virtual networking.

[Test Case]
Deploy openstack environment; create large number of virtual networks and routers.
OVS will start to error with 'Too many open files'

[Regression Potential]
Minimal - we're just increasing the NOFILE limit via the systemd service definition.

[Original Bug Report]
When there are a large number of routers and dhcp agents on a host, we see a syslog error repeated:

"hostname ovs-vswitchd: ovs|1762125|netlink_socket|ERR|fcntl: Too many open files"

If I check the number of filehandles owned by the pid for "ovs-vswitchd unix:/var/run/openvswitch/db.sock" I see close to/at 65535 files.

If I then run the following, we double the limit and (in our case) saw the count rise to >80000:

prlimit -p $pid --nofile=131070

We need to be able to:
- monitor via nrpe, if the process is running short on filehandles
- configure the limit so we have the option to not run out.

Currently, if I restart the process, we'll lose this setting.

Needless to say, openvswitch running out of filehandles causes all manner of problems for services which use it.

CVE References

James Page (james-page) wrote :

This needs fixing in the systemd units for ovs; raising packaging tasks.

Changed in openvswitch (Ubuntu):
status: New → Triaged
milestone: none → ubuntu-18.04
importance: Undecided → Medium
Changed in charm-neutron-openvswitch:
status: New → Invalid
Changed in openvswitch (Ubuntu Artful):
status: New → Triaged
Changed in openvswitch (Ubuntu Xenial):
status: New → Triaged
Changed in openvswitch (Ubuntu Artful):
importance: Undecided → Medium
Changed in openvswitch (Ubuntu Xenial):
importance: Undecided → Medium
Alvaro Uría (aluria) wrote :

Hi, this is affecting a production environment running xenial+ocata. Could we have an ETA for the systemd config to land Xenial packages?

Thanks much!

James Page (james-page) wrote :

I've uploaded fixes to bionic; will do SRU's for Xenial->Artful once that's into the release pocket and the current 2.8.1 ovs stable release is through.

Changed in openvswitch (Ubuntu Bionic):
status: Triaged → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openvswitch - 2.8.1-0ubuntu3

---------------
openvswitch (2.8.1-0ubuntu3) bionic; urgency=medium

  * Updates to systemd configuration:
    - Move to distinct units for ovsdb-server and ovs-vswitchd.
  * Drop obsolete upstart configuration file.
  * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
  * d/control: Bump minimum debhelper version to 10, drop BD on
    dh-systemd.
  * d/p/dpif-kernel-gre-mtu-workaround.patch,
    d/p/dpif-netlink-rtnl-Use-65000-instead-of-65535-as-tunnel-MTU.patch:
    Cherry pick in-flight fixes for workaround to correctly set MTU
    of GRE devices via netlink (LP: #1742505).

 -- James Page <email address hidden> Thu, 18 Jan 2018 15:26:41 +0200

Changed in openvswitch (Ubuntu Bionic):
status: Fix Committed → Fix Released
Xav Paice (xavpaice) wrote :

Any update on when we might land an SRU for Xenial?

Xav Paice (xavpaice) wrote :

Subscribed field-high because we have an active environment (more?) that are are affected by this using Xenial/Ocata, and we really need that SRU released.

James Page (james-page) on 2018-08-17
Changed in openvswitch (Ubuntu Artful):
status: Triaged → Won't Fix
James Page (james-page) wrote :

As restarts of OVS are service impacting, I was holding off on uploading this fix until we had point releases for OVS - which we now do.

Changed in openvswitch (Ubuntu Xenial):
status: Triaged → In Progress
Changed in cloud-archive:
status: New → Fix Released
description: updated
Changed in openvswitch (Ubuntu Xenial):
assignee: nobody → James Page (james-page)

Hello Xav, or anyone else affected,

Accepted openvswitch into ocata-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ocata-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ocata-needed to verification-ocata-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ocata-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-ocata-needed
James Page (james-page) wrote :

uploaded to xenial for SRU team review

Robie Basak (racb) wrote :

> Subscribed field-high because we have an active environment (more?) that are are affected by this using Xenial/Ocata, and we really need that SRU released.

Are you aware that systemd supports drop-in overrides for individual configuration items? So until this SRU is released, you could work around on a production system by dropping in the right file in /etc that will make exactly the same functional change the SRU will. See systemd.unit(5) for details.

Robie Basak (racb) wrote :

Hello Xav, or anyone else affected,

Accepted openvswitch into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/openvswitch/2.5.5-0ubuntu0.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in openvswitch (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
tags: added: sts-sru-needed
Hua Zhang (zhhuabj) wrote :

I have verified openvswitch-switch=2.5.5-0ubuntu0.16.04.1, it looks good to me.

root@16.04:/tmp/ovs/openvswitch-2.5.5$ grep -r '1048576' ./debian/
./debian/changelog: * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
./debian/openvswitch-switch.openvswitch-nonetwork.service:LimitNOFILE=1048576

root@16.04:~$ grep -r '1048576' /lib/systemd/system/openvswitch*
/lib/systemd/system/openvswitch-nonetwork.service:LimitNOFILE=1048576

root@node1:~# grep -r '1048576' /proc/`pidof ovsdb-server`/limits
Max open files 1048576 1048576 files

root@node1:~# grep -r '1048576' /proc/`pidof ovs-vswitchd`/limits
Max open files 1048576 1048576 files

Hua Zhang (zhhuabj) on 2018-09-19
tags: added: verification-done-xenial
removed: verification-needed-xenial
James Page (james-page) wrote :

Hello Xav, or anyone else affected,

Accepted openvswitch into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
Edward Hope-Morley (hopem) wrote :

looks like the upload to pike proposed didn't update this lp but it is defo there:

openvswitch (2.8.4-0ubuntu0.17.10.1) xenial; urgency=medium

  * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
  * New upstream point release (LP: #1787519):
    - d/p/s390x-stp-timeout.patch: Dropped, equivalent
      change upstream.

 -- James Page <email address hidden> Fri, 17 Aug 2018 08:01:11 +0100

tags: added: verification-pike-needed
James Page (james-page) wrote :

Regression testing for xenial + UCA/pike-proposed

======
Totals
======
Ran: 94 tests in 362.4202 sec.
 - Passed: 89
 - Skipped: 5
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 432.8592 sec.

$ apt-cache policy openvswitch-switch
openvswitch-switch:
  Installed: 2.8.4-0ubuntu0.17.10.1
  Candidate: 2.8.4-0ubuntu0.17.10.1
  Version table:
 *** 2.8.4-0ubuntu0.17.10.1 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu xenial-proposed/pike/main amd64 Packages
        100 /var/lib/dpkg/status
     2.5.4-0ubuntu0.16.04.1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
     2.5.2-0ubuntu0.16.04.2 500
        500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
     2.5.0-0ubuntu1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

tags: added: verification-pike-done
removed: verification-pike-needed
James Page (james-page) wrote :

xenial + UCA/ocata-proposed

======
Totals
======
Ran: 94 tests in 585.1451 sec.
 - Passed: 88
 - Skipped: 6
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 892.7394 sec.

$ apt-cache policy openvswitch-switch
openvswitch-switch:
  Installed: 2.6.3-0ubuntu0.17.04.1~cloud0
  Candidate: 2.6.3-0ubuntu0.17.04.1~cloud0
  Version table:
 *** 2.6.3-0ubuntu0.17.04.1~cloud0 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu xenial-proposed/ocata/main amd64 Packages
        100 /var/lib/dpkg/status
     2.5.4-0ubuntu0.16.04.1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
     2.5.2-0ubuntu0.16.04.2 500
        500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
     2.5.0-0ubuntu1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

The verification of the Stable Release Update for openvswitch has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package openvswitch - 2.8.4-0ubuntu0.17.10.1
---------------

 openvswitch (2.8.4-0ubuntu0.17.10.1) xenial; urgency=medium
 .
   * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
   * New upstream point release (LP: #1787519):
     - d/p/s390x-stp-timeout.patch: Dropped, equivalent
       change upstream.

James Page (james-page) wrote :

The verification of the Stable Release Update for openvswitch has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package openvswitch - 2.6.3-0ubuntu0.17.04.1~cloud0
---------------

 openvswitch (2.6.3-0ubuntu0.17.04.1~cloud0) xenial; urgency=medium
 .
   * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
   * d/watch: Fix watchfile for upstream website changes.
   * New upstream point release (LP: #1787599).
     - d/p/CVE-2017-9214.patch,CVE-2017-9264.patch,CVE-2017-9265.patch:
       Dropped, included in upstream release.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openvswitch - 2.5.5-0ubuntu0.16.04.1

---------------
openvswitch (2.5.5-0ubuntu0.16.04.1) xenial; urgency=medium

  * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
  * d/watch: Update for upstream website changes.
  * New upstream point release (LP: #1788103).
  * d/p/CVE-2017-9214.patch: Dropped, included upstream.

 -- James Page <email address hidden> Wed, 22 Aug 2018 09:36:55 +0100

Changed in openvswitch (Ubuntu Xenial):
status: Fix Committed → Fix Released
James Page (james-page) wrote :

The verification of the Stable Release Update for openvswitch has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package openvswitch - 2.5.5-0ubuntu0.16.04.1~cloud0
---------------

 openvswitch (2.5.5-0ubuntu0.16.04.1~cloud0) trusty-mitaka; urgency=medium
 .
   * New upstream release for the Ubuntu Cloud Archive.
 .
 openvswitch (2.5.5-0ubuntu0.16.04.1) xenial; urgency=medium
 .
   * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
   * d/watch: Update for upstream website changes.
   * New upstream point release (LP: #1788103).
   * d/p/CVE-2017-9214.patch: Dropped, included upstream.

Hua Zhang (zhhuabj) wrote :

Failed to verify trusty-proposed/mitaka/main 2.5.5-0ubuntu0.16.04.1~cloud0 since trusty doesn't use systemd as default

root@trusty:/tmp/openvswitch-2.5.5# grep -r '1048576' ./debian/
./debian/changelog: * Bump nofiles to 1048576 for ovs daemons (LP: #1737866).
./debian/openvswitch-switch.openvswitch-nonetwork.service:LimitNOFILE=1048576

root@trusty:~# grep -r '1048576' /lib/systemd/system/openvswitch*
/lib/systemd/system/openvswitch-nonetwork.service:LimitNOFILE=1048576

root@trusty:~# grep -r 'Max open files' /proc/`pidof ovsdb-server`/limits
Max open files 1024 4096 files

root@trusty:~# grep -r 'Max open files' /proc/`pidof ovs-vswitchd`/limits
Max open files 65535 65535 files

tags: added: verification-mitaka-failed
removed: verification-mitaka-needed
Hua Zhang (zhhuabj) wrote :

sucessfully to verify xenial-proposed/ocata/main 2.6.3-0ubuntu0.17.04.1~cloud0

root@xenial:~# grep -r '1048576' /lib/systemd/system/openvswitch*
/lib/systemd/system/openvswitch-nonetwork.service:LimitNOFILE=1048576
root@xenial:~# grep -r 'Max open files' /proc/`pidof ovsdb-server`/limits
Max open files 1048576 1048576 files
root@xenial:~# grep -r 'Max open files' /proc/`pidof ovs-vswitchd`/limits
Max open files 1048576 1048576 files

tags: added: verification-ocata-done
removed: verification-ocata-needed
Corey Bryant (corey.bryant) wrote :

I've moved mitaka back to Triaged since this is only fixed so far in the systemd init file.

Corey Bryant (corey.bryant) wrote :

Note that this is fixed in xenial (mitaka) but not in trusty (mitaka) because trusty uses upstart.

Edward Hope-Morley (hopem) wrote :

Re the trusty-mitaka failure, from comment in http://upstart.ubuntu.com/wiki/Stanzas#limit it looks like this should fix it - https://pastebin.ubuntu.com/p/PMmQTNQsxZ/

tags: added: sts-sru-done
removed: sts-sru-needed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers