dnsmasq hits inotify max_user_instances limit in busy neutron deployments

Bug #1593041 reported by Narinder Gupta
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Neutron Gateway Charm
Triaged
Medium
Unassigned
OpenStack Neutron Open vSwitch Charm
Triaged
Medium
Unassigned

Bug Description

I deployed openstack HA with openvswitch. During creating the instances we found few instances get the ip address from dhcp agent. But after some time we are seeing issue with agent with the following error.

dnsmasq: failed to create inotify: Too many open files

2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 1346711a-66b8-4d24-81b8-e9d7a2e28248.
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/dhcp/agent.py", line 112, in call_driver
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs)
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/dhcp.py", line 210, in enable
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent self.spawn_process()
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/dhcp.py", line 424, in spawn_process
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent self._spawn_or_reload_process(reload_with_HUP=False)
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/dhcp.py", line 438, in _spawn_or_reload_process
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent pm.enable(reload_cfg=reload_with_HUP)
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/external_process.py", line 92, in enable
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent run_as_root=self.run_as_root)
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 927, in execute
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent log_fail_as_error=log_fail_as_error, **kwargs)
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 140, in execute
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent raise RuntimeError(msg)
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent RuntimeError: Exit code: 5; Stdin: ; Stdout: ; Stderr:
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent dnsmasq: failed to create inotify: Too many open files
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent
2016-06-16 02:57:22.712 35607 ERROR neutron.agent.dhcp.agent

root@rack-6-m1:/var/log/neutron# pastebinit neutron-dhcp-agent.log
http://paste.ubuntu.com/17389205/

pastebinit ../juju/unit-neutron-gateway-0.log
http://paste.ubuntu.com/17389255/

When i tried the same bundle with trusty then it works fine.

Revision history for this message
Narinder Gupta (narindergupta) wrote :

after making this change it is ok how can i do it using charm and which charm can i include in?

Revision history for this message
Narinder Gupta (narindergupta) wrote :

echo 256 > /proc/sys/fs/inotify/max_user_instances

Revision history for this message
Robin Cernin (rcernin) wrote :

Note that wont persist across reboots, and you will hit the same issue.

The following is the right way to have it runtime + persistent across reboots.

sysctl -w fs.inotify.max_user_instances=256 >> /etc/sysctl.conf

David Lawson (deej)
tags: added: sts
tags: added: canonical-is
Revision history for this message
James Page (james-page) wrote :

The neutron-gateway charm can also do this via configuration:

  juju config neutron-gateway sysctl="{ fs.inotify.max_user_instances: 256}"

This will be persistent to disk for reboot durability.

no longer affects: charms
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

Checking one of my own installations:

cat /proc/sys/fs/inotify/max_user_instances
1024

so I'm not quite sure how a default gets set here.

Revision history for this message
James Page (james-page) wrote :

and another:

$ cat /proc/sys/fs/inotify/max_user_instances
128

Revision history for this message
James Page (james-page) wrote :

Increasing this by default in the charm sounds sensible to me either way.

Revision history for this message
James Page (james-page) wrote :

That first install has up-to-date LXD:

ubuntu@juju-b88663-nova-lxd-demo-8:/etc/sysctl.d$ cat 10-lxd-inotify.conf
# Increase the user inotify instance limit to allow for about
# 100 containers to run before the limit is hit again
fs.inotify.max_user_instances = 1024

second one does not; so I think increasing this value during charm installation makes huge sense.

Changed in charm-neutron-gateway:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 18.02
summary: - DHCP agent does not give ip address
+ dnsmasq hits inotify max_user_instances limit in busy neutron
+ deployments
Changed in charm-neutron-openvswitch:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 18.02
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

Supporting change to charmhelpers:

  https://github.com/juju/charm-helpers/pull/15

Revision history for this message
Nobuto Murata (nobuto) wrote :

FWIW, the bump of fs.inotify.max_user_instances has been done by 2.0.10-0ubuntu1~16.04.2 for xenial on 2017-08-31.
https://launchpad.net/ubuntu/+source/lxd/2.0.10-0ubuntu1~16.04.2

Ryan Beisner (1chb1n)
Changed in charm-neutron-gateway:
milestone: 18.02 → 18.05
Changed in charm-neutron-openvswitch:
milestone: 18.02 → 18.05
David Ames (thedac)
Changed in charm-neutron-gateway:
milestone: 18.05 → 18.08
Changed in charm-neutron-openvswitch:
milestone: 18.05 → 18.08
James Page (james-page)
Changed in charm-neutron-gateway:
milestone: 18.08 → 18.11
Changed in charm-neutron-openvswitch:
milestone: 18.08 → 18.11
tags: added: canonical-bootstack
tags: added: cpe-onsite
David Ames (thedac)
Changed in charm-neutron-gateway:
milestone: 18.11 → 19.04
Changed in charm-neutron-openvswitch:
milestone: 18.11 → 19.04
David Ames (thedac)
Changed in charm-neutron-gateway:
milestone: 19.04 → 19.07
Changed in charm-neutron-openvswitch:
milestone: 19.04 → 19.07
David Ames (thedac)
Changed in charm-neutron-gateway:
milestone: 19.07 → 19.10
Changed in charm-neutron-openvswitch:
milestone: 19.07 → 19.10
David Ames (thedac)
Changed in charm-neutron-gateway:
milestone: 19.10 → 20.01
Changed in charm-neutron-openvswitch:
milestone: 19.10 → 20.01
James Page (james-page)
Changed in charm-neutron-gateway:
milestone: 20.01 → 20.05
Changed in charm-neutron-openvswitch:
milestone: 20.01 → 20.05
David Ames (thedac)
Changed in charm-neutron-gateway:
milestone: 20.05 → 20.08
Changed in charm-neutron-openvswitch:
milestone: 20.05 → 20.08
James Page (james-page)
Changed in charm-neutron-gateway:
milestone: 20.08 → none
Changed in charm-neutron-openvswitch:
milestone: 20.08 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.