Please increase the network neighbour L2 tables

Bug #1427893 reported by Bjoern
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Wishlist
Kevin Carter
Juno
Fix Released
Wishlist
Kevin Carter
Kilo
Fix Released
Wishlist
Kevin Carter
Trunk
Fix Released
Wishlist
Kevin Carter

Bug Description

On neutron agent containers we should increase the network neighbour table to he following values:

sysctl.conf :
net.ipv4.neigh.default.gc_thresh1 = 1024
net.ipv4.neigh.default.gc_thresh2 = 4096
net.ipv4.neigh.default.gc_thresh3 = 8192

Background see https://www.mirantis.com/blog/improving-dhcp-performance-openstack/ which I found useful

tags: added: juno-backport-potential
summary: - Please increate the network neighbour L2 tables
+ Please increase the network neighbour L2 tables
Revision history for this message
Kevin Carter (kevin-carter) wrote :

Current defaults on Ubuntu 14.04.01

``` bash
root@578134-compute07:~# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"

root@578134-compute07:~# uname -a
Linux 578134-compute07 3.16.0-34-generic #47~14.04.1-Ubuntu SMP Fri Apr 10 17:49:16 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

root@578134-compute07:~# sysctl -a | grep gc_thre
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024
net.ipv4.route.gc_thresh = -1
net.ipv4.xfrm4_gc_thresh = 32768
net.ipv6.neigh.default.gc_thresh1 = 128
net.ipv6.neigh.default.gc_thresh2 = 512
net.ipv6.neigh.default.gc_thresh3 = 1024
net.ipv6.route.gc_thresh = 1024
net.ipv6.xfrm6_gc_thresh = 32768
```

Revision history for this message
Evan Callicoat (diopter) wrote :

I'd like more information on the reasoning behind why you believe you need to support 1024 minimum MAC addresses in any particular network namespaces' neighbor table, let alone a hard limit of 8192 MACs. Each Neutron network in a neutron-agents container will live inside of network namespaces, and each namespace has its own MAC garbage collection tunables. The only namespaces where multiple Neutron networks could be seen is inside of a router namespace with multiple networks plugged into that router. Even so, a 'hard' limit of 8192 MACs is a very large collection of networks hooked into any given router -- 8 /22s would be that size, for instance.

More to the point though, I'm not sure that I agree this should be upped substantially in the base OSAD project, and instead feel that the deployer should make use of the sysctl framework to apply per-container adjustments as they see fit if their network density requires them.

Changed in openstack-ansible:
importance: Low → Wishlist
Revision history for this message
James Denton (james-denton) wrote :

Due to Neutron with l2pop implementing arp proxy when creating vxlan interfaces, there is a potential for the ip neighbor table on all nodes (especially infra nodes) to grow beyond the default thresholds. This will manifest itself as 'no buffer space available' when Neutron attempts to issue 'ip neigh add' commands. The threshold had to be increased to avoid issues.

Relevant Neutron bug: https://bugs.launchpad.net/neutron/+bug/1450696

Revision history for this message
Kevin Carter (kevin-carter) wrote :

I've re-opened this issue as it is something that we can make configurable and being that this has been an issue in production I'm thinking that we need to expose some of the options to make tuning possible.

no longer affects: openstack-ansible/trunk
Changed in openstack-ansible:
status: New → Won't Fix
status: Won't Fix → Confirmed
assignee: Evan Callicoat (apsu-2) → nobody
importance: Low → Medium
importance: Medium → Low
Revision history for this message
Serge van Ginderachter (svg) wrote :

After several weeks of troubleshooting unstability in OSAD in general and more specific with networking, it seems we bumped into this issue as well.

As of kilo, AFAICS, this can easily be configured by adding those sysctl settings when redefining the default list defined in:

playbooks/roles/openstack_hosts/defaults/main.yml:openstack_kernel_options:

Changed in openstack-ansible:
milestone: none → 11.0.5
milestone: 11.0.5 → none
importance: Low → Medium
Changed in openstack-ansible:
importance: Medium → Wishlist
Changed in openstack-ansible:
assignee: nobody → Kevin Carter (kevin-carter)
Changed in openstack-ansible:
status: Confirmed → In Progress
Revision history for this message
Darren Birkett (darren-birkett) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (kilo)

Fix proposed to branch: kilo
Review: https://review.openstack.org/200179

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/196699
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=b95827f35167ed845e66b4f7084ac95a2ff0d8aa
Submitter: Jenkins
Branch: master

commit b95827f35167ed845e66b4f7084ac95a2ff0d8aa
Author: kevin <email address hidden>
Date: Mon Jun 29 09:53:56 2015 -0500

    Added openstack_kernel options for gc_thresh

    This commit adds to the openstack_kernel_options to set the
    "net.ipv4/6.neigh.default.gc_thresh*" values according to how much ram
    is available on the box.

    How this is being defined:
    The change brings with it a filter to find the closest power of 2 from the
    amount of ram discovered on the target host. If facts are disabled when the
    role is called a default value of 1024 will be used. The `set_gc_val`, when
    computed, has a max value of 8192. For both ipv4/6 thresh1 is half the
    `set_gc_val` thresh2 is computed value, and finally both
    thresh3/router.gc_thresh are double the `set_gc_val`.

    The changes here should provide for a more scalable neutron networking
    environment by default while also ensuring that the values computed are sane.
    Additionallyi, should the user want to define their own values they can do so by
    simply overriding the `set_gc_val`.

    Change-Id: Ic5fd7ebdac009fa1472aeb0b0666f9b2611a31d7
    Closes-Bug: #1427893

Changed in openstack-ansible:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (kilo)

Reviewed: https://review.openstack.org/200179
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=d7e93724e34802601ff868943ef9d602132d9b6c
Submitter: Jenkins
Branch: kilo

commit d7e93724e34802601ff868943ef9d602132d9b6c
Author: kevin <email address hidden>
Date: Mon Jun 29 09:53:56 2015 -0500

    Added openstack_kernel options for gc_thresh

    This commit adds to the openstack_kernel_options to set the
    "net.ipv4/6.neigh.default.gc_thresh*" values according to how much ram
    is available on the box.

    How this is being defined:
    The change brings with it a filter to find the closest power of 2 from the
    amount of ram discovered on the target host. If facts are disabled when the
    role is called a default value of 1024 will be used. The `set_gc_val`, when
    computed, has a max value of 8192. For both ipv4/6 thresh1 is half the
    `set_gc_val` thresh2 is computed value, and finally both
    thresh3/router.gc_thresh are double the `set_gc_val`.

    The changes here should provide for a more scalable neutron networking
    environment by default while also ensuring that the values computed are sane.
    Additionallyi, should the user want to define their own values they can do so by
    simply overriding the `set_gc_val`.

    Change-Id: Ic5fd7ebdac009fa1472aeb0b0666f9b2611a31d7
    Closes-Bug: #1427893
    (cherry picked from commit b95827f35167ed845e66b4f7084ac95a2ff0d8aa)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (juno)

Reviewed: https://review.openstack.org/203871
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=e15512636018d946acb13c7605a315201a80d119
Submitter: Jenkins
Branch: juno

commit e15512636018d946acb13c7605a315201a80d119
Author: kevin <email address hidden>
Date: Mon Jul 20 18:48:59 2015 -0500

    Added openstack_kernel options for gc_thresh

    This commit adds to the openstack_kernel_options to set the
    "net.ipv4/6.neigh.default.gc_thresh*" values according to the set_gc_val
    which has a default value of 8192

    The changes here should provide for a more scalable neutron networking
    environment by default while also ensuring that the values computed are sane.
    Additionally, should the user want to define their own `set_gc_val` they can do
    so by simply overriding the `set_gc_val` variable.

    Change-Id: Ic5fd7ebdac009fa1472aeb0b0666f9b2611a31d7
    Closes-Bug: #1427893

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.11

This issue was fixed in the openstack/openstack-ansible 11.2.11 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible 11.2.12

This issue was fixed in the openstack/openstack-ansible 11.2.12 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.14

This issue was fixed in the openstack/openstack-ansible 11.2.14 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.