Neighbour table overflow problem with Neutron

Bug #1844349 reported by Pierre Riteau
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Medium
Pierre Riteau
Train
Fix Released
Medium
Radosław Piliszek
Ussuri
Fix Released
Medium
Pierre Riteau

Bug Description

While re-deploying an existing OpenStack Queens deployment with Kolla Ansible on a cloud with a large number of networks and instances, we discovered that some default sysctl values were too low on network nodes, with connectivity from network nodes failing shortly after deploying neutron_l3_agent. Note that the network nodes were configured in dvr_snat mode; this may or may not be part of the problem.

We saw messages in the kernel log like:

    net_ratelimit: 26 callbacks suppressed

And experienced ping returning:

    ping: sendmsg: Invalid argument

This problem was discussed in detail by Charms and some other projects:

* https://bugs.launchpad.net/charm-nova-compute/+bug/1780348
* https://bugs.launchpad.net/fuel/+bug/1488938
* https://bugs.launchpad.net/tripleo/+bug/1690087
* https://opendev.org/openstack/tripleo-heat-templates/commit/1651a1805a16212299fe0a91aebb2a91ed39bc6e
* https://github.com/crowbar/crowbar-openstack/commit/3dd21ea62ac152e40bfdfee4b8e25a528c82a79f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/682664

Changed in kolla-ansible:
assignee: nobody → Pierre Riteau (priteau)
status: New → In Progress
Mark Goddard (mgoddard)
Changed in kolla-ansible:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/682664
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=4234cc4b5b3ca7172c9d184ed3f145b955f11163
Submitter: Zuul
Branch: master

commit 4234cc4b5b3ca7172c9d184ed3f145b955f11163
Author: Pierre Riteau <email address hidden>
Date: Wed Nov 27 16:32:47 2019 +0100

    [neutron] Adjust neighbour table thresholds

    When clouds have a large number of hosts, the default size of the ARP
    cache is too small. The cache can overflow, which means that the system
    has no way to reach some IP addresses.

    Increasing threshold limits addresses the situation, in a reasonably
    safe way (the maximum impact is 5MB or so of additional RAM used).

    More context on this issue:

    * http://man7.org/linux/man-pages/man7/arp.7.html
    * https://bugs.launchpad.net/charm-nova-compute/+bug/1780348
    * https://bugs.launchpad.net/fuel/+bug/1488938
    * https://bugs.launchpad.net/tripleo/+bug/1690087
    * https://github.com/crowbar/crowbar-openstack/commit/0583a0c94996df6b784229e8a534f955eaca85bc
    * https://github.com/crowbar/crowbar-openstack/commit/3dd21ea62ac152e40bfdfee4b8e25a528c82a79f
    * https://opendev.org/openstack/tripleo-heat-templates/commit/1651a1805a16212299fe0a91aebb2a91ed39bc6e

    Change-Id: I60c871e8eb9f2c086818ff077987f2390930800c
    Closes-Bug: #1844349

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/699189

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/699190

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/699191

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/train)

Reviewed: https://review.opendev.org/699189
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=eeff8e011a70bf1cef15c7db281d3d2c5f9ad6f4
Submitter: Zuul
Branch: stable/train

commit eeff8e011a70bf1cef15c7db281d3d2c5f9ad6f4
Author: Pierre Riteau <email address hidden>
Date: Wed Nov 27 16:32:47 2019 +0100

    [neutron] Adjust neighbour table thresholds

    When clouds have a large number of hosts, the default size of the ARP
    cache is too small. The cache can overflow, which means that the system
    has no way to reach some IP addresses.

    Increasing threshold limits addresses the situation, in a reasonably
    safe way (the maximum impact is 5MB or so of additional RAM used).

    More context on this issue:

    * http://man7.org/linux/man-pages/man7/arp.7.html
    * https://bugs.launchpad.net/charm-nova-compute/+bug/1780348
    * https://bugs.launchpad.net/fuel/+bug/1488938
    * https://bugs.launchpad.net/tripleo/+bug/1690087
    * https://github.com/crowbar/crowbar-openstack/commit/0583a0c94996df6b784229e8a534f955eaca85bc
    * https://github.com/crowbar/crowbar-openstack/commit/3dd21ea62ac152e40bfdfee4b8e25a528c82a79f
    * https://opendev.org/openstack/tripleo-heat-templates/commit/1651a1805a16212299fe0a91aebb2a91ed39bc6e

    Change-Id: I60c871e8eb9f2c086818ff077987f2390930800c
    Closes-Bug: #1844349
    (cherry picked from commit 4234cc4b5b3ca7172c9d184ed3f145b955f11163)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/rocky)

Reviewed: https://review.opendev.org/699191
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=44b9cc3b5b49d6bdbd3c59d92e8303203cab038b
Submitter: Zuul
Branch: stable/rocky

commit 44b9cc3b5b49d6bdbd3c59d92e8303203cab038b
Author: Pierre Riteau <email address hidden>
Date: Wed Nov 27 16:32:47 2019 +0100

    [neutron] Adjust neighbour table thresholds

    When clouds have a large number of hosts, the default size of the ARP
    cache is too small. The cache can overflow, which means that the system
    has no way to reach some IP addresses.

    Increasing threshold limits addresses the situation, in a reasonably
    safe way (the maximum impact is 5MB or so of additional RAM used).

    More context on this issue:

    * http://man7.org/linux/man-pages/man7/arp.7.html
    * https://bugs.launchpad.net/charm-nova-compute/+bug/1780348
    * https://bugs.launchpad.net/fuel/+bug/1488938
    * https://bugs.launchpad.net/tripleo/+bug/1690087
    * https://github.com/crowbar/crowbar-openstack/commit/0583a0c94996df6b784229e8a534f955eaca85bc
    * https://github.com/crowbar/crowbar-openstack/commit/3dd21ea62ac152e40bfdfee4b8e25a528c82a79f
    * https://opendev.org/openstack/tripleo-heat-templates/commit/1651a1805a16212299fe0a91aebb2a91ed39bc6e

    Change-Id: I60c871e8eb9f2c086818ff077987f2390930800c
    Closes-Bug: #1844349
    (cherry picked from commit 4234cc4b5b3ca7172c9d184ed3f145b955f11163)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/stein)

Reviewed: https://review.opendev.org/699190
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=3563497a4fb44d37eaf07f4953c7b58bb14778d5
Submitter: Zuul
Branch: stable/stein

commit 3563497a4fb44d37eaf07f4953c7b58bb14778d5
Author: Pierre Riteau <email address hidden>
Date: Wed Nov 27 16:32:47 2019 +0100

    [neutron] Adjust neighbour table thresholds

    When clouds have a large number of hosts, the default size of the ARP
    cache is too small. The cache can overflow, which means that the system
    has no way to reach some IP addresses.

    Increasing threshold limits addresses the situation, in a reasonably
    safe way (the maximum impact is 5MB or so of additional RAM used).

    More context on this issue:

    * http://man7.org/linux/man-pages/man7/arp.7.html
    * https://bugs.launchpad.net/charm-nova-compute/+bug/1780348
    * https://bugs.launchpad.net/fuel/+bug/1488938
    * https://bugs.launchpad.net/tripleo/+bug/1690087
    * https://github.com/crowbar/crowbar-openstack/commit/0583a0c94996df6b784229e8a534f955eaca85bc
    * https://github.com/crowbar/crowbar-openstack/commit/3dd21ea62ac152e40bfdfee4b8e25a528c82a79f
    * https://opendev.org/openstack/tripleo-heat-templates/commit/1651a1805a16212299fe0a91aebb2a91ed39bc6e

    Change-Id: I60c871e8eb9f2c086818ff077987f2390930800c
    Closes-Bug: #1844349
    (cherry picked from commit 4234cc4b5b3ca7172c9d184ed3f145b955f11163)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 7.2.0

This issue was fixed in the openstack/kolla-ansible 7.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 8.1.0

This issue was fixed in the openstack/kolla-ansible 8.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 9.0.1

This issue was fixed in the openstack/kolla-ansible 9.0.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.