With LinuxBridge/VXLAN ARP proxy, ip neigh replace fails due to ARP entry limits

Bug #1450696 reported by James Denton
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Medium
yalei wang

Bug Description

In an environment with over 600 instances, we observed failures by the LinuxBridge agent w/ l2pop on the network nodes to add neighbor (arp) entries when booting instances. The lack of an ARP entry resulted in the qrouter namespaces being unable to communicate with the instances, as their ARP request was not proxied and was dropped. The 'ip neigh replace' command could be seen failing within the log with a 'RTNETLINK answers: No buffer space available' message. To resolve this, we increased the gc_thresh sysctl parameters from their defaults.

To demonstrate, we booted four instances:

infra03_neutron_agents_container-68756ad0:~# nova list
+--------------------------------------+------------------+--------++-------------+------------------------------------+
| ID | Name | Status || Power State | Networks |
+--------------------------------------+------------------+--------++-------------+------------------------------------+
| 0b5678f8-fbaf-475c-908b-fab2300b76e7 | 20150430-JD-RAX1 | ACTIVE || Running | management-network=10.87.80.39 |
| be2ecc51-cf2b-469d-b768-d262ad2debe9 | 20150430-JD-RAX2 | ACTIVE || Running | management-network=10.87.80.40 |
| a41b432c-1704-47c4-aa37-22ecde422a73 | 20150430-JD-RAX3 | ACTIVE || Running | management-network=10.87.80.41 |
| b2c4a80c-06ac-42e6-9ed3-06875a0f1c98 | 20150430-JD-RAX4 | ACTIVE || Running | management-network=10.87.80.42 |

Three of the four 'ip neigh replace' commands failed on one of the infra nodes running an l3 agent. Coincidentally, the one hosting the router for the respective tenant network:

2015-04-30 19:06:01.835 748 ERROR neutron.agent.linux.utils [req-85a689c4-5056-4b00-a181-06f0a4a51a90 None]
Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'neigh', 'replace', '10.87.80.40', 'lladdr', 'fa:16:3e:42:8d:28', 'dev', 'vxlan-17', 'nud', 'permanent']
Exit code: 2
Stdout: ''
Stderr: 'RTNETLINK answers: No buffer space available\n'
2015-04-30 19:06:08.825 748 INFO neutron.agent.securitygroups_rpc [req-f4034bb3-f15c-4911-b676-bfce60123979 None] Security group member updated [u'dd6ae41a-165b-4f3c-8ffd-ef6e66e64f1e']
2015-04-30 19:06:21.800 748 ERROR neutron.agent.linux.utils [req-4f8da54b-fbe5-469d-ab4a-1ff1eeb20a9c None]
Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'neigh', 'replace', '10.87.80.41', 'lladdr', 'fa:16:3e:31:6b:d6', 'dev', 'vxlan-17', 'nud', 'permanent']
Exit code: 2
Stdout: ''
Stderr: 'RTNETLINK answers: No buffer space available\n'
2015-04-30 19:06:34.585 748 INFO neutron.agent.securitygroups_rpc [req-645ef3f8-e481-4e58-a95b-0a5f9562d4af None] Security group member updated [u'dd6ae41a-165b-4f3c-8ffd-ef6e66e64f1e']
2015-04-30 19:06:44.641 748 ERROR neutron.agent.linux.utils [req-aa4b819e-3351-483b-b45c-db330c5b039f None]
Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'neigh', 'replace', '10.87.80.42', 'lladdr', 'fa:16:3e:11:51:60', 'dev', 'vxlan-17', 'nud', 'permanent']
Exit code: 2
Stdout: ''
Stderr: 'RTNETLINK answers: No buffer space available\n'

The failure was verified by the lack of a permanent ARP entry on the infra node for the three instances above:

root@infra01_neutron_agents_container-4c850328:~# arp -an | grep 10.87.80.39
? (10.87.80.39) at fa:16:3e:3e:4d:30 [ether] PERM on vxlan-17
root@infra01_neutron_agents_container-4c850328:~# arp -an | grep 10.87.80.40
root@infra01_neutron_agents_container-4c850328:~# arp -an | grep 10.87.80.41
root@infra01_neutron_agents_container-4c850328:~# arp -an | grep 10.87.80.42

We increased the gc_thresh sysctl parameters from their defaults:

FROM:
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

TO:
sysctl -w net.ipv4.neigh.default.gc_thresh1=1024
sysctl -w net.ipv4.neigh.default.gc_thresh2=4096
sysctl -w net.ipv4.neigh.default.gc_thresh3=8192

They may not be ideal values, but nonetheless, increasing those values allowed subsequent instances to be booted without issue.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Thanks for the report. What do you think would be good behavior here? Have the agent auto adjust or should we just have better logging?

Revision history for this message
James Denton (james-denton) wrote :

Hi Kevin. I don't really know the ideal behavior here, as I'm not sure what the side effects of increasing the thresholds are. We have the benefit of increasing the ARP entry ceiling, but what sort of impact is there on memory and lookup times as you scale out? The nodes have to have an ARP entry for basically every port on a VXLAN network. Is there a point where this could get out of hand? If so, where is that limit?

If increasing the thresholds don't have much an an impact, then it would be nice if the agent were to tune them automatically to ensure no loss of connectivity and log it accordingly. Otherwise, maybe it's better to document it the deployment guide and update the agent to log the error and point the operator to a possible solution.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Can we adjust the size of the cache on error? Say with exp algorithm, increasing existing limit two times.

Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Bjoern (bjoern-t) wrote :
Revision history for this message
yalei wang (yalei-wang) wrote :

could we use 'ebtables' to send back the arp-relpy?

yalei wang (yalei-wang)
Changed in neutron:
assignee: nobody → yalei wang (yalei-wang)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.