When hugepages is set vm.max_map_count is not automatically adjusted
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| falkor |
Fix Released
|
High
|
Chris Glass | |
| dpdk (Ubuntu) |
Medium
|
Unassigned | ||
| nova-compute (Juju Charms Collection) |
High
|
Liam Young | ||
| openvswitch-dpdk (Ubuntu) |
Undecided
|
Unassigned |
Bug Description
When hugepages is set the kernel parameter vm.max_map_count should be a minimum of 2 * vm.nr_hugepages but it is currently not dynamically increased.
This minimum seems to come form https:/
"While most applications need less than a thousand maps, certain
programs, particularly malloc debuggers, may consume lots of them,
e.g., up to one or two maps per allocation."
Related branches
- James Page: Approve on 2015-10-20
-
Diff: 33 lines (+15/-0)2 files modifiedcharmhelpers/core/hugepage.py (+2/-0)
tests/core/test_hugepage.py (+13/-0)
- Marco Ceppi (community): Approve on 2015-10-23
- David Ames: Approve on 2015-10-22
-
Diff: 546 lines (+245/-26)8 files modifiedhooks/charmhelpers/contrib/openstack/amulet/deployment.py (+100/-1)
hooks/charmhelpers/contrib/openstack/amulet/utils.py (+25/-3)
hooks/charmhelpers/contrib/openstack/context.py (+10/-9)
hooks/charmhelpers/contrib/openstack/utils.py (+4/-1)
hooks/charmhelpers/core/host.py (+12/-1)
hooks/charmhelpers/core/hugepage.py (+2/-0)
tests/charmhelpers/contrib/openstack/amulet/deployment.py (+67/-8)
tests/charmhelpers/contrib/openstack/amulet/utils.py (+25/-3)
- David Ames: Approve on 2015-10-22
-
Diff: 546 lines (+245/-26)8 files modifiedhooks/charmhelpers/contrib/openstack/amulet/deployment.py (+100/-1)
hooks/charmhelpers/contrib/openstack/amulet/utils.py (+25/-3)
hooks/charmhelpers/contrib/openstack/context.py (+10/-9)
hooks/charmhelpers/contrib/openstack/utils.py (+4/-1)
hooks/charmhelpers/core/host.py (+12/-1)
hooks/charmhelpers/core/hugepage.py (+2/-0)
tests/charmhelpers/contrib/openstack/amulet/deployment.py (+67/-8)
tests/charmhelpers/contrib/openstack/amulet/utils.py (+25/-3)
Changed in nova-compute (Juju Charms Collection): | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → Liam Young (gnuoy) |
description: | updated |
Changed in nova-compute (Juju Charms Collection): | |
status: | In Progress → Fix Released |
milestone: | none → 15.10 |
Changed in falkor: | |
milestone: | none → 0.13 |
assignee: | nobody → Chris Glass (tribaal) |
importance: | Undecided → High |
status: | New → Confirmed |
Changed in falkor: | |
status: | Confirmed → Fix Committed |
Changed in falkor: | |
status: | Fix Committed → Fix Released |
Atsuko Ito (yottatsa) wrote : | #1 |
Launchpad Janitor (janitor) wrote : | #2 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in dpdk (Ubuntu): | |
status: | New → Confirmed |
Changed in openvswitch-dpdk (Ubuntu): | |
status: | New → Confirmed |
Christian Ehrhardt (paelzer) wrote : | #4 |
After a discussion with gnuoy I picket it up for the DPDK init scripts that can be used to set hugepages properly for DPDK.
I still consider the reasoning rather unclear why exactly 2*#hp+padding are "correct".
According to our discussion it seems to be only derived from "e.g., up to one or two maps per allocation."
If anybody has more, like an example that breaks and so on and could share it that would be great.
Without that it is hard to correctly quantify if "2*#hp+padding" would be correct for 1G hugepages as well.
Changed in dpdk (Ubuntu): | |
status: | Confirmed → Triaged |
importance: | Undecided → Low |
Christian Ehrhardt (paelzer) wrote : | #5 |
The comment is back from 2004-04-1 http://
I doubt that anybody thought about 1G hugepages back then.
Reading the referred doc over and over again I also realized they are referring to 2*alloc not 2*#hugepages.
Only other references I found were:
- some forums and howtos that set it to very high number for high memory sytems (high memory depending on the time of the post e.g. 64G in one example which today is normal for servers)
- hugepage.py charmhelper which got it from this bug
- DPDK issue with a lot of huge pages http://
The latter being the only source close to what we discuss here.
Around rte_eal_
In fact those can be limited via -m / socket-mem or whatever EAL parm you prefer.
But lets assume up to #hugepages.
And there it does a mapping of hpi->hugepage_sz.
So it does up to 2* mappings for each hugepage, no matter what the size is.
And the padding is to add the normal system limit on top as application and dpdk do more than just handling the huge pages.
Ok, that summarized I think it makes sense to me now.
I hope that helped the next one getting by to understand it as well.
Changed in dpdk (Ubuntu): | |
importance: | Low → Medium |
Christian Ehrhardt (paelzer) wrote : | #6 |
I did some tests to be sure:
/sys/kernel/
/sys/kernel/
/sys/devices/
/sys/devices/
/sys/devices/
/sys/devices/
That shows that /sys/kernel/
This avoids some hazzle in !numa systems where /sys/devices/
Launchpad Janitor (janitor) wrote : | #7 |
This bug was fixed in the package dpdk - 2.2.0-0ubuntu7
---------------
dpdk (2.2.0-0ubuntu7) xenial; urgency=medium
* Increase max_map_count after setting huge pages (LP: #1507921):
- The default config of 65530 would cause issues as soon as about 64GB or
more are used as 2M huge pages for dpdk.
- Increase this value to base+2*#hugepages to avoid issues on huge systems.
* d/p/ubuntu-
- these will be in the 16.04 dpdk release, delta can then be dropped.
- 5 fixes that do not change api/behaviour but fix serious issues.
- 01 f82f705b lpm: fix allocation of an existing object
- 02 f9bd3342 hash: fix multi-process support
- 03 1aadacb5 hash: fix allocation of an existing object
- 04 5d7bfb73 hash: fix race condition at creation
- 05 fe671356 vfio: fix resource leak
- 06 356445f9 port: fix ring writer buffer overflow
- 07 52f7a5ae port: fix burst size mask type
* d/p/ubuntu-
- this will likely be in dpdk release 16.07 and delta can then be dropped.
- fixes a crash on using fd's >1023 (LP: #1566874)
* d/p/ubuntu-
- the old patches had an error freeing a pointer which had no meta data
- that lead to a crash on any lpm_free call
- folded into the fix that generally covers the lpm allocation and free
weaknesses already (also there this particular mistake was added)
-- Christian Ehrhardt <email address hidden> Tue, 12 Apr 2016 16:13:47 +0200
Changed in dpdk (Ubuntu): | |
status: | Triaged → Fix Released |
Changed in openvswitch-dpdk (Ubuntu): | |
status: | Confirmed → Invalid |
status: | Invalid → Won't Fix |
For openvswitch-dpdk, vm.max_map_count should be adjusted at least for 2*nr_hugepages + some padding for other apps, e.g.:
max_ map_count= "$(awk -v padding=65530 '{total+ =$1}END{ print total*2+padding}' /sys/devices/ system/ node/node* /hugepages/ hugepages- */nr_hugepages) " map_count= ${max_map_ count:- 65530}
sysctl -q vm.max_