Juju Charms Collection
nova-compute package

When hugepages is set vm.max_map_count is not automatically adjusted

Bug #1507921 reported by Liam Young on 2015-10-20

This bug affects 3 people

	Status	Importance	Assigned to	Milestone
falkor	Fix Released	High	Chris Glass	falkor 0.13
dpdk (Ubuntu)	Fix Released	Medium	Unassigned
nova-compute (Juju Charms Collection)	Fix Released	High	Liam Young	Juju Charms Collection 15.10
openvswitch-dpdk (Ubuntu)	Won't Fix	Undecided	Unassigned

Bug Description

When hugepages is set the kernel parameter vm.max_map_count should be a minimum of 2 * vm.nr_hugepages but it is currently not dynamically increased.

This minimum seems to come form https://www.kernel.org/doc/Documentation/sysctl/vm.txt

"While most applications need less than a thousand maps, certain
programs, particularly malloc debuggers, may consume lots of them,
e.g., up to one or two maps per allocation."

See original description

Related branches

lp:~gnuoy/charm-helpers/max_map_count

Merged into lp:charm-helpers at revision 466

James Page: Approve on 2015-10-20

lp:~gnuoy/charms/trusty/nova-compute/chsync-stable

Merged into lp:~openstack-charmers-archive/charms/trusty/nova-compute/trunk at revision 141

Marco Ceppi (community): Approve on 2015-10-23

David Ames (community): Approve on 2015-10-22

lp:~gnuoy/charms/trusty/nova-compute/chsync-next

Merged into lp:~openstack-charmers-archive/charms/trusty/nova-compute/next at revision 177

David Ames (community): Approve on 2015-10-22

Liam Young (gnuoy) on 2015-10-20

Changed in nova-compute (Juju Charms Collection):
status:	New → In Progress
importance:	Undecided → High
assignee:	nobody → Liam Young (gnuoy)

Liam Young (gnuoy) on 2015-10-20

description:

updated

Liam Young (gnuoy) on 2015-10-26

Changed in nova-compute (Juju Charms Collection):
status:	In Progress → Fix Released
milestone:	none → 15.10

David Britton (dpb) on 2015-10-26

Changed in falkor:
milestone:	none → 0.13
assignee:	nobody → Chris Glass (tribaal)
importance:	Undecided → High
status:	New → Confirmed

David Britton (dpb) on 2015-10-28

Changed in falkor:
status:	Confirmed → Fix Committed

David Britton (dpb) on 2015-11-06

Changed in falkor:
status:	Fix Committed → Fix Released

Revision history for this message

Atsuko Ito (yottatsa) wrote on 2016-03-15:

For openvswitch-dpdk, vm.max_map_count should be adjusted at least for 2*nr_hugepages + some padding for other apps, e.g.:

max_map_count="$(awk -v padding=65530 '{total+=$1}END{print total*2+padding}' /sys/devices/system/node/node*/hugepages/hugepages-*/nr_hugepages)"
sysctl -q vm.max_map_count=${max_map_count:-65530}

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-03-15:

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in dpdk (Ubuntu):
status:	New → Confirmed
Changed in openvswitch-dpdk (Ubuntu):
status:	New → Confirmed

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2016-04-04:

After a discussion with gnuoy I picket it up for the DPDK init scripts that can be used to set hugepages properly for DPDK.

I still consider the reasoning rather unclear why exactly 2*#hp+padding are "correct".
According to our discussion it seems to be only derived from "e.g., up to one or two maps per allocation."

If anybody has more, like an example that breaks and so on and could share it that would be great.
Without that it is hard to correctly quantify if "2*#hp+padding" would be correct for 1G hugepages as well.

Changed in dpdk (Ubuntu):
status:	Confirmed → Triaged
importance:	Undecided → Low

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2016-04-04:

The comment is back from 2004-04-1 http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=56d93842e4840f371cb9acc8e5a628496b615a96

I doubt that anybody thought about 1G hugepages back then.
Reading the referred doc over and over again I also realized they are referring to 2*alloc not 2*#hugepages.

Only other references I found were:
- some forums and howtos that set it to very high number for high memory sytems (high memory depending on the time of the post e.g. 64G in one example which today is normal for servers)
- hugepage.py charmhelper which got it from this bug
- DPDK issue with a lot of huge pages http://dpdk.org/ml/archives/dev/2014-September/005397.html

The latter being the only source close to what we discuss here.

Around rte_eal_hugepage_init/map_all_hugepages in the dpdk source one finds the chance of 2*mapping of all hugepages.
In fact those can be limited via -m / socket-mem or whatever EAL parm you prefer.
But lets assume up to #hugepages.
And there it does a mapping of hpi->hugepage_sz.
So it does up to 2* mappings for each hugepage, no matter what the size is.
And the padding is to add the normal system limit on top as application and dpdk do more than just handling the huge pages.

Ok, that summarized I think it makes sense to me now.
I hope that helped the next one getting by to understand it as well.

Changed in dpdk (Ubuntu):
importance:	Low → Medium

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2016-04-11:

I did some tests to be sure:
/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages : 0
/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages : 5
/sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages : 0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages : 2
/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages : 0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages : 3

That shows that /sys/kernel/mm/hugepages/* always holds the global aggregated view.
This avoids some hazzle in !numa systems where /sys/devices/system/node doesn't even exists e.g. i386

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-04-13:

This bug was fixed in the package dpdk - 2.2.0-0ubuntu7

---------------
dpdk (2.2.0-0ubuntu7) xenial; urgency=medium

  * Increase max_map_count after setting huge pages (LP: #1507921):
    - The default config of 65530 would cause issues as soon as about 64GB or
      more are used as 2M huge pages for dpdk.
    - Increase this value to base+2*#hugepages to avoid issues on huge systems.
  * d/p/ubuntu-backport-[28-32,34-35] backports for stability (LP: #1568838):
     - these will be in the 16.04 dpdk release, delta can then be dropped.
     - 5 fixes that do not change api/behaviour but fix serious issues.
        - 01 f82f705b lpm: fix allocation of an existing object
        - 02 f9bd3342 hash: fix multi-process support
        - 03 1aadacb5 hash: fix allocation of an existing object
        - 04 5d7bfb73 hash: fix race condition at creation
        - 05 fe671356 vfio: fix resource leak
        - 06 356445f9 port: fix ring writer buffer overflow
        - 07 52f7a5ae port: fix burst size mask type
  * d/p/ubuntu-backport-33-vhost-user-add-error-handling-for-fd-1023.patch
     - this will likely be in dpdk release 16.07 and delta can then be dropped.
     - fixes a crash on using fd's >1023 (LP: #1566874)
  * d/p/ubuntu-fix-lpm-use-after-free-and-leak.patch fix lpm_free (LP: #1569375)
     - the old patches had an error freeing a pointer which had no meta data
     - that lead to a crash on any lpm_free call
     - folded into the fix that generally covers the lpm allocation and free
       weaknesses already (also there this particular mistake was added)

-- Christian Ehrhardt <email address hidden> Tue, 12 Apr 2016 16:13:47 +0200

Changed in dpdk (Ubuntu):
status:	Triaged → Fix Released

Christian Ehrhardt  (paelzer) on 2017-11-29

Changed in openvswitch-dpdk (Ubuntu):
status:	Confirmed → Invalid
status:	Invalid → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Support for DPDK for improved networking performance

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Juju Charms Collectionnova-compute package

When hugepages is set vm.max_map_count is not automatically adjusted

Bug Description

Related branches

Other bug subscribers

Related blueprints

Remote bug watches

Juju Charms Collection
nova-compute package