Unable to initialize DPDK: invalid parameters for --socket-mem with multi-NUMA per socket system

Bug #1938557 reported by Nobuto Murata
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Charm Helpers
Fix Released
High
Liam Young
charm-ovn-chassis
Fix Released
High
Liam Young

Bug Description

focal-ussuri

We passed dpdk-socket-memory=4096 thought the charm option:
https://jaas.ai/ovn-chassis/15#charm-config-dpdk-socket-memory

Then, DPDK fails to be initialized as follows. I suppose the charm put 4096 multiple times based on the number of NUMA nodes, but according to the upstream document it may expect the number of sockets instead.
https://doc.dpdk.org/guides/linux_gsg/linux_eal_parameters.html#id3

This system has 4 NUMA nodes per socket by 2 sockets so 8 NUMA nodes in total across 2 sockets.

2021-07-30T10:49:17.726Z|00013|dpdk|INFO|Using DPDK 19.11.7
2021-07-30T10:49:17.726Z|00014|dpdk|INFO|DPDK Enabled - initializing...
2021-07-30T10:49:17.726Z|00015|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2021-07-30T10:49:17.726Z|00016|dpdk|INFO|IOMMU support for vhost-user-client disabled.
2021-07-30T10:49:17.726Z|00017|dpdk|INFO|POSTCOPY support for vhost-user-client disabled.
2021-07-30T10:49:17.726Z|00018|dpdk|INFO|Per port memory for DPDK devices disabled.
2021-07-30T10:49:17.726Z|00019|dpdk|INFO|EAL ARGS: ovs-vswitchd --pci-whitelist 0000:81:00.0 --pci-whitelist 0000:81:00.1 -c 0x300000000000000 --socketmem 4096,4096,4096,4096,4096,4096,4096,4096 --socket-limit 4096,4096,4096,4096,4096,4096,4096,4096.
2021-07-30T10:49:17.729Z|00020|dpdk|INFO|EAL: Detected 128 lcore(s)
2021-07-30T10:49:17.729Z|00021|dpdk|INFO|EAL: Detected 4 NUMA nodes
2021-07-30T10:49:17.729Z|00022|dpdk|ERR|EAL: invalid parameters for --socket-mem
2021-07-30T10:49:17.729Z|00023|dpdk|ERR|EAL: Invalid 'command line' arguments.
2021-07-30T10:49:17.729Z|00024|dpdk|EMER|Unable to initialize DPDK: Invalid argument
2021-07-30T10:49:17.872Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted), core dumped)
2021-07-30T10:49:17.872Z|00003|daemon_unix|EMER|could not detach from foreground session

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7542 32-Core Processor
Stepping: 0
Frequency boost: enabled
CPU MHz: 1496.985
CPU max MHz: 3406.9331
CPU min MHz: 1500.0000
BogoMIPS: 5789.55
Virtualization: AMD-V
L1d cache: 2 MiB
L1i cache: 2 MiB
L2 cache: 32 MiB
L3 cache: 256 MiB
NUMA node0 CPU(s): 0-7,64-71
NUMA node1 CPU(s): 8-15,72-79
NUMA node2 CPU(s): 16-23,80-87
NUMA node3 CPU(s): 24-31,88-95
NUMA node4 CPU(s): 32-39,96-103
NUMA node5 CPU(s): 40-47,104-111
NUMA node6 CPU(s): 48-55,112-119
NUMA node7 CPU(s): 56-63,120-127
...

Nobuto Murata (nobuto)
description: updated
Revision history for this message
Nobuto Murata (nobuto) wrote :

The only workaround we found so far is manually updating other_config as other_config:dpdk-socket-mem=4096,4096 outside of the charm.

Revision history for this message
Nobuto Murata (nobuto) wrote :

Subscribing ~field-high.

Revision history for this message
Nobuto Murata (nobuto) wrote :

dpdk_context.socket_memory() is from charm-helpers so adding the task.

Revision history for this message
Nobuto Murata (nobuto) wrote :

The output of:
$ grep -r . /sys/devices/system/node/node*
is required to update the unit tests.

Liam Young (gnuoy)
Changed in charm-ovn-chassis:
assignee: nobody → Liam Young (gnuoy)
Changed in charm-helpers:
assignee: nobody → Liam Young (gnuoy)
Revision history for this message
Nobuto Murata (nobuto) wrote :

> The output of:
> $ grep -r . /sys/devices/system/node/node*
> is required to update the unit tests.

Attached. But not sure we can extract socket related information as node is already NUMA scoped.

Revision history for this message
Nobuto Murata (nobuto) wrote :

`lscpu -p socket` output.

Revision history for this message
Liam Young (gnuoy) wrote :
Changed in charm-helpers:
importance: Undecided → High
Changed in charm-ovn-chassis:
importance: Undecided → High
Changed in charm-helpers:
status: New → In Progress
Changed in charm-ovn-chassis:
status: New → In Progress
Liam Young (gnuoy)
Changed in charm-helpers:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-chassis (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/x/charm-ovn-chassis/+/803454

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-chassis (master)

Reviewed: https://review.opendev.org/c/x/charm-ovn-chassis/+/803454
Committed: https://opendev.org/x/charm-ovn-chassis/commit/c0de8c036071986fc24e41ff48a8ea49291ff5ca
Submitter: "Zuul (22348)"
Branch: master

commit c0de8c036071986fc24e41ff48a8ea49291ff5ca
Author: Liam Young <email address hidden>
Date: Wed Aug 4 09:46:48 2021 +0000

    Rebuild to pickup charmhelpers change

    Change-Id: I3fa4b93b506a5de46138827a010b870787abe6ae
    Closes-Bug: 1938557

Changed in charm-ovn-chassis:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-chassis (stable/21.04)

Fix proposed to branch: stable/21.04
Review: https://review.opendev.org/c/x/charm-ovn-chassis/+/803950

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-chassis (stable/21.04)

Reviewed: https://review.opendev.org/c/x/charm-ovn-chassis/+/803950
Committed: https://opendev.org/x/charm-ovn-chassis/commit/e81c14122be39601e48fb7ea4bdf158d0e1e2b04
Submitter: "Zuul (22348)"
Branch: stable/21.04

commit e81c14122be39601e48fb7ea4bdf158d0e1e2b04
Author: Liam Young <email address hidden>
Date: Wed Aug 4 09:46:48 2021 +0000

    Rebuild to pickup charmhelpers change

    Change-Id: I3fa4b93b506a5de46138827a010b870787abe6ae
    Closes-Bug: 1938557
    (cherry picked from commit c0de8c036071986fc24e41ff48a8ea49291ff5ca)

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

This fix was released to the stable charms with the above review, so I'm marking this bug fix-released.

Changed in charm-helpers:
status: Fix Committed → Fix Released
Changed in charm-ovn-chassis:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.