Configured Memory access latency and bandwidth not taking effect

Bug #1888923 reported by Vishnu Dixit
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
QEMU
Invalid
Undecided
Unassigned

Bug Description

I was trying to configure latencies and bandwidths between nodes in a NUMA emulation using QEMU 5.0.0.

Host : Ubuntu 20.04 64 bit
Guest : Ubuntu 18.04 64 bit

The machine configured has 2 nodes. Each node has 2 CPUs and has been allocated 3GB of memory. The memory access latencies and bandwidths for a local access (i.e from initiator 0 to target 0, and from initiator 1 to target 1) are set as 40ns and 10GB/s respectively. The memory access latencies and bandwidths for a remote access (i.e from initiator 1 to target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s respectively.

The command line launch is as follows.

sudo x86_64-softmmu/qemu-system-x86_64 \
-machine hmat=on \
-boot c \
-enable-kvm \
-m 6G,slots=2,maxmem=7G \
-object memory-backend-ram,size=3G,id=m0 \
-object memory-backend-ram,size=3G,id=m1 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1 \
-smp 4,sockets=4,maxcpus=4 \
-numa cpu,node-id=0,socket-id=0 \
-numa cpu,node-id=0,socket-id=1 \
-numa cpu,node-id=1,socket-id=2 \
-numa cpu,node-id=1,socket-id=3 \
-numa dist,src=0,dst=1,val=20 \
-net nic \
-net user \
-hda testing.img \
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
-numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
-numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
-numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
-numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \

Then the latencies and bandwidths between the nodes were tested using the Intel Memory Latency Checker v3.9 (https://software.intel.com/content/www/us/en/develop/articles/intelr-memory-latency-checker.html). But the obtained results did not match the configuration. The following are the results obtained.

Latency_matrix with idle latencies (in ns)

Numa Node
. .0. . .1.
0 36.2 36.4
1 34.9 35.4

Bandwidth_matrix with memory bandwidths (in MB/s)

Numa Node
. . .0. . . .1.
0 15167.1 15308.9
1 15226.0 15234.0

A test was also conducted with the tool “lat_mem_rd” from lmbench to measure the memory read latencies. This also gave results which did not match the config.

Any information on why the config latency and bandwidth values are not applied, would be appreciated.

Vishnu Dixit (rvdixit23)
description: updated
description: updated
description: updated
Revision history for this message
Igor (imammedo) wrote : Re: [Bug 1888923] [NEW] Configured Memory access latency and bandwidth not taking effect
Download full text (13.0 KiB)

On Sat, 25 Jul 2020 07:26:38 -0000
Vishnu Dixit <email address hidden> wrote:

> Public bug reported:
>
> I was trying to configure latencies and bandwidths between nodes in a
> NUMA emulation using QEMU 5.0.0.
>
> Host : Ubuntu 20.04 64 bit
> Guest : Ubuntu 18.04 64 bit
>
> The machine configured has 2 nodes. Each node has 2 CPUs and has been
> allocated 3GB of memory. The memory access latencies and bandwidths for
> a local access (i.e from initiator 0 to target 0, and from initiator 1
> to target 1) are set as 40ns and 10GB/s respectively. The memory access
> latencies and bandwidths for a remote access (i.e from initiator 1 to
> target 0, and from initiator 0 to target 1) are set as 80ns and 5GB/s
> respectively.
>
> The command line launch is as follows.
>
> sudo x86_64-softmmu/qemu-system-x86_64 \
> -machine hmat=on \
> -boot c \
> -enable-kvm \
> -m 6G,slots=2,maxmem=7G \
> -object memory-backend-ram,size=3G,id=m0 \
> -object memory-backend-ram,size=3G,id=m1 \
> -numa node,nodeid=0,memdev=m0 \
> -numa node,nodeid=1,memdev=m1 \
> -smp 4,sockets=4,maxcpus=4 \
> -numa cpu,node-id=0,socket-id=0 \
> -numa cpu,node-id=0,socket-id=1 \
> -numa cpu,node-id=1,socket-id=2 \
> -numa cpu,node-id=1,socket-id=3 \
> -numa dist,src=0,dst=1,val=20 \
> -net nic \
> -net user \
> -hda testing.img \
> -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=40 \
> -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
> -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=80 \
> -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
> -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,latency=80 \
> -numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5G \
> -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,latency=40 \
> -numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=10G \
>
> Then the latencies and bandwidths between the nodes were tested using
> the Intel Memory Latency Checker v3.9
> (https://software.intel.com/content/www/us/en/develop/articles/intelr-
> memory-latency-checker.html). But the obtained results did not match the
> configuration. The following are the results obtained.
>
> Latency_matrix with idle latencies (in ns)
>
> Numa Node
> . .0. . .1.
> 0 36.2 36.4
> 1 34.9 35.4
>
> Bandwidth_matrix with memory bandwidths (in MB/s)
>
> Numa Node
> . . .0. . . .1.
> 0 15167.1 15308.9
> 1 15226.0 15234.0
>
> A test was also conducted with the tool “lat_mem_rd” from lmbench to
> measure the memory read latencies. This also gave results which did not
> match the config.
>
> Any information on why the config latency and bandwidth values are not
> applied, would be appreciated.

There is no information about host hardware, so I'd onldy hazard a guess
that host is non NUMA machine so all guest RAM and CPUs are in the same
latency domain, so that's why you are seeing pretty much the same timings.

QEMU nor KMV do not simullate HW latencies at all, all that is configured...

Thomas Huth (th-huth)
Changed in qemu:
status: New → Invalid
Revision history for this message
Vishnu Dixit (rvdixit23) wrote :

It indeed was a non NUMA machine

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.