Comment 7 for bug 1838575

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I was bumping up to the config you had (but with one PT device).

- Host phys bits machine type for larger mappings
- more CPUS 1->32

Adding/removing a PT device in the configs above doesn't change a lot.

As assumed none of these increased the time tremendously.
Then I went to bump up the memory size.

T1: use 1.2 TB but no PT device
  #1: 6 sec
  #2: 21 sec
  #3: 16 sec

This only very slightly increases in #2 due to more memory that needs to be set up.

T2: use 1.2 TB with one PT device
  #1: 253 sec
  #2: 20 sec
  #3: 18 sec

The time consuming part is a single process, with kernel side load.
Associated userpace is qemu, but the load i in the kernel close to 100%.

Samples: 62K of event 'cycles:ppp', Event count (approx.): 34521154809
Overhead Shared Object Symbol
  73.91% [kernel] [k] clear_page_erms
   9.53% [kernel] [k] clear_huge_page
   1.65% [kernel] [k] follow_trans_huge_pmd