AMD-Vi: Unable to read/write to IOMMU perf counter

Bug #1917203 reported by David Coe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
In Progress
Undecided
Unassigned

Bug Description

This boot warning (concealed by grub not taking over the fbcon console on a solo installation but always present on a multi-boot) has been bothering Linux users of AMD Ryzen machines for some time.

The problem is currently under discussion at kernel level <https://bugzilla.kernel.org/show_bug.cgi?id=201753>.

One solution proposed by Suravee Suthikulpanit <email address hidden> <https://lkml.org/lkml/2021/2/8/486> works but inserts a boot-up delay of (at least) 100 msec. A second option by Alexander Monakov <email address hidden> <https://bugzilla.kernel.org/show_bug.cgi?id=201753> also works but inserts no delay and more-or-less just moves one line of code.

I've tried both solutions with kernel rebuilds for both Breezy kernel 5.8.18 and Hirsute kernel 5.10.11 and both work on my AMD Ryzen 2400G. Could I encourage your kernel experts to evaluate the situation (I think SuSE folk already are).

My suggestion (as a humble user :-) ) would be to fold Alex's simple patch into your own list of kernel tweeks and get the correction out with the upcoming Hirsute release.

Best regards and all respect!

Revision history for this message
Steve Langasek (vorlon) wrote :

The ubuntu-cdimage project is for tracking bugs in the code used to create the install media for Ubuntu. Please use the ubuntu-bug command to file bug reports against the individual packages you are finding bugs in.

affects: ubuntu-cdimage → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1917203

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Alex Hung (alexhung) wrote :

Suravee's patch (commit 6778ff5b21bd8e78) is now in mainline kernel. It is more appropriate to backport it to Ubuntu kernels.

Let's wait for hirsute to rebase to 5.11 and send a SRU.

Changed in linux (Ubuntu):
assignee: nobody → Alex Hung (alexhung)
status: Incomplete → In Progress
Revision history for this message
Alex Hung (alexhung) wrote :

@david,

I cherry-picked 6778ff5b21bd8e78 and built a test kernel @ https://people.canonical.com/~alexhung/LP1917203/. I tested on an AMD Ryzen 5 2500U

Before
[ 0.975812] pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter.

After:
[ 1.006639] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported

It would be great if you can give it a try and update the results.

Revision history for this message
David Coe (hooligan-red) wrote :

Hi Alex!

A splendidly prompt piece of cherry-picking. Many thanks.

I've checked your test release on my Ryzen 2400G on current Ubuntu 20.10 and it still gives the diagostic:
  AMD-Vi: Unable to read/write to IOMMU perf counter

It's to be expected! The little crew engaged with testing Suravee Suthikulpanit's patch [1] are finding that the 2400G (that's me) needs 120 msec for power-gating and the 2200G (that's Paul Menzel] needs >200 msecs. Suravee's RFC v3 provides 5 x 20 msecs delay which just isn't enough for these lesser Ryzen's :-).

If you change Suravee's patch line

   for (retry = 5; retry; retry--) {
to
   for (retry = 25; retry; retry--) {

all will be well (imho). The maximum delay is indeed outrageous but only that really necessary to enable IOMMU is actually used. Eventually a proper solution will be found (in firmware or silicon) but (unlike at present) the many users of entry-level Ryzen CPU's will not have their IOMMU crippled during linux boot-up.

Incidentally, if you do rebuild your (very kind) test-release with the above change (please, please), could you also include the deb package for linux-tools for the same kernel. Things like the command "perf stat -a -e amd_iommu/mem_trans_total/ test" (used to test IOMMU performance in user-space) are tied to the running kernel-version and will then work.

Best regards and again many thanks

David

[1] https://bugzilla.kernel.org/show_bug.cgi?id=201753

Revision history for this message
David Coe (hooligan-red) wrote :

Hi Alex!

There are further results [1] for Suravee's patch on Ryzen 2500U (BubuXP) and 2400G (myself). Both show a slight difference between cold boot (retry 5 x 20 msecs = success) and warm boot (retry 6 x 20 msecs = failure).

It would be worth (even slightly) increasing the number of retries should you backport this commit <6778ff5b21bd8e78> to hirsute or breezy.

Regards

[1] https://bugzilla.kernel.org/show_bug.cgi?id=201753

Revision history for this message
Alex Hung (alexhung) wrote :

I did some tests with retry, based on latest mainline kernel today

1. change retry to 10
2. install kernel and reboot/poweroff
3. on the first boot after install, regarding of reboot or poweroff, the patch works
4. after second boot (reboot or poweroff), it always fails

Revision history for this message
Alex Hung (alexhung) wrote :

I also tried to increase the msleep, (retry = 5)
- msleep(20);
+ msleep(40);

The same result on #7. I am not sure whether this applies to my AMD Ryzen 5 2500U.

Update: If the system is powered off for long time (i.e. 5+ mins), the patch will work.

In any case, it seems there are still room for improvement.

Revision history for this message
David Coe (hooligan-red) wrote :

I've just tried Surajee's patch on Ubuntu's latest kernel 5.11.0-11 for upcoming Hirsute and it consistently unlatches IOMMU after 6 x 20 msec tries on my Ryzen 2400G. Mainline 5.12-rc2 gave identical results.

Paul Menzel too gets failure even after 10 x 20 msecs on a 2200G and regards so long a delay as most unacceptable. Alexander Monakov's very simple patch always works for me but I haven't seen data from other people. Surajee has mentioned problems with it for some part of AMD's product range and, as one of AMD's IOMMU experts down at Austen Texas, he should know his onions :-).

I would be tempted to run with the patch (11 x 20 msecs), log the exit count number and get some user feedback. Hopefully an improved patch will emerge! AMD may have commercial as well as technical reasons not leave it unaddressed.

Best regards

Revision history for this message
David Coe (hooligan-red) wrote :

Hi Alex!

Herewith a summary of our accumulated results [1] for Surajee's IOMMU patch using 20 msecs wait and logged retries on AMD's Ryzen:

Ryzen Kernel Cold Warm

4700U 5.11.0-11 6 1
3500U 5.11.7 5 6
2500U 5.8.0-45 5
        5.12.0 RC3 5 > 5
2400G 5.11.0-11 6 6
        5.8.0-45 5 6
2200G ? >10

Two points are clear:

1. there are differences between cold and warm boot, mostly marginal but marked and very consistent with the quite new 4700U.

2. the choice of 5 as the maximum retry number is unfortunate. Mostly, it guarantees all our Ryzens just fail the IOMMU write test!

The 2200G is a bit of a odd-ball and it's owner, understandably, wants a more elegant solution than just upping the number of retries. For the rest of us, a maximum of 6 or 7 would sort it.

Best regards and many thanks

David

[1] https://bugzilla.kernel.org/show_bug.cgi?id=201753

Revision history for this message
Alex Hung (alexhung) wrote :

@david, I am also watching the bugzilla 201753. I am also looking forward seeing a better solution

Revision history for this message
Suravee Suthikulpanit (suravee-suthikulpanit) wrote : Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
Download full text (3.4 KiB)

On 4/10/2021 5:03 PM, David Coe wrote:
> Results for AMD Ryzen 4700U running Ubuntu 21.04β kernel 5.11.0-13
>
> $ sudo dmesg | grep IOMMU
> [    0.490352] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
> [    0.491985] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
> [    0.493732] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
> [    0.793259] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <email address hidden>
>
> ....
>
> $ sudo perf stat -e 'amd_iommu_0/cmd_processed/, amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/,
> amd_iommu_0/mem_trans_total/, amd_iommu_0/page_tbl_read_gst/, amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/' sleep 10
>
> Performance counter stats for 'system wide':
>
>                12      amd_iommu_0/cmd_processed/             (33.28%)
>                 6       amd_iommu_0/cmd_processed_inv/        (33.32%)
>                 0       amd_iommu_0/ign_rd_wr_mmio_1ff8h/     (33.36%)
>               290       amd_iommu_0/int_dte_hit/              (33.40%)
>                20       amd_iommu_0/int_dte_mis/              (33.46%)
>               391       amd_iommu_0/mem_dte_hit/              (33.49%)
>             3,720       amd_iommu_0/mem_dte_mis/              (33.49%)
>                44       amd_iommu_0/mem_iommu_tlb_pde_hit/    (33.46%)
>               810       amd_iommu_0/mem_iommu_tlb_pde_mis/    (33.45%)
>                35       amd_iommu_0/mem_iommu_tlb_pte_hit/    (33.41%)
>               746       amd_iommu_0/mem_iommu_tlb_pte_mis/    (33.37%)
>                 0       amd_iommu_0/mem_pass_excl/            (33.32%)
>                 0       amd_iommu_0/mem_pass_pretrans/        (33.28%)
>                 0       amd_iommu_0/mem_pass_untrans/         (33.28%)
>                 0       amd_iommu_0/mem_target_abort/         (33.27%)
>               715       amd_iommu_0/mem_trans_total/          (33.27%)
>                 0       amd_iommu_0/page_tbl_read_gst/        (33.28%)
>                36       amd_iommu_0/page_tbl_read_nst/        (33.27%)
>                36       amd_iommu_0/page_tbl_read_tot/        (33.27%)
>                 0       amd_iommu_0/smi_blk/                  (33.28%)
>                 0       amd_iommu_0/smi_recv/                 (33.26%)
>                 0       amd_iommu_0/tlb_inv/                  (33.23%)
>                 0       amd_iommu_0/vapic_int_guest/          (33.24%)
>               366       amd_iommu_0/vapic_int_non_guest/      (33.27%)
>
> The immediately obvious difference is the with the enormous count seen on mem_dte_mis on the older Ryzen 2400G. Will do some RTFM but anyone with comm...

Read more...

Revision history for this message
Suravee Suthikulpanit (suravee-suthikulpanit) wrote :

David,

On 4/14/2021 10:33 PM, David Coe wrote:
> Hi Suravee!
>
> I've re-run your revert+update patch on Ubuntu's latest kernel 5.11.0-14 partly to check my mailer's 'mangling' hadn't also reached the code!
>
> There are 3 sets of results in the attachment, all for the Ryzen 2400G. The as-distributed kernel already incorporates your IOMMU RFCv3 patch.
>
> A. As-distributed kernel (cold boot)
>    >5 retries, so no IOMMU read/write capability, no amd_iommu events.
>
> B. As-distributed kernel (warm boot)
>    <5 retries, amd_iommu running stats show large numbers as before.
>
> C. Revert+Update kernel
>    amd_iommu events listed and also show large hit/miss numbers.
>
> In due course, I'll load the new (revert+update) kernel on the 4700G but won't overload your mail-box unless something unusual turns up.
>
> Best regards,
>

For the Ryzen 2400G, could you please try with:
- 1 event at a time
- Not more than 8 events (On your system, it has 2 banks x 4 counters/bank.
I am trying to see if this issue might be related to the counters multiplexing).

Thanks,
Suravee

Revision history for this message
Suravee Suthikulpanit (suravee-suthikulpanit) wrote :

David / Joerg,

On 4/10/2021 5:03 PM, David Coe wrote:
>
> The immediately obvious difference is the with the enormous count seen on mem_dte_mis on the older Ryzen 2400G. Will do some RTFM but anyone with comments and insight?
>
> 841,689,151,202,939       amd_iommu_0/mem_dte_mis/              (33.44%)
>
> Otherwise, all seems to running smoothly (especially for a distribution still in β). Bravo and many thanks all!

The initial hypothesis is that the issue happens only when users specify more number of events than
the available counters, which Perf will time-multiplex the events onto the counters.

Looking at the Perf and AMD IOMMU PMU multiplexing logic, it requires:
  1. Stop the counter (i.e. set CSOURCE to zero to stop counting)
  2. Save the counter value of the current event
  3. Reload the counter value of the new event (previously saved)
  4. Start the counter (i.e. set CSOURCE to count new events)

The problem here is that when the driver writes zero to CSOURCE register in step 1, this would enable power-gating,
which prevents access to the counter and result in writing/reading value in step 2 and 3.

I have found a system that reproduced this case (w/ unusually large number of count), and debug the issue further.
As a hack, I have tried skipping step 1, and it seems to eliminate this issue. However, this is logically incorrect,
and might result in inaccurate data depending on the events.

Here are the options:
1. Continue to look for workaround for this issue.
2. Find a way to disable event time-multiplexing (e.g. only limit the number of counters to 8)
    if power gating is enabled on the platform.
3. Back to the original logic where we had the pre-init check of the counter vlues, which is still the safest choice
    at the moment unless

Regards,
Suravee

Alex Hung (alexhung)
Changed in linux (Ubuntu):
assignee: Alex Hung (alexhung) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.