Page MenuHomePhabricator

[X86] AMD Znver3 Scheduler descriptions and llvm-mca tests
Changes PlannedPublic

Authored by lebedev.ri on Jan 11 2021, 2:01 AM.

Details

Summary

The patch adds the following.

  1. AMD Znver3 scheduler descriptions.
  2. The llvm-mca tests that check the znver3 scheduler descriptions.
  3. Adds znver3 for codgen tests.

Diff Detail

Event Timeline

GGanesh created this revision.Jan 11 2021, 2:01 AM
GGanesh requested review of this revision.Jan 11 2021, 2:01 AM
RKSimon added subscribers: lebedev.ri, bkramer.

Adding @bkramer as IIRC he added the initial znver3 support, and @lebedev.ri who has added amd models in the past.

This mainly looks like a copy+tweak of the existing znver2 model (which itself was just a copy+tweak of the znver1 model), which I guess makes sense, but I don't have a 5000 series cpu to compare exegesis numbers.

llvm/lib/Target/X86/X86PfmCounters.td
241

Do you have the fpu pipe assignments for znver3 (or znver2 for that matter?)

Adding @bkramer as IIRC he added the initial znver3 support, and @lebedev.ri who has added amd models in the past.

This mainly looks like a copy+tweak of the existing znver2 model (which itself was just a copy+tweak of the znver1 model), which I guess makes sense, but I don't have a 5000 series cpu to compare exegesis numbers.

I'd love to help (or even take over if that helps) with this, but presently i still don't have this cpu.

I would like to know if this patch can be approved and later can be checked for the numbers when a cpu is accessible.
Since this is an extension and tweaks from znver1->znver2 path, I would request that if it is doable.

Meanwhile, we are trying to host a machine as well for the community.

Hi,
I have the CPU (5900x) and I can help you to run some commands if you tell me exactly what to do.

There is a small catch right now that libpfm does not support Zen3 performance counters yet. So as a result llvm-exegesis throws the error "event not found - cannot create event cycles_not_in_halt" and fails. One has to patch/compile/install libpfm first. My simple stupid patch for libpfm below.

diff -uNr libpfm-4.11.0/lib/pfmlib_amd64.c libpfm-4.11.0.p/lib/pfmlib_amd64.c
--- libpfm-4.11.0/lib/pfmlib_amd64.c	2020-09-02 20:48:00.000000000 +0200
+++ libpfm-4.11.0.p/lib/pfmlib_amd64.c	2021-01-27 12:50:10.536351642 +0100
@@ -183,6 +183,8 @@
                 }
 	} else if (cfg->family == 22) { /* family 16h */
 		rev = PFM_PMU_AMD64_FAM16H;
+	} else if (cfg->family == 25) { /* family 19h */
+		rev = PFM_PMU_AMD64_FAM17H_ZEN2;
 	}
 
       cfg->revision = rev;

I think there's a number of other libpfm4 changes that need to be done foryou to be able to test this - the fpu pipe mappings seems to be missing from its zen2 events list (the equivalent of amd64_fam17h_zen1_fpu_pipe_assignment on zen1) - we can then add the missing mappings in X86PfmCounters.td and then create latency/uops inconsistency reports in llvm-exegesis. This is all assuming that fam19h has the same pfm pipes as zen2....

Matt added a subscriber: Matt.Feb 25 2021, 2:17 AM
Matt added inline comments.Feb 25 2021, 8:17 AM
llvm/lib/Target/X86/X86ScheduleZnver3.td
1170

I'm wondering, would you happen to know whether there's a chance that marking VPERM2F128 as "Microcoded Instruction" (with Latency = 100) is a leftover from a previous Zen microarchitecture scheduler?

It seems this is no longer the case for Zen 3; cf. https://www.agner.org/optimize/instruction_tables.pdf reporting macro-operations=1, latency=3.5, and reciprocal throughput=0.5 for the y,y,y/m,i instruction variant.

RKSimon added inline comments.Feb 25 2021, 9:02 AM
llvm/lib/Target/X86/X86ScheduleZnver3.td
1170

Yes, there are a lot of numbers in this patch that look like a direct copy+paste from the znver2 model (which was mainly a copy+paste of the znver1 model), and all of them diverge from what Agner, instlatx64 and the AMD SoG tables all report. For this patch to get any further we need llvm-exegesis to be run on the model to determine how accurate it really is. Otherwise we're better off just staying with the existing zen models.

We have started working on the libpfm patch. We are working on getting it posted very soon! Hopefully, once that is in place, we will be able to measure these numbers in znver3 hardware and correct accordingly.

We have started working on the libpfm patch. We are working on getting it posted very soon! Hopefully, once that is in place, we will be able to measure these numbers in znver3 hardware and correct accordingly.

Given that the later znver models are very similar to znver1 you might make faster progress by fixing the znver1 model first and copying those fixes down to znver2/znver3. But getting the zen2/zen3 libpfm support fixed upstream would be very useful as well.

(still very much interested in taking this over once i have this hardware, but that is presently still not the case)

(still very much interested in taking this over once i have this hardware, but that is presently still not the case)

@lebedev.ri Do you have an older znver1 CPU that can be used to analyze/clean up the base znver1 model first?

(still very much interested in taking this over once i have this hardware, but that is presently still not the case)

@lebedev.ri Do you have an older znver1 CPU that can be used to analyze/clean up the base znver1 model first?

Nope, otherwise i would have done so already :)

I can work on znver1 and znver2 models as well. I have a znver2 machine. Let me check if I can get a znver1 machine and run exegesis to correct these latency\throughput numbers.

(still very much interested in taking this over once i have this hardware, but that is presently still not the case)

I may have an update on this within next ~7 days, hang on..

We have exchanged mails with Eranian for libpfm4 support. We will upload the libpfm4 patch shortly (3-4 days in time). The patch will be based on the latest PPR manual (https://www.amd.com/system/files/TechDocs/55898_pub.zip). I believe that will keep it moving for verifying the numbers and the details with exegesis.

tambre added a subscriber: tambre.Thu, Mar 18, 10:05 AM

(still very much interested in taking this over once i have this hardware, but that is presently still not the case)

I may have an update on this within next ~7 days, hang on..

Alrighty :)
This now deeply affects me personally, providing a necessary, and sufficient, interest to be involved with this.

$ lscpu  | grep -i "Model name"
Model name:                      AMD Ryzen 9 5950X 16-Core Processor

So. @RKSimon @craig.topper @GGanesh what would be the reaction to my proposition for me to take this over?
This won't be uncharter territory for me, see BdVer2 sched.

@lebedev.ri

My colleague had submitted the libpfm4 patch to the libpfm4 community. You can take this patch and enablement for exegesis for znver3. Thank you for the help!

@lebedev.ri

My colleague had submitted the libpfm4 patch to the libpfm4 community. You can take this patch and enablement for exegesis for znver3. Thank you for the help!

@lebedev.ri No objections as long as @GGanesh is happy. What might be an issue is znver2/znver3 libpfm4 event lists are still missing the FPU pipe events, despite them being documented in the PPRs - this will mean that llvm-exegesis wouldn't be able to verify pipe allocation.

I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions - so ensuring the existing znver1/znver2 models are correct must surely make sense - if it ends up being we copy+paste backwards from an accurate znver3 model then that works as well.

lebedev.ri commandeered this revision.Wed, Mar 24, 6:01 AM
lebedev.ri planned changes to this revision.
lebedev.ri edited reviewers, added: GGanesh; removed: lebedev.ri.

@lebedev.ri

My colleague had submitted the libpfm4 patch to the libpfm4 community. You can take this patch and enablement for exegesis for znver3. Thank you for the help!

@lebedev.ri No objections as long as @GGanesh is happy. What might be an issue is znver2/znver3 libpfm4 event lists are still missing the FPU pipe events, despite them being documented in the PPRs - this will mean that llvm-exegesis wouldn't be able to verify pipe allocation.

I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions - so ensuring the existing znver1/znver2 models are correct must surely make sense - if it ends up being we copy+paste backwards from an accurate znver3 model then that works as well.

@GGanesh i agree with @RKSimon.
Having only cycle and uops counters is pretty limiting, having per-pipe counters would be awesome.

The FPU pipe counters aren't in the PPR as well. I have raised a ticket so as to get this updated in the PPR. Unfortunately, without the document getting updated, these events can't be enabled.

So i've finally started working on this (yay!),
and while i haven't gotten to the actual instruction latencies/uops/pipe distribution yet,
basically everything so far is is not really correct.
So i'm going to end up basically scrapping this and rewriting/remeasuring from scratch...

And yes, cursory examination suggests that zen2 is also not very correct.