This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add schedule module for Alderlake-P
ClosedPublic

Authored by HaohaiWen on Aug 1 2022, 8:23 PM.

Details

Summary

The X86SchedAlderlakeP.td file is automatically generated by schedtool
(D130897). Most of instruction's scheduling information is based on
measured ADL-P data in uops.info. Some data is from GLC tpt/lat data
provided by intel doc. The rest instruction's scheduling information is
from skylake client schedule model in order to get a relative complete
model.

Diff Detail

Event Timeline

HaohaiWen created this revision.Aug 1 2022, 8:23 PM
Herald added a project: Restricted Project. · View Herald Transcript
HaohaiWen requested review of this revision.Aug 1 2022, 8:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2022, 8:23 PM
HaohaiWen retitled this revision from Add schedule module for Alderlake target to [X86] Add schedule module for Alderlake target.Aug 1 2022, 8:34 PM
HaohaiWen added reviewers: LuoYuanke, wxiao3, pengfei.
RKSimon requested changes to this revision.Aug 2 2022, 3:16 AM

This comes back to https://github.com/llvm/llvm-project/issues/56092 - I don't think we can have a single scheduler model called "alderlake" - the p cores and e cores behaviour are just too different. The models are used for analysis as much as scheduling.

As a first step it might be OK if we rename this model alderlake-p and use it by default for the -mcpu=alderlake target but do you intend to add a alderlake-e model as well?

This revision now requires changes to proceed.Aug 2 2022, 3:16 AM

Do you intend to add a alderlake-e model as well?

I'd like to add adl-e model. The problem is we have no instruction port information for gracemont since it has no events like uops.dispatch.port0. See https://uops.info/table.html

llvm-exegesis can give some reasonably latency / throughput numbers based off uops counters alone and the latest AoM shows the Gracemont microarch for actual ports - we had to do something similar for the Atom and SLM models

llvm-exegesis can give some reasonably latency / throughput numbers based off uops counters alone and the latest AoM shows the Gracemont microarch for actual ports - we had to do something similar for the Atom and SLM models

I believe we can't get precise ports info for gracemont (it has 17 ports) with llvm-exegesis. llvm-exegesis uses libpfm4 to read event counter. The problem is gracemont as well as other atom processor has no event counter for specific port like uops.dispatch.port1 so that we can't infer how many uops has been dispatched for each port.

I get that - but you do have enough public info to write the model manually and then exegesis can confirm it at least matches total uops, throughput and latency counts (although interestingly I don't see alderlake-p or alderlake-e counters in libpfm4 yet) - even if you don't have counters that confirm pipe occupancy.

do you have enough public info to write the model manually and then exegesis can confirm it at least matches total uops, throughput and latency counts.

For total uops, latency, we can get them from uops.info. We can set them in schedule model automatically.
For throughput, llvm calculate (see MCSchedModel::getReciprocalThroughput) them based on port description (resource, resource_cycles) instead of defining them directly like latency. We need to infer possible ports based on given throughput.

In fact, for resource_cycles, we don't have chance to measure each uop's latency for any intel x86 processors. I noticed skylake model made a assumption that each uop consume 1 cycle. That means throughput inferred from skylake schedule model may be inaccurate. I guess that's why skylake model has dummy ports called SKLDivider and SKLFPDivider and it may be used to get right throughput. In this alderlake-p model, I also defined each uop consume 1 cycle. I know that's not accurate but I don't have a better workaround.
I don't know whether we can measure resouce_cycles for other arch, but for x86, I think we can't get this. Because of this limitation, can we manually define a identifier called "Throughput" like "Latency/NumMicroOps" so that getReciprocalThroughput can return it if "Throughput" has defined or calculate based on port deception if not defined?

Matt added a subscriber: Matt.Aug 2 2022, 10:16 PM
HaohaiWen updated this revision to Diff 449611.Aug 3 2022, 2:48 AM

Rename alderlake model to alderlake-p model

HaohaiWen retitled this revision from [X86] Add schedule module for Alderlake target to [X86] Add schedule module for Alderlake-P.Aug 3 2022, 2:50 AM
HaohaiWen edited the summary of this revision. (Show Details)
RKSimon accepted this revision.Aug 17 2022, 3:39 AM

LGTM - cheers

This revision is now accepted and ready to land.Aug 17 2022, 3:39 AM
This revision was landed with ongoing or failed builds.Aug 18 2022, 1:40 AM
This revision was automatically updated to reflect the committed changes.
llvm/test/tools/llvm-mca/X86/AlderlakeP/resources-x86_32.s