The X86SchedAlderlakeP.td file is automatically generated by schedtool
(D130897). Most of instruction's scheduling information is based on
measured ADL-P data in uops.info. Some data is from GLC tpt/lat data
provided by intel doc. The rest instruction's scheduling information is
from skylake client schedule model in order to get a relative complete
model.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This comes back to https://github.com/llvm/llvm-project/issues/56092 - I don't think we can have a single scheduler model called "alderlake" - the p cores and e cores behaviour are just too different. The models are used for analysis as much as scheduling.
As a first step it might be OK if we rename this model alderlake-p and use it by default for the -mcpu=alderlake target but do you intend to add a alderlake-e model as well?
Do you intend to add a alderlake-e model as well?
I'd like to add adl-e model. The problem is we have no instruction port information for gracemont since it has no events like uops.dispatch.port0. See https://uops.info/table.html
llvm-exegesis can give some reasonably latency / throughput numbers based off uops counters alone and the latest AoM shows the Gracemont microarch for actual ports - we had to do something similar for the Atom and SLM models
llvm-exegesis can give some reasonably latency / throughput numbers based off uops counters alone and the latest AoM shows the Gracemont microarch for actual ports - we had to do something similar for the Atom and SLM models
I believe we can't get precise ports info for gracemont (it has 17 ports) with llvm-exegesis. llvm-exegesis uses libpfm4 to read event counter. The problem is gracemont as well as other atom processor has no event counter for specific port like uops.dispatch.port1 so that we can't infer how many uops has been dispatched for each port.
I get that - but you do have enough public info to write the model manually and then exegesis can confirm it at least matches total uops, throughput and latency counts (although interestingly I don't see alderlake-p or alderlake-e counters in libpfm4 yet) - even if you don't have counters that confirm pipe occupancy.
do you have enough public info to write the model manually and then exegesis can confirm it at least matches total uops, throughput and latency counts.
For total uops, latency, we can get them from uops.info. We can set them in schedule model automatically.
For throughput, llvm calculate (see MCSchedModel::getReciprocalThroughput) them based on port description (resource, resource_cycles) instead of defining them directly like latency. We need to infer possible ports based on given throughput.
In fact, for resource_cycles, we don't have chance to measure each uop's latency for any intel x86 processors. I noticed skylake model made a assumption that each uop consume 1 cycle. That means throughput inferred from skylake schedule model may be inaccurate. I guess that's why skylake model has dummy ports called SKLDivider and SKLFPDivider and it may be used to get right throughput. In this alderlake-p model, I also defined each uop consume 1 cycle. I know that's not accurate but I don't have a better workaround.
I don't know whether we can measure resouce_cycles for other arch, but for x86, I think we can't get this. Because of this limitation, can we manually define a identifier called "Throughput" like "Latency/NumMicroOps" so that getReciprocalThroughput can return it if "Throughput" has defined or calculate based on port deception if not defined?