The X86SchedSapphireRapids.td file is automatically generated by
schedtool (D130897). Most of instruction's scheduling information
is from SapphireRapids tpt/lat data provided by intel doc. Some data
is from measured ADL-P data in uops.info. The rest instruction's
scheduling information is from skylake server schedule model in order
to get a relative complete model.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Instruction's scheduling info in this model comes from many sources.
Priority of source is (dsc order)
- 4th Generation Intel® Xeon® Scalable Processor Family (based on Sapphire Rapids Architecture) Instruction Throughput and Latency in https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
- Alderlake-P data from uops.info
- Current SkylakeServerModel.
llvm/lib/Target/X86/X86.td | ||
---|---|---|
1668 | Update these models as well? |
@HaohaiWen We don't currently have llvm-mca test coverage for the amx ISAs, I'll see if I can get that added at some point soon - have you noticed any other ISAs we're still missing please?
I can't find the latency and throughput of AMX instructions from https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html. It seems it is not disclosed yet. The avx512fp16 is a big ISA introduced in SPR, but we can use the same schedule model with float32 instructions. The TTI information may be added for avx512fp16.
llvm/test/CodeGen/X86/pmullq-false-deps.ll | ||
---|---|---|
9 ↗ | (On Diff #488887) | please can you check if this is OK - I can't recall the exact nature of the false-deps issue but shouldn't xmm2 be cleared? |
llvm/test/CodeGen/X86/pmullq-false-deps.ll | ||
---|---|---|
9 ↗ | (On Diff #488887) | The reason is there is no any use of xmm2 after the update of scheduling. Could we disable instruction scheduling for this test? |
llvm/test/CodeGen/X86/pmullq-false-deps.ll | ||
---|---|---|
9 ↗ | (On Diff #488887) | It seems vxorps is not generated which is not expected. Can we replace "nop" with "endbr" instruction to create scheduling boudary, so that we can prevent vpmullq being scheduled before inline assmebly? |
The change is expected that ADL and SPR for VNNI instruction combine is the same. Previously SPR use skylake server schedule model.
llvm/lib/Target/X86/X86SchedSapphireRapids.td | ||
---|---|---|
69 | Port 10 is ALU+LEA which is same group as 0, 1, 5, 6 |
The priority is:
- Alderlake-P data (including avx512) from uops.info
- 4th Generation Intel® Xeon® Scalable Processor Family (based on Sapphire Rapids Architecture) Instruction Throughput and Latency in https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
- Current IcelakeModel.
Update these models as well?