This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add schedule module for SapphireRapids
ClosedPublic

Authored by HaohaiWen on Jan 11 2023, 5:36 AM.

Details

Summary

The X86SchedSapphireRapids.td file is automatically generated by
schedtool (D130897). Most of instruction's scheduling information
is from SapphireRapids tpt/lat data provided by intel doc. Some data
is from measured ADL-P data in uops.info. The rest instruction's
scheduling information is from skylake server schedule model in order
to get a relative complete model.

Diff Detail

Event Timeline

HaohaiWen created this revision.Jan 11 2023, 5:36 AM
Herald added a project: Restricted Project. · View Herald Transcript
HaohaiWen requested review of this revision.Jan 11 2023, 5:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2023, 5:36 AM

Instruction's scheduling info in this model comes from many sources.
Priority of source is (dsc order)

  1. 4th Generation Intel® Xeon® Scalable Processor Family (based on Sapphire Rapids Architecture) Instruction Throughput and Latency in https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
  2. Alderlake-P data from uops.info
  3. Current SkylakeServerModel.
RKSimon added inline comments.Jan 11 2023, 6:07 AM
llvm/lib/Target/X86/X86.td
1668

Update these models as well?

@HaohaiWen We don't currently have llvm-mca test coverage for the amx ISAs, I'll see if I can get that added at some point soon - have you noticed any other ISAs we're still missing please?

@HaohaiWen We don't currently have llvm-mca test coverage for the amx ISAs, I'll see if I can get that added at some point soon - have you noticed any other ISAs we're still missing please?

I can't find the latency and throughput of AMX instructions from https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html. It seems it is not disclosed yet. The avx512fp16 is a big ISA introduced in SPR, but we can use the same schedule model with float32 instructions. The TTI information may be added for avx512fp16.

Use SapphireRapidsModel for graniterapids and emeraldrapids

HaohaiWen marked an inline comment as done.Jan 13 2023, 12:24 AM
RKSimon added inline comments.Jan 13 2023, 7:52 AM
llvm/test/CodeGen/X86/pmullq-false-deps.ll
9 ↗(On Diff #488887)

please can you check if this is OK - I can't recall the exact nature of the false-deps issue but shouldn't xmm2 be cleared?

HaohaiWen added inline comments.Jan 16 2023, 11:20 PM
llvm/test/CodeGen/X86/pmullq-false-deps.ll
9 ↗(On Diff #488887)

Hi @gpei, could you please help to check this? Thanks.

gpei added inline comments.Jan 17 2023, 12:38 AM
llvm/test/CodeGen/X86/pmullq-false-deps.ll
9 ↗(On Diff #488887)

The reason is there is no any use of xmm2 after the update of scheduling. Could we disable instruction scheduling for this test?

LuoYuanke added inline comments.Jan 17 2023, 12:52 AM
llvm/test/CodeGen/X86/pmullq-false-deps.ll
9 ↗(On Diff #488887)

It seems vxorps is not generated which is not expected. Can we replace "nop" with "endbr" instruction to create scheduling boudary, so that we can prevent vpmullq being scheduled before inline assmebly?

Matt added a subscriber: Matt.Jan 25 2023, 9:03 AM
HaohaiWen updated this revision to Diff 520335.May 8 2023, 5:38 AM

Rebase and prefer data from uops.info

@LuoYuanke , could you please help to verify avxvnni-combine.ll?

@LuoYuanke , could you please help to verify avxvnni-combine.ll?

The change is expected that ADL and SPR for VNNI instruction combine is the same. Previously SPR use skylake server schedule model.

LuoYuanke added inline comments.May 8 2023, 6:24 AM
llvm/lib/Target/X86/X86SchedSapphireRapids.td
69

Port10 or port11?

80

Port11 or port10?

HaohaiWen marked 2 inline comments as done.May 8 2023, 6:53 AM
HaohaiWen added inline comments.
llvm/lib/Target/X86/X86SchedSapphireRapids.td
69

Port 10 is ALU+LEA which is same group as 0, 1, 5, 6
Port 11 is AGU+Load which is same group as 2, 3, 7, 8

Instruction's scheduling info in this model comes from many sources.
Priority of source is (dsc order)

  1. 4th Generation Intel® Xeon® Scalable Processor Family (based on Sapphire Rapids Architecture) Instruction Throughput and Latency in https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
  2. Alderlake-P data from uops.info
  3. Current SkylakeServerModel.

Why skylake server as opposed to ICX server?

HaohaiWen updated this revision to Diff 520581.May 8 2023, 8:55 PM
HaohaiWen marked an inline comment as done.

Use icelake as reference model and fix port mapping from skx/icx to spr

Instruction's scheduling info in this model comes from many sources.
Priority of source is (dsc order)

  1. 4th Generation Intel® Xeon® Scalable Processor Family (based on Sapphire Rapids Architecture) Instruction Throughput and Latency in https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
  2. Alderlake-P data from uops.info
  3. Current SkylakeServerModel.

Why skylake server as opposed to ICX server?

The priority is:

  1. Alderlake-P data (including avx512) from uops.info
  2. 4th Generation Intel® Xeon® Scalable Processor Family (based on Sapphire Rapids Architecture) Instruction Throughput and Latency in https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
  3. Current IcelakeModel.
LuoYuanke accepted this revision.May 30 2023, 8:31 PM

LGMT. I think it's a good start to have SPR schdule model in LLVM.

This revision is now accepted and ready to land.May 30 2023, 8:31 PM
This revision was automatically updated to reflect the committed changes.