User Details
- User Since
- Jul 21 2021, 11:13 PM (80 w, 5 d)
Mon, Jan 16
Fri, Jan 13
Use SapphireRapidsModel for graniterapids and emeraldrapids
Wed, Jan 11
Support sapphirerapids target
Instruction's scheduling info in this model comes from many sources.
Priority of source is (dsc order)
- 4th Generation Intel® Xeon® Scalable Processor Family (based on Sapphire Rapids Architecture) Instruction Throughput and Latency in https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
- Alderlake-P data from uops.info
- Current SkylakeServerModel.
Jan 4 2023
Don't match CodeGenOnly instructions with lock prefix. Add default LoadUOps argument.
Ignore CodeGenOnly instructions with lock prefix
Dec 27 2022
Any comments?
Dec 8 2022
Now, when llc encounters the case that contains a lot of extract_vector_elt and a BUILD_VECTOR, it will replace these to vector_shuffle to decrease the size of code, the actions are done in createBuildVecShuffle in DAGCombiner.cpp, but now the code cannot handle the case that the size of source vector reg is more than twice the dest size.
Dec 5 2022
Don't match CodeGenOnly opcodes by default
Dec 2 2022
Generate asm string for CodeGenOnly but encodable instructions
Nov 25 2022
Fix typo
Nov 24 2022
I don’t know anything about xed. What does it require?
AFAIK, Input to it is normally encoding or asm string. That's why I need to enumerate asm string and encode it for each llvm opcodes.
Is it not possible to use the encoding information in TSFlags rather than going through the assembly parser? Your patches for schedtool seem very coupled to the names of operand classes and other things. It looks like it will require updates often.
The asm enumeration code and asm matcher patch as well as xed patch are used to build map between llvm opcode <-> Xed info <-> uops.info data / other scheduling info data source. Do you have any suggestion to build this map?
IsaSet in Xed info can also be used to identify whether a llvm opcode is supported by specific target. LLVM predicates can't determine that precisely.
Can we ignore those with isCodeGenOnly. I think they are just duplications of the non codegen only ones from the perspective of encoding.
Of course we can ignore it in almost all cases because they'll never be generated to asm printer.
However we should describe them correctly in schedule model. In fact, current schedtool D130897 only emit scheduling info for not CodeGenOnly instruction. That means scheduling info for CodeGenOnly instructions like CVTSD2SI64rm may be not correct, although it should be same with CVTSD2SI64rm_Int.
I'm working on fixing that, this requires correct mode predicates for CodeGenOnly and encodable instructions.
How does adding In64BitMode avoid this error? This error occurs when AH/BH/CH/DH are passed to an instruction that uses a REX prefix. Are you using In64BitMode to avoid passing AH/BH/CH/DH registers?
In fact, the issue is, I relied on predicates to auto gen asm. Let's take CVTSD2SI64rm as an example.
Since it have no predicates, I made assumption its encodable in all modes and then try to gen 16,32,64 bit asm string (by adding {.code16 .code32 . code64) enumeration and encode it with llvm-mc. Then fed the encoding into llvm-mc to decode in order to find all matchable llvm opcodes.
That's how I found this predicates error.
Isn't X86::CVTSD2SI64rm the CodeGenOnly instruction?
Are you seeing a functional issue that this fixes?
Nov 23 2022
Have you confirmed that this passes the whole of check-llvm? Sometime these cost changes have effects in transforms/codegen you don't expect
Yes, all passed.
Nov 22 2022
Rebase upon D38485
Add cost-kind and update costs for slm/goldencove/btver2
Please see diffs between those versions to see what changed.
Patch1: Just copy throughput cost files.
Patch2: Add cost-kind options (some cost-kinds are added in patch4)
Patch3: Update cost value. (some costs are added in patch4)
Update cost value
Add cost-kind options
Nov 21 2022
No test affected, or lacking test for them?
Looks like no permute latency test.
Nov 11 2022
Do you have any suggestions for scheduler classes that we could add/change to help reduce the number of overrides?
def ADLPWriteResGroup222 : SchedWriteRes<[ADLPPort00_01, ADLPPort00_01_05, ADLPPort00_06, ADLPPort01_05]> { let ResourceCycles = [2, 2, 1, 2]; let Latency = 6; let NumMicroOps = 7; } def : InstRW<[ADLPWriteResGroup222], (instregex "^SHA1MSG2rr$")>;That could just as well be (instrs SHA1MSG2rr)
Delete misadded files
Replace simple instregex with instrs to boost tblgen match speed
All I'm actually requiring is that the instregex that end in "_Int$" are replaced with (_Int)?$
Replace simple instregex with instrs
@RKSimon , you can compare diff of patch1 and patch2 to see what changed for instregex.
Nov 8 2022
Nov 7 2022
Nov 3 2022
Remove old alderlake.td template file
Rebase patch
Nov 2 2022
Oct 26 2022
Oct 24 2022
Oct 23 2022
Delete mistakenly added files
Update comments
Oct 21 2022
Fix amdgpu test
Sep 13 2022
Sep 4 2022
Aug 18 2022
Thank you @RKSimon
Aug 3 2022
Rename alderlake to alderlake-p
Rename alderlake model to alderlake-p model
Aug 2 2022
do you have enough public info to write the model manually and then exegesis can confirm it at least matches total uops, throughput and latency counts.
For total uops, latency, we can get them from uops.info. We can set them in schedule model automatically.
For throughput, llvm calculate (see MCSchedModel::getReciprocalThroughput) them based on port description (resource, resource_cycles) instead of defining them directly like latency. We need to infer possible ports based on given throughput.
llvm-exegesis can give some reasonably latency / throughput numbers based off uops counters alone and the latest AoM shows the Gracemont microarch for actual ports - we had to do something similar for the Atom and SLM models
Do you intend to add a alderlake-e model as well?
I'd like to add adl-e model. The problem is we have no instruction port information for gracemont since it has no events like uops.dispatch.port0. See https://uops.info/table.html