The optimization guide can be found here:
https://developer.arm.com/documentation/PJDOC-466751330-18256/latest/
Details
Diff Detail
Unit Tests
Event Timeline
Hello - I've been looking at scheduling a little lately again. I presume this was created by transcribing the values from the Software Optimization guide? It looks nice and clean from what I can see.
I think we can use this new schedule for all "Arm-v9" cores in AArch64.td (that are not in-order). It will almost certainly be a better fit than the older A57 model, and be good to get some decent SVE information.
Can you take the tests from llvm/test/tools/llvm-mca/AArch64/Cortex/A55-basic-instructions.s and llvm/test/tools/llvm-mca/AArch64/Cortex/A55-neon-instructions.s and replicate them for the new model, and preferably find a way for writing a sve equivalent? We've found that those files make a great test of the information in the model.
Hi Dave, that's right it's based on the optimization guide.
I think we can use this new schedule for all "Arm-v9" cores in AArch64.td (that are not in-order). It will almost certainly be a better fit than the older A57 model, and be good to get some decent SVE information.
Sounds good, my only concern is we're quite keen to get this into LLVM 15 which branches around the middle of July I believe and making this the default for all v9 cores presumably raises the bar in terms of validating it is better than the A57 for the v9 cores?
Can you take the tests from llvm/test/tools/llvm-mca/AArch64/Cortex/A55-basic-instructions.s and llvm/test/tools/llvm-mca/AArch64/Cortex/A55-neon-instructions.s and replicate them for the new model, and preferably find a way for writing a sve equivalent? We've found that those files make a great test of the information in the model.
I'll have a look.
Add tests and missing info for EOR(BT|TB)_ZZZ_[BHSD], LDNT1D_ZZR_D_REAL, and STNT1D_ZZR_D instructions.
Can you take the tests from llvm/test/tools/llvm-mca/AArch64/Cortex/A55-basic-instructions.s and llvm/test/tools/llvm-mca/AArch64/Cortex/A55-neon-instructions.s and replicate them for the new model, and preferably find a way for writing a sve equivalent? We've found that those files make a great test of the information in the model.
Done. For SVE I used the objdump output from the MC tests, removed the duplicates and sorted on opcode. There's still probably a few duplicate variants but the coverage should be good.
Could you add something modern to the basic instruction test, like, e.g., MEMTAG and BTI?
llvm/lib/Target/AArch64/AArch64SchedA64FX.td | ||
---|---|---|
21 | Is the change to CompleteModel intentional in this patch? |
BTI is just an alias of HINT, but I've added tests for MTE based on llvm/test/MC/Disassembler/AArch64/armv8.5a-mte.txt.
llvm/lib/Target/AArch64/AArch64SchedA64FX.td | ||
---|---|---|
21 |
It is, thanks for pointing it out my intention was to notify Fujitsu engineers when I put the patch up, but I completely forgot. The A64FX model is missing info for FTSSEL, FMSB, PFIRST. Also, RDFFR info is set on the pseudo and not the real instruction. I suspect there's a bug in the scheduling code somewhere since the model was marked as complete, yet these missing instructions weren't detected |
Done. For SVE I used the objdump output from the MC tests, removed the duplicates and sorted on opcode. There's still probably a few duplicate variants but the coverage should be good.
Thanks, that's great.
I think we can use this new schedule for all "Arm-v9" cores in AArch64.td (that are not in-order). It will almost certainly be a better fit than the older A57 model, and be good to get some decent SVE information.
Sounds good, my only concern is we're quite keen to get this into LLVM 15 which branches around the middle of July I believe and making this the default for all v9 cores presumably raises the bar in terms of validating it is better than the A57 for the v9 cores?
I think for Cortex-A710 we can change it, they are very similar microarchitectures. The performance checks I've ran seem just fine for them. I would change the Cortex-X2 and NeoverseV1 as well, although the core is a little different it would be a shame to leave them off. Neoverse512TVB I know less about, but as it doesn't relate to any specific core I would say a model with SVE scheduling is better than one without. We would presumably add all new cores with the new schedule as opposed to the A57 model, and I would prefer not to leave the other SVE cores behind in that regard.
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s | ||
---|---|---|
1859 | Is this missing? | |
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s | ||
4195 | Perhaps try and remove some of these. | |
4466 | We can probably get away without different conditions. I don't think they should be important for scheduling. |
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s | ||
---|---|---|
1859 |
I can't see any issue with this? |
Sorry for the delayed response I've been off for a week. Thanks for checking the performance, I've set this model as the default for the cores you listed in D129203.
Thanks for the cleanup and second patch. I think apart from the EXTR question this looks good.
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s | ||
---|---|---|
1859 | I meant should it be the same as extr x? The software optimization guide mentions "Bitfield extract, one reg" and "Bitfield extract, two regs", but doesn't make a distinction between X regs and W. We don't model the difference between EXTR where both operands are the same (if that's what "one reg" means) - that should be fine as it sounds minor. But should the matching be including W EXT too: def : InstRW<[N2Write_3cyc_1I_1M], (instrs EXTRXrri, EXTRWrri)>; |
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s | ||
---|---|---|
1859 |
I was confused by one reg / two reg in the guide as well, I just copied the A57 which implements it like this. I don't know if that's correct, I'll check. |
The one reg variant of EXTR (both input registers are the same, can’t be modelled) was incorrectly modelled as the W form. Use the two regs properties for W form.
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s | ||
---|---|---|
1859 |
One register means both inputs are the same as you thought, should probably be fixed for the A57 as well. |
Thanks LGTM
The one reg variant of EXTR (both input registers are the same, can’t be modelled)
It may be possible with a SchedulePredicate, but I think it's fairly minor. This patch is looking good.
First I've heard of SchedPredicate, good to know. And thanks for reviewing Dave, appreciate it.
llvm/lib/Target/AArch64/AArch64SchedNeoverseN2.td | ||
---|---|---|
191 | @c-rhodes Hi, I've been doing something similar lately. I do not understand ResourceCycles. I find that llvm-mca is used to calculate throughput. I do not see that this is used in other places. Do you know how ResourceCycles affect instruction scheduling? |
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s | ||
---|---|---|
7174 | I found that the throughput information for this instruction (st4d) does not match the description in the documentation. Is there a problem with the document? |
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s | ||
---|---|---|
7174 |
@Cullen Rhodes |
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s | ||
---|---|---|
7174 |
I've seen your comment, I don't have an immediate answer but will check. |
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s | ||
---|---|---|
7174 |
The throughput for the ST4 instructions here is incorrect as you pointed out, it should be 9. I noticed ST2 doesn't look correct either. Thanks for reporting, we'll get it fixed. |
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s | ||
---|---|---|
7174 | Okay, look forward to your fix. |
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s | ||
---|---|---|
7174 |
I'm no longer working in this area, someone from my team will fix it but it's not urgent enough to drop everything they're currently doing unless there's something I'm not aware of. Is this blocking you somehow? |
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s | ||
---|---|---|
7174 | Haha, I've had a similar problem recently, but I don't know how to solve it, so it's urgent. |
Is the change to CompleteModel intentional in this patch?