This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Initial sched model for Neoverse N2
ClosedPublic

Authored by c-rhodes on Jun 27 2022, 3:14 AM.

Diff Detail

Event Timeline

c-rhodes created this revision.Jun 27 2022, 3:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2022, 3:14 AM
c-rhodes requested review of this revision.Jun 27 2022, 3:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2022, 3:14 AM

Hello - I've been looking at scheduling a little lately again. I presume this was created by transcribing the values from the Software Optimization guide? It looks nice and clean from what I can see.

I think we can use this new schedule for all "Arm-v9" cores in AArch64.td (that are not in-order). It will almost certainly be a better fit than the older A57 model, and be good to get some decent SVE information.

Can you take the tests from llvm/test/tools/llvm-mca/AArch64/Cortex/A55-basic-instructions.s and llvm/test/tools/llvm-mca/AArch64/Cortex/A55-neon-instructions.s and replicate them for the new model, and preferably find a way for writing a sve equivalent? We've found that those files make a great test of the information in the model.

Hello - I've been looking at scheduling a little lately again. I presume this was created by transcribing the values from the Software Optimization guide? It looks nice and clean from what I can see.

Hi Dave, that's right it's based on the optimization guide.

I think we can use this new schedule for all "Arm-v9" cores in AArch64.td (that are not in-order). It will almost certainly be a better fit than the older A57 model, and be good to get some decent SVE information.

Sounds good, my only concern is we're quite keen to get this into LLVM 15 which branches around the middle of July I believe and making this the default for all v9 cores presumably raises the bar in terms of validating it is better than the A57 for the v9 cores?

Can you take the tests from llvm/test/tools/llvm-mca/AArch64/Cortex/A55-basic-instructions.s and llvm/test/tools/llvm-mca/AArch64/Cortex/A55-neon-instructions.s and replicate them for the new model, and preferably find a way for writing a sve equivalent? We've found that those files make a great test of the information in the model.

I'll have a look.

c-rhodes updated this revision to Diff 440557.Jun 28 2022, 3:39 AM

Add tests and missing info for EOR(BT|TB)_ZZZ_[BHSD], LDNT1D_ZZR_D_REAL, and STNT1D_ZZR_D instructions.

Can you take the tests from llvm/test/tools/llvm-mca/AArch64/Cortex/A55-basic-instructions.s and llvm/test/tools/llvm-mca/AArch64/Cortex/A55-neon-instructions.s and replicate them for the new model, and preferably find a way for writing a sve equivalent? We've found that those files make a great test of the information in the model.

Done. For SVE I used the objdump output from the MC tests, removed the duplicates and sorted on opcode. There's still probably a few duplicate variants but the coverage should be good.

Could you add something modern to the basic instruction test, like, e.g., MEMTAG and BTI?

Matt added a subscriber: Matt.Jun 28 2022, 2:21 PM
Matt added inline comments.Jun 28 2022, 2:31 PM
llvm/lib/Target/AArch64/AArch64SchedA64FX.td
21

Is the change to CompleteModel intentional in this patch?

c-rhodes updated this revision to Diff 440891.Jun 29 2022, 1:34 AM

Add tests for MTE and missing info for STGM/IRGstack instructions.

Could you add something modern to the basic instruction test, like, e.g., MEMTAG and BTI?

BTI is just an alias of HINT, but I've added tests for MTE based on llvm/test/MC/Disassembler/AArch64/armv8.5a-mte.txt.

llvm/lib/Target/AArch64/AArch64SchedA64FX.td
21

Is the change to CompleteModel intentional in this patch?

It is, thanks for pointing it out my intention was to notify Fujitsu engineers when I put the patch up, but I completely forgot.

@kawashima-fj @yutsumi

The A64FX model is missing info for FTSSEL, FMSB, PFIRST. Also, RDFFR info is set on the pseudo and not the real instruction.

I suspect there's a bug in the scheduling code somewhere since the model was marked as complete, yet these missing instructions weren't detected
until I added them for this model. I can't add the missing info for your model so I've just set it to incomplete, but thought I'd let you know.

kawashima-fj added inline comments.Jun 29 2022, 5:08 AM
llvm/lib/Target/AArch64/AArch64SchedA64FX.td
21

The A64FX model is missing info for FTSSEL, FMSB, PFIRST. Also, RDFFR info is set on the pseudo and not the real instruction.

@c-rhodes Thanks for notifying me. We Fujitsu will address them and then revert CompleteModel to 1 shortly.

Done. For SVE I used the objdump output from the MC tests, removed the duplicates and sorted on opcode. There's still probably a few duplicate variants but the coverage should be good.

Thanks, that's great.

I think we can use this new schedule for all "Arm-v9" cores in AArch64.td (that are not in-order). It will almost certainly be a better fit than the older A57 model, and be good to get some decent SVE information.

Sounds good, my only concern is we're quite keen to get this into LLVM 15 which branches around the middle of July I believe and making this the default for all v9 cores presumably raises the bar in terms of validating it is better than the A57 for the v9 cores?

I think for Cortex-A710 we can change it, they are very similar microarchitectures. The performance checks I've ran seem just fine for them. I would change the Cortex-X2 and NeoverseV1 as well, although the core is a little different it would be a shame to leave them off. Neoverse512TVB I know less about, but as it doesn't relate to any specific core I would say a model with SVE scheduling is better than one without. We would presumably add all new cores with the new schedule as opposed to the A57 model, and I would prefer not to leave the other SVE cores behind in that regard.

llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s
1858

Is this missing?

llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
4194

Perhaps try and remove some of these.

4465

We can probably get away without different conditions. I don't think they should be important for scheduling.

c-rhodes updated this revision to Diff 442560.Jul 6 2022, 7:15 AM

Update SVE test

c-rhodes marked 2 inline comments as done.Jul 6 2022, 7:16 AM
c-rhodes added inline comments.
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s
1858

Is this missing?

I can't see any issue with this?

I think we can use this new schedule for all "Arm-v9" cores in AArch64.td (that are not in-order). It will almost certainly be a better fit than the older A57 model, and be good to get some decent SVE information.

Sounds good, my only concern is we're quite keen to get this into LLVM 15 which branches around the middle of July I believe and making this the default for all v9 cores presumably raises the bar in terms of validating it is better than the A57 for the v9 cores?

I think for Cortex-A710 we can change it, they are very similar microarchitectures. The performance checks I've ran seem just fine for them. I would change the Cortex-X2 and NeoverseV1 as well, although the core is a little different it would be a shame to leave them off. Neoverse512TVB I know less about, but as it doesn't relate to any specific core I would say a model with SVE scheduling is better than one without. We would presumably add all new cores with the new schedule as opposed to the A57 model, and I would prefer not to leave the other SVE cores behind in that regard.

Sorry for the delayed response I've been off for a week. Thanks for checking the performance, I've set this model as the default for the cores you listed in D129203.

Thanks for the cleanup and second patch. I think apart from the EXTR question this looks good.

llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s
1858

I meant should it be the same as extr x? The software optimization guide mentions "Bitfield extract, one reg" and "Bitfield extract, two regs", but doesn't make a distinction between X regs and W.

We don't model the difference between EXTR where both operands are the same (if that's what "one reg" means) - that should be fine as it sounds minor. But should the matching be including W EXT too:

def : InstRW<[N2Write_3cyc_1I_1M], (instrs EXTRXrri, EXTRWrri)>;
c-rhodes added inline comments.Jul 7 2022, 1:12 AM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s
1858

I meant should it be the same as extr x? The software optimization guide mentions "Bitfield extract, one reg" and "Bitfield extract, two regs", but doesn't make a distinction between X regs and W.

We don't model the difference between EXTR where both operands are the same (if that's what "one reg" means) - that should be fine as it sounds minor. But should the matching be including W EXT too:

def : InstRW<[N2Write_3cyc_1I_1M], (instrs EXTRXrri, EXTRWrri)>;

I was confused by one reg / two reg in the guide as well, I just copied the A57 which implements it like this. I don't know if that's correct, I'll check.

c-rhodes updated this revision to Diff 442826.Jul 7 2022, 2:26 AM

The one reg variant of EXTR (both input registers are the same, can’t be modelled) was incorrectly modelled as the W form. Use the two regs properties for W form.

c-rhodes marked an inline comment as done.Jul 7 2022, 2:28 AM
c-rhodes added inline comments.
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s
1858

I meant should it be the same as extr x? The software optimization guide mentions "Bitfield extract, one reg" and "Bitfield extract, two regs", but doesn't make a distinction between X regs and W.

We don't model the difference between EXTR where both operands are the same (if that's what "one reg" means) - that should be fine as it sounds minor. But should the matching be including W EXT too:

def : InstRW<[N2Write_3cyc_1I_1M], (instrs EXTRXrri, EXTRWrri)>;

I was confused by one reg / two reg in the guide as well, I just copied the A57 which implements it like this. I don't know if that's correct, I'll check.

One register means both inputs are the same as you thought, should probably be fixed for the A57 as well.

dmgreen accepted this revision.Jul 7 2022, 3:10 AM

Thanks LGTM

The one reg variant of EXTR (both input registers are the same, can’t be modelled)

It may be possible with a SchedulePredicate, but I think it's fairly minor. This patch is looking good.

This revision is now accepted and ready to land.Jul 7 2022, 3:10 AM

Thanks LGTM

The one reg variant of EXTR (both input registers are the same, can’t be modelled)

It may be possible with a SchedulePredicate, but I think it's fairly minor. This patch is looking good.

First I've heard of SchedPredicate, good to know. And thanks for reviewing Dave, appreciate it.

This revision was landed with ongoing or failed builds.Jul 8 2022, 2:39 AM
This revision was automatically updated to reflect the committed changes.
dewen added a subscriber: dewen.Aug 26 2022, 4:43 AM
dewen added inline comments.
llvm/lib/Target/AArch64/AArch64SchedNeoverseN2.td
191

@c-rhodes Hi, I've been doing something similar lately. I do not understand ResourceCycles. I find that llvm-mca is used to calculate throughput. I do not see that this is used in other places. Do you know how ResourceCycles affect instruction scheduling?

dewen added inline comments.Aug 26 2022, 6:31 PM
llvm/lib/Target/AArch64/AArch64SchedNeoverseN2.td
191

@c-rhodes @dmgreen Excuse me, could you please answer my question?

dewen removed a subscriber: dewen.Nov 10 2022, 11:01 PM
dewen added inline comments.Nov 10 2022, 11:56 PM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
7174

I found that the throughput information for this instruction (st4d) does not match the description in the documentation. Is there a problem with the document?

dewen added a subscriber: dewen.Nov 11 2022, 12:05 AM
dewen added inline comments.Nov 11 2022, 1:40 AM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
7174

I found that the throughput information for this instruction (st4d) does not match the description in the documentation. Is there a problem with the document?

@Cullen Rhodes

c-rhodes added inline comments.Nov 11 2022, 1:47 AM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
7174

I found that the throughput information for this instruction (st4d) does not match the description in the documentation. Is there a problem with the document?

@Cullen Rhodes

I've seen your comment, I don't have an immediate answer but will check.

dewen added inline comments.Nov 13 2022, 5:59 PM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
7174

Hey, has this comment come to a conclusion? @c-rhodes

c-rhodes added inline comments.Nov 14 2022, 12:51 AM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
7174

Hey, has this comment come to a conclusion? @c-rhodes

The throughput for the ST4 instructions here is incorrect as you pointed out, it should be 9. I noticed ST2 doesn't look correct either. Thanks for reporting, we'll get it fixed.

dewen added inline comments.Nov 14 2022, 11:48 PM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
7174

Okay, look forward to your fix.

dewen added inline comments.Nov 16 2022, 6:51 PM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
7174

Hi, when are you planning to fix this? I'm looking forward to it. @c-rhodes

c-rhodes added inline comments.Nov 17 2022, 12:58 AM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
7174

Hi, when are you planning to fix this? I'm looking forward to it. @c-rhodes

I'm no longer working in this area, someone from my team will fix it but it's not urgent enough to drop everything they're currently doing unless there's something I'm not aware of. Is this blocking you somehow?

dewen added inline comments.Nov 18 2022, 3:05 AM
llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-sve-instructions.s
7174

Haha, I've had a similar problem recently, but I don't know how to solve it, so it's urgent.

llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s