This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix schedmodel pre/post-index loads and stores for Neoverse V2
ClosedPublic

Authored by SjoerdMeijer on Aug 31 2023, 1:06 AM.

Details

Summary

Pre and post-index loads and stores are modelled incorrectly. The address update is modelled with too high latency. This is easily visible in llvm-mca's output and we discussed that earlier here:

https://github.com/llvm/llvm-project/issues/61047#issuecomment-1452120079

Part of the problem has to do with the way operands are defined in the scheduling model. It affects other operands as well (see the fadd in https://godbolt.org/z/d1Gbr48cE), but this seems to be a good start.

This fixes the problem for the Neoverse V2. The problem also affects the other Neoverse cores (different schedmodels influenced each other), but I won't be fixing the others as it would require checking performance. For the V2, performance wise this is also ok.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Aug 31 2023, 1:06 AM
SjoerdMeijer requested review of this revision.Aug 31 2023, 1:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2023, 1:06 AM
david-arm added inline comments.
llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td
1367

This seems to be matching both pre- and post- forms. Same with the change above. Does that matter?

Oh, So it does. There are a lot of new tests, which is good to see. Are there any that test the loaded latency, not the index update?

ldr  x1, [x27], #254
ldr  x2, [x1], #254
rjj added a subscriber: rjj.Aug 31 2023, 4:30 AM

Thanks for taking a look!

Are there any that test the loaded latency, not the index update?

ldr x1, [x27], #254
ldr x2, [x1], #254

I have added this test. Test file V2-writeback.s is a new one. I have precommitted that locally, to only show the differences here.

With your test case added, and this patch applied, there is no difference before/after. In other words, the behaviour is not changed, which I think is what you wanted to check:

CHECK:      [0,0]     DeeeeER   .   ldr       x1, [x27], #254
CHECK-NEXT: [0,1]     D====eeeeER   ldr       x2, [x1], #254
llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td
1367

Another way of looking at this is that this is a correctness fix: the first operand is the update/increment. By fixing that, the correct performance behaviour just falls out from that. And yes, I think doing this for the pre- and post- forms is what we want.

dmgreen accepted this revision.Sep 1 2023, 1:34 AM

Thanks for taking a look!

Are there any that test the loaded latency, not the index update?

ldr x1, [x27], #254
ldr x2, [x1], #254

I have added this test. Test file V2-writeback.s is a new one. I have precommitted that locally, to only show the differences here.

With your test case added, and this patch applied, there is no difference before/after. In other words, the behaviour is not changed, which I think is what you wanted to check:

CHECK:      [0,0]     DeeeeER   .   ldr       x1, [x27], #254
CHECK-NEXT: [0,1]     D====eeeeER   ldr       x2, [x1], #254

It's not there at the moment, right? I'm not just being blind? So long as there is a test for it in the committed version then it sounds good to me. LGTM, thanks

This revision is now accepted and ready to land.Sep 1 2023, 1:34 AM
SjoerdMeijer added a comment.EditedSep 1 2023, 6:37 AM

It's not there at the moment, right? I'm not just being blind? So long as there is a test for it in the committed version then it sounds good to me. LGTM, thanks

I can confirm there's nothing wrong with your eyes!
File test/tools/llvm-mca/AArch64/Neoverse/V2-writeback.s is a new file, that I have "pre-committed locally". I have added your test to it, so is going to be there in the commit.

This revision was landed with ongoing or failed builds.Sep 4 2023, 3:09 AM
This revision was automatically updated to reflect the committed changes.
llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td