This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix schedmodel zero latency moves for Neoverse V2
ClosedPublic

Authored by rjj on Sep 5 2023, 4:32 AM.

Details

Summary

Model some register-to-register move operations and move immediate
operations as "zero latency moves", as described in the Software
Optimisation Guide (SOG), §4.12:

https://developer.arm.com/documentation/PJDOC-466751330-593177/r0p2/

NB I've assumed there's a mistake in the SOG on p. 63 in the following
instructions:

mov  h1, wzr
mov  h1, xzr
mov  s1, wzr
mov  d1, xzr

(The mov should be an fmov.)

Diff Detail

Event Timeline

rjj created this revision.Sep 5 2023, 4:32 AM
Herald added a project: Restricted Project. · View Herald Transcript
rjj requested review of this revision.Sep 5 2023, 4:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2023, 4:32 AM
rjj edited the summary of this revision. (Show Details)Sep 5 2023, 4:33 AM

I agree with these observations, wrote a couple of micro-benchmarks, and think that on Grace I can confirm:

These instructions do not utilize the scheduling and execution resources of the machine.

(taken from the SWOG).

I am assuming that fixing this helps LLVM-MCA? I don't expect this to make a difference for performance of the codegen, but what I can offer is to check that with a couple of performance runs on hardware as I am assuming you haven't been able to verify that.

I added one question inline, and am also curious what Dave thinks about this.

llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-basic-instructions.s
2535

This MOV seems to be affected, but doesn't seem to right?

rjj marked an inline comment as done.Sep 5 2023, 6:49 AM

I agree with these observations, wrote a couple of micro-benchmarks, and think that on Grace I can confirm:

These instructions do not utilize the scheduling and execution resources of the machine.

(taken from the SWOG).

I am assuming that fixing this helps LLVM-MCA? I don't expect this to make a difference for performance of the codegen, but what I can offer is to check that with a couple of performance runs on hardware as I am assuming you haven't been able to verify that.

I added one question inline, and am also curious what Dave thinks about this.

Yep, the main aim of this is to make MCA a bit more accurate. If you could check performance on hardware that would be great!

Cheers,
Ricardo

llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-basic-instructions.s
2535

That should be the last MOV listed on §4.12, MOV Xd, Xn.

SjoerdMeijer added inline comments.Sep 5 2023, 6:58 AM
llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-basic-instructions.s
2535

Ah sorry, missed that. I thought they were only zeroing moves, overlooked the last two moves in that list.

SjoerdMeijer accepted this revision.Sep 6 2023, 6:42 AM

I have ran SPEC FP and INT and this patch didn't change anything, which is what we were hoping for.
So I am okay with this patch and think it makes sense, but allow a few days for the Arm folks to comment before committing.

This revision is now accepted and ready to land.Sep 6 2023, 6:42 AM
dmgreen accepted this revision.Sep 6 2023, 1:27 PM

The patch sounds good, if the results look OK. Thanks.

rjj marked an inline comment as done.Sep 7 2023, 2:48 AM

I have ran SPEC FP and INT and this patch didn't change anything, which is what we were hoping for.
So I am okay with this patch and think it makes sense, but allow a few days for the Arm folks to comment before committing.

The patch sounds good, if the results look OK. Thanks.

Grand, thanks very much!

This revision was landed with ongoing or failed builds.Sep 7 2023, 2:56 AM
This revision was automatically updated to reflect the committed changes.