This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Order STP Q's by ascending address
ClosedPublic

Authored by avieira on May 11 2022, 5:12 AM.

Details

Summary

Hi,

This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule STP Q's to the same base-address in ascending order of offsets. We have found this to improve performance on Neoverse N1 and should not hurt other AArch64 cores.

Diff Detail

Event Timeline

avieira created this revision.May 11 2022, 5:12 AM
avieira requested review of this revision.May 11 2022, 5:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2022, 5:12 AM

In order sounds sensible to me. We may need a subtarget feature for this, it depends on what the Apple folks think, but we can always add one later if needed.

llvm/lib/Target/AArch64/AArch64MachineScheduler.cpp
19

Can you add a comment explaining what this is doing and why.

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
488

Should this either return DAG, or create DAG inside the if.

avieira added inline comments.May 12 2022, 1:37 AM
llvm/lib/Target/AArch64/AArch64MachineScheduler.cpp
19

That sounds like a good idea yeah ;)

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
488

Ah yeah good catch! Forgot to change that when I moved the DSG creation out of the Fusion check.

fhahn added a comment.May 12 2022, 8:26 AM

In order sounds sensible to me. We may need a subtarget feature for this, it depends on what the Apple folks think, but we can always add one later if needed.

I *think* it should be fine, but I can double-check.

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
484

Might make sense to move the code from AArch64MacroFusion.cpp to the new "AArch64MachineScheduler.h"

avieira updated this revision to Diff 429991.May 17 2022, 3:35 AM
avieira added inline comments.
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
484

I don't have strong feelings about this, but other targets seem to keep mutations and MachineScheduler definitions separately. Could put them all in the same file if you prefer or leave it as is... I do assume you meant merge AArch64MacroFusion.{h,cpp} with respectively AArch64MachineScheduler.{h,cpp}.

So long as there is no issues from @fhahn, this LGTM.

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
479

Can you make sure you clang-format the patch? The formatting is a bit off here.

fhahn accepted this revision.May 19 2022, 1:11 PM

LGTM, I think that should be fine. ordering by ascending address is likely also slightly more readable for our users.

llvm/lib/Target/AArch64/AArch64MachineScheduler.cpp
19

might be good to reduce an early exit here if the candidate isn't valid to reduce the nesting level below.

23

nit: IMO it would be easier to follow the code if there would be a comment upfront describing what the whole block does, rather than interleaving the comments.

llvm/lib/Target/AArch64/AArch64MachineScheduler.h
34

nit: Stray newline

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
484

I do assume you meant merge AArch64MacroFusion.{h,cpp} with respectively AArch64MachineScheduler.{h,cpp}.

Yeah, moving the stuff from AArch64MacroFusion to AArch64MachineScheduler. In any case that's something that should be done one as follow-up.

This revision is now accepted and ready to land.May 19 2022, 1:11 PM
Allen added a subscriber: Allen.May 21 2022, 10:24 PM

hi @avieira , Does Neoverse N1 Out of order execution, why does the order of instruction launch significantly affect the performance?

Hi @Allen ,

I believe that 'scheduling doesn't matter for Out of Order cores' is a long-standing myth that we've seen not to be correct. Yes, scheduling is definitely different for out of order cores, the problem shifts towards thinking about a sliding window of instructions that go into specific pipelines and dispatch queues and the likes. And you now find yourself trying to optimize the utilisation of pipelines, avoiding bubbles, rather than looking for the 'perfect sequence'. Out of order execution has some other limits and in some cases it helps if the compiler can lend the core a hand. In this case the Neoverse N1 prefers ascending STP Q's and an updated Neoverse N1 Software Optimization Guide will be reflecting this.

This revision was landed with ongoing or failed builds.May 23 2022, 1:51 AM
This revision was automatically updated to reflect the committed changes.
Allen added a comment.Jun 1 2022, 6:54 PM

Thanks @avieira for your detailed explanation, and according your idea, I extend more stores in D126700, can you help me review, thanks.