This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Convert sra(X, elt_size(X)-1) to cmlt(X, 0)
ClosedPublic

Authored by labrinea on Dec 9 2021, 11:05 AM.

Details

Summary

CMLT has twice the execution throughput of SSHR on Arm out-of-order cores.

Diff Detail

Event Timeline

labrinea created this revision.Dec 9 2021, 11:05 AM
labrinea requested review of this revision.Dec 9 2021, 11:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2021, 11:05 AM

Can you add a reasoning to the commit message? As far as I understand, the CMLT has a higher throughput on many cpus than the sshr.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11229

This add exemption makes it a bit awkward. Do we need to convert this early, or could we just add tablegen patterns and let it pick the "best" one for the given code, like it is fairly descent at.

Something like this, for each type, might be enough:

def : Pat<(v16i8 (AArch64vashr (v16i8 V128:$Rn), (i32 7))),
          (CMLTv16i8rz V128:$Rn)>;
llvm/test/CodeGen/AArch64/arm64-vshr.ll
49

I would leave the old test name, or maybe change the constant so it's no longer 63 and still tests a sshr is produced.

chill added a subscriber: chill.Dec 10 2021, 2:08 AM
chill added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11229

Why is ADD an exception?

labrinea added inline comments.Dec 13 2021, 11:50 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11229

When the Shift is followed by an Add we should prefer emitting a single instruction (SSRA) instead of two instructions (CMLT, ADD). As Dave suggested above I might be able to work around this with tablegen patterns.

labrinea updated this revision to Diff 393987.Dec 13 2021, 11:59 AM
labrinea edited the summary of this revision. (Show Details)
  • Reimplemented the transformation using tablegen patterns.
  • Updated the description to explain the motivation behind this change.
dmgreen accepted this revision.Dec 14 2021, 1:22 AM

Thanks. LGTM with a couple of suggestions.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
4179 ↗(On Diff #393987)

I would remove all these empty lines, to keep the related patterns together.

4815 ↗(On Diff #393987)

There is a v1i64 CMLT here if you want to add the pattern for that too. v1 types usually matter less but it may be good to have it for consistency.

This revision is now accepted and ready to land.Dec 14 2021, 1:22 AM
labrinea added inline comments.Dec 14 2021, 1:51 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
4815 ↗(On Diff #393987)

There is no 1D variant if that's what you meant. According to ArmARM the encoding (size = 11 , Q = 0) is reserved.

dmgreen added inline comments.Dec 14 2021, 1:56 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
4815 ↗(On Diff #393987)

This is the "Scalar" variant, not the "Vector" variant.

This one, I think: https://developer.arm.com/documentation/dui0802/a/A64-Advanced-SIMD-Scalar-Instructions/CMLT--scalar--zero-

This revision was landed with ongoing or failed builds.Dec 14 2021, 8:09 AM
This revision was automatically updated to reflect the committed changes.