CMLT has twice the execution throughput of SSHR on Arm out-of-order cores.
Details
Diff Detail
Event Timeline
Can you add a reasoning to the commit message? As far as I understand, the CMLT has a higher throughput on many cpus than the sshr.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
11229 | This add exemption makes it a bit awkward. Do we need to convert this early, or could we just add tablegen patterns and let it pick the "best" one for the given code, like it is fairly descent at. Something like this, for each type, might be enough: def : Pat<(v16i8 (AArch64vashr (v16i8 V128:$Rn), (i32 7))), (CMLTv16i8rz V128:$Rn)>; | |
llvm/test/CodeGen/AArch64/arm64-vshr.ll | ||
49 | I would leave the old test name, or maybe change the constant so it's no longer 63 and still tests a sshr is produced. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
11229 | Why is ADD an exception? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
11229 | When the Shift is followed by an Add we should prefer emitting a single instruction (SSRA) instead of two instructions (CMLT, ADD). As Dave suggested above I might be able to work around this with tablegen patterns. |
- Reimplemented the transformation using tablegen patterns.
- Updated the description to explain the motivation behind this change.
Thanks. LGTM with a couple of suggestions.
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
4179 ↗ | (On Diff #393987) | I would remove all these empty lines, to keep the related patterns together. |
4815 ↗ | (On Diff #393987) | There is a v1i64 CMLT here if you want to add the pattern for that too. v1 types usually matter less but it may be good to have it for consistency. |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
4815 ↗ | (On Diff #393987) | There is no 1D variant if that's what you meant. According to ArmARM the encoding (size = 11 , Q = 0) is reserved. |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
4815 ↗ | (On Diff #393987) | This is the "Scalar" variant, not the "Vector" variant. This one, I think: https://developer.arm.com/documentation/dui0802/a/A64-Advanced-SIMD-Scalar-Instructions/CMLT--scalar--zero- |
This add exemption makes it a bit awkward. Do we need to convert this early, or could we just add tablegen patterns and let it pick the "best" one for the given code, like it is fairly descent at.
Something like this, for each type, might be enough: