For certain negative indices passed to the VECTOR_SPLICE operation
we can actually directly use the SVE splice instruction by creating
the appropriate predicate. The predicate needs to be constructed in
such a way that all but the last -idx elements are false. We can do
this efficiently using a combination of 'ptrue' (with the appropriate
fixed pattern, e.g. vl1, vl2, etc.) and 'rev'. The advantage of using
these instructions to generate the predicate is they do not set any
flags, unlike the whilelo instruction. This is critical when the splice
operation is in a loop, since we want MachineLICM to hoist the
predicate generation out of the loop.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This is likely to be an improvement, because we can calculate the predicate (ptrue+rev) before the loop, at which point all we need is the splice inside the loop, whereas currently it requires both a lastb + insr inside the loop.
Just left a few nits, otherwise looks good to me.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7835 | nit: IdxVal < 0 && abs(IdxVal) <= 8 | |
7836 | I don't think this check is necessary? | |
7841 | nit: can you use abs(IdxVal) instead? (I personally find that slightly more readable) |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7836 | Unforunately, it is necessary for correctness because the ptrue behaviour changes completely when you specify a fixed pattern that exceeds the number of elements. For example, if we do something like this: ptrue p0.d, vl4 and your vector length=128 bits, then ptrue actually returns an all-false predicate! So the maximum we can do safely without knowing vscale is: ptrue p0.d, vl2 |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7836 | If that happens it means that the original IR was incorrect, because it should never have generated a splice with offset -4 for a `vscale x 2 x eltty> if vscale can be lower than 2. This can be verified in the IR Verifier with an extra check on vscale_range , but it shouldn't be part of the check here. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7836 | OK, in that case maybe I can just change this to an assert and I will add code to the verifier as part of this patch if it's not already there. |
@sdesmalen it seems we have tests that explicitly rely upon working when the splice index > min number of elements. When I add something to the Verifier.cpp these tests fail:
Failed Tests (2):
LLVM :: CodeGen/AArch64/named-vector-shuffles-neon.ll LLVM :: CodeGen/AArch64/named-vector-shuffles-sve.ll
It looks like we permit out-of-bounds indices for all vectors, and in SelectionDAGBuilder::visitVectorSplice we do no checks for scalable vectors, but do return undef for out-of-bounds indices for fixed-width, i.e.
// VECTOR_SHUFFLE doesn't support a scalable mask so use a dedicated node. if (VT.isScalableVector()) { MVT IdxVT = TLI.getVectorIdxTy(DAG.getDataLayout()); setValue(&I, DAG.getNode(ISD::VECTOR_SPLICE, DL, VT, V1, V2, DAG.getConstant(Imm, DL, IdxVT))); return; } unsigned NumElts = VT.getVectorNumElements(); if ((-Imm > NumElts) || (Imm >= NumElts)) { // Result is undefined if immediate is out-of-bounds. setValue(&I, DAG.getUNDEF(VT)); return; }
I'm happy to change the LangRef and definition of the intrinsic so that we can enforce indices <= known min number of elements. This would allow me to remove the -IdxVal <= getVectorMinNumElements() check in the lowering code.
- Removed check for minimum number of elements by ensuring that the intrinsic will never contain indices that are out-of-bounds (see D115933).
- Added more indices by seeing if the index is a supported pattern for the ptrue instruction.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7856 | Idx == -1 is already handled above? |
llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h | ||
---|---|---|
455 | Instead of making some arbitrary value 'invalid', maybe just return an Optional<unsigned> from getSVEPredPatternFromNumElements instead. As an additional benefit, the result value of other uses of getSVEPredPatternFromNumElements in AArch64ISelLowering are checked validity too. |
nit: IdxVal < 0 && abs(IdxVal) <= 8