This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Use splice instruction when lowering VECTOR_SPLICE
ClosedPublic

Authored by david-arm on Dec 16 2021, 1:10 AM.

Details

Summary

For certain negative indices passed to the VECTOR_SPLICE operation
we can actually directly use the SVE splice instruction by creating
the appropriate predicate. The predicate needs to be constructed in
such a way that all but the last -idx elements are false. We can do
this efficiently using a combination of 'ptrue' (with the appropriate
fixed pattern, e.g. vl1, vl2, etc.) and 'rev'. The advantage of using
these instructions to generate the predicate is they do not set any
flags, unlike the whilelo instruction. This is critical when the splice
operation is in a loop, since we want MachineLICM to hoist the
predicate generation out of the loop.

Diff Detail

Event Timeline

david-arm created this revision.Dec 16 2021, 1:10 AM
david-arm requested review of this revision.Dec 16 2021, 1:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2021, 1:10 AM

This is likely to be an improvement, because we can calculate the predicate (ptrue+rev) before the loop, at which point all we need is the splice inside the loop, whereas currently it requires both a lastb + insr inside the loop.
Just left a few nits, otherwise looks good to me.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7835

nit: IdxVal < 0 && abs(IdxVal) <= 8

7836

I don't think this check is necessary?

7841

nit: can you use abs(IdxVal) instead? (I personally find that slightly more readable)

david-arm added inline comments.Dec 16 2021, 5:21 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7836

Unforunately, it is necessary for correctness because the ptrue behaviour changes completely when you specify a fixed pattern that exceeds the number of elements. For example, if we do something like this:

ptrue p0.d, vl4

and your vector length=128 bits, then ptrue actually returns an all-false predicate! So the maximum we can do safely without knowing vscale is:

ptrue p0.d, vl2
sdesmalen added inline comments.Dec 16 2021, 5:26 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7836

If that happens it means that the original IR was incorrect, because it should never have generated a splice with offset -4 for a `vscale x 2 x eltty> if vscale can be lower than 2. This can be verified in the IR Verifier with an extra check on vscale_range , but it shouldn't be part of the check here.

david-arm added inline comments.Dec 16 2021, 5:29 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7836

OK, in that case maybe I can just change this to an assert and I will add code to the verifier as part of this patch if it's not already there.

@sdesmalen it seems we have tests that explicitly rely upon working when the splice index > min number of elements. When I add something to the Verifier.cpp these tests fail:

Failed Tests (2):

LLVM :: CodeGen/AArch64/named-vector-shuffles-neon.ll
LLVM :: CodeGen/AArch64/named-vector-shuffles-sve.ll

It looks like we permit out-of-bounds indices for all vectors, and in SelectionDAGBuilder::visitVectorSplice we do no checks for scalable vectors, but do return undef for out-of-bounds indices for fixed-width, i.e.

// VECTOR_SHUFFLE doesn't support a scalable mask so use a dedicated node.
if (VT.isScalableVector()) {
  MVT IdxVT = TLI.getVectorIdxTy(DAG.getDataLayout());
  setValue(&I, DAG.getNode(ISD::VECTOR_SPLICE, DL, VT, V1, V2,
                           DAG.getConstant(Imm, DL, IdxVT)));
  return;
}

unsigned NumElts = VT.getVectorNumElements();

if ((-Imm > NumElts) || (Imm >= NumElts)) {
  // Result is undefined if immediate is out-of-bounds.
  setValue(&I, DAG.getUNDEF(VT));
  return;
}

I'm happy to change the LangRef and definition of the intrinsic so that we can enforce indices <= known min number of elements. This would allow me to remove the -IdxVal <= getVectorMinNumElements() check in the lowering code.

david-arm updated this revision to Diff 395096.Dec 17 2021, 4:50 AM
  • Removed check for minimum number of elements by ensuring that the intrinsic will never contain indices that are out-of-bounds (see D115933).
  • Added more indices by seeing if the index is a supported pattern for the ptrue instruction.
sdesmalen added inline comments.Dec 20 2021, 7:18 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7856

Idx == -1 is already handled above?

david-arm updated this revision to Diff 397253.Jan 4 2022, 3:44 AM
  • Change IdxVal >= -1 check to be IdxVal >= 0 because the -1 is now already covered.
sdesmalen added inline comments.Jan 4 2022, 4:07 AM
llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
455

Instead of making some arbitrary value 'invalid', maybe just return an Optional<unsigned> from getSVEPredPatternFromNumElements instead.
Because any value that's not between 0 and 31 inclusive, is invalid.

As an additional benefit, the result value of other uses of getSVEPredPatternFromNumElements in AArch64ISelLowering are checked validity too.

david-arm updated this revision to Diff 397295.Jan 4 2022, 7:13 AM
  • Changed getSVEPredPatternFromNumElements to return Optional<unsigned>
david-arm marked an inline comment as done.Jan 4 2022, 7:14 AM
sdesmalen accepted this revision.Jan 4 2022, 7:32 AM

Thanks, LGTM!

This revision is now accepted and ready to land.Jan 4 2022, 7:32 AM
Matt added a subscriber: Matt.Jan 4 2022, 9:18 AM
This revision was landed with ongoing or failed builds.Jan 11 2022, 3:58 AM
This revision was automatically updated to reflect the committed changes.