This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Match VLS all-1's masks to PTRUE
AbandonedPublic

Authored by cameron.mcinally on Feb 18 2022, 11:42 AM.

Details

Summary

Here's a patch to match VLS all-1s masks to PTRUE. There were a few places, before and after legalization, that this could be done, but I think SETCC_MERGE_ZERO combines are the best fit.

Diff Detail

Event Timeline

cameron.mcinally requested review of this revision.Feb 18 2022, 11:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2022, 11:42 AM

Why didn't or cannot InstCombine catch this?

Why didn't or cannot InstCombine catch this?

I may be misunderstanding, but this pattern is just a legalized truncate, e.g. ({1,1,1,1} & splat(1)) != splat(0). The VLS->VLA transition is creating a bunch of extra nodes that need to be matched.

Fix formatting for the Lint bots.

Notice the "sign_extend" -> "sext" change to fit in 80 columns. This no longer matches the ISD node naming scheme, so it's a little weird. I didn't see a better fix for it though.

david-arm added a subscriber: craig.topper.EditedFeb 21 2022, 6:49 AM

Hi @cameron.mcinally, I really like what you're trying to do in this patch and the codegen indeed looks a lot better! I just had some suggestions about a possibly simpler, and more comprehensive approach that might give us more overall benefit.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17054

I think you also need to check for && !Negated here. Alternatively, I think you could just do:

APInt SplatVal;
if (isAllActivePredicate(DAG, Pred) && LHS.getOpcode() == ISD::AND &&
    ISD::isConstantSplatVector(LHS.getOperand(1), SplatVal) && SplatVal == 1) {

The only additional value that isPow2Splat adds here is that it also checks for AArch64ISD::DUP nodes, but I imagine at the point we're doing the DAG combines here we haven't generated an AArch64 ISD node yet?

17060

It feels like we should have a more basic DAG combine here, i.e. something in either DAGCombiner::visitINSERT_SUBVECTOR or in performInsertSubvectorCombine, that basically combines

<vscale x M x iXY> insert_subvector <vscale x M x iXY> undef, <N x iXY> <splat of iXY A>

into

<vscale x M x iXY> <splat of iXY A>

when we know that vscale x M == N.

If you implement such a DAG combine it might benefit other parts of the code too? It would then mean here you should only have to check if Trunc is a splat of 1, which may also catch more cases.

17062

Again, here I think you need to check && !TruncNegated

Hi @cameron.mcinally, sorry I mentioned @craig.topper in my previous comment, but I meant you. It's because I've also just reviewed one of Craig's patches so I got mixed up. :)

Does D120328 achieve the effect you're after @cameron.mcinally? I need to pull out and extend the LowerSPLAT_VECTOR related change but figured I'd push up my current work in case it helps.

cameron.mcinally updated this revision to Diff 410604.EditedFeb 22 2022, 11:58 AM

Updated patch based on @david-arm's review.

@paulwalker-arm, D120328 looks good too. I'm happy to go with that one. But I notice that it will define bits that may be undef otherwise. E.g. we're inserting a 1/4 width vector into a full width vector, or Idx != 0.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14458

I'm not sure if getVScaleForTuning is the right way to go here, but it seemed like the cleanest solution.

I also wonder if all insert_subvector(undef, splat(X), 0)->splat(X) should be canonicalized here. I don't have a strong opinion on it though.

I also wondered about defining bits that may be undef otherwise, but am unsure how much is really matters. I'll see if I can limit (when or perhaps just handle the constant case) the combine to reduce any potential downsides and report back. I'll also note that I believe this patch suffers the same problem because getVScaleForTuning() is only a hint. You can use getMinSVEVectorSizeInBits and getMaxSVEVectorSizeInBits to see if the true size is known but the downside if that you'll only be optimising the cases when the fixed length vector is the same size as the scalar equivalent, which means not all vectorised loops will see the benefit.

cameron.mcinally abandoned this revision.Feb 23 2022, 6:45 AM

Good point. Replacing the lowered truncates with ptrue sounds like a win in the general case. Abandoning this Diff.