This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Perform first active true vector combine
ClosedPublic

Authored by Allen on Mar 3 2022, 4:17 AM.

Details

Summary

Materialize : i1 = extract_vector_elt t37, Constant:i64<0>

... into: "ptrue p, all" + PTEST

Test bit of lane 0 can use P register directly, and the instruction “pture all”
is loop invariant, which will beneficial to SVE after hoisting out the loop.

Diff Detail

Event Timeline

Allen created this revision.Mar 3 2022, 4:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 4:17 AM
Allen requested review of this revision.Mar 3 2022, 4:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 4:17 AM
david-arm added inline comments.Mar 4 2022, 3:13 AM
llvm/test/CodeGen/AArch64/sve-extract-element.ll
486 ↗(On Diff #412670)

To be honest, this doesn't look like a win to me. We're introducing control flow where previously we had none. I think we should avoid setting the flags when the original code set none.

I think I'd prefer it if you restricted the DAG combine to only cases where we're extracting from a flag-setting operation, i.e. an extract from a icmp.

llvm/test/CodeGen/AArch64/sve-pture.ll
1 ↗(On Diff #412670)

Maybe this is better in something like sve-cmp-folds.ll?

51 ↗(On Diff #412670)

This looks quite an odd IR sequence and I don't think will ever be generated by the vectoriser. I imagine this has come from hand-written SVE ACLE code.

Allen updated this revision to Diff 413268.EditedMar 5 2022, 7:12 PM

add SetCC.getOpcode() != ISD::SETCC, and now the case in file llvm/test/CodeGen/AArch64/sve-extract-element.ll is not touched

Allen updated this revision to Diff 413273.Mar 5 2022, 10:59 PM
Allen marked 2 inline comments as done.

merge the test into file llvm/test/CodeGen/AArch64/sve-cmp-folds.ll

david-arm accepted this revision.Mar 7 2022, 1:09 AM

LGTM! Thanks for making the changes @Allen.

This revision is now accepted and ready to land.Mar 7 2022, 1:09 AM
paulwalker-arm added inline comments.Mar 7 2022, 3:13 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15450

You could use Idx->isZero() here.

18399

performExtractVectorEltCombine looks like a common placeholder for all related combines so can the call to performFirstTrueTestVectorCombine me moved into it?

llvm/test/CodeGen/AArch64/sve-cmp-folds.ll
56–67

This test looks rather more involved than the DAGCombine it's trying to verify. Why not just:

define i1 @foo(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
  %vcond = fcmp oeq <vscale x 4 x float> %a, %b
  %bit = extractelement <vscale x 4 x i1> %vcond, i64 0
  ret i1 %bit
}
Allen updated this revision to Diff 413463.Mar 7 2022, 7:17 AM
Allen marked an inline comment as done.

Address comments

paulwalker-arm added inline comments.Mar 7 2022, 8:17 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14376

Sorry I missed this previously but I think this is bug and should be
if (!Subtarget->hasSVE() || DCI.isBeforeLegalize()) as otherwise you might create a PTEST with illegal types, which cannot be legalised and will instead cause a compiler crash.

14400

I think it's worth having this same assert inside performFirstTrueTestVectorCombine just in case it gets called via a different route in the future.

14401

Unwanted space?

Allen updated this revision to Diff 413503.Mar 7 2022, 8:46 AM
Allen marked 3 inline comments as done.
paulwalker-arm accepted this revision.Mar 7 2022, 8:55 AM
This revision was landed with ongoing or failed builds.Mar 7 2022, 9:13 AM
This revision was automatically updated to reflect the committed changes.
Allen marked 3 inline comments as done.