This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Fold target specific ext/trunc nodes into loads/stores
ClosedPublic

Authored by bsmith on Jun 17 2022, 8:50 AM.

Details

Summary

Due to the way fixed length SVE lowering works, we sometimes introduce
ext/trunc nodes very late, these nodes then immediately get converted
into target specific nodes (UUNPKLO/UZP1) before they get a chance to be
folded into a load/store.

This patch introduces target specific dag combines for these nodes so that
we can still create extending loads/truncating stores out of them.

Diff Detail

Event Timeline

bsmith created this revision.Jun 17 2022, 8:50 AM
Herald added a project: Restricted Project. · View Herald Transcript
bsmith requested review of this revision.Jun 17 2022, 8:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2022, 8:50 AM
Matt added a subscriber: Matt.Jun 27 2022, 2:15 PM
paulwalker-arm added inline comments.Jul 1 2022, 7:59 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15220–15222

Is this universally true across all predicate patterns? Given my comment below I'm wondering if you'll eventually just create fresh PTRUEs rather than have to answer this question.

17034–17036

This function also accepts AArch64ISD::UUNPKHI so you'll need to protect against that.

17051

I think you've confused extending with explanding? I don't think this is something we support for AArch64, which explains why most code in this file just omits this parameter.

17413–17419

Not quite sure but this feels too easy. What if you actually want to store the result of an UZP1?

I think the current value of the predicate plays a key role here. The case you're trying to handle is when we're storing N elements where all those elements come from one side of an UZIP1.

Not sure if the loads have the same issue but I figure it's safest for the two to have symmetry.

bsmith updated this revision to Diff 441776.Jul 1 2022, 12:59 PM
bsmith edited the summary of this revision. (Show Details)
  • Ensure we don't try and combine a UUNPKHI.
  • Remove incorrect IsExapanding argument to getMaskedLoad.
  • Explicitly check for and create new PTRUE for mask.
  • Properly check that the PTRUE we are creating can be represented at the current VL.
  • Remove extract_subvector ptrue combine as it is no longer needed.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17047

Is this not just VT? It's not related to any input from what I can see, or have a misunderstood the naming?

17051

getNumElementsFromSVEPredPattern can return 0 for cases other than all. Perhaps

unsigned NumElts = getNumElementsFromSVEPredPattern(PgPattern);
if (NumElts && NumElts * InVT.getVectorElementType().getSizeInBits() <= MinSVESize)
17058

Is this correct? Given the new load has a different result type, I'd expect PassThru to also change. Speaking of which, I think the combine is only equivalent when the original PassThru is zero or undef and the new one is forced to zero?

bsmith updated this revision to Diff 445477.Jul 18 2022, 6:40 AM
  • Rename InVT to VT for clarity.
  • Check for zero return from getNumElementsFromSVEPredPattern.
  • Bail out if passthru is not zero or undef.
  • Force passthru to be zero so as to not introduce extra undef bits.
bsmith updated this revision to Diff 445500.Jul 18 2022, 7:47 AM
  • Remove redundant check against all predicate pattern
Allen added a subscriber: Allen.Jul 18 2022, 7:49 AM
bsmith updated this revision to Diff 446875.Jul 22 2022, 9:39 AM
  • Tighten up checks on bitcast in store case to ensure we have the correct types.
  • Remove redundant uzp1 operand check.
  • Bail upon encountering non-integer types.
paulwalker-arm accepted this revision.Jul 24 2022, 4:35 AM

I've added a suggested improvement if you agree, but otherwise looks good.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17038

This is not necessary because AArch64ISD::UUNPKLO only supports integer types.

17415–17422

Whilst this works, you don't need to restrict the combine like this. Here you're checking the stored elements all come from UZP1's first operand. You could do this using:

ValueVT.isInteger() && ValueVT != MVT::nxv2i64) {
EVT HalfVT = ValueVT.getHalfNumVectorElementsVT(*DAG.getContext());
EVT InVT = HalfVT.widenIntegerVectorElementType(*DAG.getContext());

followed by your current NumElts check. Then just before the getMaskedStore() create a fresh bitcast (e.g. Value = bitcast(Value.getOperand(0) to InVT). This means you don't really care what UZP1's first operand is and you might catch more cases.

This revision is now accepted and ready to land.Jul 24 2022, 4:35 AM
bsmith added inline comments.Jul 25 2022, 8:09 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17415–17422

I think for now this change should be left to a separate ticket, if we ever encounter this causing an issue. The logic around this is already quite confusing and I worry a change like that would make this section of code very confusing, whilst possibly not benefiting anything realistic.

This revision was landed with ongoing or failed builds.Jul 25 2022, 8:24 AM
This revision was automatically updated to reflect the committed changes.