This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Fold insert(zero, extract(X, 0), 0) -> X, when X is known to zero lanes 1-N
ClosedPublic

Authored by bsmith on Apr 27 2021, 8:28 AM.

Details

Summary

Specifically, this allow us to rely on the lane zero'ing behaviour of
SVE reduce instructions.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Diff Detail

Event Timeline

bsmith created this revision.Apr 27 2021, 8:28 AM
bsmith requested review of this revision.Apr 27 2021, 8:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2021, 8:28 AM
peterwaller-arm accepted this revision.Apr 27 2021, 8:58 AM

LGTM, modulo nit.

Test suggestion: what about testing an insert into a non zeroinitializer vector?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16048

Nit: indent.

Also suggestion: if (Res != SDValue()) return Res;, that way the code can be extended to handle other combines that come in the future with minimal further changes.

This revision is now accepted and ready to land.Apr 27 2021, 8:58 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16048

Erratum: not sure what I was seeing with the whitespace. I think I misread phabricator's diff whitespace hints here.

The documentation for the nodes you are using the implicit zeroing of say "// Only the lower result lane is defined." Should that be changed to explain that all non-zero lanes are zero?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15919

Can this use isConstantSplatVectorAllZeros?

16047–16050

I would create a function for this, to keep PerformDAGCombine straight forward.
It is common to do:

if (SDValue Res = performInsertVectorEltCombine(N, DAG))
  return Res
return performPostLD1Combine(N, DCI, true);
david-arm added inline comments.Apr 28 2021, 1:31 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15894

Sorry to chip in here as I realise I'm not a reviewer. :) However, the ANDV instruction has a scalar SIMD&FP register for it's result, so it doesn't feel right to say all the other lanes >0 in the equivalent SVE register are zero. I'd have expected something more like isOnlyFirstLaneDefined? I understand you named this function because currently it's only called from performInsertVectorEltCombine where the insert vector is a null splat, but other users may call it elsewhere in future and it feels a bit dangerous to give it a misleading name that's all.

paulwalker-arm added inline comments.Apr 28 2021, 2:11 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15894

This is not true @david-arm . The SVE reduction nodes are defined to return a vector result to match the instructions behaviour of modifying all bits of their destination Z register. You can see this in LowerReductionToSVE where after the reduction we extract the scalar required to match the definition of the common VECREDUCE nodes. This patch is one of the reasons we do this so the explicit behaviour is captured and thus can be taken advantage of.

david-arm added inline comments.Apr 28 2021, 2:32 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15894

Ah ok, fair enough. I was just looking at the actual ANDV instruction in the developer manual that's all, whereas from what you're saying we've defined ANDV_PRED to zero all lanes > 0. Sorry for the confusion!

bsmith updated this revision to Diff 341229.Apr 28 2021, 8:59 AM
bsmith marked 4 inline comments as done.
  • Move combine switch logic into a separate function
  • Reuse existing isConstantSplatVectorAllZeros function
Matt added a subscriber: Matt.Apr 29 2021, 1:16 PM
dmgreen accepted this revision.Apr 30 2021, 10:33 AM

Thanks. This LGTM.

This revision was landed with ongoing or failed builds.May 4 2021, 7:05 AM
This revision was automatically updated to reflect the committed changes.