This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add more folds to make use of gather/scatter with 32-bit indices
ClosedPublic

Authored by CarolineConcatto on Jan 27 2022, 3:25 AM.

Details

Summary

In AArch64ISelLowering.cpp this patch implements this fold:

  1. GEP (%ptr, SHL ((stepvector(A) + splat(%offset))) << splat(B)))

into GEP (%ptr + (%offset << B), step_vector (A << B))

The above transform simplifies the index operand so that it can be expressed
as i32 elements.
This allows using only one gather/scatter assembly instruction instead of two.

Patch by Paul Walker (@paulwalker-arm).

Depends on D117900

Diff Detail

Event Timeline

CarolineConcatto requested review of this revision.Jan 27 2022, 3:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2022, 3:25 AM
Matt added a subscriber: Matt.Jan 27 2022, 5:43 AM

Just as a suggestion, perhaps it's worth changing the title of this patch so that it's different to D117900? It can be a bit confusing otherwise trying to review them! You could change it to something like:

[AArch64][SVE] Fold gather/scatter with 32bits when possible (Part 2)

or

[AArch64][SVE] Add more folds to make use of gather/scatter with 32-bit indices

?

CarolineConcatto retitled this revision from [AArch64][SVE] Fold gather/scatter with 32bits when possible to [AArch64][SVE] Add more folds to make use of gather/scatter with 32-bit indices.
CarolineConcatto edited the summary of this revision. (Show Details)
  • change commit message tittle
  • rebase depends on D118459 too
sdesmalen added inline comments.Feb 1 2022, 3:15 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16394

Does the following work:

if (auto *Shift = dyn_cast_or_null<ConstantSDNode>(DAG.getSplatValue(Index.getOperand(1))) {
  ..
}

?

If so, that removes the need to do auto *C = dyn_cast<ConstantSDNode>(Shift) and the return if C == nullptr.

16402

nit: the comment as-is seems redundant.

I think it's worth saying though that Stride is not scaled explicitly by 'Scale', because this is done implicitly by the gather/scatter addressing mode.

16404–16405

nit: Change to:

// BasePtr = BasePtr + ((Offset * Scale) << Shift)
Offset = DAG.getNode(ISD::MUL, DL, MVT::i64, Offset, N->getScale());
CarolineConcatto marked 2 inline comments as done.
  • Address nits
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16394

I can, but I will use Shift later on:
Offset = DAG.getNode(ISD::SHL, DL, MVT::i64, Offset, Shift);
My skills may not be the best. But I believe that I cannot use Shift as a ConstantSDNode when setting Offset.
I believe I need to make Shift be a splat of vscale type again.
I don't think this would be better than now.

  • Use dyn_cast_or_null<ConstantSDNode> to create Shift
  • get Shift value with SDValue(Shift, 0)

as suggested by Sander

Just a few final comments, otherwise looks fine to me.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16393

I'd suggest creating some variables here, so that further on you don't need to write something like:

Index.getOperand(0).getOperand(0).getOperand(0)

Instead, if you create some variables:

SDValue Add = Index.getOperand(0);
SDValue ShiftOp = Index.getOperand(1);
SDValue StepOp = Add.getOperand(0);
SDValue OffsetOp = Add.getOperand(1);

Then the code below becomes a bit more readable:

if (auto *Shift = dyn_cast_or_null<ConstantSDNode>(DAG.getSplatValue(ShiftOp))) {
  if (SDValue Offset = DAG.getSplatValue(OffsetOp)) {
    int64_t Step = cast<ConstantSDNode>(StepOp.getOperand(0))->getSExtValue();
    ...
  }
}
16402

nit: s/scaled/scale/

  • Create SDValues Add, ShiftOp ,StepOp ,OffsetOp as requested
CarolineConcatto marked 2 inline comments as done.Feb 3 2022, 8:06 AM
This revision is now accepted and ready to land.Feb 3 2022, 8:07 AM
  • rebase on top of main
This revision was landed with ongoing or failed builds.Feb 3 2022, 11:23 AM
This revision was automatically updated to reflect the committed changes.