Page MenuHomePhabricator

[AArch64][SVE] Fold gather/scatter with 32bits when possible

Authored by CarolineConcatto on Jan 21 2022, 9:05 AM.



In AArch64ISelLowering.cpp this patch implements this fold:

GEP (%ptr, (splat(%offset) + stepvector(A)))
into GEP ((%ptr + %offset), stepvector(A))

The above transform simplifies the index operand so that it can be expressed
as i32 elements.
This allows using only one gather/scatter assembly instruction instead of two.

Patch by Paul Walker (@paulwalker-arm).

Depends on D118459

Diff Detail

Event Timeline

CarolineConcatto requested review of this revision.Jan 21 2022, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2022, 9:05 AM

The approach seems right to me, but the code is still a little messy so I mostly left a bunch of suggestions/nits to improve readability.


nit: The word 'Offset' is used quite a few times in this function, which confused me a bit. I think what this is saying is that the eventual instruction will scale the offset (i.e. the offset is not an offset, but an index). So maybe change this to OffsetIsScaledByInstruction?


nit: this is equivalent to:

// Only consider element types that are pointer sized as smaller types can
// be easily promoted. Also ignore indices that are already type legal.
EVT IndexVT = Index.getValueType();
if (IndexVT.getVectorElementType() != MVT::i64 || IndexVT == MVT::nxv2i64)
  return false;

nit: IndexVT isn't used until line 16173, so don't change it here but rather closest to where it's used. Also, I'd suggest storing it a new variable, named e.g. NewIndexVT.


Maybe rename this to Stride ?


This makes the assumption that STEP_VECTOR is on the right-hand side. There is currently no code that forces this, so you'll want to add a condition to SelectionDAG::getNode() where it reorders the constants to the right, to do something like this:

-    if ((IsN1C && !IsN2C) || (IsN1CFP && !IsN2CFP))
+    // Favour step_vector on the RHS because it's a kind of
+    // constant.
+    if (((N1.getOpcode() == ISD::STEP_VECTOR || IsN1C) && !IsN2C) ||
+        (IsN1CFP && !IsN2CFP))

If Index.getOperand(1) is ISD::STEP_VECTOR, then it's operand *must* be a constant. There's no need to test for it, you can call Index.getOperand(0).getConstantOperandVal(0) directly.


nit: Does N->getScale() always return a value even when ScaledOffset == false?
If so, and it returns a Constant 1 by default, then you can remove the if (ScaledOffset) condition above.


If StepConst is only set in this block (or in general, when we know for sure that we can fold this into a simpler addressing mode), then ChangedIndex is redundant, and you can just check that StepConst != 0.


This isn't very easy to read, I'd recommend creating three variables instead:

if (Index.getOpcode() == ISD::SHL &&
    Index.getOperand(0).getOpcode() == ISD::ADD &&
    Index.getOperand(0).getOperand(1).getOpcode() == ISD::STEP_VECTOR) {
  SDValue ShiftAmount = Index.getOperand(1);              
  SDValue BaseOffset = Index.getOperand(0).getOperand(0); 
  SDValue Step = Index.getOperand(0).getOperand(1);       
  if (auto Shift = DAG.getSplatValue(ShiftAmount))        
    if (auto *ShiftC = dyn_cast<ConstantSDNode>(Shift))   
      if (auto Offset = DAG.getSplatValue(BaseOffset)) {  
         // ... code to calculate Stride, Offset and BasePtr here ...

You can more easily calculate MaxVScale from:

const auto &Subtarget =
    static_cast<const AArch64Subtarget &>(DAG.getSubtarget());
unsigned MaxVScale = Subtarget.getMaxSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock;

Maybe just move all these definitions after the if(!DCI.isBeforeLegalize()) return SDValue(); (that is, if you take my suggestion mentioned below).


nit: just bail out early to avoid a level of indentation, e.g.

if (!DCI.isBeforeLegalize())
  return SDValue();

if (FindMoreOptimalIndexType(...)) {


  return performMaskedGatherScatterCombine(N, DCI, DAG);


Matt added a subscriber: Matt.Jan 25 2022, 3:16 PM
CarolineConcatto marked 12 inline comments as done.
CarolineConcatto edited the summary of this revision. (Show Details)
  • This patch has only this pattern GEP (%ptr, (splat(%offset) + stepvector(A)))

I don't think it is a good solution to add this in SelectionDAG.
I could be wrong, but when I do this suggestion I see regressions in sve-stepvector.ll and sve-intrinsics-index. Instead on only 1 instruction it becomes 5.
For instance,

 define <vscale x 16 x i8> @index_rr_i8(i8 %a, i8 %b) {
 ; CHECK-LABEL: index_rr_i8:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    index z0.b, w0, w1
+; CHECK-NEXT:    index z1.b, #0, #1
+; CHECK-NEXT:    mov z2.b, w1
+; CHECK-NEXT:    mov z0.b, w0
+; CHECK-NEXT:    ptrue p0.b
+; CHECK-NEXT:    mla z0.b, p0/m, z2.b, z1.b

I could add here, but as you saw the C++ code in here would not be nicer.


As we also discussed the test

if (ScaledOffset)

could be removed and Offset could always be multiplied by N->getScale() as it will never be zero.


So my worried was that the constant could be zero.
But I checked and this would remove step vector
The code is optimized and step is removed, for the correct reason.
Index = step(const) * splat(0) -> Index = splat(0)


I pushed this fold to be in a second patch, maybe it will be better to review.
In case not I will add these extra variables.


This fold is now here:

Thanks for splitting this patch up, that makes it a bit easier to review!


This comment related to the variable ScaledOffset you had in the previous revision of this patch, but then you found out that variable could be removed, so this comment then became redundant.

I think the function should be named findMoreOptimalIndexType.


nit: can you clarify this by adding a variable SDValue StepVector = Index.getOperand(1), and then querying:

auto *C = cast<ConstantSDNode>(StepVector.getOperand(0))

the operand to a stepvector is always a constant, so it will never return false here.


Stride should only be set if Offset is a splat value, otherwise the return value is incorrect.


Stride should only be set in the block below (under if (auto Offset = DAG.getSplatValue(Index.getOperand(0))) because otherwise the function returns true (Stride != 0), but BasePtr will be undefined.


This should also bail out early if Stride == 0

CarolineConcatto marked 4 inline comments as done.
  • Rebase the patch under D118459 and address comments

Very good catch! Thank you Sander

sdesmalen added inline comments.Jan 31 2022, 6:57 AM

nit: functions should start with lower case, i.e. findMoreOptimalIndexType.


nit: redundant newline.




nit: maybe just merge these conditions together:

if (!Stride || Stride < std::numeric_limits<int32_t>::min() ||
    Stride > std::numeric_limits<int32_t>::max())
  return false?

nit: you can remove one more level of indentation, by writing it like this:

if (!FindMoreOptimalIndexType(MGS, BasePtr, Index, IndexType, DAG))
  return SDValue();

if (auto *MGT = ...) {
  return ...
} else {
  return ...

llvm_unreachable("Unhandled case");

nit: can you group the positive test cases together (scatter_i8_index_offset_minimum and scatter_i8_index_offset_maximum), followed by the negative test cases?


This test does not exist in this patch?


both positive tests use i8 as types, can you also use a different type (e.g. i16/i32), so that we can check the scaling happens correctly? (sxtw #1 or sxtw #2)

CarolineConcatto marked 6 inline comments as done.
  • Address Sander's comments

I like that test for only zero and the other to only check overflow.
They both have comments for why it returns early.

CarolineConcatto edited the summary of this revision. (Show Details)Jan 31 2022, 9:01 AM
  • Rebase the patch under D118459 and address comments

Can you add D118459 as a dependence to this patch (set parent revision)

Thanks for addressing all my comments, the patch looks good to me with final nits addressed.


nit: s/Earlier returns [...] fold change./Return early because no supported pattern is found./


This comment seems misplaced. I think you can remove it, because the code above where it checks the value of LastElementOffset says as much.


All these tests use scatters, although the name suggests it should work for gathers too. Can you also add at least one positive test that uses a llvm.masked.gather?


nit: This test still does not exist in this file?

CarolineConcatto marked 3 inline comments as done.
CarolineConcatto edited the summary of this revision. (Show Details)
  • Add gather test
CarolineConcatto marked an inline comment as done.Feb 1 2022, 2:37 AM

The commit message was updated, problems is that I forgot to do arc diff --verbatim.
So it does not update the GUI here. I will use next time.

sdesmalen accepted this revision.Feb 1 2022, 7:25 AM
This revision is now accepted and ready to land.Feb 1 2022, 7:25 AM
paulwalker-arm added inline comments.Feb 1 2022, 7:44 AM

Just a flyby comment that rather than this I think the LLVM coding style would suggest removing the else?

  • Update test gather_i8_index_offset_8
CarolineConcatto marked an inline comment as done.Feb 3 2022, 9:55 AM

-Remove commented line from test gather_i8_index_offset_8

This revision was landed with ongoing or failed builds.Feb 3 2022, 11:00 AM
This revision was automatically updated to reflect the committed changes.
paulwalker-arm added inline comments.Feb 10 2022, 8:28 AM

Hi @CarolineConcatto, sorry for the too late review comment but I think this function has a serious problem albeit one that isn't going to trigger with its current usage.

There are several places where the function returns false after having already changed some of its input reference parameters (i.e. BasePtr). This is unnatural and likely to confuse the unaware. From the original work you'll see that these parameters were only changed when something more optimal is found.

With the current format of this function callers will have to backup the parameters, which is just something they're not going to know to do until they've spent time debugging why their "local" state has been trashed.

Can you please change the function so that the reference inputs are unchanged when returning false?