Page MenuHomePhabricator

[SVE][WIP] Lower fixed length VECREDUCE_ADD to Scalable

Authored by cameron.mcinally on Wed, Sep 16, 2:15 PM.



This isn't really ready for review, but more like an RFC...

There are a number of VECREDUCE_* nodes that can be lowered to SVE instructions. The integer instructions are:


UADDV/SADDV are a little different from the rest in that they always return an i64 result, where the other results are of the vector element type.

That is fine, but the TableGen pattern aren't straight-forward. The patterns for UADDV/SADDV return a scalar i64. The other patterns return a NEON vector register, from which the scalar result is later extracted. I'd like to understand why that 'insert-then-extract' decision was made. Here is the code in question:

class SVE_2_Op_Pat_Reduce_To_Neon<ValueType vtd, SDPatternOperator op, ValueType vt1,
                   ValueType vt2, Instruction inst, SubRegIndex sub>
: Pat<(vtd (op vt1:$Op1, vt2:$Op2)),
      (INSERT_SUBREG (vtd (IMPLICIT_DEF)), (inst $Op1, $Op2), sub)>;

And then the lowering code extracts it back out:

static SDValue getReductionSDNode(unsigned Op, SDLoc DL, SDValue ScalarOp,
                                  SelectionDAG &DAG) {
  SDValue VecOp = ScalarOp.getOperand(0);
  auto Rdx = DAG.getNode(Op, DL, VecOp.getSimpleValueType(), VecOp);
  return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ScalarOp.getValueType(), Rdx,
                     DAG.getConstant(0, DL, MVT::i64));

What was the motivation for that design?

In the current state, we'll probably need an unfortunate amount of code duplication, or special cases in the lowering code, for SADDV/UADDV. I'd like to avoid that if possible...

Diff Detail

Event Timeline

cameron.mcinally requested review of this revision.Wed, Sep 16, 2:15 PM

I think the NEON code is trying to avoid producing an extra instruction if the result ends up getting used by a vector operation. SelectionDAG technically allows instructions that produce an i64 result to return it in a floating-point register, but it doesn't work very well in practice: the operations that consume it will force it back into a GPR. This is one of the issues GlobalISel's RegBankSelect solves.

It should be possible to make the patterns for SVE do the same thing, I think?

Oh, I hadn't realised we are handling reduction in this way upstream (although it does match an old design we had downstream). This is certainly not the expected behaviour so I'll get it fixed. The expectation is for the SVE reduction ISD nodes to reflect the underlying instructions behaviour, which is they set the whole vector register. The reason for this is that we don't want the element extraction to be done during isel because it introduces needless vpr-gpr transitions and there are also use cases that make use of the implicit zeroing of the upper lanes.

cameron.mcinally planned changes to this revision.EditedThu, Sep 17, 9:04 AM

This is certainly not the expected behaviour so I'll get it fixed.

Ok, thanks. Since you have downstream changes dependent on this, I'll wait for your lead...

D87843 fixes the issue. Its dependence on D87842 is purely to maintain current code quality.

Thanks for the patches, @paulwalker-arm. Those helped a lot.

Here's the updated Diff for VECREDUCE_ADD ready for review. Will add the other nodes if this is acceptable...

paulwalker-arm added inline comments.Tue, Sep 22, 3:00 AM

Using Op here is confusing because that is normally what is used to reference the passed in SDValue. Existing code typically uses Opc or Opcode when naming ISD node variables.


VT is normally used to mean the node's result value type. Perhaps InVT or SrcVT?


This is the result VT of the reduction so you could use getPackedSVEVectorVT passing in the element type of SrcVT. Something like:

EVT RdxVT = getPackedSVEVectorVT(Op == AArch64ISD::UADDV_PRED ? MVT::i64 : SrcVT.getVectorElementType());

Or you can move the code lower and use ContainerVT instead of VT. e.g.

EVT RdxVT = (Op == AArch64ISD::UADDV_PRED) ? MVT::nxv2i64 : ContainerVT;

I don't think you need convertFromScalableVector here as you can just extract the scalar result directly from the scalable vector result of the reduction.


It would be better to protect the truncate with something like Rdx.ElementType != Op.getValueType(). I say this because I don't currently see a reason why this function cannot be used for floating-point reductions other than this unprotected truncate.


Is #min() required here? Same goes for the other tests.

cameron.mcinally marked 5 inline comments as done.

A few other cleanliness opportunities popped up during the refactor. How about something like this?

paulwalker-arm accepted this revision.Wed, Sep 23, 3:41 AM
paulwalker-arm added inline comments.


This revision is now accepted and ready to land.Wed, Sep 23, 3:41 AM