This isn't really ready for review, but more like an RFC...
There are a number of VECREDUCE_* nodes that can be lowered to SVE instructions. The integer instructions are:
UADDV, SADDV, SMAXV, SMINV, UMAXV, and UMINV
UADDV/SADDV are a little different from the rest in that they always return an i64 result, where the other results are of the vector element type.
That is fine, but the TableGen pattern aren't straight-forward. The patterns for UADDV/SADDV return a scalar i64. The other patterns return a NEON vector register, from which the scalar result is later extracted. I'd like to understand why that 'insert-then-extract' decision was made. Here is the code in question:
class SVE_2_Op_Pat_Reduce_To_Neon<ValueType vtd, SDPatternOperator op, ValueType vt1, ValueType vt2, Instruction inst, SubRegIndex sub> : Pat<(vtd (op vt1:$Op1, vt2:$Op2)), (INSERT_SUBREG (vtd (IMPLICIT_DEF)), (inst $Op1, $Op2), sub)>;
And then the lowering code extracts it back out:
static SDValue getReductionSDNode(unsigned Op, SDLoc DL, SDValue ScalarOp, SelectionDAG &DAG) { SDValue VecOp = ScalarOp.getOperand(0); auto Rdx = DAG.getNode(Op, DL, VecOp.getSimpleValueType(), VecOp); return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ScalarOp.getValueType(), Rdx, DAG.getConstant(0, DL, MVT::i64)); }
What was the motivation for that design?
In the current state, we'll probably need an unfortunate amount of code duplication, or special cases in the lowering code, for SADDV/UADDV. I'd like to avoid that if possible...
Opcode