This is an archive of the discontinued LLVM Phabricator instance.

[IR][SVE] Add new llvm.experimental.stepvector intrinsic
ClosedPublic

Authored by david-arm on Feb 23 2021, 7:50 AM.

Details

Summary

This patch adds a new llvm.experimental.stepvector intrinsic,
which takes no arguments and returns a linear integer sequence of
values of the form <0, 1, ...>. It is primarily intended for
scalable vectors, although it will work for fixed width vectors
too. It is intended that later patches will make use of this
new intrinsic when vectorising induction variables, currently only
supported for fixed width. I've added a new CreateStepVector
method to the IRBuilder, which will generate a call to this
intrinsic for scalable vectors and fall back on creating a
ConstantVector for fixed width.

For scalable vectors this intrinsic is lowered to a new ISD node
called STEP_VECTOR, which takes a single constant integer argument
as the step. During lowering this argument is set to a value of 1.
The reason for this additional argument at the codegen level is
because in future patches we will introduce various generic DAG
combines such as

mul step_vector(1), 2 -> step_vector(2)
add step_vector(1), step_vector(1) -> step_vector(2)
shl step_vector(1), 1 -> step_vector(2)
etc.

that encourage a canonical format for all targets. This hopefully
means all other targets supporting scalable vectors can benefit
from this too.

I've added cost model tests for both fixed width and scalable
vectors:

llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll
llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll

as well as codegen lowering tests for fixed width and scalable
vectors:

llvm/test/CodeGen/AArch64/neon-stepvector.ll
llvm/test/CodeGen/AArch64/sve-stepvector.ll

See this thread for discussion of the intrinsic:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html

Diff Detail

Event Timeline

david-arm created this revision.Feb 23 2021, 7:50 AM
david-arm requested review of this revision.Feb 23 2021, 7:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2021, 7:50 AM
kmclaughlin added inline comments.Feb 24 2021, 5:27 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1649

nit: can you add an error message to this assert?

llvm/test/CodeGen/AArch64/neon-stepvector.ll
148

Should there also be some tests here for illegal predicate types such as v32i1, as there are in sve-stepvector.ll?

david-arm added inline comments.Feb 24 2021, 8:34 AM
llvm/test/CodeGen/AArch64/neon-stepvector.ll
148

Hi @kmclaughlin, thanks for pointing that out. So I originally did add some tests for illegal types for neon, but for some reason the i1 element types weren't being promoted to i8 and so we just ended up using GPRs instead, i.e. something like

mov %x0, #some_immediate

I thought it looked really odd and inconsistent with the legal types so I left them out. I'm not sure if that's a bug with the AArch64 backend for Neon or intended behaviour.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4800

Given that we mandate the Step must be an immediate, I wondering if DAG should have a getStepVector function that takes such. I say this because, for example, what are you expecting to happen if DAG.getNode(ISD::ANY_EXTEND did not get folded away.

As an aside, do you really not care about the type of extension? I think you absolutely need zero extension based on the current node restrictions.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1657–1659

I guess there's no action required for this patch as fixed length vectors are currently excluded, however, if this idiom becomes common then I suggest creating a DAG.getElementCountAsNode(EVT) like function. That way code like this will just work for both fixed and scalable vectors.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4323–4325

if (cast<ConstantSDNode>(Step)->isNullValue()) ?

4668

Is it worth also validating that Operand is at least as big as the result element type. That way if the "not-negative" requirement ever get's dropped I think the signedness will not actually matter?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6949

Perhaps worth punting this into visitStepVector. This is what we've done for vector.reverse.

llvm/lib/IR/IRBuilder.cpp
97

Given this is a FixedVectorType you should be able to short cut ElementCount and use getNumElements() directly.

105–107

I cannot see how this assert will ever fire and so return ConstantVector::get(Indices); just looks nicer.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9052–9053

I'm not a fan of the pattern duplication within AArch64SVEInstrInfo.td so I'm thinking we should lower all ISD::STEP_VECTORs to AArch64ISD::INDEX_VECTOR.

Regardless of whether you do that I would expect the predicate handling to be more generic, for example:
return ISD::TRUNC(ISD::STEP_VECTOR(Op.getConstantOperandVal())

david-arm added inline comments.Mar 1 2021, 12:40 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9052–9053

Hi @paulwalker-arm, me too for i1 case, however I tried the generic route first exactly as you suggest and it didn't work. The problem was that if I created the ISD::TRUNC node we never ended up custom lowering as you'd expect. If you look at AArch64TargetLowering::LowerTRUNCATE for i1 element types we return this:

return DAG.getSetCC(dl, VT, And, Zero, ISD::SETNE);

which then never gets into our custom lowering code, despite the action being set to Custom for precisely those types. We then crash in isel trying to match a setcc operation. However I'd love to do it this way if you've got any suggestions for how I can solve this problem? I suspect there are just too many levels of custom lowering involved, i.e. custom lower STEP_VECTOR, followed by custom lower TRUNCATE, followed by custom lower SETCC.

So in general your preference is to custom lower *all* types and go through this function to create AArch64ISD::INDEX_VECTOR?

david-arm added inline comments.Mar 1 2021, 12:50 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9052–9053

One solution might be to rewrite LowerTRUNCATE to avoid returning SETCC for scalable vectors and use AArch64ISD::SETCC_MERGE_ZERO directly? I'd just assumed people would find this a bit ugly.

david-arm updated this revision to Diff 327720.Mar 3 2021, 1:58 AM
david-arm marked 9 inline comments as done.Mar 3 2021, 2:27 AM

If the ISD node is going to get an argument for the step, why not let the new intrinsic have this same argument?

paulwalker-arm added a comment.EditedMar 5 2021, 4:55 AM

If the ISD node is going to get an argument for the step, why not let the new intrinsic have this same argument?

@ctetreau: We've followed the same route as for llvm.vscale() -> ISD::VSCALE so yes the code generation side is more complete/convenient. Considering we're expecting llvm.experimental.stepvector() to be redundant once there's time to push for the preferred ConstantVector solution, we feel it's better to keep the intrinsic as simple as possible.

@ctetreau: We've followed the same route as for llvm.vscale() -> ISD::VSCALE so yes the code generation side is more complete/convenient. Considering we're expecting llvm.experimental.stepvector() to be redundant once there's time to push for the preferred ConstantVector solution, we feel it's better to keep the intrinsic as simple as possible.

OK, fair enough.

I'm still working through the patch but here's what I've got for now.

llvm/docs/LangRef.rst
16572

To be consistent with the other experimental vector intrinsics this should be fixed-width?

llvm/include/llvm/CodeGen/BasicTTIImpl.h
1253–1254

This seems like the wrong place to validate the intrinsic.

llvm/include/llvm/CodeGen/SelectionDAG.h
650–651

I guess I've caused this with my previous "don't use a node request" and thus the answer will be no, but I'll ask the question anyway. Can OpVT be dropped here? as it seems inconvenient.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1657–1663

The Step is an immediate, as is the known part of LoVT's element count, which suggests you shouldn't need to use DAG for the maths because you can do getVScale(N->getConstantOperandAPInt(0) * LoVT.getVectorMinNumElements())

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4325

I think DAG.getConstant(0... is better here so that the target's preferred nodes (i.e. SPLAT_VECTOR or BUILD_VECTOR) are used.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
10938–10939

Is this needed? I ask because DAG.getStepVector will validate the result VT in one place rather than expecting all it's users to do likewise.

10946–10949

What about pushing this into DAG.getStepVector? that way there's a generic way for any code to create this vector without needing to worry about the result type.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
272–273

In a world where InstructionCost is everywhere I think it's better for the InstructionCost side to be the LHS because it reduces the number of required operator overloads. Check with @sdesmalen but I think you'll save some work if you write this as getArithmeticInstrCost() * (LT.first - 1).

paulwalker-arm added inline comments.Mar 10 2021, 3:03 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9059–9062

Is it possible to use getPromotedVTForPredicate here?

9067

I can imagine a route where this promotion is also required for the ElemVT != MVT::i1 case. Related to this I suspect we'll need an implementation of PromoteIntOp_STEP_VECTOR but currently don't because there's nothing that can exercise the Step != 1 case.

10216

Is this test needed? I'm guessing this code was added as part of SVE support (v#i1 being an illegal type for NEON).

david-arm added inline comments.Mar 11 2021, 12:44 AM
llvm/include/llvm/CodeGen/SelectionDAG.h
650–651

Yeah, that's right. This arose because of trying to pass in an immediate value here. I'm not sure of a much less ugly way when removing OpVT. If we remove it then I have to create a type every time based upon the current width of Step and assume the caller has set the bit width correctly. I can't just use the element type of ResVT because this takes me back to square one, i.e. that I then need to worry about promoting the type to i32 or i64. I could pass in a ConstantSDNode instead of the {OpVT, Step} pair, which has both the OpVT type and the APInt combined?

paulwalker-arm added inline comments.Mar 11 2021, 2:02 AM
llvm/include/llvm/CodeGen/SelectionDAG.h
650–651

Thanks for the information Dave and sorry for messing you around here. How about we go back to a stock SDValue but add an assert in getNode that the second operand is the constant we expect? This doesn't prevent people using DAG interfaces to transform what we know to be a constant but at least we'll catch is very early if it goes wrong.

I still think this interface is of value as a generic way to create such vectors regardless of vector type.

david-arm updated this revision to Diff 330233.Mar 12 2021, 7:04 AM
  • Changed interface SelectionDAG::getStepVector to take a SDValue for the Step and to support fixed width vectors.
  • Use getPromotedVTForPredicate in lowering code for STEP_VECTOR operating on i1 vectors.
  • In SplitVecRes_STEP_VECTOR folded the multiply into the getVScale call.
  • Addressed other review comments.
david-arm marked 13 inline comments as done.Mar 12 2021, 7:07 AM
bin.cheng-ali added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1657–1659

My question is if it should become common? I mean if fixed length vector can be handled more efficiently, shall we always lower and handle fixed/scalable length differently, rather than handle them simultaneously in more and more functions, which has no obvious benefit?

paulwalker-arm added inline comments.Mar 14 2021, 4:03 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1657–1659

I agree and hope we evolve to a more unified set of nodes that work for all vector types. That said, this is a larger conversation and so today we're trying to add scalable vector support in a way that is a NFC for fixed length vectors. This patch does add a unified interface in the form of SelectionDAG::getSepVector(...) that should allow for portable code regardless of vector type.

SplitVecRes_STEP_VECTOR is specifically linked to ISD::STEP_VECTOR, which can only be created for scalable vector types. So whilst this function can be easily extended to cover fixed length vector types, it would not be testable in a way that preserved this "NFC for fixed length vectors" goal.

I'm still struggling with how best to represent STEPVECTOR's scale operand. The patch as it currently stands is probably good enough but I do wonder how long we'll get by without needing to implement PromoteIntOp_STEP_VECTOR. I also wonder if we can take a similar approach as done for INSERT_SUBVECTOR's index operand.

The description for how STEPVECTOR handles overflow also seems a bit limiting. Specifically this patch lowers vector of i1 based STEP_VECTOR operations, but with the current description only the first two lanes of any vector type will result in defined behaviour, which makes me wonder the value in allowing such support.

llvm/include/llvm/CodeGen/SelectionDAG.h
836

This is no longer true, perhaps just Returns a vector of type...

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1754

Given this is fixed length specific I think calling getVectorNumElements() is clearer.

4324

Comment is no longer valid. I'd just remove it.

4670

FYI: APInt has a function for this called isNonNegative().

llvm/lib/IR/IRBuilder.cpp
95

Keep this style if you prefer but I think it will be cleaner done as

if  (isa<ScalableVectorType>(DstType))
  return CreateIntrinsic(Intrinsic::experimental_stepvector, {DstType}, {},....

<fixed vector handling here>
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
264–265

This is safe to assume and already checked within Verifier.cpp.

llvm/test/CodeGen/AArch64/neon-stepvector.ll
10–11

For this test to be correct requires LCPI0_0 to point to something meaningful. I believe update_llc_test_checks.py strips this information so you'll need to manually add it. We did likewise for named-vector-shuffle-reverse-neon.ll.

david-arm updated this revision to Diff 331195.Mar 17 2021, 2:57 AM
  • Addressed review comments.
  • After recent upstream discussion on the SVE sync call it was decided to remove support for vectors of i1 types.
  • Changed code to check for element sizes >= 8 bits instead of 1!
paulwalker-arm accepted this revision.Mar 19 2021, 8:57 AM

A couple of niggles but otherwise looks good to me.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4793

Could use N->getConstantOperandAPInt(0) here.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4667

To match the comment you'll need to add an isInteger test.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9052–9053

This assert should not be necessary as you're protecting this already within getNode and thus can assume the node to be valid.

This revision is now accepted and ready to land.Mar 19 2021, 8:57 AM
fhahn added inline comments.Mar 21 2021, 2:33 PM
llvm/docs/LangRef.rst
16614

Would it make sense to limit the intrinsic to integer vectors only? Otherwise, why is it not recommended to use it with different element types? I think the behaviour with other element types should also be spelled out, if other element types are supported.

16615

Can we make the wording here consistent with the other scalable intrinsics?

david-arm updated this revision to Diff 332253.Mar 22 2021, 5:10 AM
  • Updated LangRef to better describe the intrinsic behaviour.
fhahn added a comment.Mar 22 2021, 6:53 AM
  • Updated LangRef to better describe the intrinsic behaviour.

Thanks. I think it should be possible to enforce the new restrictions in the IR verifier, to prevent users from accidentally using them with wrong types.

  • Updated LangRef to better describe the intrinsic behaviour.

Thanks. I think it should be possible to enforce the new restrictions in the IR verifier, to prevent users from accidentally using them with wrong types.

Yeah that's right. The type (including the element size) are defended in this patch in Verifier.cpp.

fhahn added a comment.EditedMar 22 2021, 6:57 AM
  • Updated LangRef to better describe the intrinsic behaviour.

Thanks. I think it should be possible to enforce the new restrictions in the IR verifier, to prevent users from accidentally using them with wrong types.

Yeah that's right. The type (including the element size) are defended in this patch in Verifier.cpp.

That's great! It still looks like there are negative tests missing for the checks?

david-arm updated this revision to Diff 332285.Mar 22 2021, 7:48 AM
  • Added verifier tests.
fhahn accepted this revision.Mar 22 2021, 12:01 PM

Thanks, LGTM!

I'm still struggling with how best to represent STEPVECTOR's scale operand. The patch as it currently stands is probably good enough but I do wonder how long we'll get by without needing to implement PromoteIntOp_STEP_VECTOR. I also wonder if we can take a similar approach as done for INSERT_SUBVECTOR's index operand.

Hey. Sorry to raise a closed thread but this exact issue is making it difficult to support STEP_VECTOR on RISC-V. On RV32 we don't have legal i64 but do have legal i64 vectors. So we're hitting an assert during visitStepVector where we try and create a nxvXi64 STEP_VECTOR with an i32 operand. Are you aware of anything that stops us from changing the requirements to be an integer pointer type? It might limit optimizations but we can always not perform DAG combines if it's not safe to do so?

paulwalker-arm added a comment.EditedMar 30 2021, 7:59 AM

I'm still struggling with how best to represent STEPVECTOR's scale operand. The patch as it currently stands is probably good enough but I do wonder how long we'll get by without needing to implement PromoteIntOp_STEP_VECTOR. I also wonder if we can take a similar approach as done for INSERT_SUBVECTOR's index operand.

Hey. Sorry to raise a closed thread but this exact issue is making it difficult to support STEP_VECTOR on RISC-V. On RV32 we don't have legal i64 but do have legal i64 vectors. So we're hitting an assert during visitStepVector where we try and create a nxvXi64 STEP_VECTOR with an i32 operand. Are you aware of anything that stops us from changing the requirements to be an integer pointer type? It might limit optimizations but we can always not perform DAG combines if it's not safe to do so?

That's good feedback, thanks. The problem is if we ever decide to allow a negative operand. By forcing the operand to have at least the same bit width as the result's element type we managed to build in some future-proof-ness. That said I'm not locked into this idea plus we can easily switch the operand to be signed if that make more sense and as you say, not dag combine when it would be unsafe to do so. So if this is proving problematic for you then I've nothing against making the operand an integer pointer type.