This is an archive of the discontinued LLVM Phabricator instance.

Add support for (expressing) vscale.
ClosedPublic

Authored by sdesmalen on Sep 30 2019, 2:26 AM.

Details

Summary

In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:

getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1

This can be used to propagate the value in CodeGenPrepare.

In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.

This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).

Diff Detail

Event Timeline

sdesmalen created this revision.Sep 30 2019, 2:26 AM
khchen added a subscriber: khchen.Oct 1 2019, 6:23 PM

What happens if we see IR like:

%1 = mul i16 vscale, -2

Would that be promoted to i32 through PromoteIntRes_VSCALE(...)?

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5141 ↗(On Diff #222364)

This should probably have a test. Same for ISD::MUL.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5141 ↗(On Diff #222364)

Sorry, just SHL. MUL tests are there...

sdesmalen updated this revision to Diff 236558.Jan 7 2020, 5:48 AM
sdesmalen retitled this revision from [SelectionDAG][SVE] Add ISD node for VSCALE. to Add support for (expressing) vscale..
sdesmalen edited the summary of this revision. (Show Details)
sdesmalen added reviewers: efriedma, lattner.
sdesmalen added a subscriber: paulwalker-arm.

Added LLVM IR support to this patch; as intrinsic and constexpr pattern (following suggestions in D71636)

Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2020, 5:48 AM

I'm not a comptent reviewer on the details for this whole patch, but the general intrinsic approach LGTM!

SjoerdMeijer added inline comments.Jan 8 2020, 6:16 AM
llvm/docs/LangRef.rst
17751

nit: perhaps "throughout the program" -> "throughout the program execution"?

17753

nit: perhaps just omit "for convenience"?

17766

nit: unrelated change, don't need this newline?

llvm/include/llvm/CodeGen/ISDOpcodes.h
919

nit: extra whitespace before IMM

920

Just curious, what do you mean by "usually", i.e. if it is not getVectorNumElements(), what else can it be?

llvm/test/CodeGen/AArch64/sve-vscale.ll
21

I don't think I understand this and test case above (haven't looked at the other yet). Essentially I don't understand why we remove the mul with 16.... which we are optimsing away? Is this correct? Where does this constant 16 comes from and what is the meaning of it? It could be I don't understand the RDVL instruction, or the vscale intrinsic definition. Can you explain this?

sdesmalen updated this revision to Diff 236821.Jan 8 2020, 7:12 AM
sdesmalen marked 8 inline comments as done.
  • Changed some mul into shl in the test file, to test fold into vscale isd node.
  • Addressed comments.

I'm not a comptent reviewer on the details for this whole patch, but the general intrinsic approach LGTM!

Awesome, thanks for having a look!

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5141 ↗(On Diff #222364)

Good spot, I've replaced some of the mul in the test with shl.

llvm/include/llvm/CodeGen/ISDOpcodes.h
920

In the context of auto-vectorization, it is often used to increment an iteration counter with the number of elements, e.g. i = i + vscale x #elts.

I'll remove the sentence as I agree it adds unnecessary confusion.

llvm/test/CodeGen/AArch64/sve-vscale.ll
21

Here RDVL stores the current vector register size in bytes into x0 (multiplied by the immediate, in this case 1).
Given that there are vscale x 16 bytes in an SVE vector, this instruction folds that mul 16 away into the instruction, in which it is implicit.

SjoerdMeijer added inline comments.Jan 8 2020, 7:58 AM
llvm/test/CodeGen/AArch64/sve-vscale.ll
21

Ah, yes, thanks for clarifying. I got confused about bits, bytes, and sizes of everything, but I got it now and see that this makes sense. I will continue looking at this patch a bit more.

SjoerdMeijer added inline comments.Jan 9 2020, 2:01 AM
llvm/include/llvm/IR/PatternMatch.h
2021

Nit: how about renaming S to Scaling or something along those lines?

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1108

last question: is there a test case for this, can this be triggered?

Does it make sense to have a special case for vscale * 1 or vscale * -1? Not that it's likely to come up in most cases, but it would look pretty silly to generate a madd that multiplies by 1.

llvm/docs/LangRef.rst
17753

What happens if the multiply overflows?

llvm/lib/CodeGen/CodeGenPrepare.cpp
2019

This should do the right thing, as far as I know.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5139

This multiply can overflow.

5180

This shift can overflow.

sdesmalen updated this revision to Diff 237652.Jan 13 2020, 6:20 AM
sdesmalen marked 5 inline comments as done.
  • Simplified the llvm.vscale intrinsic to no longer take the scaling argument. This means the scaling is left to an explicit mul which has explicit wrapping flags. This in turn simplifies the code in SelectionDAG for merging the multiplier into the VSCALE ISD node (this should only happen when it is known that no wrapping occurs).
  • Also made llvm.vscale overloadable, so that it can return vscale in different types.
  • Added special patterns for mul vscale, 1 and mul vscale, -1.
sdesmalen marked 6 inline comments as done.Jan 13 2020, 6:23 AM
sdesmalen added inline comments.
llvm/docs/LangRef.rst
17753

Fixed by removing the scaling argument and leaving this to an explicit mul (with wrapping flags to express (lack of) overflow).

llvm/include/llvm/IR/PatternMatch.h
2021

Because the scaling argument has been removed from the intrinsic, I've also removed the matching of any scaling from this function.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1108

Yes! rdvl_3 is the test addresses that case.

The new IR intrinsic makes more sense.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5139

This multiply can still overflow. (If it's supposed to wrap, please use an unsigned multiply.)

sdesmalen marked 4 inline comments as done.Jan 14 2020, 3:38 AM
sdesmalen added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5139

Sorry, I'm not really sure I understand how the multiply above can overflow. The code checks the NSW flag on the MUL node, so if VSCALE(Imm) *N2C does not wrap, why would Imm * N2C wrap?

Perhaps int64_t is the wrong choice here though and we should instead be using APInt for the VScale operand (for the general case where VT > int64_t)

sdesmalen updated this revision to Diff 237915.Jan 14 2020, 3:42 AM

Changed scale argument to VSCALE from int64_t to APInt.

LGTM.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5139

APInt is fine.

In the version you had, the multiply "can't overflow" in the sense that the operation produces poison if it overflows; that's not a problem. The problem would be undefined behavior in the compiler itself.

LGTM.

I was about to commit this patch today when I realised the differential wasn't yet in accepted state. Any objections for me to land the patch?

efriedma accepted this revision.Jan 21 2020, 12:44 PM

Just forgot to click the button.

This revision is now accepted and ready to land.Jan 21 2020, 12:44 PM
This revision was automatically updated to reflect the committed changes.