This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add support for lowering GEPs involving scalable vectors.
ClosedPublic

Authored by efriedma on Jan 28 2020, 7:37 PM.

Details

Summary

This includes both GEPs where the indexed type is a scalable vector, and GEPs where the result type is a scalable vector.

I had to make a couple related fixes to allow all the testcases to compile.

Diff Detail

Event Timeline

efriedma created this revision.Jan 28 2020, 7:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2020, 7:37 PM
efriedma planned changes to this revision.Jan 28 2020, 7:43 PM

Just noticed CodeGen/X86/large-gep-scale.ll is failing; need to tweak the ElementSize handling a bit.

efriedma updated this revision to Diff 241606.Jan 30 2020, 3:39 PM

Fix regression test failure. Add a few comments to try to explain the convoluted logic for the various different bitwidths.

fpetrogalli added a subscriber: eli.friedman.EditedFeb 5 2020, 9:11 AM

Hi @eli.friedman ,

thank you for working on this, I am working on adding the addressing modes for SVE, this is required to be able to test what I am doing in the back-end.

I have the following questions/requests.

  1. Could you please describe in a comment in the tests the math behind operations like the following? %d = getelementptr <vscale x 2 x i64>, <vscale x 2 x i64>* %base, <2 x i64> <i64 1, i64 1>. This could be determined by looking at the codegen, but having it explained in a target independent representation would make things easier to undertsand.
  2. Could you please tests expressions like the following? getelementptr <vscale x 2 x i64>, <vscale x 2 x i64> * %base, i64 4, getelementptr <vscale x 2 x i64>, <vscale x 2 x i64> * %base, [i32|i64] %offset and getelementptr i8, <vscale x 2 x i8*> %bases, i32 1?
  3. You tests the cases for <vscale x 2 x i64>. Could you please test more configurations of <vscale x N x Ty>, at least covering all cases in which sizeof( N x Ty) == 16?

All these requests will probably require you to add more patterns in AArch64SVEInstrInfo.td, other than the one you have already added. If that's the case, I think it will be worth splitting the patch in two patches, one for the CodeGen of the patterns and one for the codegen of the GEP (of course, unless it is not possible to reproduce such patterns without the GEP on scalable types...)

Thanks!

Francesco

Andrzej removed a subscriber: Andrzej.Feb 5 2020, 10:51 AM
andwar added a subscriber: andwar.Feb 5 2020, 10:53 AM

Hello @efriedma ,

I have started adding SVE reg+imm addressing mode on top of your patch (see https://reviews.llvm.org/D74254).

To make it work, I had to add handling of getelementpr when adding a constant int. The hunk I have added could be useful to let you implement the extra cases I have asked you to add in this patch. Feel free to grab it as it is, or modify it if you find a better way to handle such cases.

Meanwhile, I will keep working on the addressing modes.

Thank you!

Francesco

Could you please describe in a comment in the tests the math behind operations like the following?

I'm not sure what you mean by "the math". A scalar GEP produces a scalar result, a vector GEP produces a vector result. A scalar operand to a vector GEP is splatted. I'm not sure what useful explanation I could add; I could add a comment "This is a vector GEP", but I'm not sure that's helpful; the result type is already explicitly written in the "ret" instruction.

Could you please tests expressions like the following?

Sure.

You tests the cases for <vscale x 2 x i64>. Could you please test more configurations of <vscale x N x Ty>, at least covering all cases in which sizeof( N x Ty) == 16?

For vector GEPs, all the math always has to happen in N x i64 (since pointers are 64 bits wide). So if N>2, we have to run type legalization, and that probably explodes in unrelated ways, so I don't really want to do that. I can add more tests for scalar GEPs over vector pointee types; that shouldn't require more patterns, I think.

To make it work, I had to add handling of getelementpr when adding a constant int

What's the difference between the DAG produced by your patch, vs. my patch alone?

efriedma updated this revision to Diff 243551.Feb 10 2020, 7:12 AM

Rebased. Added more tests. Optimized emitted DAG for GEPs over scalable vectors.

sdesmalen added inline comments.Feb 10 2020, 8:26 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3917

I'm not sure I understand this comment; why would ElementSize not fit into IdxTy?

Could you please describe in a comment in the tests the math behind operations like the following?

I'm not sure what you mean by "the math". A scalar GEP produces a scalar result, a vector GEP produces a vector result. A scalar operand to a vector GEP is splatted. I'm not sure what useful explanation I could add; I could add a comment "This is a vector GEP", but I'm not sure that's helpful; the result type is already explicitly written in the "ret" instruction.

Yeah, I was confused by what the GEP was doing, I thought you where somehow adding the values of the fixed length vector to the first lanes of the scalable one. My bad. No need to add any comment.

Could you please tests expressions like the following?

Sure.

Thanks!

You tests the cases for <vscale x 2 x i64>. Could you please test more configurations of <vscale x N x Ty>, at least covering all cases in which sizeof( N x Ty) == 16?

For vector GEPs, all the math always has to happen in N x i64 (since pointers are 64 bits wide). So if N>2, we have to run type legalization, and that probably explodes in unrelated ways, so I don't really want to do that. I can add more tests for scalar GEPs over vector pointee types; that shouldn't require more patterns, I think.

Yes, I was not asking to modify the types of the RHS argument, I was asking to modify the type of the base pointer, so that you tests things like getelementptr <vscale x 2 x i8>, <vscale x 2 x i8>*, i64 %whatever

To make it work, I had to add handling of getelementpr when adding a constant int

What's the difference between the DAG produced by your patch, vs. my patch alone?

I have explained it in the comment of the first example.

llvm/test/CodeGen/AArch64/sve-gep.ll
7–8

These two instructions could be replaced by a single one: addvl x0, x0, #4. Do you think you can change your patch to generate this sequence, or should this be done in the back-end?

Also, the code I wrote in the patch at https://reviews.llvm.org/D74254 generates the following node for the base address out of this example (well, my patches uses size in bits, but I recon it should be bytes):

(ISD::ADD, %base (ISD::VSCALE 16))

It essentially encodes the multiplicative constant in the VSCALE node directly (provided that the argument of the MUL is an immediate constant). That makes it very easy to detect in the back-end because it doesn't require to check for MUL/SUB and SHL. Do you think you can emit such DAG in your patch?

efriedma marked 2 inline comments as done.Feb 10 2020, 4:28 PM
efriedma added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3917

If you're on a 32-bit target, and you have a GEP over a type larger than 4GB. (Not sure this is something that *should* be supported, but we have at least one testcase in the LLVM tree.)

llvm/test/CodeGen/AArch64/sve-gep.ll
7–8

We probably need to add patterns for "add(x, vscale) -> addvl" in TableGen; that's not really related to this patch, though.

I guess I could modify the current patch where it does if (CI && !ElementScalable) { to generate an appropriate vscale node directly. But we probably want a mul(vscale(n1), n2) -> vscale(n1*n2) dagcombine anyway.

fpetrogalli marked an inline comment as done.Feb 11 2020, 8:24 AM
fpetrogalli added inline comments.
llvm/test/CodeGen/AArch64/sve-gep.ll
7–8

We probably need to add patterns for "add(x, vscale) -> addvl" in TableGen; that's not really related to this patch, though.

OK

I guess I could modify the current patch where it does if (CI && !ElementScalable) { to generate an appropriate vscale node directly. But we probably want a mul(vscale(n1), n2) -> vscale(n1*n2) dagcombine anyway.

Ah, DAG Combine. Right. Yep, we can do that there.

42

Maybe try two different constants in the <2 x i64> vector? And also a version with one of the elements of the vector coming from an input variable of the function?

53

Same here. Worth using two different constants and also a version in which the elements of the index vector are not constant but runtime values.

andwar added inline comments.Feb 11 2020, 9:28 AM
llvm/test/CodeGen/AArch64/sve-gep.ll
78

This test case seems identical to scalable_of_fixed_1.

sdesmalen accepted this revision.Feb 11 2020, 9:29 AM

LGTM!

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3917

Ah okay, sounds a bit obscure indeed, but that makes sense.

This revision is now accepted and ready to land.Feb 11 2020, 9:29 AM
This revision was automatically updated to reflect the committed changes.