Page MenuHomePhabricator

[ARM][MVE] Enable masked gathers from base + vector of offsets
ClosedPublic

Authored by anwel on Jan 7 2020, 5:58 AM.

Details

Summary

This patch extends the earlier patch which enabled masked v4i32 gathers from a vector of pointers (to be found here: https://reviews.llvm.org/D71743).

The pass now checks for the presence of a getelementpointer instruction and, if it finds one, builds a masked gather loading from a base and a vector of offsets. This is more efficient then loading from a vector of pointers. Also, the masked gather instruction generated is not restricted to v4i32 loads, but works for v8i16 and v16i8 as well.

Diff Detail

Event Timeline

anwel created this revision.Jan 7 2020, 5:58 AM
samparker added inline comments.Jan 7 2020, 7:40 AM
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
201–202

Doesn't look this needs to be a variable?

240

I think you can just call C->isNullValue().

248

How about moving this logic up to around line 155 so that all the offset stuff is together, it's quite hard to follow scrolling up and down!

anwel marked 6 inline comments as done.Jan 7 2020, 8:35 AM
anwel added inline comments.
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
201–202

No, it doesn't.

240

I didn't realise that function exists, thanks for the hint

248

That makes sense, I'll move it.

anwel updated this revision to Diff 236602.Jan 7 2020, 8:40 AM
anwel marked 3 inline comments as done.

Some small changes to address Sam's comments

dmgreen added inline comments.Jan 7 2020, 9:49 AM
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
232

This can be ">"

246

Do we have and tests where this is false? Is it for something like getelementptr <4 x i32>, <4 x i32> *%base, i64 16? That might make sense to generate a QI gather load from (although I don't know if anything will generate gep's like that at the moment without some help).

For the RQ form, we need to make sure that base is a single pointer.

251

This code comes from the ISel gather lowering? I think that for the moment we can just check that the number of GEP operands is 2 (Base + offset) and remove this loop. I'm not exactly sure when it would be useful. We don't seem to have any tests for it, and can add it back in later if they come up in the future. (Otherwise the types don't sound like they would be correct, and the getOperand(1) should be getOperand(FinalIndex)?)

259

I think this can be something like GEP->getResultElementType()->getPrimitiveSizeInBits() == 32. Same below, so maybe make a variable for that size. (scaled_v8f16_half should be be matched too, I think).

Also Clang format!

268

Worth adding a LLVM_DEBUG message so we can tell what prevented the generation.

anwel marked 6 inline comments as done.Jan 8 2020, 9:12 AM
anwel added inline comments.
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
246

Yes, I will change this to a check of whether we do have a vector of pointers and, if that rare case occurs, leave it to expansion for now. I'll add a test for that behaviour.

259

You are right, that is also getting rid of the context variable. I will refactor the code a bit.

anwel marked 4 inline comments as done.Jan 9 2020, 1:44 AM
anwel added inline comments.
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
251

As as of now I cannot present a test that would show this is actually relevant I agree.

anwel updated this revision to Diff 236982.Jan 9 2020, 1:47 AM
anwel marked an inline comment as done.
anwel updated this revision to Diff 236988.Jan 9 2020, 2:17 AM

Fixing a typo.

anwel updated this revision to Diff 237036.Jan 9 2020, 5:36 AM

Fixed regress in some lines introduced by rebase on llvm master where the parent patch now has been commited.

dmgreen added inline comments.Jan 9 2020, 1:59 PM
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
244–258

These return false's are worrying me a little. There may be cases where we cannot select this as a GEP, but it would be fine to fall back to selecting the qi variant instead. Maybe something like this, for example:

define arm_aapcs_vfpcc <4 x i32> @qi4(<4 x i32*> %p) {                                                                   
entry:                                                                                                                   
  %g = getelementptr inbounds i32, <4 x i32*> %p, i32 4                                                                  
  %gather = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %g, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)                                                                                                
  ret <4 x i32> %gather                                                                                                  
}

It can also be selected as a VLDRW.u32 Qd, [P, 4] I think, but we should be able to get a VLDRW.u32 Qd, [g] at least. I believe you mentioned something similar to this when talking about scatter lowering.

Also it probably makes sense to check these basic things about the GEP before the offset code above. (Although I think any GEP we see will always has at least 1 offset, it makes conceptual sense to check the number of operands before using them).

anwel updated this revision to Diff 237279.Jan 10 2020, 3:45 AM

Moduralised and re-ordered the code to give it a clearer structure and make the fallback to constructing a "basic" gather easier.

anwel marked 2 inline comments as done.Jan 10 2020, 3:45 AM
anwel added inline comments.
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
244–258

You are right in this case, and maybe there are others we did not think of. I've modularised the code now which makes it easier to fall back to QI if the construction of a more complex gathers fails for some reason. Hopefully this also makes the code more readable.

This I like. I like the structure. I think it's nice to split it out like this.

I'm just not a fan of the Ty, Ptr, Mask and Alignment being part of the class. This class will be lowering multiple gathers/scatters, it may be all too easy to accidentally mis-use them in a non-obvious way, accidentally using the Mask from the wrong gather (for example). Especially as things change in the future.

If the values we need to pass around are {Ty, Ptr, Mask}, then its probably simpler to just pass them to functions that need them. Or maybe Pass I around? I'm not sure. The other option if we expect that there will be many parameters would be to create a struct to pass them around in. Keep them encapsulated together, but not requiring too many parameters. It looks like many of them may just be able to be grabbed from I quite easily though (Ty, Mask, etc).

llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
86

LowerGather -> lowerGather to keep it like the others.

163

This one could be Value *MVEGatherScatterLowering::lookThroughBitcast(Value *Ptr), and return either BitCast->getOperand(0) or the original Ptr.

196–197

These could be Value *MVEGatherScatterLowering::tryCreateMaskedGatherOffset and return nullptr on failure. It could then look something like:

Value *Load = tryCreateMaskedGatherOffset(Builder, Ty, Ptr, Mask);
if (!Load)
  Load = tryCreateMaskedGatherBase(Builder, Ty, Ptr, Mask);
if (!Load)
  return false;
anwel updated this revision to Diff 237333.Jan 10 2020, 7:59 AM
anwel marked an inline comment as done.

@dmgreen You are absolutely right about the danger regarding the class variables, I had missed that aspect.
I managed to reduce the number of variables which need to be passed between functions to at most four, which I think is acceptable.
Please have a look and tell me what you think.

anwel marked 3 inline comments as done.Jan 10 2020, 8:01 AM
anwel added inline comments.
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
196–197

Yes, that makes one less argument to the function

dmgreen accepted this revision.Jan 13 2020, 2:41 AM

LGTM. Nice one.

llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
257

Nit: Add a space between the "Can't" and the "create".

This revision is now accepted and ready to land.Jan 13 2020, 2:41 AM
anwel updated this revision to Diff 237614.Jan 13 2020, 3:24 AM
anwel marked 2 inline comments as done.

Added a space in the debug message.

anwel updated this revision to Diff 237896.Jan 14 2020, 2:33 AM

Rebase and fix a typo.

This revision was automatically updated to reflect the committed changes.