Page MenuHomePhabricator

[ARM][MVE] Enable extending masked loads
ClosedPublic

Authored by samparker on Oct 2 2019, 7:24 AM.

Details

Summary

Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.

My little (endian) brain only really works in that mode, so I'm dubious about the big endian support here.

Diff Detail

Event Timeline

samparker created this revision.Oct 2 2019, 7:24 AM

Nice one! Looks very useful. I _think_ bigendian should be fine here, so long as we don't use the wrong type.

There are a lot of other masked load tests in mve-masked-load.ll. I think we should add the same for widening loads and narrowing stores. There should be tests for things like align1 and different passthru values. We might want extra tests for odd types too, if we are making them legal through isLegalMaskedLoad.

lib/Target/ARM/ARMISelLowering.cpp
13760 ↗(On Diff #222830)

How come this isn't in target independent code? I would expect this combine not to be MVE specific, so long as it's legal. I'm not sure if there are ways currently to check if a "widening masked load" is legal or not, in the same way as there are for normal loads.

lib/Target/ARM/ARMInstrMVE.td
5019 ↗(On Diff #222830)

I think these (and perhaps the ones above, tbh) should maybe need "let ScalarMemoryVT = i8;". To ensure they are extending from the correct types?

lib/Target/ARM/ARMTargetTransformInfo.cpp
494 ↗(On Diff #222830)

I think changes here might mean we need to handle stores too, or (temporarily) split the two out from one another.

506 ↗(On Diff #222830)

What does this mean for a v2i32, or other weird types (for us)?

test/CodeGen/Thumb2/mve-masked-ldst.ll
112 ↗(On Diff #222830)

This looks odd to me, with the vpsel. There is legalising code in LowerMLOAD, which might be doing something wrong.

191 ↗(On Diff #222830)

This is wrong at the moment? Same for all the other masked stores?

samparker marked 4 inline comments as done.Oct 3 2019, 2:21 AM

Thanks for those points, I'll add loads more tests.

lib/Target/ARM/ARMISelLowering.cpp
13760 ↗(On Diff #222830)

good point.

lib/Target/ARM/ARMInstrMVE.td
5019 ↗(On Diff #222830)

I'll give it a go.

test/CodeGen/Thumb2/mve-masked-ldst.ll
112 ↗(On Diff #222830)

Is the vpsel not just handling the predicate on the store?

191 ↗(On Diff #222830)

Yes? I hadn't looked at stores but it looks like these should now be vstrb.16.

samparker updated this revision to Diff 223172.Oct 4 2019, 1:50 AM
  • Moved the combine into generic dagcombine.
  • Now checking memory alignment to decide legality.
  • Not allowing v2 vectors.
  • Masked load patterns are now explicitly either aligned or unaligned.
  • Added more tests.
dmgreen added inline comments.Oct 4 2019, 5:28 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9283 ↗(On Diff #223172)

Is it true that whenever you have a legal extending load, you will also have the equivalent legal extending masked load? (For MVE we do, but is that true for all archs?)

Do we need to add an extra set of flags for this? Or is isVectorLoadExtDesirable good enough to handle these cases when there is an asymmetry?

lib/Target/ARM/ARMInstrMVE.td
5203 ↗(On Diff #223172)

t2addrmode_imm7<0> -> t2addrmode_imm7<1>, for a VLDRH. Same below.

lib/Target/ARM/ARMTargetTransformInfo.cpp
511 ↗(On Diff #223172)

If this is coming from codegen, can the alignment here be 0? I think in ISel it is always set (and clang will always set it), but it may not be guaranteed in llvm in general.

test/CodeGen/Thumb2/mve-masked-load.ll
551 ↗(On Diff #223172)

I don't think this vpsel should be here (it's not wrong, just inefficient, the instruction will already to this setting off predicated lanes to 0).

I'm guessing that the LowerMLOAD is creating a zero vector (that is potentially the wrong type?), so when it is called on the newly created maskedload it doesn't recognise it as 0 and we end up with the vselect being added too.

samparker marked 3 inline comments as done.Oct 4 2019, 6:34 AM
samparker added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9283 ↗(On Diff #223172)

Yes, we can't expect that it's true for everything. I don't understand why the APIs generally like to pass lots of arguments instead of just passing, say the load that you'd want to inspect... So hopefully both these calls will cover all cases and I'd like to avoid adding another flag. That or I could just change isLoadExtLegal to take the LoadSDNode, but I've assumed these calls are designed like they are for reason...

lib/Target/ARM/ARMTargetTransformInfo.cpp
511 ↗(On Diff #223172)

I can't see anything in the spec for any guarantees of these intrinsics, but for normal loads, it becomes defined by the target ABI. It's always safe for us to use a i8* accessor, so I don't see 0 being a problem here.

test/CodeGen/Thumb2/mve-masked-load.ll
551 ↗(On Diff #223172)

I'll have a look.

samparker updated this revision to Diff 223220.Oct 4 2019, 7:39 AM

Now handling the a bitcast passthru value in LowerMLOAD. Corrected the half load addr values.

samparker updated this revision to Diff 223456.Oct 7 2019, 1:09 AM

I had missed the shift value on the input patterns.

Nice. I think this is looking good, just some details to sort out, like what to do about the target independent parts.

We will presumably want to add the pre and post inc to these in the future too, which will probably bring up the same kinds of questions.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9283 ↗(On Diff #223172)

They refer back to the LoadExtActions, which are set by setLoadExtAction in ISel. We may need more flags on there to specify the difference between the masked loads and the normal loads.

lib/Target/ARM/ARMISelLowering.cpp
8887 ↗(On Diff #223220)

This is creating a zero vector of size VT, which is the size of what the masked loads returns. Should it instead be the size of the memory being loaded (because the extend happens to the passthru as well)? What happens if that isn't a legal value type?

lib/Target/ARM/ARMInstrMVE.td
5203 ↗(On Diff #223172)

Edit: You beat me to it. Can you add some tests?

5196 ↗(On Diff #223220)

There likely needs to be an anyext too. Can (or is it beneficial for) these be merged into the MVEExtLoad multiclass below?

lib/Target/ARM/ARMTargetTransformInfo.cpp
511 ↗(On Diff #223172)

Yeah. Alignment of 0 means ABI alignment, which means 8, not unaligned.

I think it may be better to just check this alignment is always the case, getting rid of that weird "use i8's to load unaligned masked loads" thing. That was probably a bad idea, more trouble than it's worth.

I think what will happen here at the moment is that the Vectorizer will call isLegalMaskedLoad with an scalar type and an alignment (which, lets say is unaligned). That alignment won't be checked so the masked loads and stores will be created. Then when we get to the backend the legalizer will call this with a vector type and we'll hit this check, expanding out the masked load into a that very inefficient bunch of code. Which is probably something that we want to avoid.

test/CodeGen/Thumb2/mve-masked-load.ll
903 ↗(On Diff #223220)

Nice :)

samparker marked 2 inline comments as done.Oct 7 2019, 7:17 AM
samparker added inline comments.
lib/Target/ARM/ARMISelLowering.cpp
8887 ↗(On Diff #223220)

Well, surely the result VT of the masked load has to match the VT of the passthru input. passthru is not about what memory is accessed, but what is written to the destination register. VOVIMM will also generate the same zero value for all full width vector types so for vector widths less than 128-bits, the higher elements will be zeroed and that makes sense. For vectors wider than 128-bits, I think something would have gone before here. I'll add some tests for both these cases.

lib/Target/ARM/ARMTargetTransformInfo.cpp
511 ↗(On Diff #223172)

Hmmm, okay. I also can't see removing unaligned support having a big negative effect. Sounds like I need to add some vectorization tests too, unless we already have them?

samparker marked an inline comment as done.Oct 7 2019, 8:13 AM
samparker added inline comments.
lib/Target/ARM/ARMInstrMVE.td
5196 ↗(On Diff #223220)

As much as I don't like copy-paste, I do appreciate being able to read the code! I think adding to that multiclass is more hassle than it's worth :)

dmgreen added inline comments.Oct 7 2019, 8:36 AM
lib/Target/ARM/ARMISelLowering.cpp
8887 ↗(On Diff #223220)

Hmmm. Yeah OK. I see. The PassThru is explicitly extended in tryToFoldExtOfMaskedLoad?

That makes sense, and the tests look OK. (There's one that is both sext and zext the same value, but that looks correct for where it is used).

Test for masked loads/stores longer than 128 bits sounds like a good idea. We should ideally be able to deal with longer vector by splitting them just fine.

lib/Target/ARM/ARMInstrMVE.td
5196 ↗(On Diff #223220)

Ha, Fair. I will agree with you there that sometimes more code is simpler.

lib/Target/ARM/ARMTargetTransformInfo.cpp
511 ↗(On Diff #223172)

There was one added to the vectoriser tests, but not for alignment checks as far as I remember.

samparker marked an inline comment as done.Oct 7 2019, 8:49 AM
samparker added inline comments.
lib/Target/ARM/ARMISelLowering.cpp
8887 ↗(On Diff #223220)

At some point, I was extending passthru... but it seems that is no longer the case! Our VMOVIMM is probably keeping us correct and if I extend it in dag combine, hopefully we won't need the bitcast handling here anymore.

samparker marked an inline comment as done.Oct 7 2019, 8:55 AM
samparker added inline comments.
lib/Target/ARM/ARMISelLowering.cpp
8887 ↗(On Diff #223220)

Ah, no. I was just being blind, passthru is extended.

samparker marked an inline comment as done.Oct 14 2019, 3:18 AM
samparker added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9283 ↗(On Diff #223172)

I had a look and I don't see how we could add an extra flag here. isLoadExtLegal will return true if the operation has been marked as legal, but for both targets (arm, x86) the MLOAD and MSTORE operations are set as custom. So I think having both calls will be necessary, with isVectorLoadExtDesirable enabling the fine grained control that we need.

samparker updated this revision to Diff 224835.EditedOct 14 2019, 5:32 AM
  • Rebased so we're now using MaybeAlign.
  • Removed codegen support for unaligned masked loads.
  • Added anyext PatFrags.
  • Added tests for wider than 128-bit vectors.
  • Added loop vectorize tests for unaligned accesses.
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2019, 5:32 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

@craig.topper This patch currently causes an isel failure for pr35443.ll when an v4i8 masked load is being zero extended into an v4i64. I know nothing about AVX, could you please advise whether this operation is supported or how to address the issue? Thanks.

craig.topper added inline comments.Oct 14 2019, 9:41 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9308

Is this function missing a one use check?

9311

Would a masked load ever not be a vector type?

9318

What if the masked load is already an extending load?

craig.topper added inline comments.Oct 14 2019, 9:47 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9324

Should we disable this if the load is an expanding load? X86 supports expanding loads, but not extending expanding loads. Or is the expectation that I should block that in X86's implementation of isVectorLoadExtDesirable?

@craig.topper This patch currently causes an isel failure for pr35443.ll when an v4i8 masked load is being zero extended into an v4i64. I know nothing about AVX, could you please advise whether this operation is supported or how to address the issue? Thanks.

Its supported. We need to add isel patterns for all the extend flavors. Or we need to block masked loads in isVectorLoadExtDesirable on X86 for now.

samparker marked an inline comment as done.Oct 15 2019, 2:54 AM

Thanks @craig.topper. I'll add the necessary changes into the X86 backend.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9324

Good point... I think it makes sense for this part to be as generic as possible and leave it to the backends. We'll have to do the same too.

samparker updated this revision to Diff 225007.Oct 15 2019, 5:31 AM
  • Addressed comments in the dag combiner.
  • Changed x86 backend so that extending masked loads are not desirable.
  • Changed arm backend so that expanding extending masked loads are not desirable.
  • Added more tests.
craig.topper added inline comments.Oct 15 2019, 10:26 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9312 ↗(On Diff #225007)

Check the extension type is NON_EXT explicitly. Don't rely on it being encoding 0.

9328 ↗(On Diff #225007)

I don't think this line is needed. Returning NewLoad should take care of it. The line that replaces SDValue(Ld, 1) is needed though.

dmgreen added inline comments.Oct 15 2019, 10:35 AM
lib/Target/ARM/ARMISelLowering.cpp
14707 ↗(On Diff #225007)

MVE doesn't support expanding loads, so it would be surprising if we did see one here. Having the check is good though.

lib/Target/ARM/ARMTargetTransformInfo.cpp
499 ↗(On Diff #225007)

How rare do you think loads with no explicit alignment to be? I think they don't come up from clang, but is it worth leaving them till later?

As far as I understand, if the alignment on the load was missing (value is 0), it is treated as the abi/pref alignment from the datalayout. So will have an alignment of 8 or 16, so will be aligned.

samparker marked an inline comment as done.Oct 16 2019, 2:38 AM
samparker added inline comments.
lib/Target/ARM/ARMTargetTransformInfo.cpp
499 ↗(On Diff #225007)

ok.

samparker updated this revision to Diff 225189.Oct 16 2019, 3:22 AM

Addressed comments.

dmgreen accepted this revision.Oct 16 2019, 5:49 AM

LGTM. If Craig is happy with the rest.

lib/Target/ARM/ARMTargetTransformInfo.cpp
510 ↗(On Diff #225189)

We still probably want to stop i64's and other types.

Maybe do it like "(EltWidth == 32 && (!Alignment || Alignment >= 4)) || ..."

This revision is now accepted and ready to land.Oct 16 2019, 5:49 AM
This revision was automatically updated to reflect the committed changes.