This is an archive of the discontinued LLVM Phabricator instance.

[ARM] MVE interleaving load and stores.
ClosedPublic

Authored by dmgreen on Oct 24 2019, 9:30 AM.

Details

Summary

Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore.

The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases.

We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though.

Diff Detail

Event Timeline

dmgreen created this revision.Oct 24 2019, 9:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2019, 9:30 AM
samparker added inline comments.Oct 28 2019, 4:26 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
16751

What is Factor here? Is it not the number of elements..?

16807

debug

16979

How about performing this with a loop?

dmgreen marked 3 inline comments as done.Oct 30 2019, 11:37 AM
dmgreen added inline comments.
llvm/lib/Target/ARM/ARMISelLowering.cpp
16751

Factor is the "n" in "vldn". So 2 or 4 for MVE, but also 3 for Neon. It comes from the elements of the shuffle, like the example in the comment at the start of this function.

We really just need it to say that a VLD3 isn't legal on MVE.

16807

Ah!

16979

Yeah. Nice suggestion. It involves a pop because of the last item changing, but looks better.

dmgreen updated this revision to Diff 227148.Oct 30 2019, 11:38 AM

Thanks. I've updated with the suggestions.

Also:

  • Added a load of extra tests.
  • Updated the cost model a little, notably for smaller than legal types that look like interleaves but actually use a single load with a vrev or a vmovn or something like it.
  • Err, formatted =)
samparker added inline comments.Nov 4 2019, 6:50 AM
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
786

if this is handling the smaller vector types, shouldn't this be <= 64 instead? Is the 2 below 'Factor'? If so it would be good to use the variable name or a clear explanation of what it is.

dmgreen marked an inline comment as done.Nov 5 2019, 6:57 AM
dmgreen added inline comments.
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
786

Oh, yeah, it's SrcVecTy, so yes <= 64 will be fine (128 will already be handled in the above if).

The 2 below is VLDR, VREV or VLDR; VMOVN. 2 Instructions though. I'll add a comment.

dmgreen updated this revision to Diff 227887.Nov 5 2019, 8:29 AM
dmgreen marked an inline comment as done.

Updated "vmovn" cost check.

samparker accepted this revision.Nov 6 2019, 2:03 AM

Nice one. LGTM

This revision is now accepted and ready to land.Nov 6 2019, 2:03 AM
This revision was automatically updated to reflect the committed changes.