This is an archive of the discontinued LLVM Phabricator instance.

[ARM] MVE interleaving load and stores.
ClosedPublic

Authored by dmgreen on Oct 24 2019, 9:30 AM.

Details

Summary

Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore.

The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases.

We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though.

Diff Detail

Event Timeline

dmgreen created this revision.Oct 24 2019, 9:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2019, 9:30 AM
samparker added inline comments.Oct 28 2019, 4:26 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
16984

What is Factor here? Is it not the number of elements..?

17040

debug

17212

How about performing this with a loop?

dmgreen marked 3 inline comments as done.Oct 30 2019, 11:37 AM
dmgreen added inline comments.
llvm/lib/Target/ARM/ARMISelLowering.cpp
16984

Factor is the "n" in "vldn". So 2 or 4 for MVE, but also 3 for Neon. It comes from the elements of the shuffle, like the example in the comment at the start of this function.

We really just need it to say that a VLD3 isn't legal on MVE.

17040

Ah!

17212

Yeah. Nice suggestion. It involves a pop because of the last item changing, but looks better.

dmgreen updated this revision to Diff 227148.Oct 30 2019, 11:38 AM

Thanks. I've updated with the suggestions.

Also:

  • Added a load of extra tests.
  • Updated the cost model a little, notably for smaller than legal types that look like interleaves but actually use a single load with a vrev or a vmovn or something like it.
  • Err, formatted =)
samparker added inline comments.Nov 4 2019, 6:50 AM
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
817

if this is handling the smaller vector types, shouldn't this be <= 64 instead? Is the 2 below 'Factor'? If so it would be good to use the variable name or a clear explanation of what it is.

dmgreen marked an inline comment as done.Nov 5 2019, 6:57 AM
dmgreen added inline comments.
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
817

Oh, yeah, it's SrcVecTy, so yes <= 64 will be fine (128 will already be handled in the above if).

The 2 below is VLDR, VREV or VLDR; VMOVN. 2 Instructions though. I'll add a comment.

dmgreen updated this revision to Diff 227887.Nov 5 2019, 8:29 AM
dmgreen marked an inline comment as done.

Updated "vmovn" cost check.

samparker accepted this revision.Nov 6 2019, 2:03 AM

Nice one. LGTM

This revision is now accepted and ready to land.Nov 6 2019, 2:03 AM
This revision was automatically updated to reflect the committed changes.