This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Favour post inc for MVE loops
ClosedPublic

Authored by dmgreen on Nov 27 2019, 9:12 AM.

Details

Summary

We were previously not necessarily favouring postinc for the MVE loads and stores, leading to extra code prior to the loop to set up the preinc. MVE in general can benefit from postinc (as we don't have unrolled loops), and certain instructions like the VLD2's only offset post-inc versions.

Diff Detail

Event Timeline

dmgreen created this revision.Nov 27 2019, 9:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 27 2019, 9:12 AM
dmgreen updated this revision to Diff 231284.Nov 27 2019, 9:28 AM

Update code to what I was intending.

SjoerdMeijer accepted this revision.Nov 29 2019, 1:33 AM

LGTM

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
67

nit: "modify unroll"? Perhaps clarify this a bit.

80

Is there a test for optsize?

This revision is now accepted and ready to land.Nov 29 2019, 1:33 AM
samparker added inline comments.Nov 29 2019, 1:43 AM
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1253

nice one. would you mind adding a test to show we no longer unroll loops with just mve intrinsics?

dmgreen updated this revision to Diff 237627.Jan 13 2020, 5:36 AM
dmgreen marked an inline comment as done.

Sorry for the long delay here. This wasn't making things much better in the tests I was trying (it was a bit up-and-down). I've adjusted it now to include shouldFavorPostInc for MVE subtargets, not just disable shouldFavorBackedgeIndex. That should get the costmodel more correct in LSR, where the AddRec is now free because it can just be the postinc. I also removed the old "containsVectors(L)" check as adding something that is O(n) to the inner parts of LSR, something that is already O(something large), was probably a bad idea.

This means that it's just based on subtarget. I was hoping I could add Type here and use that, but the only type we have in LSR is the type of the SCEV, not the type of the memory being loaded (i.e just a pointer, not a vector). My benchmarking shows this to be an improvement (even if a bit of an unreliable one). More so with D71194. The argument is that most simple loops are vectorized, not unrolled, so favouring post-inc is a slightly better alternative in general when we have MVE.

This revision was automatically updated to reflect the committed changes.