We were previously not necessarily favouring postinc for the MVE loads and stores, leading to extra code prior to the loop to set up the preinc. MVE in general can benefit from postinc (as we don't have unrolled loops), and certain instructions like the VLD2's only offset post-inc versions.
Details
Diff Detail
Event Timeline
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1249 | nice one. would you mind adding a test to show we no longer unroll loops with just mve intrinsics? |
Sorry for the long delay here. This wasn't making things much better in the tests I was trying (it was a bit up-and-down). I've adjusted it now to include shouldFavorPostInc for MVE subtargets, not just disable shouldFavorBackedgeIndex. That should get the costmodel more correct in LSR, where the AddRec is now free because it can just be the postinc. I also removed the old "containsVectors(L)" check as adding something that is O(n) to the inner parts of LSR, something that is already O(something large), was probably a bad idea.
This means that it's just based on subtarget. I was hoping I could add Type here and use that, but the only type we have in LSR is the type of the SCEV, not the type of the memory being loaded (i.e just a pointer, not a vector). My benchmarking shows this to be an improvement (even if a bit of an unreliable one). More so with D71194. The argument is that most simple loops are vectorized, not unrolled, so favouring post-inc is a slightly better alternative in general when we have MVE.
nit: "modify unroll"? Perhaps clarify this a bit.