Page MenuHomePhabricator

[LSR] Generate formulae to enable more indexed accesses
Needs ReviewPublic

Authored by samparker on Dec 6 2018, 8:08 AM.

Details

Summary

Modify GenerateConstantOffsetsImpl to create offsets that can be used by indexed addressing modes. If formulae can be generated which result in the constant offset being the same size as the recurrence, we can generate an indexed access.

The resulting code, at least for Arm, is that usually pre-indexed loads are used as the last access, but sometimes the first.

@kparzysz Would you be able to provide feedback on how this effects Hexagon? It's a target that I don't build and I haven't looked at the tests, but I'm assuming this would interest you.

Diff Detail

Event Timeline

samparker created this revision.Dec 6 2018, 8:08 AM
gilr added inline comments.Dec 9 2018, 8:49 AM
test/CodeGen/ARM/loop-align-cortex-m.ll
3

You're just fixing this by the way, right?

test/Transforms/LoopStrengthReduce/ARM/complexity.ll
10 ↗(On Diff #176977)

Shouldn't -DEFAULT be removed here too?

samparker marked 2 inline comments as done.Dec 11 2018, 12:51 AM
samparker added inline comments.
test/CodeGen/ARM/loop-align-cortex-m.ll
3

yes

test/Transforms/LoopStrengthReduce/ARM/complexity.ll
10 ↗(On Diff #176977)

thanks!

I've run some tests and the results are not great for us. On some tests we got up to 5.5% improvements, but there are a lot of severe degradations (15+% worse). If this patch goes in, we'd like to be able to opt out.

Okay, thanks. We're also seeing some regressions, so I know I've got some tuning to do. Do you have any idea of the characteristics of your regressions? At the moment I'm thinking:

  • That the costs that I've added here are overly simplistic, for one I think I need to add a setup cost.
  • It's also probably not worth doing when we know that the loop iteration count is low.
  • In the current state, we also see code size regressions whereas your previous work helps us reduce code size. It may mean that I'll need a different flag to enable this change, but it also maybe a symptom of the performance regressions.
gilr added inline comments.Dec 12 2018, 12:13 AM
lib/Transforms/Scalar/LoopStrengthReduce.cpp
3793

Is this relevant for non-Address kind formulae?

3793

Worth a comment here to describe the motivation and how adjusting the offset generates the post-inc opportunity.

3795

How does a non-constant step (e.g. 50 + %x) translate to post-inc? Could you add such a test case?

samparker marked 2 inline comments as done.Dec 12 2018, 1:09 AM
samparker added inline comments.
lib/Transforms/Scalar/LoopStrengthReduce.cpp
3793

No. I had naively assumed that this function was only generating formulae for addresses... so if that's not the case, I'll make the change here.

3795

Ah, good catch. This should only trigger for constant steps.

samparker updated this revision to Diff 177872.Dec 12 2018, 9:53 AM

I've moved the logic under the control of a new TTI flag as it seems that the current shouldFavourPostInc is trying to achieve different things. Hopefully I've also addressed Gil's comments.

gilr added inline comments.Dec 16 2018, 5:56 AM
lib/Transforms/Scalar/LoopStrengthReduce.cpp
1258
  • The existing {LI, +, C} pattern seems to already match this case (i.e. {(-C + %a + ...), +, C}) and the more general case {(Offset - C2) + %a + ...), +, C2}, where Offset =/= 0. So IIUC matching this pattern here is only needed if the new TTI flag is set but the existing one is reset, right?
  • Won't this also match something like {3, +, 5, +, %x}?
1384

IIRC single-statement clauses shouldn't get curly braces.

3802

The new TTI API relates to the same HW feature, right? Why not use a cl::opt in LSR to turn this optimization off?

samparker marked an inline comment as done.Dec 19 2018, 1:27 AM
samparker added inline comments.
lib/Transforms/Scalar/LoopStrengthReduce.cpp
1258

I'm going to need to replace this with something that compares the step with the base offset, aiming to have the post increment happen on the last, not the first, access. I need to make a few other changes to support this though as well.

samparker marked an inline comment as done.Dec 19 2018, 4:49 AM
samparker added inline comments.
lib/Transforms/Scalar/LoopStrengthReduce.cpp
3802

When I've got the next patch ready, I will re-run my tests and see if this is viable. It may not be good for all targets, and if so, I think a target hook would be more useful.

samparker updated this revision to Diff 181251.EditedFri, Jan 11, 5:02 AM
  • Added a command line option to 'EnableBackedgePostIncs'.
  • Added a command line option to enable narrowing the search space by collapsing unrolled code.
  • The TTI hook now accepts the loop so that the target can make a more informed decision on when it 'shouldFavorBackedgePostIncs'.
  • LSRInstance contains a boolean 'FavorBackedgePostInc' which is equal to EnableBackedgePostIncs && shouldFavorBackedgePostIncs.
  • When FavorBackedgePostIncs:
    • Generate the new constant offsets.
    • In RateRegister, the LoopCost is now set to 0 if the step recurrence is equal to the base offset of the parent formula.
    • IsProfitableChain has a higher limit
    • The last expression in the IVChain is not added to IVIncSet so it's a target for optimising.
samparker retitled this revision from [LSR] Generate formulae to enable more post-incs to [LSR] Generate formulae to enable more indexed accesses.Fri, Jan 11, 7:22 AM
samparker edited the summary of this revision. (Show Details)
samparker updated this revision to Diff 181270.Fri, Jan 11, 7:42 AM

some renaming, renamed post inc bits to 'index'.

samparker updated this revision to Diff 181297.Fri, Jan 11, 9:22 AM

Changed the default value for the command line option 'EnableBackedgeIndexing' to false.

samparker updated this revision to Diff 181776.Tue, Jan 15, 6:44 AM

Removed changes to the complexity.ll test.