This is an archive of the discontinued LLVM Phabricator instance.

[LoopStrengthReduce] ComplexityLimit as an option
ClosedPublic

Authored by samparker on Nov 26 2018, 7:39 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

samparker created this revision.Nov 26 2018, 7:39 AM
gilr added inline comments.Nov 27 2018, 5:37 AM
test/Transforms/LoopStrengthReduce/ARM/complexity.ll
28 ↗(On Diff #175256)

Could you get rid of the %[0-9]+ implicit variable names? ('opt -instnamer' does that for you)
Those are painful when manually editing tests.

gilr added inline comments.Nov 27 2018, 8:26 AM
test/Transforms/LoopStrengthReduce/ARM/complexity.ll
101 ↗(On Diff #175256)

Can the code be reduced to just this loop? Are the outer/epilog loops necessary?

samparker updated this revision to Diff 175654.Nov 28 2018, 3:37 AM

Thanks for taking a look and for the value renamer tip! The test case has now been renamed and reduced.

gilr added a subscriber: qcolombet.Nov 28 2018, 2:21 PM

Thanks for taking a look and for the value renamer tip! The test case has now been renamed and reduced.

Sure. Credit for the tip goes to @qcolombet who enlightened me about this :)

test/Transforms/LoopStrengthReduce/lsr-comp-time.ll
2 ↗(On Diff #175654)

Won't this effectively limit LSR to search spaces it can process w/o pruning?

qcolombet accepted this revision.Nov 28 2018, 3:02 PM

LGTM with @gilr's comments addressed.

Out-of-curiosity why do you need to play with that parameter?

This revision is now accepted and ready to land.Nov 28 2018, 3:02 PM
samparker marked an inline comment as done.Nov 28 2018, 11:52 PM

Hi Quentin

I've found that increasing this limit helps the performance of unrolled loops where too many registers can be used for address computation. I'm working towards enabling greater use of post increment addressing modes, and I believe getting LSR to produce pointer phis is the first step.

test/Transforms/LoopStrengthReduce/lsr-comp-time.ll
2 ↗(On Diff #175654)

As I understand, this high limit will prevent pruning from happening in more cases which leads to the solver having to process more formulae. I would hope that this doesn't limit the transform.

This revision was automatically updated to reflect the committed changes.
samparker added a comment.EditedNov 29 2018, 3:52 AM

@qcolombet maybe you could suggest the areas in LSR which will enable to help produce these post inc loads? What we have with the default complexity is:

loop:
%index = phi i32
load %base, index
load %base, index + 1
load %base, index + 2
load %base, index + 3

Increasing the complexity replaces the i32 phi with pointers, and the entry value is base-2. What I would like to see is base-1 on entry, then:

%base = %base.orig - 1
loop:
%base.iv = phi [ %base, %base.iv.next ]
load %base.iv, 1
load %base.iv, 2
load %base.iv, 3
load %base1.iv, 4
base.iv.next = gep base.iv, 4

With this form, the first three loads can be base + immediate offset, while the last can perform the post increment to create %base.iv.next. What do you think?

Hi @samparker,

[...]
With this form, the first three loads can be base + immediate offset, while the last can perform the post increment to create %base.iv.next. What do you think?

Do you see the formula you are interested in the considered solutions when you increase the limit?
If yes, I would check why this one is preferred against the one you want and see how to improve the cost model.
If not, we'll have to teach LSR how to produce this formula in the generateXXX methods.

Cheers,
-Quentin

Ah, ok, I don't think they're being generated. I will have a look at GenerateConstantOffsetsImpl. Thanks!