Convert ComplexityLimit into a command line value.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
test/Transforms/LoopStrengthReduce/ARM/complexity.ll | ||
---|---|---|
28 ↗ | (On Diff #175256) | Could you get rid of the %[0-9]+ implicit variable names? ('opt -instnamer' does that for you) |
test/Transforms/LoopStrengthReduce/ARM/complexity.ll | ||
---|---|---|
101 ↗ | (On Diff #175256) | Can the code be reduced to just this loop? Are the outer/epilog loops necessary? |
Thanks for taking a look and for the value renamer tip! The test case has now been renamed and reduced.
Sure. Credit for the tip goes to @qcolombet who enlightened me about this :)
test/Transforms/LoopStrengthReduce/lsr-comp-time.ll | ||
---|---|---|
2 ↗ | (On Diff #175654) | Won't this effectively limit LSR to search spaces it can process w/o pruning? |
LGTM with @gilr's comments addressed.
Out-of-curiosity why do you need to play with that parameter?
Hi Quentin
I've found that increasing this limit helps the performance of unrolled loops where too many registers can be used for address computation. I'm working towards enabling greater use of post increment addressing modes, and I believe getting LSR to produce pointer phis is the first step.
test/Transforms/LoopStrengthReduce/lsr-comp-time.ll | ||
---|---|---|
2 ↗ | (On Diff #175654) | As I understand, this high limit will prevent pruning from happening in more cases which leads to the solver having to process more formulae. I would hope that this doesn't limit the transform. |
@qcolombet maybe you could suggest the areas in LSR which will enable to help produce these post inc loads? What we have with the default complexity is:
loop: %index = phi i32 load %base, index load %base, index + 1 load %base, index + 2 load %base, index + 3
Increasing the complexity replaces the i32 phi with pointers, and the entry value is base-2. What I would like to see is base-1 on entry, then:
%base = %base.orig - 1 loop: %base.iv = phi [ %base, %base.iv.next ] load %base.iv, 1 load %base.iv, 2 load %base.iv, 3 load %base1.iv, 4 base.iv.next = gep base.iv, 4
With this form, the first three loads can be base + immediate offset, while the last can perform the post increment to create %base.iv.next. What do you think?
Hi @samparker,
[...]
With this form, the first three loads can be base + immediate offset, while the last can perform the post increment to create %base.iv.next. What do you think?
Do you see the formula you are interested in the considered solutions when you increase the limit?
If yes, I would check why this one is preferred against the one you want and see how to improve the cost model.
If not, we'll have to teach LSR how to produce this formula in the generateXXX methods.
Cheers,
-Quentin
Ah, ok, I don't think they're being generated. I will have a look at GenerateConstantOffsetsImpl. Thanks!