Page MenuHomePhabricator

Add new optimization pass of Tree-Height-Reduction
Needs RevisionPublic

Authored by masakazu.ueno on Sep 9 2019, 6:49 PM.

Details

Summary

The tree-height-reduction optimization increases the instruction-level parallelism by changing the order of calculations in a loop to keep the
calculation tree as short as possible.
For example, the following code is improved.
for (int i = 0; i < N; ++i){ a[i] = b[i] + c[i] + d[i] + e[i] + f[i] + g[i] + h[i] + k[i] ; }

  • Conditions for optimization
    • Integer arithmetic: -O1 or higher is effective
    • Floating point arithmetic -O1 or higher is effective, AND -ffast-math is effective, AND -mllvm -enable-fp-thr is effective

Diff Detail

Event Timeline

masakazu.ueno created this revision.Sep 9 2019, 6:49 PM

@masakazu.ueno not directly related to this patch, but i'm just wondering if this is something that could help with the similar pattern/question in https://bugs.llvm.org/show_bug.cgi?id=43146#c8

mgrang added inline comments.Sep 10 2019, 3:10 PM
lib/Transforms/Scalar/TreeHeightReduction.cpp
654

Please use the range-based llvm::sort function here. See https://llvm.org/docs/CodingStandards.html#beware-of-non-deterministic-sorting-order-of-equal-elements.

llvm::sort(Leaves);
hfinkel added inline comments.
lib/Transforms/Scalar/TreeHeightReduction.cpp
379

This is the only call to TTI that I see, and I don't see it don't see it documented anywhere what objective function this transformation is effectively using. Can you please add some explanation of that? Also, I'd expect to see some check on the instruction throughput -- because it might not make sense to introduce more ILP than the target can use -- and I'd expect to see some register-pressure estimate.

lkail added a subscriber: lkail.Sep 10 2019, 6:42 PM
xbolva00 added inline comments.Sep 11 2019, 3:47 AM
test/Transforms/TreeHeightReduction/AArch64/fp16-mult.c
1

This should be fp16-mult.ll, not fp16-mult.c

Please adjust tests to test only this new pass, not the -O1 pipeline.

troyj added a subscriber: troyj.Nov 18 2019, 1:47 PM
fsaintjacques requested changes to this revision.Dec 30 2019, 12:08 PM

I would appreciate if bitwise operators were supported, see my comments and example (could it be added in the unit test?)

lib/Transforms/Scalar/TreeHeightReduction.cpp
506

This should also work with bitwise instructions (AND, OR, XOR). I'm writing a small query language for bitmaps and was wondering if such re-balancing optimization existed. See example for IR which would benefit from this.

This revision now requires changes to proceed.Dec 30 2019, 12:08 PM
fsaintjacques added inline comments.Jan 9 2020, 10:07 AM
lib/Transforms/Scalar/TreeHeightReduction.cpp
73

The not is misleading as this is disabled by default.

379

@hfinkel I'd like to see this optimization in the next release. I don't have the time and knowledge to make the changes you request. Is it an acceptable solution if I disable this optimization in all levels, but make it accessible to front-end languages?

The cost estimation would be a followup task.

test/Transforms/TreeHeightReduction/AArch64/fp16-mult.c
1

I assume it's the same for all other files.