Page MenuHomePhabricator

Add new optimization pass of Tree-Height-Reduction
Needs ReviewPublic

Authored by masakazu.ueno on Mon, Sep 9, 6:49 PM.



The tree-height-reduction optimization increases the instruction-level parallelism by changing the order of calculations in a loop to keep the
calculation tree as short as possible.
For example, the following code is improved.
for (int i = 0; i < N; ++i){ a[i] = b[i] + c[i] + d[i] + e[i] + f[i] + g[i] + h[i] + k[i] ; }

  • Conditions for optimization
    • Integer arithmetic: -O1 or higher is effective
    • Floating point arithmetic -O1 or higher is effective, AND -ffast-math is effective, AND -mllvm -enable-fp-thr is effective

Diff Detail

Event Timeline

masakazu.ueno created this revision.Mon, Sep 9, 6:49 PM

@masakazu.ueno not directly related to this patch, but i'm just wondering if this is something that could help with the similar pattern/question in

mgrang added inline comments.Tue, Sep 10, 3:10 PM

Please use the range-based llvm::sort function here. See

hfinkel added inline comments.

This is the only call to TTI that I see, and I don't see it don't see it documented anywhere what objective function this transformation is effectively using. Can you please add some explanation of that? Also, I'd expect to see some check on the instruction throughput -- because it might not make sense to introduce more ILP than the target can use -- and I'd expect to see some register-pressure estimate.

lkail added a subscriber: lkail.Tue, Sep 10, 6:42 PM
xbolva00 added inline comments.Wed, Sep 11, 3:47 AM

This should be fp16-mult.ll, not fp16-mult.c

Please adjust tests to test only this new pass, not the -O1 pipeline.