The tree-height-reduction optimization increases the instruction-level parallelism by changing the order of calculations in a loop to keep the

calculation tree as short as possible.

For example, the following code is improved.

for (int i = 0; i < N; ++i){ a[i] = b[i] + c[i] + d[i] + e[i] + f[i] + g[i] + h[i] + k[i] ; }

- Conditions for optimization
- Integer arithmetic: -O1 or higher is effective
- Floating point arithmetic -O1 or higher is effective, AND -ffast-math is effective, AND -mllvm -enable-fp-thr is effective