This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
- If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c) callee(ptr);
to :
if (!ptr) callee(null ptr) // set the known constant value else if (c) callee(nonnull ptr) // set non-null attribute in the argument
- We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0: %c = icmp eq i32 %i1, %i2 br i1 %c, label %BB2, label %BB1 BB1: br label %BB2 BB2: %p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ] call void @bar(i32 %p)
to
BB0: %c = icmp eq i32 %i1, %i2 br i1 %c, label %BB2-split0, label %BB1 BB1: br label %BB2-split1 BB2-split0: call void @bar(i32 0) br label %BB2 BB2-split1: call void @bar(i32 1) br label %BB2 BB2: %p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
Enabled this for O3 and LTO. I didn't see any significant code size increase in my spec2000/2006/2017 tests on aarch64. Observed 20% performance gain in spec2017/gcc without regressions in other benchmarks.
I've added only two simple tests to demonstrate this pass, but I will add more
tests covering what this patch doing if the overall approach is reasonable.
Was this break supposed to be in the if above? As-is, it unconditionally breaks after the first iteration of the loop.
But even if it was a line further up, we'd only check the first Phi in the parent and check if it's an arg. I don't know if that's enough.