This is an archive of the discontinued LLVM Phabricator instance.

[Inline][WIP] Expose more inlining opportunities by further constraining call site arguments based on splitting an OR condition.
AbandonedPublic

Authored by junbuml on Oct 6 2017, 12:52 PM.

Details

Summary

If a call site is dominated by an OR condition and if any of its arguments are predicated on this OR condition, see if splitting the condition (and thereby further constraining the arguments) increases our opportunities to inline the call.

For example, in the code below, if callee() is not inlinable, we try to split the call site since we can predicate the argument (ptr) based on the OR condition. Inline if any of the new call sites is inlinable.

Split the OR condition from :

char *ptr = foo();
bool cond = bar();
if (!ptr || cond)
  callee(ptr);

to:

if (!ptr)
  callee(null)  // pass null because ptr is known constant null
else if (cond)
  callee(nonnull ptr)   // set the nonnull attribute on the ptr argument

, if the inline cost for either callee(null) or callee(nonnull %ptr) is less than threshold.

This is WIP and needs more test cases, but I'm submitting to get an early high level feedback about its approach.

I found 20% performance improvement in spec2017/gcc without regression in my spec2000/2006/2017 tests on aarch64.

Diff Detail

Event Timeline

junbuml created this revision.Oct 6 2017, 12:52 PM
mcrosier retitled this revision from [Inline][WIP] Try to inline if predicated on OR condition to [Inline][WIP] Expose more inlining opportunities by further constraining call site arguments based on an splitting an OR condition..Oct 6 2017, 2:48 PM
mcrosier edited the summary of this revision. (Show Details)
mcrosier retitled this revision from [Inline][WIP] Expose more inlining opportunities by further constraining call site arguments based on an splitting an OR condition. to [Inline][WIP] Expose more inlining opportunities by further constraining call site arguments based on splitting an OR condition..Oct 6 2017, 2:50 PM
junbuml updated this revision to Diff 118704.Oct 11 2017, 2:50 PM

Simplified the code little bit and updated comments. Please let me know any comment.

This is good stuff, but I don't feel Inliner is the right place to do such transformation.

Narrowly speaking, it is callsite splitting transformation -- but in general it can be enhanced to handle more general block cloning/splitting to enable more const/predicate propagation (similar to jumpthreading)? IPA-cp based function cloning can also benefit from this.

This is good stuff, but I don't feel Inliner is the right place to do such transformation.
Narrowly speaking, it is callsite splitting transformation -- but in general it can be enhanced to handle more general block cloning/splitting to enable more const/predicate propagation (similar to jumpthreading)? IPA-cp based function cloning can also benefit from this.

Yes, this change itself is callsite splitting of which profitability is tightly related with the inline cost. That's why I placed this in inliner. Making this more general for blocks might be good, but I'm not sure about the profitability check in general and the range of blocks we have to cover. When isolating this just for callsite splitting, I wasn't able to find any other good place other than inliner. I will be happy to hear any suggestion.

If we limit this only for the call site splitting, do you still think inliner is not a good place for this?

I see a lot of potential to make this more general. As I mentioned, this is similar to constant propagation based function cloning -- exposing specialization opportunities seems not limited to inliner though inlining could be the biggest customer.

Consider this:

define void @foo(i32) local_unnamed_addr #0 {

%2 = icmp eq i32 %0, 10
%3 = select i1 %2, i32 1, i32 2
tail call void @bar(i32 %3) #2
ret void

}

Converting Select into control flow and expose the constant propagation opportunity should be done in the same pass.

Consider another example:

define void @foo(i32) local_unnamed_addr #0 {

%2 = icmp eq i32 %0, 10
br i1 %2, label %3, label %4

; <label>:3: ; preds = %1

tail call void @bar(i32 1) #2
br label %4

; <label>:4: ; preds = %1, %3

%5 = phi i32 [ 1, %3 ], [ 2, %1 ]
tail call void @bar(i32 %5) #2
ret void

}

Hoisting 'bar' call into incoming block of the phi can also expose opportunity.

Note that simplifyCFG pass in LLVM currently aggressively sink common code into the merge point -- which may lead to missing opportunities here. Chandler has a patch to undo that to reduce the damage done by the sinking but that pass is pretty late in the pipeline and won't help for inlining/cloning purpose.

junbuml abandoned this revision.Oct 20 2017, 1:34 PM

Submitted https://reviews.llvm.org/D39137 to add a new pass for call-site splitting.