This is an archive of the discontinued LLVM Phabricator instance.

[JumpThreading] Split select that has constant conditions coming from the PHI node
ClosedPublic

Authored by haicheng on Dec 14 2015, 1:02 PM.

Details

Summary

Look for PHI/Select in the same BB of the form
bb:

%p = phi [false, %bb1], [true, %bb2 ], [false, %bb3], [true, %bb4], ...
%s = select p, trueval, falseval

And expand the select into a branch structure and this later enables threading over bb.

The motivation example is as follows

class H
{
public:

int a;              
int b;  
int c;

};

inline int operator < (H& A, H& B)
{

return (A.a < B.a) ? 1 :
       (A.a > B.a) ? 0 :
       (A.b < B.b) ? 1 :
       (A.b > B.b) ? 0 :
        A.c < B.c;

}

int s(H *h, int j)
{

if (h[j] < h[j+1])  
    j+=2;

return j;

}

The generated assembly in AArch64 is like below. X86 assembly has similar problem.

If(h0.a < h1.a) goto bb0
if (h0.a > h1.a) goto bb1
If(h0.b < h1.b) goto bb2
if (h0.b > h1.b) goto bb3
flag = (h0.c < h1.c)
goto if_end
bb0: Flag = 1; Goto if_end
bb1: Flag = 0; Goto if_end
bb2: Flag = 1; Goto if_end
bb3: Flag = 0; Goto if_end
if_end: j = select flag, j+2, j

The trivial basic blocks bb0 – bb3 can cause performance penalty if the comparison (h[j] < h[j+1]) is in a hot loop. The IR code of if_end is like this

%cond = phi i1 [ false, … ], [ true, … ], [ false, … ], [ true, … ], [%flag, …]
%add = add i32 %j, 2
%j.add = select i1 %cond.i, i32 %j, i32 %add

Every bool constant in the phi becomes a trivial basic block in the assembly. select is generated by SpeculativelyExecuteBB() of simplifyCFG before function inlining and prevents jump-threading to further optimize the CFG.

Diff Detail

Repository
rL LLVM

Event Timeline

haicheng updated this revision to Diff 42756.Dec 14 2015, 1:02 PM
haicheng retitled this revision from to [JumpThreading] Split select that has constant conditions coming from the PHI node.
haicheng updated this object.
haicheng added reviewers: gberry, mssimpso, mcrosier.
haicheng set the repository for this revision to rL LLVM.
gberry edited edge metadata.Dec 14 2015, 4:10 PM

I'm still looking over this but I had one question so far: would it make sense to unfold the select when the phi has any constant operands instead of all but one being constant?

haicheng added a comment.EditedDec 15 2015, 7:22 AM

I'm still looking over this but I had one question so far: would it make sense to unfold the select when the phi has any constant operands instead of all but one being constant?

My current logic is all but at most one being constant. It is the same as my motivation case, maybe it is too conservative. But I am not sure which is more efficient on different targets, select or branch structure, when the phi has more than one nonconstant.

I think it would make sense to do this unfolding if there is even one constant phi argument. This seems very similar to the code in SimplifyCFG.cpp:FoldCondBranchOnPHI, and that is the approach taken there.

One other possible approach to fixing this: the code that forms the select to begin with (SimplifyCFG.cpp:SpeculativelyExecuteBB), is trying to be very conservative with the amount of selects it generates. Perhaps we could make it more conservative by not doing the select conversion if the select condition is fed by a (non-intrinsic?) call? That seems to me to be in the spirit of what it is trying to achieve since presumably there won't be any instcombine-like optimizations exposed by doing the select conversion in this case. It would probably also be a simpler change.

haicheng updated this revision to Diff 42904.Dec 15 2015, 1:53 PM
haicheng edited edge metadata.
haicheng removed rL LLVM as the repository for this revision.

Now I split the select if the associated phi has at least one constant.

I am looking at SpeculativelyExecuteBB now, maybe we can do both.

haicheng set the repository for this revision to rL LLVM.Dec 15 2015, 1:55 PM

I am looking at SpeculativelyExecuteBB now, maybe we can do both.

I prevent SpeculativelyExecuteBB generating select if the select condition is call. Later, SimplifyCFG::FoldTwoEntryPHINode() generates the same select. I do same thing to SimplifyCFG::FoldTwoEntryPHINode() to not generate select in this case. But, after function inlining, SpeculativelyExecuteBB still can generate the same select because the cfg of the inlined function is not fully integrated yet.

I think it might be more general and simpler to make the changes in jumpthreading.

Hi Haicheng,

Your results from experimenting with preventing the select conversion in the first place sound reasonable to me. I've added a few comments below.

lib/Transforms/Scalar/JumpThreading.cpp
1894

Maybe add to this comment to indicate that you're just looking for at least one constant.

1953

I'm not sure I understand why this check is here. Are you actually duplicating any code in this transformation?

haicheng updated this revision to Diff 43031.Dec 16 2015, 10:35 AM

I updated the comments.

haicheng marked an inline comment as done.Dec 16 2015, 10:46 AM
haicheng added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
1953

Jumpthreading duplicates codes when it threads edges so that it checks the duplication cost before the transformation. The duplicated codes are usually optimized away afterwards.

I put the check here because I don't want to split select in a basic block that cannot be threaded. My check is actually a little conservative. I just need to check the instructions between phi and select rather than the whole basic block. Do you think it is worthwhile to change getJumpThreadDuplicationCost()?

gberry added inline comments.Dec 16 2015, 11:01 AM
lib/Transforms/Scalar/JumpThreading.cpp
1953

If you undo a select and no threading happens, won't it just get converted back into a select later? I'd be concerned about duplicating all of the logic that the threading code checks here.

haicheng updated this revision to Diff 43051.Dec 16 2015, 12:36 PM

remove the duplication cost check

haicheng marked an inline comment as done.Dec 16 2015, 12:37 PM

Hi Geoff,

I removed the duplication cost check. Should I change anything else?

Haicheng,

I added a few additional comments. I think this is ready for review in the larger community after addressing them.

lib/Transforms/Scalar/JumpThreading.cpp
1902

The first part of this code (the CFG modification) could be replaced with a call to SplitBlockAndInsertIfThenElse (BasicBlockUtils.h), which also takes care of setting some additional state (e.g. debuglocs).

1938

I think you should reject vector selects here as well.

1947

Could you simplify this check to just be isa<ConstantInt>?

junbuml added inline comments.Dec 18 2015, 11:34 AM
lib/Transforms/Scalar/JumpThreading.cpp
1898

I think you need to comment why at least one constant.

1934

You can merger this if with the if right above.

1937

I think you can use PN->user_back(), instead of PN->use_begin()->getUser().

1945

Since InBB is only used in here, if (PN->getIncomingBlock(i) == BB) will be better .

1948

Looks like you can directly return here "return SI;". What if BB has multiple PHIs ? This code looks checking only the first PHI.

haicheng added inline comments.Dec 18 2015, 12:17 PM
lib/Transforms/Scalar/JumpThreading.cpp
1948

I check if any Incoming block of PHI is not BB itself so that I do not return early.

To make things simple, I unfold one select a time. If there is another phi/select pair in the BB, I will unfold the select in the next pass after the current unfolded select is jump-threaded.

junbuml added inline comments.Dec 18 2015, 12:36 PM
lib/Transforms/Scalar/JumpThreading.cpp
1948

I check if any Incoming block of PHI is not BB itself so that I do not return early.

This make sense to me.

1952

Shouldn't it be continue? instead of return? What if BB has two pairs of phi/select, and the first pair is not match with your condition, but the second is matched.

haicheng updated this revision to Diff 43261.Dec 18 2015, 1:41 PM

Address the comments from Geoff and Jun Bum. Thank you, folks.

haicheng marked 9 inline comments as done.Dec 18 2015, 1:44 PM
mssimpso added inline comments.Jan 4 2016, 10:45 AM
lib/Transforms/Scalar/JumpThreading.cpp
1897

Please reword the sentence beginning with "And expand..." as its confusing. I think you're saying that SimplifyCFG will perform the optimization you want if the select is unfolded?

1942

Adding "&& !hasConst" to the terminating condition will break the loop early if you find an incoming constant.

junbuml added inline comments.Jan 4 2016, 10:55 AM
lib/Transforms/Scalar/JumpThreading.cpp
1949–1952

I think you can return SI when hasConst is true :

if (hasConst)
  return SI;
haicheng added inline comments.Jan 4 2016, 11:32 AM
lib/Transforms/Scalar/JumpThreading.cpp
1942

I check if any Incoming block of PHI is not BB itself so that I do not break the loop early.

haicheng updated this revision to Diff 43926.Jan 4 2016, 2:36 PM

Address Matt and Jun Bum's comments

haicheng marked 3 inline comments as done.Jan 4 2016, 2:40 PM
haicheng added a subscriber: llvm-commits.
mcrosier added inline comments.Jan 5 2016, 8:29 AM
lib/Transforms/Scalar/JumpThreading.cpp
1902

can -> will

1921

Honestly, I would get rid of this helper function and inline it into TryToUnfoldSelectInCurrBB(). It's highly unlikely this function will be reused and the current TryToUnfoldSelectInCurrBB implementation is trivially small..

1922

Please add a more verbose comment. Perhaps,

// If threading this would thread across a loop header, don't thread the edge.
// See the comments above FindLoopHeaders for justifications and caveats.
1926

Please maximize 80-column comments (i.e., only wrap the line after 80 chars).

Perhaps,

// Look for a Phi/Select pair in the same basic block.  The Phi feeds the
// condition of the Select and at least one the incoming values is a constant.
1948

Can't you remove the 'HasConst' bool entirely by returning SI here? This will also cause the loop to early exit.

haicheng added inline comments.Jan 5 2016, 9:04 AM
lib/Transforms/Scalar/JumpThreading.cpp
1948

I also check if any Incoming block of PHI is not BB itself so that I do not exit loop early here.

mcrosier added inline comments.Jan 5 2016, 11:11 AM
lib/Transforms/Scalar/JumpThreading.cpp
1948

Never mind, just ignore this comment. (Actually, I thought I deleted the comment before I submitted the review.)

haicheng updated this revision to Diff 44050.Jan 5 2016, 12:46 PM

Address Chad's comments.

haicheng marked 5 inline comments as done.Jan 5 2016, 12:49 PM

Adding Chandler and Phillip as they might have some general comments.

mcrosier accepted this revision.Jan 7 2016, 1:02 PM
mcrosier edited edge metadata.

LGTM. Overall, I'm happy with this patch. We saw a number of non-trivial improvements on external tests with no slowdowns. The differences in the inline assembly across Spec and the test-suite didn't raise any concerns. Thanks for working on this, Haicheng.

This revision is now accepted and ready to land.Jan 7 2016, 1:02 PM
haicheng closed this revision.Jan 8 2016, 12:47 PM

Committed in r257198.