With this patch I tried to reduce the complexity of the code sightly, by
removing some indirection. Please let me know what you think.
Details
Diff Detail
Event Timeline
Thanks Florian for cleaning up this pass. Overall look good, but this doesn't seem a NFC and can miss an opportunity for a constant phi in non-OR structure. Please see my inline comments.
lib/Transforms/Scalar/CallSiteSplitting.cpp | ||
---|---|---|
386 | In case HeaderBB has more than two successors (e.g, if a switch is it's terminator), this change could allow splitting. If you want to keep this change NFC, the number of successors of HeaderBB may need to be checked. | |
393 | We also split a call-site if it use a PHI with all constant incoming values. In this case, Preds are not necessarily an OR structure. For example, in the diamond pattern below, we should split the call site. define i32 @test(i32* %a, i32 %v) { entry: br i1 undef, label %TBB0, label %TBB1 TBB0: br i1 undef, label %Tail, label %End TBB1: br i1 undef, label %Tail, label %End Tail: %p = phi i32[1,%TBB0], [2, %TBB1] %r = call i32 @callee(i32* %a, i32 %v, i32 %p) ret i32 %r End: ret i32 %v } |
Added tryToSplitOnOrPredicatedArgument and tryToSplitOnPHIPredicatedArgument to make clear which checks are required for each. This changed the order of the predecessors in 2 test cases, as now we do not determine the HeaderBB for PHI predicated arguments.
I have also added the test case Jun suggested to make sure this change does not regress things.
lib/Transforms/Scalar/CallSiteSplitting.cpp | ||
---|---|---|
386 | I think for the OR case the checks for ICMP would catch this case later (as switch for i1 only can have 2 successors I think) , but I could add an explicit check. | |
393 | Done! I've separated the code paths splitting on PHI and on Or, as they have different legality checks. |
lib/Transforms/Scalar/CallSiteSplitting.cpp | ||
---|---|---|
386 | Currently, the below test case is not handled, but with this change, we can split the call-site. Although it's not the OR structure, it doesn't seem to break the original intention of this pass. Looks like we can even extend this pass to support switch better. However, if we want to support this in this change, I think we should run tests. define i32 @test(i32* %a, i32 %i, i32 %v) { Header: switch i32 %i, label %End [ i32 0, label %BB0 i32 1, label %BB1 i32 2, label %BB2 i32 3, label %Tail i32 4, label %Tail ] BB0: br label %End BB1: br label %End BB2: %cmp = icmp eq i32 %v, 1 br i1 %cmp, label %Tail, label %End Tail: %r = call i32 @callee(i32* %a, i32 %v) ret i32 %r End: ret i32 %v } | |
test/Transforms/CallSiteSplitting/callsite-split-or-phi.ll | ||
232 | Maybe test_cfg_no_or --> test_cfg_no_or_phi ? |
Just a side comment -- this looks more and more like more general code cloning optimization to enable constant/range propagations -- may want to consider rename the pass and make it more general.
Yep, one of the reasons I tried reduce some indirection is to make it easier to extend the pass :)
lib/Transforms/Scalar/CallSiteSplitting.cpp | ||
---|---|---|
386 | I think it will be best to keep this change as a NFC and go from there, what do you think? I would plan to update the patch to only do the transform if the header has 2 successors in the OR case and post a follow-up patch to enable the additional transform after that. |
lib/Transforms/Scalar/CallSiteSplitting.cpp | ||
---|---|---|
386 | Sounds good to me. |
Add check to ensure we only split call sites in tryToSplitOnOrPredicatedArgument if the header block has exactly 2 successors. I plan to post a patch relaxing that soonish.
LGTM. Please run clang-format before committing.
lib/Transforms/Scalar/CallSiteSplitting.cpp | ||
---|---|---|
117 | Looks like more than 80? |
Looks like more than 80?