This is an archive of the discontinued LLVM Phabricator instance.

[CallSiteSplitting] Remove isOrHeader restriction.
ClosedPublic

Authored by fhahn on Dec 1 2017, 7:36 AM.

Details

Summary

By following the single predecessors of the predecessors of the call
site, we do not need to restrict the control flow.

Diff Detail

Event Timeline

davide accepted this revision.Dec 1 2017, 10:17 AM
davide added a subscriber: davide.

LGTM

This revision is now accepted and ready to land.Dec 1 2017, 10:17 AM
junbuml edited edge metadata.Dec 6 2017, 2:46 PM

I believe we need to check the code size increase with this because it will be applied widely and clone call-sites which could be inlined.

lib/Transforms/Scalar/CallSiteSplitting.cpp
335

We should bail out when Preds[0] == Preds[1]. For below IR , we should not split CS.

define i32 @test(i32* %a, i32 %v, i32 %p) {
TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %Tail
Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  ret i32 %r
}
test/Transforms/CallSiteSplitting/callsite-split-or-phi.ll
169

We need more test cases because this change will hit even very simple case like :

define i32 @test(i32* %a, i32 %v, i32 %p) {
Header:
  br i1 undef, label %Tail, label %End

TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %End

Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  ret i32 %r

End:
  ret i32 %v
}
fhahn updated this revision to Diff 127494.Dec 19 2017, 4:28 AM
fhahn marked 2 inline comments as done.

Added testcases as suggested. I've also moved the or structure related tests to a separate file and the test case that should not be split to another file to avoid having test files that are too big. I think it would also make sense to move tests we do not expect to be split to that file.

I've run the LNT test suite, SPEC2006 and SPEC2000. There were no noticeable changes in code size (all changes < 0.5% on AArch64)

fhahn added a comment.Dec 20 2017, 9:24 AM

@junbuml I'd appreciate another look, please let me know if you are happy with the test changes.

I've run the LNT test suite, SPEC2006 and SPEC2000. There were no noticeable changes in code size (all changes < 0.5% on AArch64)

Thanks Florian for the update. Can you please also share if you see any + or - in performance? I will run performance tests in my side as well. It seems that the comment in the top of CallSiteSplitting.cpp should be updated accordingly.

test/Transforms/CallSiteSplitting/callsite-no-or-structure.ll
136 ↗(On Diff #127494)

It will be good to add a test case where Tail form a loop like :

define i32 @test_loop(i32* %a, i32 %v, i32 %p) {
TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %End

Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  br i1 undef, label %Tail, label %End

End:
  ret i32 %v
}

I didn't see any significant performance changes in my spec2000/2006/2017 runs on AArch64 .

fhahn added a comment.Dec 21 2017, 7:46 AM

I didn't see any significant performance changes in my spec2000/2006/2017 runs on AArch64 .

I don't see any signification changes either. It's more a step towards potentially making the pass more general, as @davidxl suggested a while ago in a comment.

For Cortex-A57, I get -0.23% on the geomean of SPEC2006 exec times, for SPEC2000 + test suite it's -0.38%

junbuml accepted this revision.Dec 21 2017, 7:50 AM

LGTM, assuming the update in the top comment .

This revision was automatically updated to reflect the committed changes.

LGTM, assuming the update in the top comment .

I've committed this after updating the top comment, as well as dropped to Or from tryToSplitOnOrPredicatedArgument and updated the comment there.

I hope the comments are clear.