[CallSiteSplitting] Support splitting of blocks with instrs before call.
ClosedPublic

Authored by fhahn on Jan 9 2018, 5:41 AM.

Details

Summary

For basic blocks with instructions between the beginning of the block
and a call we have to duplicate the instructions before the call in all
split blocks and add PHI nodes for uses of the duplicated instructions
after the call.

Currently, the threshold for the number of instructions before a call
is quite low, to keep the impact on binary size low.

Diff Detail

Repository
rL LLVM
fhahn created this revision.Jan 9 2018, 5:41 AM
fhahn edited the summary of this revision. (Show Details)Jan 9 2018, 5:41 AM
fhahn updated this revision to Diff 129068.Jan 9 2018, 5:53 AM

Added test case test/Transforms/CallSiteSplitting/callsite-instructions-before-call.ll

fhahn updated this revision to Diff 130039.Jan 16 2018, 2:47 PM

Rebased after committing rL322599

Do you have any result for performance and size ?

lib/Transforms/Scalar/CallSiteSplitting.cpp
79 ↗(On Diff #130039)

Why don't we make it a command line option? It will be good to run a correctness test without being limited in the current default value.

307 ↗(On Diff #130039)

CurrentI might point a new PHI created below (line 309) during the iteration as it iterate reversely.

test/Transforms/CallSiteSplitting/callsite-instructions-before-call.ll
104 ↗(On Diff #130039)

We need more test to see if PHIs are created properly for cloned instructions used after callsite split.

fhahn updated this revision to Diff 130951.Jan 22 2018, 12:58 PM

Update code to insert new PHIs at the beginning of TailBB and delete instructions up to the original first instruction. This way, remove unnecessary existing PHI nodes, while not deleting newly created ones.

I hope I can post benchmark numbers with the new version of the patch tomorrow.

fhahn added a comment.Jan 23 2018, 8:23 AM

I've run spec2006, spec2000 and the test-suite on Cortex-A57. Slight improvement in the geomean speedup, no perf regressions, not size regressions.

Overall LGTM, but I want someone else also look at this.

lib/Transforms/Scalar/CallSiteSplitting.cpp
290 ↗(On Diff #130951)

Isn't it possible to hold space for ValueToValueMapTy for each predecessor and pass its reference to DuplicateInstructionsInSplitBetween.

For example, as we only support 2 predecessors for now, we can define Remaps as a fixed size array which has ValueToValueMapTy as its element, and then we can pass the reference of each array element to DuplicateInstructionsInSplitBetween for each predecessor.

313 ↗(On Diff #130951)

Instead of '2', it will be good to use the number of pred of TailBB. Or at least we need assert before using '2' directly here.

test/Transforms/CallSiteSplitting/callsite-instructions-before-call.ll
158 ↗(On Diff #130951)

I think it should be %i, instead of %0. Or did you intend [[V1]] ?

fhahn updated this revision to Diff 132384.Feb 1 2018, 6:38 AM
fhahn marked 2 inline comments as done.

Thanks for having a look! I've updated the patch to use a fixed size array for ValueToValueMapTys, used Preds.size() when creating the PHI node and updated the remaining tests.

fhahn edited the summary of this revision. (Show Details)Feb 1 2018, 6:39 AM
fhahn marked an inline comment as done.
fhahn added inline comments.
lib/Transforms/Scalar/CallSiteSplitting.cpp
290 ↗(On Diff #130951)

I've changed it to use a fixed size array. I am not entirely happy with it, but it's definitely nicer than the duplication :)

junbuml added inline comments.Feb 1 2018, 10:52 AM
lib/Transforms/Scalar/CallSiteSplitting.cpp
192–193 ↗(On Diff #132384)

Here we assume that the duplication costs of instructions are all same. However, I believe this assumption is not always true because each instruction here will end up differently after lowering. For example, if there is another call before the call we are trying to split, the cost for such call should be more expensive than other simple instructions (e.g., add or trunc ). If such call duplicated is inlined, the code size could be significantly impacted. I think we should consider using getUserCost(), instead of number of instructions ?

313 ↗(On Diff #132384)

If CurrnetI is PHI, "Mapping[CurrentI])->getParent()" may not point the new SplitBlock becasue DuplicateInstructionsInSplitBetween map a PHI with its original incoming value. Isn't it okay to skip doing this if CurrnetI is PHI?

Based on your test_remove_unused_phi(), we should add a test case something like this :

define void @test_no_remove_used_phi(i32* %ptrarg, i32* %ptrarg2, i32 %i, i32 %i2) {
Header:
  %l1 = load i32, i32* undef, align 16
  %tobool = icmp ne i32* %ptrarg, null
  br i1 %tobool, label %TBB, label %CallSite

TBB:                                    ; preds = %Header
  %arrayidx = getelementptr inbounds i32, i32* %ptrarg, i64 42
  %0 = load i32, i32* %arrayidx, align 4
  %l2 = load i32, i32* undef, align 16
  %tobool1 = icmp ne i32 %0, 0
  br i1 %tobool1, label %CallSite, label %End

CallSite:                                          ; preds = %TBB, %Header
  %l = phi i32 [ %l1, %Header ], [ %l2, %TBB ]
  call void @bar(i32* %ptrarg, i32 %l)
  call void @bari(i32 %l)                   <------------ %l is used here, so it will hit the case I mentioned.
  br label %End

End:                                           ; preds = %CallSite, %TBB
  ret void
}
fhahn updated this revision to Diff 132802.Feb 5 2018, 3:27 AM

Thanks Jun!
Update the patch to use getInstructionCost(i, TCK_CodeSize) to decide whether it is worth to perform call site splitting. Also avoid creating PHI nodes for existing PHI nodes unnecessarily.

fhahn marked 2 inline comments as done.Feb 5 2018, 3:35 AM
fhahn added inline comments.
lib/Transforms/Scalar/CallSiteSplitting.cpp
192–193 ↗(On Diff #132384)

I've updated the code to use getInstructionCost with TCK_CodeSize. Per default it uses getUserCost, which seems OK, although in some cases, the cost it returns might be too high for our case (e.g. sdiv has a high cost although it might be lowered to a single instruction)

313 ↗(On Diff #132384)

Thanks, there is indeed no need to re-create the PHI node there. I've added your test case. The incoming basic blocks in the PHI node are updated to use the split blocks. I assume this is done when splitting. Not sure if it would be better to use the original basic blocks. But I think that should be follow up patch if we want to change it ;)

junbuml added inline comments.Feb 6 2018, 10:22 AM
lib/Transforms/Scalar/CallSiteSplitting.cpp
85 ↗(On Diff #132802)

Considering that TCC_Basic is 1, don't you think 50 is too aggressive? For me, the conservative enough default should be TCC_Expensive.

I've updated the code to use getInstructionCost with TCK_CodeSize. Per default it uses getUserCost, which seems OK, although in some cases,
the cost it returns might be too high for our case (e.g. sdiv has a high cost although it might be lowered to a single instruction)

If getUserCost is supposed to represent only the code size (I believe so), then we may need to introduce a new target hook to handle sdiv properly in getOperationCost(). @hfinkel might have better idea about it.

fhahn updated this revision to Diff 133192.Feb 7 2018, 3:35 AM
fhahn marked an inline comment as done.

Fix default threshold.

fhahn added a comment.Feb 7 2018, 3:41 AM

If getUserCost is supposed to represent only the code size (I believe so), then we may need to introduce a new target hook to handle sdiv properly in getOperationCost(). @hfinkel might have better idea about it.

I am not entirely sure, but it looks like for most instructions, getUserCost delegates to getOperationCost, which per default marks sdiv & co as expensive. In any case, the patch uses

TTI.getInstructionCost(&InstBeforeCall,
                                   TargetTransformInfo::TCK_CodeSize);

so in case the code size cost becomes more accurate, we should pick it up.

lib/Transforms/Scalar/CallSiteSplitting.cpp
85 ↗(On Diff #132802)

Yeah, that was left over from testing, sorry about that!

junbuml accepted this revision.Feb 9 2018, 8:52 AM

LGTM, but I may want to hold it for sometime to let other reviewers take a chance to look at this as well.

lib/Transforms/Scalar/CallSiteSplitting.cpp
190 ↗(On Diff #133192)

when there the CodeSize -> when the CodeSize

This revision is now accepted and ready to land.Feb 9 2018, 8:52 AM
fhahn added a comment.Feb 9 2018, 10:47 AM

Thanks @junbuml ! I'll wait with committing until the middle of next week or so

This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Feb 13 2018, 7:39 AM

Reverted again in rL325009, as memsan was not happy with the array of the ValueToValueMaps. I'll look into how to resolve this.

fhahn reopened this revision.Feb 13 2018, 7:39 AM
This revision is now accepted and ready to land.Feb 13 2018, 7:39 AM
This revision was automatically updated to reflect the committed changes.