This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Memory intrinsic calls optimization based on profiled size
ClosedPublic

Authored by xur on Jan 20 2017, 1:26 PM.

Details

Summary

This patch optimizes two memory intrinsic operations: memset and memcpy based on the profiled size of the operation. The high level transformation is like:

mem_op(..., size)
==>
switch (size) {
  case s1:
     mem_op(..., s1);
     goto merge_bb;
  case s2:
     mem_op(..., s2);
     goto merge_bb;
  ...
  default:
     mem_op(..., size);
     goto merge_bb;
  }
merge_bb:

There are a few internal options that controls when the optimization takes place.

This patch in an incremental patch based on
https://reviews.llvm.org/D28965

Diff Detail

Event Timeline

xur created this revision.Jan 20 2017, 1:26 PM
xur edited the summary of this revision. (Show Details)
davidxl edited edge metadata.Mar 7 2017, 1:59 PM

The optimization pass should be split into two phases as IC promotion. The annotation part should probably be merged with the instrumentation patch. The transformation patch should be done in the same pass as IndirectCallPromotion.

xur added a comment.Mar 8 2017, 11:11 AM

The optimization pass should be split into two phases as IC promotion. The annotation part should probably be merged with the instrumentation patch. The transformation patch should be done in the same pass as IndirectCallPromotion.

Is there a good reason for doing the transformation late? Here I don't do the annotation and instead, I do the transformation directly in the same pass. The main reason we have annotations in indirect-call-promotion is we need to call it late (in LTO or ThinLTO). Another reason I'm reluctant to use annotation is that we need to maintain/update it (for inline and unroll, for example).

-Rong

xur updated this revision to Diff 92382.Mar 20 2017, 2:05 PM

Integrated David's suggestion to make the optimization a stand along pass (a function pass).
It currently resides in IndirectCallPromotion.cpp. I will have a follow-up patch to change IndirectCallPromotion.cpp to a more appropriate name.

-Rong

davidxl added inline comments.Mar 20 2017, 4:58 PM
lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
789 ↗(On Diff #92382)

Add a comment on the kinds.

NormalGroup--> NonLargeGroup?

794 ↗(On Diff #92382)

Is it correct to guard the parsing here?

Should the parsing be called at the start of the pass with PreciseStart etc as a member variable?

824 ↗(On Diff #92382)

DC --> RemainCount?

843 ↗(On Diff #92382)

Should it use TotalCount here?

xur marked 3 inline comments as done.Mar 29 2017, 2:43 PM
xur added inline comments.
lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
794 ↗(On Diff #92382)

moved to class MemOPSizeOpt

843 ↗(On Diff #92382)

The percent threshold use the ratio of current count and remain count.
I refactor the code a bit (by using a function). The new code also compares the count with the count-threshold.

xur updated this revision to Diff 93408.Mar 29 2017, 2:48 PM
xur marked an inline comment as done.

This new patch integrated David's review suggestion to make this optimization a separated pass.
This pass will be called after inlining to avoid the potential interference to in-lining decisions. It will use both the metadata annotation for memop size profiling and the enclosing BB's edge count to find the actual count.

davidxl added inline comments.Mar 30 2017, 8:32 PM
lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
834 ↗(On Diff #93408)

Use APInt or SaturatingMultipleAdd to deal with possible overflow.

849 ↗(On Diff #93408)

Should check the actual count.

883 ↗(On Diff #93408)

can shrink the MemOpScaleCount check into getScaledCount

xur updated this revision to Diff 93689.Mar 31 2017, 12:14 PM
xur marked 3 inline comments as done.

integrated David's review comments

davidxl accepted this revision.Apr 1 2017, 12:37 PM

lgtm

This revision is now accepted and ready to land.Apr 1 2017, 12:37 PM
This revision was automatically updated to reflect the committed changes.