This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Leverage PGO data to version/expand small/large memcpy calls
AbandonedPublic

Authored by sfertile on May 4 2017, 10:30 AM.

Details

Summary

This is the subsequent patch to D31772 which leverage pgo profiling data to make better decisions about when and under what conditions to actually perform known size memcpy call expansion and/or unknown size memcpy call versioning.

If PGO info exists and call is != cold:

  • expand valid known size memcpy calls
  • version unknown sized memcpy calls

If PGO info does not exists

  • expand valid known size memcpy calls
  • do NOT version unknown sized memcpy calls

Diff Detail

Event Timeline

lei created this revision.May 4 2017, 10:30 AM
gyiu added a subscriber: gyiu.Jul 21 2017, 11:57 AM
bnemanich commandeered this revision.Nov 3 2017, 2:23 PM
bnemanich added a reviewer: lei.
bnemanich updated this revision to Diff 121554.Nov 3 2017, 2:26 PM

Turns on memcpy loop expansion optimization by default for PowerPC.
Turns off memcpy loop expansion for unknown sized memcpy calls but provides a new option to enable expansion for unknown sized memcpy calls.
Also provides a new option to enable expansion for unknown sized memcpy calls for non-pgo runs.
Changes the loop datasize to be 32 bytes.

Micro-benchmarks have shown that this optimization can more than double the performance for known size memcpy calls that are smaller than 512 bytes on Power8 machines.

Can you break out some of this into "NFC statistics updates" and "the rest of the patch"? It'll make it much easier to review.

Thanks!

-eric

Updated version of the patch without keeping track of statistics.

hfinkel added inline comments.Dec 12 2017, 4:20 PM
lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp
162

You shouldn't do all of this for every memcpy call. That seems like it's at least O(N^2).

lei added inline comments.Dec 13 2017, 10:37 AM
lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp
162

The content of this if-stmt can be replaced with:

return CISize->getZExtValue() >= MemcpyLoopFloor &&
            CISize->getZExtValue() <= MemcpyLoopCeil;
sfertile added inline comments.Dec 14 2017, 11:53 AM
lib/Target/PowerPC/PPCTargetTransformInfo.cpp
486

I'm guessing this was meant to be a v2i64 instead of a v8i64

bnemanich updated this revision to Diff 127127.Dec 15 2017, 7:17 AM

Moved location for gathering profiling information to avoid extra computations.
Cleaned up some of the conditions.

bnemanich marked 3 inline comments as done.Dec 15 2017, 7:19 AM
bnemanich added inline comments.
lib/Target/PowerPC/PPCTargetTransformInfo.cpp
486

Actually, this was the intended size. This has the effect of having the main loop do four vector operations per iteration, which gives better performance.

hfinkel added inline comments.Dec 15 2017, 7:33 AM
lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp
93

Line too long.

142

hasPGOInfo -> HasPGOInfo

161

Comments should be complete sentences, starting with a capital letter, and ending with a period (here and below).

bnemanich updated this revision to Diff 127153.Dec 15 2017, 9:49 AM
bnemanich marked an inline comment as done.

Update the comments to be complete sentences.

bnemanich marked 3 inline comments as done.Dec 15 2017, 9:50 AM
lei added inline comments.Jan 10 2018, 6:56 AM
lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp
204

I think we should add this as a member to the class then there is no need to pass it as a param to shouldExpandmemCpy(). Not sure about the trade offs to initializing it here vs in runOnModule, but since it doesn't have a dependency on the Function it'll only be initialized once per module as opposed to for each Function which contains a memcpy.

sfertile commandeered this revision.Mar 27 2018, 1:06 PM
sfertile abandoned this revision.
sfertile edited reviewers, added: bnemanich; removed: sfertile.