This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: BlockPlacement: Increase tail duplication size for O3.
ClosedPublic

Authored by iteratee on Apr 20 2017, 4:28 PM.

Details

Reviewers
davidxl
Summary

At O3 we are more willing to increase size if we believe it will improve
performance. The current threshold for tail-duplication of 2 instructions is
conservative, and can be relaxed at O3.

Benchmark results:
llvm test-suite:
6% improvement in aha, due to duplication of loop latch
3% improvement in hexxagon for similar reasons.

2% slowdown in lpbench. Seems related, but couldn't completely diagnose.

Internal google benchmark:
Produces 4% improvement on internal google protocol buffer serialization
benchmarks.

Diff Detail

Event Timeline

iteratee created this revision.Apr 20 2017, 4:28 PM
davidxl added inline comments.Apr 20 2017, 4:35 PM
lib/CodeGen/MachineBlockPlacement.cpp
2657

Is it better to have two parameters: TailDupThreshold and TailDupAggressiveThreshold? The later can be used for O3.

iteratee updated this revision to Diff 97534.May 2 2017, 6:00 PM

Made the aggressive threshold an option.

iteratee marked an inline comment as done.May 2 2017, 6:01 PM
davidxl added inline comments.May 3 2017, 9:12 AM
lib/CodeGen/MachineBlockPlacement.cpp
2662

I think when the aggressive threshold is also explicitly specified, then it should take precedence even at O2. Basically this is the order:

Explicit Aggressive Threshold
Explicit regular Threshold

Implicit Aggressive at O3 and implicit regular at O2.

iteratee added inline comments.May 4 2017, 3:44 PM
lib/CodeGen/MachineBlockPlacement.cpp
2662

At O3 I think it should be:

Explicit Aggressive Threshold
Explicit Regular Threshold
Implicit Aggressive Threshold

At O2 I think it should be:

Explicit Regular Threshold
Explicit Aggressive Threshold
Implicit Regular Threshold

For instance someone may want to adjust both flags globally and compile individual modules at O2 or O3.

davidxl added inline comments.May 12 2017, 9:32 AM
lib/CodeGen/MachineBlockPlacement.cpp
2662

Do you have an updated patch with the proposed logic?

No, I wanted to get agreement before I re-wrote it. I can do it if you'd like to see it before deciding.

iteratee updated this revision to Diff 98850.May 12 2017, 2:48 PM
iteratee set the repository for this revision to rL LLVM.

If either threshold is the only one explicitly set, use that threshold.
Otherwise, if both, or neither are set, use the aggressive threshold at O3

davidxl accepted this revision.May 12 2017, 4:17 PM

lgtm

This revision is now accepted and ready to land.May 12 2017, 4:17 PM
iteratee closed this revision.May 15 2017, 10:53 AM

Committed in rL303084