Page MenuHomePhabricator

[Loop Rotation] Make default max rotation header size threshold dependent on target CPU
Needs ReviewPublic

Authored by aturetsk on Apr 8 2016, 4:13 AM.

Details

Summary

Make default max rotation header size threshold dependent on target CPU through TTI.
Also set the threshold to 2 for Lakemont. Experiments on MCU benchmarks (Dhrystone, Coremark) show that this is the most optimal value for code size and performance.

Diff Detail

Event Timeline

aturetsk updated this revision to Diff 53015.Apr 8 2016, 4:13 AM
aturetsk retitled this revision from to [Loop Rotation] Make default max rotation header size threshold dependent on target CPU.
aturetsk updated this object.
aturetsk added subscribers: llvm-commits, zinovy.nis.
zzheng added a subscriber: zzheng.Apr 27 2016, 11:21 AM

Minor suggestion to use Optional<unsigned>.

lib/Transforms/Scalar/LoopRotation.cpp
618
if (SpecifiedThreshold.hasValue())
  return *SpecifiedThreshold;
else {
...
}
646–648

Can we take this opportunity to change it to

Optional<unsigned> SpecifiedMaxHeaderSize
aturetsk updated this revision to Diff 55578.Apr 29 2016, 5:13 AM

Use Optional<unsigned>.

Hi,
Thanks for the review. I used 'Optional<unsigned>' as suggested.

echristo added inline comments.May 1 2016, 9:57 PM
lib/Target/X86/X86TargetTransformInfo.cpp
1591

Explain the change?

lib/Transforms/Scalar/LoopRotation.cpp
642–643

This all feels awkward. Can you try to rework this a bit?

aturetsk updated this revision to Diff 56270.May 5 2016, 5:34 AM

Added a comment in the X86TTIImpl::getLoopRotationDefaultThreshold.

Hi Eric,
Thanks for the review.

lib/Target/X86/X86TargetTransformInfo.cpp
1591

Added a comment.

lib/Transforms/Scalar/LoopRotation.cpp
642–643

Do I understand right that you don't like having Optional<> type for the variable which is not a function argument?

Unfortunately we can't use TTI object in the constructor, so we need to keep somehow the information whether the SpecifiedMaxHeaderSize was passed to the createLoopRotate or not until the chooseMaxHeaderSize is called. I see three ways to do that:

  1. Use "Optional<unsigned> SpecifiedThreshold" as it is now
  2. Use "int SpecifiedThreshold" and '-1' would mean the argument wasn't passed to the createLoopRotate. That's close to what the patch used to be.
  3. Use 'unsigned SpecifiedThreshold' and 'bool IsSpecified'. The bool variable would indicate whether the argument was passed to the createLoopRotate or not.

Personally, I prefer the first option. BTW, I saw Optional<> used with class fields in other places (e.g. in lib/Transforms/Scalar/LoopUnrollPass.cpp in LoopUnroll class around the line 785).

zzheng added inline comments.May 5 2016, 1:59 PM
lib/Transforms/Scalar/LoopRotation.cpp
642–643

I think either 1 or 3 will be fine. If we choose 3, we can remove the static unsigned chooseMaxHeaderSize().

class LoopRotate : public LoopPass {
unsigned SpecifiedThreshold;
bool IsSpecified;
...
LoopRotate(Optional<unsigned> SpecifiedMaxHeaderSize = None)
    : LoopPass(ID) {
  ...
  if (SpecifiedMaxHeaderSize.hasValue())
    SpecifiedThreshold = *SpecifiedMaxHeaderSize;
  else
    SpecifiedThreshold = RotationThreshold;

   IsSpecified = SpecifiedMaxHeaderSize.hasValue() || RotationThreshold.getNumOccurrences() > 0;
}
...

We can even change the if-else above to

SpecifiedThreshold = SpecifiedMaxHeaderSize.hasValue() ?
                                *SpecifiedMaxHeaderSize : RotationThreshold;

as long as it conforms with the coding standard.

Later:

return iterativelyRotateLoop(
    L, IsSpecified ? SpecifiedThreshold : TTI->getLoopRotationDefaultThreshold(), LI, TTI, AC, DT, SE);
aturetsk updated this revision to Diff 56905.May 11 2016, 7:04 AM
  1. Rebased to the newer trunk.
  2. The loop rotation pass has been ported to the new pass manager. Extend this patch for the new version of the pass as well.
  3. Use approach #3 (with bool variable keeping information about the createLoopRotatePass argument).

The patch looks in good shape to me. You can use clang-format to polish indent. See inlined comments.

lib/Transforms/Scalar/LoopRotation.cpp
634

This indent seems weird, is this the result of clang-format?

677

Same indent oddness...

All formatting weirdnesses you mentioned are the product of clang-format.
Unless I missed something all the changes were clang-formatted.

echristo edited edge metadata.Jul 18 2016, 11:21 AM

This is in my list, I'm trying to fix a bit of the pass before this. It shouldn't be too bad though.

lib/Target/X86/X86TargetTransformInfo.cpp
1591

Can you elaborate? What did you test on, do you know why? Is it just magic or is it based on something about the cpu?

OK.

lib/Target/X86/X86TargetTransformInfo.cpp
1591

The positive effect was measured on Spec2000 (code size only, Spec2000 is too large to run on MCU) and Dhrystone (code size and performance). Also tested code size on ULP and code size with performance on Coremark, but there was no effect.
The initial idea was to just improve code size for -Os by reducing the threshold, but since performance on Dhrystone improved as well (both for -Os and -O2), it seems to be a good idea to use a smaller threshold for Lakemont regardless of -O option.
So this is based on mere tuning, no special idea behind it.

Hi Eric,
I will manage D18898 patch on behalf of Andrey.
Are you still working on the fix you mentioned above?

This is in my list, I'm trying to fix a bit of the pass before this. It shouldn't be too bad though.

Hi Eric! Could you please update a status of your mentioned fix?

hfinkel added inline comments.
include/llvm/Analysis/TargetTransformInfo.h
298

Please explain here what this threshold actually does (I imagine that this is the maximum header size). Unfortunately, LoopRotate barely documents what it does, so we might need to improve that in order for this to make sense.

lib/Target/X86/X86TargetTransformInfo.cpp
1591

If this is something that is useful for a target to tune on a per-model basis, then we should add it to the scheduling model (for example, see LoopMicroOpBufferSize in SchedMachineModel and how that's handled).

We really should understand what is going on, however. Loop rotation uses CodeMetrics to calculate the effective size, but that cost model (TTI.getUserCost) has received very little attention. Also, does lakemont need any kind of scheduling model? It does not seem to have one at all.