Make default max rotation header size threshold dependent on target CPU through TTI.
Also set the threshold to 2 for Lakemont. Experiments on MCU benchmarks (Dhrystone, Coremark) show that this is the most optimal value for code size and performance.
Details
- Reviewers
nadav mzolotukhin echristo bogner bruno
Diff Detail
Event Timeline
Hi Eric,
Thanks for the review.
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
1591 | Added a comment. | |
lib/Transforms/Scalar/LoopRotation.cpp | ||
642–643 | Do I understand right that you don't like having Optional<> type for the variable which is not a function argument? Unfortunately we can't use TTI object in the constructor, so we need to keep somehow the information whether the SpecifiedMaxHeaderSize was passed to the createLoopRotate or not until the chooseMaxHeaderSize is called. I see three ways to do that:
Personally, I prefer the first option. BTW, I saw Optional<> used with class fields in other places (e.g. in lib/Transforms/Scalar/LoopUnrollPass.cpp in LoopUnroll class around the line 785). |
lib/Transforms/Scalar/LoopRotation.cpp | ||
---|---|---|
642–643 | I think either 1 or 3 will be fine. If we choose 3, we can remove the static unsigned chooseMaxHeaderSize(). class LoopRotate : public LoopPass { unsigned SpecifiedThreshold; bool IsSpecified; ... LoopRotate(Optional<unsigned> SpecifiedMaxHeaderSize = None) : LoopPass(ID) { ... if (SpecifiedMaxHeaderSize.hasValue()) SpecifiedThreshold = *SpecifiedMaxHeaderSize; else SpecifiedThreshold = RotationThreshold; IsSpecified = SpecifiedMaxHeaderSize.hasValue() || RotationThreshold.getNumOccurrences() > 0; } ... We can even change the if-else above to SpecifiedThreshold = SpecifiedMaxHeaderSize.hasValue() ? *SpecifiedMaxHeaderSize : RotationThreshold; as long as it conforms with the coding standard. Later: return iterativelyRotateLoop( L, IsSpecified ? SpecifiedThreshold : TTI->getLoopRotationDefaultThreshold(), LI, TTI, AC, DT, SE); |
- Rebased to the newer trunk.
- The loop rotation pass has been ported to the new pass manager. Extend this patch for the new version of the pass as well.
- Use approach #3 (with bool variable keeping information about the createLoopRotatePass argument).
All formatting weirdnesses you mentioned are the product of clang-format.
Unless I missed something all the changes were clang-formatted.
This is in my list, I'm trying to fix a bit of the pass before this. It shouldn't be too bad though.
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
1591 | Can you elaborate? What did you test on, do you know why? Is it just magic or is it based on something about the cpu? |
OK.
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
1591 | The positive effect was measured on Spec2000 (code size only, Spec2000 is too large to run on MCU) and Dhrystone (code size and performance). Also tested code size on ULP and code size with performance on Coremark, but there was no effect. |
Hi Eric,
I will manage D18898 patch on behalf of Andrey.
Are you still working on the fix you mentioned above?
include/llvm/Analysis/TargetTransformInfo.h | ||
---|---|---|
298 | Please explain here what this threshold actually does (I imagine that this is the maximum header size). Unfortunately, LoopRotate barely documents what it does, so we might need to improve that in order for this to make sense. | |
lib/Target/X86/X86TargetTransformInfo.cpp | ||
1591 | If this is something that is useful for a target to tune on a per-model basis, then we should add it to the scheduling model (for example, see LoopMicroOpBufferSize in SchedMachineModel and how that's handled). We really should understand what is going on, however. Loop rotation uses CodeMetrics to calculate the effective size, but that cost model (TTI.getUserCost) has received very little attention. Also, does lakemont need any kind of scheduling model? It does not seem to have one at all. |
Please explain here what this threshold actually does (I imagine that this is the maximum header size). Unfortunately, LoopRotate barely documents what it does, so we might need to improve that in order for this to make sense.