Now, getUserCost() only checks the src and dst types of EXT to decide it is free or not. This change first checks the types, then calls isExtFreeImpl(), and check if EXT can form ExtLoad at last. Currently, only AArch64 has customized implementation of isExtFreeImpl() to check if EXT can be folded into its use. This patch just moves code around and calls existing APIs.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
This looks good to me. It will be good if you can share any performance or code size impact with this change?
test/Transforms/Inline/AArch64/ext.ll | ||
---|---|---|
2 ↗ | (On Diff #103394) | You should remove -mcpu=kryo. |
Here is the performance and code size change of SPEC20xx on AArch64.
Benchmark | Performance (+ is better) | Code Size (+ is larger) |
spec2000/gap | +1.15% | 0% |
spec2006/h264ref | -2.55% | 0% |
spec2017/leela | +5.22% | 0% |
spec2017/perlbench | -1.66% | 0% |
spec2017/x264 | +1.70% | 0% |
spec2017/xz | -1.29% | 0% |
The regression in spec2006/h264ref is caused by a fully unrolled loop. The loop has an indrect call inside it. If transforming the indirect call to direct call, the performance can be improved by +10%, and then unrolling this loop can get additional +3% improvement. We are working on it.
include/llvm/Target/TargetLowering.h | ||
---|---|---|
2014 ↗ | (On Diff #104441) | I found the name not descriptive of what it is checking. |
The regression in spec2006/h264ref is caused by a fully unrolled loop. The loop has an indrect call inside it. If transforming the indirect call to direct call, the performance can be improved by +10%, and then unrolling this loop can get additional +3% improvement. We are working on it.
You're planning a change to the unroller? Or something else?
It looks like this has some impact on other targets; do we have any test coverage?
include/llvm/Analysis/TargetTransformInfoImpl.h | ||
---|---|---|
720 ↗ | (On Diff #105259) | Do you need to pass through Arguments here? It looks like getExtCost depends on the operands. |
We plan to work on transforming the indirect call to direct calls, not on the unroller.
It looks like this has some impact on other targets; do we have any test coverage?
I added two more tests for other targets.
Thank you,
Haicheng
It looks like AArch64TargetLowering::isExtFreeImpl has some other special-cases which might be relevant here; could you add a test for shl(sext(x))?
LGTM with one minor comment.
test/Transforms/Inline/AArch64/ext.ll | ||
---|---|---|
256 ↗ | (On Diff #106366) | "zext"? Also, this isn't really testing what you want to test because isZExtFree is true for 32->64. |
llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h | ||
---|---|---|
735 | This change affects not only on ARM targets. You changed cost calculation on X86 and we measure significant performance degradation after this patch. I assume that the cost of Ext instruction should be provided by target. |
Hi Elena,
Could you please share how significant the regressions are? What benchmarks? Any improvements? Any analysis of the regressions? What are differences in IR?
I have examples when regressions happened not because of the original changes but because other passes had bugs or did not support new cases introduced by the changes.
Thanks,
Evgeny
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
167 | At this point we should call to getOperationCost() and ask the target about the cost of Ext for the given type. |
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
167 | Sorry about the performance regression. I think isExtFree(I) above checks isZExtFree() with its types. The part that can affect x86 is isExtLoad(). Do you want me to add a target hook to choose use refined ext cost? |
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
167 | First, I should understand why if the Src is not ExtLoad you return TargetTransformInfo::TCC_Basic. |
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
167 | I think that getTLI()->isExtFree(I) above asks the actual cost of the Ext which is equivalent to calling getOperationCost() since they both check target specific isZExtFree() with type. |
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
167 | getTLI()->isExtFree(I) is a boolean function that returns true or false. But if the Ext is not free you should provide an actual cost and not TCC_Basic. |
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
167 | Do you want me to change this line to return getOperationCost(Operator::getOpcode(I), I->getType(), I->getOperand(0)->getType()); which returns TCC_Basic, TCC_Free, or TCC_Expensive? |
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
167 | I'll try to make this change and measure performance.. |
llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h | ||
---|---|---|
124 | Do you want to make the same change here? |
I dug into the performance degradation and found that isExtLoad() reduces cost of a loop, as a result the loop is being unrolled. Unrolling this loop causes performance degradation.
But the load + zext cost is correct and it is 1 Op. I'll work more with our architects in order to understand what parameter is incorrect on X86.
Thank you very much for clearing things up. Please let me know if you want anything from me.
Haicheng
Do you want to make the same change here?