This is an archive of the discontinued LLVM Phabricator instance.

[TTI] Refine the cost of EXT in getUserCost()
ClosedPublic

Authored by haicheng on Jun 21 2017, 9:12 AM.

Details

Summary

Now, getUserCost() only checks the src and dst types of EXT to decide it is free or not. This change first checks the types, then calls isExtFreeImpl(), and check if EXT can form ExtLoad at last. Currently, only AArch64 has customized implementation of isExtFreeImpl() to check if EXT can be folded into its use. This patch just moves code around and calls existing APIs.

Diff Detail

Repository
rL LLVM

Event Timeline

haicheng created this revision.Jun 21 2017, 9:12 AM
junbuml edited edge metadata.Jun 27 2017, 8:29 AM

This looks good to me. It will be good if you can share any performance or code size impact with this change?

test/Transforms/Inline/AArch64/ext.ll
2 ↗(On Diff #103394)

You should remove -mcpu=kryo.

This looks good to me. It will be good if you can share any performance or code size impact with this change?

Here is the performance and code size change of SPEC20xx on AArch64.

BenchmarkPerformance (+ is better)Code Size (+ is larger)
spec2000/gap+1.15%0%
spec2006/h264ref-2.55%0%
spec2017/leela+5.22%0%
spec2017/perlbench-1.66%0%
spec2017/x264+1.70%0%
spec2017/xz-1.29%0%

The regression in spec2006/h264ref is caused by a fully unrolled loop. The loop has an indrect call inside it. If transforming the indirect call to direct call, the performance can be improved by +10%, and then unrolling this loop can get additional +3% improvement. We are working on it.

haicheng updated this revision to Diff 104441.Jun 28 2017, 9:39 AM
haicheng marked an inline comment as done.
haicheng added a subscriber: mssimpso.

Update the test case.

qcolombet requested changes to this revision.Jun 30 2017, 2:16 PM
qcolombet added inline comments.
include/llvm/Target/TargetLowering.h
2014 ↗(On Diff #104441)

I found the name not descriptive of what it is checking.
At the very least, add a comment on what this method is doing.

This revision now requires changes to proceed.Jun 30 2017, 2:16 PM
haicheng updated this revision to Diff 104955.Jun 30 2017, 4:51 PM
haicheng edited edge metadata.

Add a comment and rename some argument names. Thank you, Quentin.

haicheng updated this revision to Diff 105259.Jul 5 2017, 6:13 AM

Edit the comment.

Kindly ping.

efriedma edited edge metadata.Jul 12 2017, 11:21 AM

The regression in spec2006/h264ref is caused by a fully unrolled loop. The loop has an indrect call inside it. If transforming the indirect call to direct call, the performance can be improved by +10%, and then unrolling this loop can get additional +3% improvement. We are working on it.

You're planning a change to the unroller? Or something else?


It looks like this has some impact on other targets; do we have any test coverage?

include/llvm/Analysis/TargetTransformInfoImpl.h
720 ↗(On Diff #105259)

Do you need to pass through Arguments here? It looks like getExtCost depends on the operands.

haicheng updated this revision to Diff 106356.Jul 12 2017, 5:23 PM

Add tests for other targets. Pass through operands argument.

haicheng marked an inline comment as done.Jul 12 2017, 5:27 PM

The regression in spec2006/h264ref is caused by a fully unrolled loop. The loop has an indrect call inside it. If transforming the indirect call to direct call, the performance can be improved by +10%, and then unrolling this loop can get additional +3% improvement. We are working on it.

You're planning a change to the unroller? Or something else?

We plan to work on transforming the indirect call to direct calls, not on the unroller.


It looks like this has some impact on other targets; do we have any test coverage?

I added two more tests for other targets.

Thank you,

Haicheng

It looks like AArch64TargetLowering::isExtFreeImpl has some other special-cases which might be relevant here; could you add a test for shl(sext(x))?

haicheng updated this revision to Diff 106366.Jul 12 2017, 6:43 PM

Add two test cases for shl(ext(x)).

efriedma accepted this revision.Jul 14 2017, 2:35 PM

LGTM with one minor comment.

test/Transforms/Inline/AArch64/ext.ll
256 ↗(On Diff #106366)

"zext"? Also, this isn't really testing what you want to test because isZExtFree is true for 32->64.

haicheng updated this revision to Diff 106731.Jul 14 2017, 4:36 PM

Remove an inappropriate test case.

haicheng marked an inline comment as done.Jul 14 2017, 4:37 PM
This revision was automatically updated to reflect the committed changes.
delena added a subscriber: delena.Jul 20 2017, 3:02 AM
delena added inline comments.
llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h
735

This change affects not only on ARM targets. You changed cost calculation on X86 and we measure significant performance degradation after this patch. I assume that the cost of Ext instruction should be provided by target.

I assume this patch should be reverted and reviewed again.

I assume this patch should be reverted and reviewed again.

Hi Elena,

Could you please share how significant the regressions are? What benchmarks? Any improvements? Any analysis of the regressions? What are differences in IR?

I have examples when regressions happened not because of the original changes but because other passes had bugs or did not support new cases introduced by the changes.

Thanks,
Evgeny

delena added inline comments.Jul 20 2017, 5:31 AM
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
167

At this point we should call to getOperationCost() and ask the target about the cost of Ext for the given type.

haicheng added inline comments.Jul 20 2017, 6:08 AM
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
167

Sorry about the performance regression.

I think isExtFree(I) above checks isZExtFree() with its types. The part that can affect x86 is isExtLoad(). Do you want me to add a target hook to choose use refined ext cost?

delena added inline comments.Jul 20 2017, 6:21 AM
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
167

First, I should understand why if the Src is not ExtLoad you return TargetTransformInfo::TCC_Basic.
Why you are not asking the target about the actual cost of the Ext?

haicheng added inline comments.Jul 20 2017, 6:28 AM
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
167

I think that getTLI()->isExtFree(I) above asks the actual cost of the Ext which is equivalent to calling getOperationCost() since they both check target specific isZExtFree() with type.

delena added inline comments.Jul 20 2017, 6:35 AM
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
167

getTLI()->isExtFree(I) is a boolean function that returns true or false. But if the Ext is not free you should provide an actual cost and not TCC_Basic.

haicheng added inline comments.Jul 20 2017, 6:50 AM
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
167

Do you want me to change this line to

return getOperationCost(Operator::getOpcode(I), I->getType(), I->getOperand(0)->getType());

which returns TCC_Basic, TCC_Free, or TCC_Expensive?

delena added inline comments.Jul 20 2017, 10:16 AM
llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h
167

I'll try to make this change and measure performance..

haicheng added inline comments.Jul 20 2017, 10:21 AM
llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h
124

Do you want to make the same change here?

I dug into the performance degradation and found that isExtLoad() reduces cost of a loop, as a result the loop is being unrolled. Unrolling this loop causes performance degradation.
But the load + zext cost is correct and it is 1 Op. I'll work more with our architects in order to understand what parameter is incorrect on X86.

I dug into the performance degradation and found that isExtLoad() reduces cost of a loop, as a result the loop is being unrolled. Unrolling this loop causes performance degradation.
But the load + zext cost is correct and it is 1 Op. I'll work more with our architects in order to understand what parameter is incorrect on X86.

Thank you very much for clearing things up. Please let me know if you want anything from me.

Haicheng