This is an archive of the discontinued LLVM Phabricator instance.

Use the basic cost if a GEP is not used as addressing mode
ClosedPublic

Authored by junbuml on Sep 20 2017, 8:58 AM.

Details

Summary

Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.

Diff Detail

Event Timeline

junbuml created this revision.Sep 20 2017, 8:58 AM

In my tests for spec2000/2006/2017 on aarch64 with -O3, I didn't see any significant performance regressions, and the code size change was minor.

test/Analysis/CostModel/X86/vector_gep.ll
11

With this patch, %B is changed to a non-free because it's used in %arrayidx (non-memory operation). It might be possible to continue checking users of the non-memory operation users, but doing this completely must be expensive to be done in getGEPCost. It might be possible to add some simple exceptions, but in this patch I didn't add such checks.

efriedma added inline comments.Sep 20 2017, 12:03 PM
test/Analysis/CostModel/X86/vector_gep.ll
11

Yes, it should be fine to avoid folding together GEPs in getUserCost(). (Arguably, you might want to, but it could get complicated, so okay to skip that for now.)

That said, there's something going wrong here. "gep %x, 0, 0" is free because it's just a type conversion. By the same reasoning, "gep %s, zeroinitializer, zeroinitializer" should also be free.

hfinkel added inline comments.Sep 20 2017, 12:12 PM
include/llvm/Analysis/TargetTransformInfo.h
260–261

This comment would need to be updated.

include/llvm/Analysis/TargetTransformInfoImpl.h
55

This needs to be updated, or...

At a high level, I'm not sure what we want to completely remove the version of the function that can be used without an existing function. Instead, we should add an overload, and then fall back to the existing code where relevant.

junbuml added inline comments.Sep 20 2017, 2:37 PM
include/llvm/Analysis/TargetTransformInfoImpl.h
55

I'm not perfectly clear about this.
Do you think we should keep the existing getGEPCost as it is :

int getGEPCost(Type *PointeeType, const Value *Ptr, ArrayRef<const Value *> Operands),

and add the new one :

int getGEPCost(const GEPOperator *GEP, ArrayRef<const Value *> Operands)
test/Analysis/CostModel/X86/vector_gep.ll
11

We might be able to handle zeroinlitializer in hasAllZeroIndices() for ArrayType. Then, I think it should be a separate patch.

hfinkel added inline comments.Sep 20 2017, 3:02 PM
include/llvm/Analysis/TargetTransformInfoImpl.h
55

Yes. Then implement this new function so that it calls the instruction-independent version to handle the non-user-based logic.

test/Analysis/CostModel/X86/vector_gep.ll
11

Why a separate patch? This patch is not overly complicated, and if there's a separate patch we'll have a regression in between.

If you want to separate the patches, we should have them both before either is committed. But it sounds like a couple lines of code and a few lines of code for some tests.

junbuml updated this revision to Diff 116209.Sep 21 2017, 9:43 AM
junbuml marked 3 inline comments as done.

Addressed comments from Hal and Eli.

Kindly ping.

hfinkel added inline comments.Sep 27 2017, 11:25 AM
include/llvm/Analysis/TargetTransformInfoImpl.h
738

togetehr -> together

740

I don't understand this comment. If there's a sext/zext, then it would be an operand of the GEP, not the other way around, no?

junbuml updated this revision to Diff 116854.Sep 27 2017, 12:03 PM
junbuml marked an inline comment as done.

Address Hal's comment.

include/llvm/Analysis/TargetTransformInfoImpl.h
740

Yes, sext must be an operand of the GEP, not an user. Sorry for the confusion. Remove "sext" from the comment.

This revision is now accepted and ready to land.Sep 27 2017, 12:23 PM
junbuml updated this revision to Diff 117144.Sep 29 2017, 7:51 AM

Minor change in spacing. Thanks for the review !

This revision was automatically updated to reflect the committed changes.

Reverted as r314560, it crashes sanitizer bots.

vitalybuka reopened this revision.Sep 29 2017, 3:30 PM
This revision is now accepted and ready to land.Sep 29 2017, 3:30 PM
eastig added a subscriber: eastig.Oct 3 2017, 5:34 AM
eastig added a comment.Oct 3 2017, 5:53 AM

Thank you, Jun, for fixing this.
There is PR33642 with a test case when TCC_FREE is returned for non-cost free GEPs.

junbuml updated this revision to Diff 117572.Oct 3 2017, 1:26 PM

The buildbot failure happened when an user of GEPOperator (GEP ConstantExpr) is a constant. Now returns TCC_Basic for ConstantExpr as it require one or two instructions. For example in AArch64, adrp and add instruction is required. The test case added should show the case.

efriedma accepted this revision.Oct 3 2017, 2:01 PM

Oh, I see, cast<Operator>(U) will crash in cases where the user is something else, like a global variable.

A ConstantExpr GEP could still be free in certain cases (e.g. on x86-64 a GEP of a global variable could get lowered to a RIP-relative load instruction), but maybe not worth worrying about.

junbuml closed this revision.Oct 4 2017, 11:37 AM

Recommitted in r314923. Thanks for the review.

This patch got reverted again, and I wanted to provide some insight as to why...

We saw a pretty dramatic and immediate regression in benchmarks due to failed inlining. One of the benchmarks is open source (Gipfeli) but it can be hard or impossible to reproduce the benchmark rigging that shows the clear issue. But looking at just one benchmark doesn't seem very informative...

The issue is that we saw significant increases in inline cost. This very likely isn't just in one place but is true across the board, and I think I can explain some of why....

I think the real problem here is that we simply can't determine "what will fold" into an addressing mode this early in the pipeline. Looking at users I think is the wrong model, because those users are extremely likely to change as a result of inlining and the rest of the optimization pipeline. While it may be somewhat counter-intuitive that we get better results by *assuming* that a GEP will fold at some point into an addressing mode, currently that seems to be a better assumption than that the users seen when querying the cost are actually the correct users to analyze.

I'm okay reverting this change if the regression caused by this change is significant and prevent many people from testing their compiler. As we go back to the original cost model, we optimistically assume that a GEP will fold into addressing mode later in the pipeline. In other words, the Free from getUserCost for a GEP is "expected to be cheap", not really free. So, don't we need to change the name from "Free" to "Cheap"?

The patch has caused a few regressions on Arm too. They are within 5%. We investigated some of them. For example, the unrolling is not applied. However the unrolling could figure out that some non-free GEPs are removed during optimisation. Maybe other passes have the similar problem. We currently need more precise cost model for inlining at Oz. However we started looking at the recently introduced TTI code size cost if we can use it.
At which optimisation level is the inlining affected? Could we fix the issues by updating the heuristics?

We also observed regressions caused by this change. In our case the change increases the perceived cost of a loop which in turn reduced the unrolling factor. Let me know if you need further details about the regressions.