Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.
Details
Diff Detail
Event Timeline
In my tests for spec2000/2006/2017 on aarch64 with -O3, I didn't see any significant performance regressions, and the code size change was minor.
test/Analysis/CostModel/X86/vector_gep.ll | ||
---|---|---|
11 | With this patch, %B is changed to a non-free because it's used in %arrayidx (non-memory operation). It might be possible to continue checking users of the non-memory operation users, but doing this completely must be expensive to be done in getGEPCost. It might be possible to add some simple exceptions, but in this patch I didn't add such checks. |
test/Analysis/CostModel/X86/vector_gep.ll | ||
---|---|---|
11 | Yes, it should be fine to avoid folding together GEPs in getUserCost(). (Arguably, you might want to, but it could get complicated, so okay to skip that for now.) That said, there's something going wrong here. "gep %x, 0, 0" is free because it's just a type conversion. By the same reasoning, "gep %s, zeroinitializer, zeroinitializer" should also be free. |
include/llvm/Analysis/TargetTransformInfo.h | ||
---|---|---|
260–261 | This comment would need to be updated. | |
include/llvm/Analysis/TargetTransformInfoImpl.h | ||
55 | This needs to be updated, or... At a high level, I'm not sure what we want to completely remove the version of the function that can be used without an existing function. Instead, we should add an overload, and then fall back to the existing code where relevant. |
include/llvm/Analysis/TargetTransformInfoImpl.h | ||
---|---|---|
55 | I'm not perfectly clear about this. int getGEPCost(Type *PointeeType, const Value *Ptr, ArrayRef<const Value *> Operands), and add the new one : int getGEPCost(const GEPOperator *GEP, ArrayRef<const Value *> Operands) | |
test/Analysis/CostModel/X86/vector_gep.ll | ||
11 | We might be able to handle zeroinlitializer in hasAllZeroIndices() for ArrayType. Then, I think it should be a separate patch. |
include/llvm/Analysis/TargetTransformInfoImpl.h | ||
---|---|---|
55 | Yes. Then implement this new function so that it calls the instruction-independent version to handle the non-user-based logic. | |
test/Analysis/CostModel/X86/vector_gep.ll | ||
11 | Why a separate patch? This patch is not overly complicated, and if there's a separate patch we'll have a regression in between. If you want to separate the patches, we should have them both before either is committed. But it sounds like a couple lines of code and a few lines of code for some tests. |
Address Hal's comment.
include/llvm/Analysis/TargetTransformInfoImpl.h | ||
---|---|---|
736 | Yes, sext must be an operand of the GEP, not an user. Sorry for the confusion. Remove "sext" from the comment. |
Thank you, Jun, for fixing this.
There is PR33642 with a test case when TCC_FREE is returned for non-cost free GEPs.
The buildbot failure happened when an user of GEPOperator (GEP ConstantExpr) is a constant. Now returns TCC_Basic for ConstantExpr as it require one or two instructions. For example in AArch64, adrp and add instruction is required. The test case added should show the case.
Oh, I see, cast<Operator>(U) will crash in cases where the user is something else, like a global variable.
A ConstantExpr GEP could still be free in certain cases (e.g. on x86-64 a GEP of a global variable could get lowered to a RIP-relative load instruction), but maybe not worth worrying about.
This patch got reverted again, and I wanted to provide some insight as to why...
We saw a pretty dramatic and immediate regression in benchmarks due to failed inlining. One of the benchmarks is open source (Gipfeli) but it can be hard or impossible to reproduce the benchmark rigging that shows the clear issue. But looking at just one benchmark doesn't seem very informative...
The issue is that we saw significant increases in inline cost. This very likely isn't just in one place but is true across the board, and I think I can explain some of why....
I think the real problem here is that we simply can't determine "what will fold" into an addressing mode this early in the pipeline. Looking at users I think is the wrong model, because those users are extremely likely to change as a result of inlining and the rest of the optimization pipeline. While it may be somewhat counter-intuitive that we get better results by *assuming* that a GEP will fold at some point into an addressing mode, currently that seems to be a better assumption than that the users seen when querying the cost are actually the correct users to analyze.
I'm okay reverting this change if the regression caused by this change is significant and prevent many people from testing their compiler. As we go back to the original cost model, we optimistically assume that a GEP will fold into addressing mode later in the pipeline. In other words, the Free from getUserCost for a GEP is "expected to be cheap", not really free. So, don't we need to change the name from "Free" to "Cheap"?
The patch has caused a few regressions on Arm too. They are within 5%. We investigated some of them. For example, the unrolling is not applied. However the unrolling could figure out that some non-free GEPs are removed during optimisation. Maybe other passes have the similar problem. We currently need more precise cost model for inlining at Oz. However we started looking at the recently introduced TTI code size cost if we can use it.
At which optimisation level is the inlining affected? Could we fix the issues by updating the heuristics?
We also observed regressions caused by this change. In our case the change increases the perceived cost of a loop which in turn reduced the unrolling factor. Let me know if you need further details about the regressions.
This comment would need to be updated.