This is an archive of the discontinued LLVM Phabricator instance.

Changing TargetTransformInfo::getGEPCost to take GetElementPtrInst as parameter
Needs ReviewPublic

Authored by eastig on Mar 21 2017, 5:48 AM.

Details

Summary

This is a patch for RFC: Change TargetTransformInfo::getGEPCost to take GetElementPtrInst as a parameter (http://lists.llvm.org/pipermail/llvm-dev/2017-March/111066.html)

The current signature of TargetTransformInfo::getGEPCost is:

/// \brief Estimate the cost of a GEP operation when lowered.
///
/// The contract for this function is the same as \c getOperationCost except
/// that it supports an interface that provides extra information specific to
/// the GEP operation.
int getGEPCost(Type *PointeeType, const Value *Ptr,
               ArrayRef<const Value *> Operands) const;

I’d like to change it to:

int getGEPCost(const GetElementPtrInst *GEP,  ArrayRef<const Value *> Operands) const;

All uses of the current getGEPCost look like: TTI.getGEPCost(GEP.getSourceElementType(), GEP.getPointerOperand(), …):

lib/Analysis/InlineCost.cpp

TTI.getGEPCost(GEP.getSourceElementType(), GEP.getPointerOperand(),

lib/Transforms/Scalar/NaryReassociate.cpp

return TTI->getGEPCost(GEP->getSourceElementType(), GEP->getPointerOperand(),

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp

return TTI->getGEPCost(GEP->getSourceElementType(), GEP->getPointerOperand(),

Rationale:

In the following IR produced from the code of a simple memcopy function GEPs are not free:

while.cond:                                       ; preds = %while.body, %entry
  %dest.addr.0 = phi i8* [ %dest, %entry ], [ %incdec.ptr1, %while.body ]
  %src.addr.0 = phi i8* [ %src, %entry ], [ %incdec.ptr, %while.body ]
  %tobool = icmp eq i32 %size.addr.0, 0
  br i1 %tobool, label %while.end, label %while.body

while.body:                                       ; preds = %while.cond
  %dec = add nsw i32 %size.addr.0, -1
  %incdec.ptr = getelementptr inbounds i8, i8* %src.addr.0, i32 1
  %0 = load i8, i8* %src.addr.0, align 1, !tbaa !12
  %incdec.ptr1 = getelementptr inbounds i8, i8* %dest.addr.0, i32 1
  store i8 %0, i8* %dest.addr.0, align 1, !tbaa !12
  br label %while.cond

while.end:                                        ; preds = %while.cond

For x86 and ARM they are lowered into ADD instructions. So they are not free but the current getGEPCost returns they are free. E.g., this affects the cost of inlining. The calculated cost is lower than it should be and functions are inlined.
We can do the analysis before the call of getGEPCost but this will require to do it at all places where getGEPCost is called. So it’s better to do this in one place, in the getGEPCost function or its implementations for targets.
To detect this case and other Def-Use based cases GEPs need to be accessed in getGEPCost which is not possible with the current signature.

Event Timeline

eastig created this revision.Mar 21 2017, 5:48 AM
efriedma edited edge metadata.Mar 21 2017, 11:03 AM

In some contexts, we might want to compute the cost of a GEP which doesn't actually exist in IR at the moment. I guess there aren't any callers in the tree like that at the moment, but we use this sort of capability in other contexts, so we want to structure the code to allow it.

Actually, the inliner itself could in theory take advantage of that: if you have a function which returns a GEP, the cost depends on how the caller uses the value. That might be overkill, though, given the tiny effect in most cases.

Along those lines, it probably makes sense to structure the code more like this:

  1. The core cost function for GEPs should keep its existing signature, and just assume the GEP has users which are not memory operations (so it's not free unless all the indices are zero).
  2. We should have a utility function which examines the users of the GEP, and checks if the GEP can be folded into them, using isLegalAddressingMode or something like that.
include/llvm/Analysis/TargetTransformInfoImpl.h
571

Useless assertion/

In some contexts, we might want to compute the cost of a GEP which doesn't actually exist in IR at the moment. I guess there aren't any callers in the tree like that at the moment, but we use this sort of capability in other contexts, so we want to structure the code to allow it.

Actually, the inliner itself could in theory take advantage of that: if you have a function which returns a GEP, the cost depends on how the caller uses the value. That might be overkill, though, given the tiny effect in most cases.

Along those lines, it probably makes sense to structure the code more like this:

  1. The core cost function for GEPs should keep its existing signature, and just assume the GEP has users which are not memory operations (so it's not free unless all the indices are zero).
  2. We should have a utility function which examines the users of the GEP, and checks if the GEP can be folded into them, using isLegalAddressingMode or something like that.

+1

I'll also point out that we're further walking down the path of costing instruction patterns, instead of individual instructions. We should clearly document how we expect this to work. Specifically, we want to have a uniform scheme that we can use without overcounting. We can't for example, have some logic look at an instruction along with its operands and another loop at an instruction along with its users. I think that the approach described, where we look at the users, seems potentially the easiest to make consistent, but regardless, we should make a decision.

jonpa added a subscriber: jonpa.Mar 21 2017, 11:39 PM

In some contexts, we might want to compute the cost of a GEP which doesn't actually exist in IR at the moment. I guess there aren't any callers in the tree like that at the moment, but we use this sort of capability in other contexts, so we want to structure the code to allow it.

I have also been working on improving cost functions by passing the actual instruction pointer when available. With the same argument - that there might not actually be an instruction to pass in a speculative context - I did not replace the existing version, but merely adding the instruction as an optional argument. I am hoping this is the right thing to do. See https://reviews.llvm.org/D29631

Eli, Hal and Jonas, thank you for comments.

I see there are use cases I have not taken into account.

Let me summarize:

  • Requirements:

"get*Cost" functions should try to estimate the cost of an operation when lowered.

  • User stories:
  1. An user wants to estimate the cost of an operation to make a decision whether it's worth to create it. The operation does not exist in IR.
  2. An user has an operation in IR and wants to know the cost of the operation to estimate its contribution into execution.

Actually, the inliner itself could in theory take advantage of that: if you have a function which returns a GEP, the cost depends on how the caller uses the value. That might be overkill, though, given the tiny effect in most cases.

Eli, you wrote: "the cost depends on how the caller uses the value." A question is where the dependency should be taken into account: in get*Cost or in user's place?

Hal, you wrote:

I'll also point out that we're further walking down the path of costing instruction patterns

What do you mean "instruction patterns"?

Taking into account all of these I thin "get*Cost" functions should answer the question: What is the cost of an operation if I want to have it or I have it in DFG/CFG?

Maybe it's time to redesign API?

Jonas, an optional parameter duplicates the information passed through other parameters. It can provide all of the needed information. Also single API for all use cases might create some kind of misunderstanding.

include/llvm/Analysis/TargetTransformInfoImpl.h
571

I prefer to have controlled crashes instead of uncontrolled ones. It has saved debugging time quite often when a pointer was passed through a chain of calls and was dereferenced in somewhere in the middle of the chain. The assert triggered at the beginning of the chain so I didn't need to examine the whole call stack.
Here it's also for the purpose of a contract: GEP must not be null.

User stories:

An user wants to estimate the cost of an operation to make a decision whether it's worth to create it. The operation does not exist in IR.
An user has an operation in IR and wants to know the cost of the operation to estimate its contribution into execution.

I think there is also the case in a vectorizer, where there is an existing instruction but the cost query is for the same instruction *vectorized* with VF.

Jonas, an optional parameter duplicates the information passed through other parameters. It can provide all of the needed information. Also single API for all use cases might create some kind of misunderstanding.

No, not in the case in the vectorizer. Here the scalar instruction is passed, with vector types.

User stories:

An user wants to estimate the cost of an operation to make a decision whether it's worth to create it. The operation does not exist in IR.
An user has an operation in IR and wants to know the cost of the operation to estimate its contribution into execution.

I think there is also the case in a vectorizer, where there is an existing instruction but the cost query is for the same instruction *vectorized* with VF.

Jonas, an optional parameter duplicates the information passed through other parameters. It can provide all of the needed information. Also single API for all use cases might create some kind of misunderstanding.

No, not in the case in the vectorizer. Here the scalar instruction is passed, with vector types.

But that is that the vectorizer normally does. It takes the scalar instruction and emits the same instruction with vector types. You need to explain how any other case is different.

...

Hal, you wrote:

I'll also point out that we're further walking down the path of costing instruction patterns

What do you mean "instruction patterns"?

I mean taking into account folding decisions that will be made by the backend. For example, it might be the case that, for some data type, add(zext(x), zext(x)), will be given a cost of 3 because each zext costs 1 and the add costs 1. These costs are accurate in isolation, however, the target can actually lower this into a single instruction, so the overall cost should be 1 for all three operations. I think that the proposed approach, where we pass each instruction so we can look at the users, probably makes the most sense. We need to be careful, however, that we don't overcount.

Maybe it's time to redesign API?

This may be true. Also, I think that the VPlan work being done (D28975) may be highly relevant to how we do this (we might just want to let the backends cost VPlans directly in cases where the fine-grained modeling won't work).

jonpa added a comment.Mar 23 2017, 7:21 AM
User stories:

An user wants to estimate the cost of an operation to make a decision whether it's worth to create it. The operation does not exist in IR.
An user has an operation in IR and wants to know the cost of the operation to estimate its contribution into execution.

I think there is also the case in a vectorizer, where there is an existing instruction but the cost query is for the same instruction *vectorized* with VF.

Jonas, an optional parameter duplicates the information passed through other parameters. It can provide all of the needed information. Also single API for all use cases might create some kind of misunderstanding.

No, not in the case in the vectorizer. Here the scalar instruction is passed, with vector types.

But that is that the vectorizer normally does. It takes the scalar instruction and emits the same instruction with vector types. You need to explain how any other case is different.

I meant that it is not quite true that it would be enough to just pass the Instruction without types, as a response to the statement that it just duplicates information. Currently LoopVectorizer passes types as arguments to TTI. Passing the instruction to TTI does not provide the Types, it only gives clues of the original instruction plus the possibility to inspect users / operands. This is part of D29632 for SystemZ, which is still under review.

User stories:

An user wants to estimate the cost of an operation to make a decision whether it's worth to create it. The operation does not exist in IR.
An user has an operation in IR and wants to know the cost of the operation to estimate its contribution into execution.

I think there is also the case in a vectorizer, where there is an existing instruction but the cost query is for the same instruction *vectorized* with VF.

Jonas, an optional parameter duplicates the information passed through other parameters. It can provide all of the needed information. Also single API for all use cases might create some kind of misunderstanding.

No, not in the case in the vectorizer. Here the scalar instruction is passed, with vector types.

But that is that the vectorizer normally does. It takes the scalar instruction and emits the same instruction with vector types. You need to explain how any other case is different.

I meant that it is not quite true that it would be enough to just pass the Instruction without types, as a response to the statement that it just duplicates information. Currently LoopVectorizer passes types as arguments to TTI. Passing the instruction to TTI does not provide the Types, it only gives clues of the original instruction plus the possibility to inspect users / operands.

I agree. We can't only have the current instruction.

I found that there is TTI::getUserCost which provides needed functionality by design. So it can be used instead of these changes. I created https://reviews.llvm.org/D33685.