This is an archive of the discontinued LLVM Phabricator instance.

[TargetTransformInfo] Add a new public interface getInstructionCost
ClosedPublic

Authored by Carrot on Aug 25 2017, 4:33 PM.

Details

Summary

Current TargetTransformInfo can support throughput cost model, but sometimes we also need instruction latency and code size cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface:

enum TargetCostKind {
  TCK_Throughput, ///< The traditional cost model, it actually
                  ///< means reciprocal throughputs.
  TCK_Latency,    ///< The latency of instruction.
  TCK_CodeSize    ///< Instruction code size.
};

int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const;

All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model.

This patch also provides a simple default implementation of getInstructionSize and getInstructionLatency.

The default implementation of getInstructionSize returns 4 directly, it is enough for many RISC processors. But variable encoding targets should provide their own implementations.

The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways:

  • Add more detail into this function.
  • Add getXXXLatency function and call it from here.
  • Implement target specific getInstructionLatency function.

Diff Detail

Repository
rL LLVM

Event Timeline

Carrot created this revision.Aug 25 2017, 4:33 PM
hfinkel added inline comments.Aug 31 2017, 9:34 AM
include/llvm/Analysis/TargetTransformInfo.h
115 ↗(On Diff #112755)

throughputs -> throughput

127 ↗(On Diff #112755)

This is good, but this is the wrong function. getUserCost is what the inliner uses, and could be viewed as returning normalized code size (right now), not throughput.

If we want to have a single interface like this for throughput, which just takes an arbitrary instruction, then we can, but we need to pull the code out of CostModelAnalysis::getInstructionCost (in lib/Analysis/CostModel.cpp). I'd be highly supportive of doing this (currently that code lives in the analysis pass because that's how we test the code-model interface, but it's not otherwise reusable, and that's not great).

Then we need to decide how to integrate everything. One option is to add this TargetCostKind kind parameter to the existing throughput cost-model interface. Another option is to only have this parameter on this generic function taking an Instruction* and leave everything else alone. Or we leave all of this alone and just add new interfaces for now providing latency and size (which we do by renaming the 'user cost' function).

Carrot updated this revision to Diff 113616.Sep 1 2017, 3:41 PM
Carrot marked 2 inline comments as done.
Carrot added inline comments.
include/llvm/Analysis/TargetTransformInfo.h
127 ↗(On Diff #112755)

Thank you for the good suggestion.

I moved the code from CostModelAnalysis::getInstructionCost to TargetTransformInfo::getInstructionThroughput.

I chose the following interface function.
int getInstructionCost(const Instruction *I, enum TargetCostKind kind).

hfinkel edited edge metadata.Sep 2 2017, 11:51 AM

This is looking good. A few more comments.

You can add an implementation of getInstructionLatency to include/llvm/CodeGen/BasicTTIImpl.h something like:

int getInstructionLatency(const Instruction *I) {
  if (isa<LoadInst>(I))
    return getST()->.getSchedModel().DefaultLoadLatency;

  return BaseT::getInstructionLatency(I);
}

The default value of DefaultLoadLatency is also 4, but I'd like this to happen before we end up with any actual users of this interface, plus...

The other thing that needs to happen is that we need some way of testing this. Can you please add a command-line option to the CostModel analysis what allows us to switch which cost model is being queried (the default can still be the reciprocal throughput model). Then you can add a couple of simple tests for the latency model (and code-size model) as well. This will provide us with an independent way of testing the model (and that's important).

include/llvm/Analysis/TargetTransformInfo.h
112 ↗(On Diff #113616)

Please replace with something like:

/// There are several different cost models that can be customized by the target. The normalization of each cost model may be target specific.
114 ↗(On Diff #113616)

I'd remove the "The traditional cost model, it actually means", because both this model and the "user cost" model could be considered traditional (arguably, the latter is actually older).

114 ↗(On Diff #113616)

I'd prefer this be named TCK_RecipThroughput, or maybe just TCK_RThroughput, to make it clear what it means.

123 ↗(On Diff #113616)
/// Clients should use this interface to query the cost of an existing instruction. The instruction must have a valid parent (basic block).
136 ↗(On Diff #113616)

llvm_unreachable("Unknown instruction cost kind");

882 ↗(On Diff #113616)

Move the note about expense to the comment on getInstructionCost. This is true for the two existing cost models, and may become true for the latency estimate as well.

Carrot updated this revision to Diff 114096.Sep 6 2017, 4:39 PM
Carrot marked 6 inline comments as done.
Carrot added a subscriber: echristo.

Add a simple test case to make sure the interface works correctly.

hfinkel accepted this revision.Sep 6 2017, 6:10 PM

LGTM

See below, however, because we might want to document that the "code size" model could mean number of uops. On Xeon x86, for example, that's what the unroller wants to measure (at least for small loops). We may at some point want to split this into a uops and an actual code size, but we don't have an in-tree use case for that right now.

I should mention, however, that there will be some issues with the "code size" model, because right now, it's not strictly measuring code size. Right now, it's mostly setup to return things based TCC_Free/TCC_Basic/TCC_Expensive. Right now this is going to be fine, essentially, because TCC_Free is 0 and TCC_Basic is 1. TCC_Expensive is 4, however, which does not make sense as a value. Some auditing reveals:

  1. The default code in include/llvm/Analysis/TargetTransformInfoImpl.h has:
case Instruction::FDiv:
case Instruction::FRem:
case Instruction::SDiv:
case Instruction::SRem:
case Instruction::UDiv:
case Instruction::URem:
  return TTI::TCC_Expensive;

and that does not make sense for code size. Moreover, as a "expense value", it's too low for division nearly everywhere (the reciprocal throughput and latency for division is generally closer to 10-20, not 4). However, we can't decrease this value until we stop using these values in the loop unroller to decide on unrolling factors (on x86, we're unrolling in many cases to fill the LSD's uop cache, so we don't want to unroll too much there). Unless, we say that code size might be in terms of number of uops, not just the size of the encoding. The other places we return TCC_Expensive is for function calls (or think we might turn into function calls). For these, I suppose 4 is an okay code-size estimate, if we needed to pick a random number.

  1. This is code in lib/Analysis/InlineCost.cpp that does this:
// If the instruction is floating point, and the target says this operation
// is expensive or the function has the "use-soft-float" attribute, this may
// eventually become a library call. Treat the cost as such.
if (I->getType()->isFloatingPointTy()) {
  // If the function has the "use-soft-float" attribute, mark it as
  // expensive.
  if (TTI.getFPOpCost(I->getType()) == TargetTransformInfo::TCC_Expensive ||
      (F.getFnAttribute("use-soft-float").getValueAsString() == "true"))
    Cost += InlineConstants::CallPenalty;
}

This code that does an equality check against TCC_Expensive should probably have a >= instead.

And there are a number of other minor things (i.e. the speculation thresholds in SimplifyCFG) we'll run into as we clean this up.

However, just adding this interface won't change the current behavior of anything, so I'm fine with moving ahead with this for now.

include/llvm/Analysis/TargetTransformInfo.h
881 ↗(On Diff #114096)

What does this return if the result is unknown? This should also be documented.

This revision is now accepted and ready to land.Sep 6 2017, 6:10 PM
Carrot updated this revision to Diff 114211.Sep 7 2017, 11:10 AM
Carrot marked an inline comment as done.

Thanks for the detail explanation of current "code size" model.
I will commit this patch.

This revision was automatically updated to reflect the committed changes.