This is an archive of the discontinued LLVM Phabricator instance.

[TTI] Add hook for contextual cast estimates
AbandonedPublic

Authored by mssimpso on Dec 29 2015, 1:42 PM.

Details

Summary

This change adds a new hook for estimating the cost of cast instructions within
a given context. The hook differs from the existing context-free hook in that
clients can provide information about the cast's known operand or user.

The change is motivated by sign extensions, which on AArch64 can often be
folded into other instructions and are free. For example, a vector extract
followed by a sign extension can be performed by smov. And a vector subtraction
whose operands are both vector sign extensions can be performed by ssubl. And
so on.

While context is generally not a desirable feature to include in the cost
model, the fact that these extensions are often free may make them special
enough to warrant the added complexity.

The default implementation of the hook uses the context-free estimate. For
AArch64, I've added logic to cover the extract-extend pattern and the
lengthening and widening operations I mentioned above. The intent is to quickly
and simply identify the cases in which the cast is free, and to fall back on
the default implementation otherwise.

For an initial client, I've modified the cast estimates in the SLP vectorizer
to provide the additional context when known and thought to be useful.

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 43759.Dec 29 2015, 1:42 PM
mssimpso retitled this revision from to [TTI] Add hook for contextual cast estimates.
mssimpso updated this object.
mssimpso added reviewers: jmolloy, sbaranga, hfinkel.
mssimpso added subscribers: mcrosier, llvm-commits.
jmolloy requested changes to this revision.Jan 6 2016, 6:37 AM
jmolloy edited edge metadata.
jmolloy added inline comments.
include/llvm/Analysis/TargetTransformInfo.h
475

I feel that this structure should contain EITHER an anonymous opcode (as it currently does) OR a Value*.

I think it is not worthwhile to have the user reencode their use-def graph for the purposes of passing as context. What we need to be able to say is "I'm going to insert $OPCODE and it's going to use these values". Some of the values may be yet to be inserted - so these could be anonymous too. But encoding the entire use-def graph to an arbitrary depth sounds like wasted effort. The code below for AArch64 seems to look quite deep at the graph too.

502

I'd prefer if an optional Ctx argument were added to existing methods.

lib/Target/AArch64/AArch64TargetTransformInfo.cpp
468

If you expect Dst, you must use cast<> above instead of dyn_cast_or_null. Similarly in all the other place you use dyn_cast_or_null, I don't think a nullptr argument is *actually* allowed.

This revision now requires changes to proceed.Jan 6 2016, 6:37 AM
mssimpso updated this revision to Diff 44277.Jan 7 2016, 3:43 PM
mssimpso edited edge metadata.

Addressed James's comments

Thanks again for the review James! I took a shot at updating the patch and also added some responses inline.

mssimpso marked 2 inline comments as done.Jan 7 2016, 3:46 PM
mssimpso added inline comments.
include/llvm/Analysis/TargetTransformInfo.h
475

I agree this seems overly complex. However, I'm having difficulty coming up with a better alternative.

For the scalar cost estimates (where we're getting an estimate for an instruction that already exists), I definitely see the advantage of just looking at the existing Values. We wouldn't have to re-encode the use-def information like you mention. But I think this gets trickier for the vector estimates. For example, considering the extract-extend case (where we sign extend the value we extract from the vector), we would need to create an unlinked instruction for the extract if I understand correctly. But what would it's operand be? We haven't decided to vectorize anything yet. And if we eventually chose not to vectorize, we would have to go back and delete the unlinked instructions we create. I'm definitely open to suggestions here, and any more thoughts you might have would be very helpful!

502

I'm fine doing this. I've added an optional Ctx argument to the existing getCastInstrCost hook like you suggested. The drawback of course is that we have to modify the other targets (and we might cause some minor annoyance for the out-of-tree target maintainers).

mssimpso updated this revision to Diff 44360.Jan 8 2016, 12:16 PM
mssimpso edited edge metadata.
mssimpso marked an inline comment as done.

Rebased.

hfinkel added inline comments.Feb 2 2016, 4:39 PM
include/llvm/Analysis/TargetTransformInfo.h
475

We definitely need this kind of functionality, but I share James's uncomfortableness with forcing the user to reencode some arbitrary portion of the use/def graph. Part of the problem is that the user of the interface has no idea how much of the graph would be useful to encode. Plus, the things you're reencoding into are essentially instructions that aren't instructions. While we generally have a policy against creating temporary instructions, getting around that by reinventing instructions seems unsatisfying.

What if we defined an abstract graph-traversal class that could be used by a TTI implementation to walk a (mutated) view of the existing SSA use/def graph.

struct InstrContextView {
  struct InstrContext {
     Type *Ty;
     unsigned Opcode;
     unsigned NumOperands;
  };

  // Call this to get the initial instruction context (the starting point of the view on the use/def chain)
  virtual InstrContext *getInstrContext() = 0;
  virtual InstrContext *getOperandInstrContext(const InstrContext *, unsigned OpNum) = 0;
  virtual unsigned getNumInstrContextUsers(const InstrContext *) = 0;
  virtual InstrContext *getUserInstrContext(const InstrContext *, unsigned UserNum) = 0;
};

Then the TTI interfaces take some derived class of InstrContextView, which holds the necessary state to provide whatever view is possible given the caller's knowledge of the potential IR. In this way, the creation of the InstrContext objects can be completely lazy.

The downside is that, because the traversal functions need to be virtual, walking large portions of the graph this was is not feasible. Luckily, I think we never want to do that.

Hal,

Thanks very much for the suggestions, and I apologize for the delayed response. I agree the initial approach is unsatisfying. There is definitely a balance to be struck here between providing context to the cost model and having to unnecessarily re-encode information. The question of how much context to provide is also problematic. Although not a high priority for me at the moment, I do plan to evaluate your idea soon. So thanks again for the feedback!

anemet added a subscriber: anemet.Mar 11 2016, 9:51 AM
mssimpso updated this revision to Diff 50469.Mar 11 2016, 1:05 PM

Rebasing for evaluation only. The patch still needs to be updated to address reviewer comments.

mssimpso abandoned this revision.Apr 26 2016, 8:11 AM

I'm abandoning this change in favor of D18523. But I hope we can eventually revisit the idea of contextualizing the cost model estimates if needed.