This is an archive of the discontinued LLVM Phabricator instance.

Moving isComplex decision to TTI
ClosedPublic

Authored by magabari on Dec 7 2016, 6:05 AM.

Details

Summary

Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be "complex" and need some extra cost for address computation handling.

This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by proposing a new interface method in the TTI interface.

Specifically at X86 targets we dont see any significant address computation cost in case if the access is strided.

Diff Detail

Repository
rL LLVM

Event Timeline

magabari updated this revision to Diff 80582.Dec 7 2016, 6:05 AM
magabari retitled this revision from to Moving isComplex decision to TTI.
magabari updated this object.
magabari added a subscriber: llvm-commits.
mkuper edited edge metadata.Dec 7 2016, 1:32 PM

I have some comments on the patch, but on a higher level, I'm not sure this is the right way to do this.
The only place that'll call this new TTI hook will be isLikelyComplexAddressComputation(), and the only thing we do with the result of isLikelyComplexAddressComputation() is feed it into a different TTI hook - TTI.getAddressComputationCost().

Perhaps we can enhance getAddressComputationCost() instead of adding another hook?

include/llvm/Analysis/TargetTransformInfo.h
210 ↗(On Diff #80582)

Extra space.

211 ↗(On Diff #80582)

Maybe "Expensive" makes more sense than "Complex" here?

lib/Target/X86/X86TargetTransformInfo.cpp
1480 ↗(On Diff #80582)

What do you mean by "we don't see any significant address computation cost"? Can you give an example?

lib/Transforms/Vectorize/LoopVectorize.cpp
6618 ↗(On Diff #80582)

NULL -> nullptr
Also, this looks a bit dirty. Maybe use Optional<>?

zvi added a subscriber: zvi.Dec 8 2016, 12:17 PM
Ayal edited edge metadata.Dec 9 2016, 1:30 PM

I have some comments on the patch, but on a higher level, I'm not sure this is the right way to do this.
The only place that'll call this new TTI hook will be isLikelyComplexAddressComputation(), and the only thing we do with the result of isLikelyComplexAddressComputation() is feed it into a different TTI hook - TTI.getAddressComputationCost().

Perhaps we can enhance getAddressComputationCost() instead of adding another hook?

Yes we can, by passing it the (possibly Optional<>) APStrideVal instead of the current boolean "IsComplexComputation", and folding the "how expensive are certain stride values" logic from isLikelyComplexAddressComputation() down to getAddressComputationCost(). In any case this logic should best be target dependent, right?

mkuper added a comment.Dec 9 2016, 1:38 PM
In D27518#618646, @Ayal wrote:

I have some comments on the patch, but on a higher level, I'm not sure this is the right way to do this.
The only place that'll call this new TTI hook will be isLikelyComplexAddressComputation(), and the only thing we do with the result of isLikelyComplexAddressComputation() is feed it into a different TTI hook - TTI.getAddressComputationCost().

Perhaps we can enhance getAddressComputationCost() instead of adding another hook?

Yes we can, by passing it the (possibly Optional<>) APStrideVal instead of the current boolean "IsComplexComputation", and folding the "how expensive are certain stride values" logic from isLikelyComplexAddressComputation() down to getAddressComputationCost().

Yes, I think that would be a better design. (Disagreement is welcome. :-) )

In any case this logic should best be target dependent, right?

I think so - I'm not familiar with addressing modes of other targets, but I think it makes a lot of sense for the logic to be target dependent.

magabari updated this revision to Diff 81212.Dec 13 2016, 4:26 AM
magabari edited edge metadata.

updating patch following to Michael comments.

magabari updated this revision to Diff 81214.EditedDec 13 2016, 5:18 AM

clarified comment of X86TTI::isExpensiveAddressComputation

delena added inline comments.Dec 13 2016, 9:11 AM
include/llvm/Analysis/TargetTransformInfo.h
606 ↗(On Diff #81214)

Why do you need a struct? "Value *" will contain everything.

Value *Stride;

if (Stride == nullptr) - no stride,
dyn_cast<ConstantInt>(Stride) - answers the question isConstant()

mkuper added inline comments.Dec 13 2016, 11:33 AM
include/llvm/Analysis/TargetTransformInfo.h
606 ↗(On Diff #81214)

IIUC, we don't necessarily have a Value* in hand, this originates from a SCEV, right?

lib/Target/X86/X86TargetTransformInfo.cpp
1482 ↗(On Diff #81214)

nor -> not

lib/Transforms/Vectorize/LoopVectorize.cpp
6782 ↗(On Diff #81214)

Extra spaces after the line.
(On the line below too.)

6782 ↗(On Diff #81214)

You don't seem to initialize the fields.

7046 ↗(On Diff #81214)

Please elaborate about what you actually get, rather than just "some info"
(this is really "Figure out whether the access is strided and what the stride is..." or something like that).

delena added inline comments.Dec 13 2016, 12:13 PM
include/llvm/Analysis/TargetTransformInfo.h
606 ↗(On Diff #81214)

yes. We can use const SCEV*.

mkuper added inline comments.Dec 13 2016, 12:39 PM
include/llvm/Analysis/TargetTransformInfo.h
606 ↗(On Diff #81214)

I'm not sure passing a SCEV* to TTI makes sense, in terms of layering.

hfinkel added inline comments.
include/llvm/Analysis/TargetTransformInfo.h
606 ↗(On Diff #81214)

I don't see why that should be a problem. The backends already depend on SCEV (and the rest of analysis) because of IR-level passes. TTI should definitely use SCEV where appropriate.

mkuper added inline comments.Dec 13 2016, 12:53 PM
include/llvm/Analysis/TargetTransformInfo.h
606 ↗(On Diff #81214)

Never mind me, then. SCEV it is.

magabari marked 3 inline comments as done.Dec 15 2016, 1:27 AM

fixed some comments.
In my opinion we should still keep TTI as a lightweight interface which is not depended on SCEV.
So i think that passing struct which can hold all the properties still can be better approach. Even for future enhancements for that function.
please tell me what do you think.

include/llvm/Analysis/TargetTransformInfo.h
606 ↗(On Diff #81214)

passing only Value* has some cons and doesn't solve the general problem.

  1. it depends on SCEV
  2. currently we differentiate between strided and not strided access, but i think that there is more than those 2 cases so i thought creating some struct will be better approach
lib/Transforms/Vectorize/LoopVectorize.cpp
6782 ↗(On Diff #81214)

add initialize for isConstant field too. last field is undefined in case it's not constant stride

7046 ↗(On Diff #81214)

fixed

magabari updated this revision to Diff 81550.Dec 15 2016, 2:00 AM

fixed some comments.
In my opinion we should still keep TTI as a lightweight interface which is not depended on SCEV.
So i think that passing struct which can hold all the properties still can be better approach. Even for future enhancements for that function.
please tell me what do you think.

If all of the information that the backend might reasonably need can be summarized in a structure, then I agree with you that using a structure is preferred. TTI is already not necessarily a "lightweight" interface (to compute unrolling preferences, for example, we just hand the backend the loop itself because there's no reasonable way that we can summarize what the backend might need).

mkuper added inline comments.Dec 15 2016, 9:39 AM
include/llvm/Analysis/TargetTransformInfo.h
606 ↗(On Diff #81214)

I agree a Value* is not appropriate.
I can live with either a struct or SCEV.

include/llvm/CodeGen/BasicTTIImpl.h
921 ↗(On Diff #81550)

Assuming this stays a struct - could you change all the "Info" variables and arguments to "AddressInfo" or something else that's meaningful?

lib/Target/X86/X86TargetTransformInfo.cpp
1483 ↗(On Diff #81550)

Maybe it would be better to have the options "0, 1, 10" instead of "0, 10, where the 0 case can hide an add that we ignore"?
I'm saying "maybe" because, as imprecise as this already is, fixing that may just be noise. On the other hand, I'm loathe to leave it as a TODO, because I think that'd require another API change.

lib/Transforms/Vectorize/LoopVectorize.cpp
6782 ↗(On Diff #81214)

Please don't leave undefined fields.

fixed issues pointed by Michael

include/llvm/Analysis/TargetTransformInfo.h
606 ↗(On Diff #81214)

SCEV can't replace the struct, since you still need more info.
nullptr -> means default behavior
struct->isStrided = false -> means after check it is not strided at all
struct->isStrided = true -> you can look on the stride info or SCEV
but SCEV can be part of the struct if needed.
So in case there is no objections I prefer to go with struct which will contain bool and SCEV.

include/llvm/CodeGen/BasicTTIImpl.h
921 ↗(On Diff #81550)

fixed

lib/Target/X86/X86TargetTransformInfo.cpp
1483 ↗(On Diff #81550)

fixed

lib/Transforms/Vectorize/LoopVectorize.cpp
6782 ↗(On Diff #81214)

fixed

magabari updated this revision to Diff 81921.Dec 19 2016, 12:04 AM
delena added inline comments.Dec 19 2016, 12:20 AM
include/llvm/Analysis/TargetTransformInfo.h
609 ↗(On Diff #81921)

Please add a cosntructor here.

613 ↗(On Diff #81921)

I suggest to shorten these 2 functions:
bool isConstantStridedAccess() const {

   return (!Strided || !dyn_cast<SCEVConstant>(Step));
}

bool isConstantStridedAccessLessThan(int64_t MergeDistance) const {

  if (!isConstantStridedAccess())
    return false;
  APInt StrideVal = cast<SCEVConstant>(C)->getAPInt();
  if (StrideVal.getBitWidth() > 64)
    return false;
  return StrideVal.getSExtValue() < MergeDistance;
}
lib/Transforms/Vectorize/LoopVectorize.cpp
6788 ↗(On Diff #81921)

These 2 lines may be removed once you have a constructor.

magabari updated this revision to Diff 81931.Dec 19 2016, 1:57 AM

fix Elena comments

mkuper added inline comments.Dec 19 2016, 10:05 AM
include/llvm/Analysis/TargetTransformInfo.h
606 ↗(On Diff #81214)

But aren't you figuring isStrided out based on the SCEV?
That is, if you pass a SCEV* S, it can be:
nullptr -> default
!isa<SCEVAddRecExpr>(S) -> not strided
isa<SCEVAddRecExpr>(S) -> strided?

delena added inline comments.Dec 20 2016, 2:06 AM
include/llvm/Analysis/TargetTransformInfo.h
606 ↗(On Diff #81214)

There are 4 cases that we want to handle

  1. Default, when pointer to AddressInfo is null. This case is equal to the old variant when IsComplex was false;
  2. The access is complex and we don't have info about the stride. In this case we put isStrided = false and do not check SCEV.
  3. The access is strided (isStrided = true, SCEV is not null): (3.1) Stride is constant (SCEV is const int) (3.2) Stride is loop invariant unknown value (SCEV is not const int)

I agree that we can drop isStrided and say the following:

  1. The access is complex and we don't have info about the stride. AddressInfo is not null, but SCEV inside is null
  2. The access is strided - SCEV is not null

Is that what you mean? I just think that the situation when "AddressInfo is not null, but SCEV inside is null" looks strange from design point of view.

magabari updated this revision to Diff 82106.Dec 20 2016, 6:54 AM

re-adding the test. some how it got missed in the last version

What I'm saying is that you have this code in getAddressAccessComplexity():

const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(SE->getSCEV(Ptr));
if (!AddRec)
  return AddressInfo;

AddressInfo.isStrided = true;

// Get Step Recurrence.
AddressInfo.Step = AddRec->getStepRecurrence(*SE);
return AddressInfo;

So, basically, the logic right now is:

  1. If we don't have a SCEV for the pointer, or have a SCEV which isn't an AddRec, return a struct with isStrided == false.
  2. If we have a SCEV, and it's an AddRec, return a struct with isStrided == true, and a Step.

What I'm saying is that you could replace this with "If we don't have a SCEVAddRecExpr for the pointer return null, otherwise return the SCEV", and pass the result of that into getAddressComputationCost(). Then, if getAddressComputationCost() doesn't get a SCEV, it knows the access isn't strided. If it does get a SCEV, it knows it's strided, and can call getStepRecurrence to get the Step.

Am I missing something?

So you are suggesting passing the SCEV of the ptr instead of the Step, right?

I am just trying to think if there is more interesting access patterns rather than strided access to care about,
which is already covered by passing the step and isStrided boolean.
Also in this case I need to pass the ScalarEvolution ptr in order to get the Step.

So you are suggesting passing the SCEV of the ptr instead of the Step, right?

Yes.

I am just trying to think if there is more interesting access patterns rather than strided access to care about,
which is already covered by passing the step and isStrided boolean.

Sure, but you're basically inventing a new struct to pass information that's already encapsulated in an existing one (the SCEV). This may be justified, I'm just trying to understand what's the advantage of not passing the ptr SCEV. It looks like the SCEV would be more flexible, and using it avoids introducing another way to describe the same information. I originally thought it may be problematic from the layering perspective, but Hal made me realize that's nonsense, so... :-)

magabari updated this revision to Diff 82590.Dec 28 2016, 5:52 AM

following Michael comments.
removed AddressAccessInfo and passed ScalarEvolution and Ptr SCEV directly.

mkuper added inline comments.Dec 28 2016, 11:46 AM
include/llvm/Analysis/TargetTransformInfo.h
26 ↗(On Diff #82590)

Why do you need the expander?

include/llvm/Analysis/TargetTransformInfoImpl.h
430 ↗(On Diff #82590)

This dyn_cast<> should be isa<>.

432 ↗(On Diff #82590)

Missing newline between functions.

435 ↗(On Diff #82590)

This dyn_cast<> can be a cast<> since you already know it's a SCEVAddRecExpr.
In general, you should never use dyn_cast without checking if the result is null. If you already know the cast is going to succeed, use cast<>. If not, use dyn_cast<> and check the result.

437 ↗(On Diff #82590)

Again, isa<>.

439 ↗(On Diff #82590)

Another missing newline.

443 ↗(On Diff #82590)

Again, cast.

444 ↗(On Diff #82590)

Why not replace isConstantStridedAccess() with something that returns the step if it succeeds?

lib/Target/AArch64/AArch64TargetTransformInfo.cpp
429 ↗(On Diff #82590)

clang-format?

lib/Target/ARM/ARMTargetTransformInfo.cpp
350 ↗(On Diff #82590)

clang-format?

magabari marked 10 inline comments as done.EditedDec 29 2016, 12:16 AM

fixed all comments

magabari updated this revision to Diff 82660.Dec 29 2016, 4:17 AM

fixed Michael comments.

magabari updated this revision to Diff 82665.Dec 29 2016, 4:45 AM

adding back the test (dropped it by mistake in the prev. patch)

mkuper added inline comments.Dec 29 2016, 10:12 AM
include/llvm/Analysis/TargetTransformInfo.h
26 ↗(On Diff #82590)

Sorry, I missed this on the previous pass - but why do you even need this include here at all?
Can you forward-declare what you need (SCEV, ScalarEvolution?) and actually include this where you need it (I guess TargetTransformInfoImpl.h and the relevant cpps)?

include/llvm/Analysis/TargetTransformInfoImpl.h
446 ↗(On Diff #82665)

You could just return a SCEVConstant* from getConstantStrideStep()
E.g. something like:

const SCEVConstant *getConstantStrideStep(ScalarEvolution *SE, const SCEV *Ptr) {
  if (!isStridedAccess(Ptr))
    return nullptr;

  const SCEVAddRecExpr *AddRec = cast<SCEVAddRecExpr>(Ptr);
  return dyn_cast<SCEVConstant>(AddRec->getStepRecurrence(*SE));
}

Then this code could be something like:

bool isConstantStridedAccessLessThan(ScalarEvolution *SE, const SCEV *Ptr,
                                     int64_t MergeDistance) {
  const SCEVConstant *Step = getConstantStrideStep(SE, Ptr);
  APInt StrideVal = Step->getAPInt();
  if (StrideVal.getBitWidth() > 64)
    return false;
  return StrideVal.getSExtValue() < MergeDistance;
}
magabari updated this revision to Diff 82797.Jan 2 2017, 12:15 AM

fixed Michael comments.

LGTM, except that I'm not sure I understand what's going on with negative strides in isConstantStridedAccessLessThan().

include/llvm/Analysis/TargetTransformInfoImpl.h
450 ↗(On Diff #82797)

Can we get here with negative stride values?

If we can't, then I'm not sure why you get the sext value - and why MergeDistance is signed.
If we can, then you probably don't want to return true for any negative value.

(I realize this was the way in the original, but as long as you're touching the code...)

LGTM, except that I'm not sure I understand what's going on with negative strides in isConstantStridedAccessLessThan().

I think it's a bug in the original version which i tried not to change. I will fix it in a separate patch.

mkuper accepted this revision.Jan 4 2017, 1:00 PM
mkuper edited edge metadata.

Ok, please add a FIXME in this patch for the negative stride thing, otherwise LGTM.

This revision is now accepted and ready to land.Jan 4 2017, 1:00 PM
This revision was automatically updated to reflect the committed changes.