This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
2/4
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
2/2
BasicTTIImpl.h
-
lib/
-
Analysis/
1
TargetTransformInfo.cpp
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.h
-
AArch64TargetTransformInfo.cpp
-
ARM/
-
ARMTargetTransformInfo.h
-
ARMTargetTransformInfo.cpp
-
Hexagon/
-
HexagonTargetTransformInfo.h
-
HexagonTargetTransformInfo.cpp
-
PowerPC/
-
PPCTargetTransformInfo.h
-
PPCTargetTransformInfo.cpp
-
SystemZ/
-
SystemZTargetTransformInfo.h
-
SystemZTargetTransformInfo.cpp
-
X86/
-
X86TargetTransformInfo.h
-
X86TargetTransformInfo.cpp
-
Transforms/
-
Scalar/
-
RewriteStatepointsForGC.cpp
-
Utils/
-
ScalarEvolutionExpander.cpp
-
Vectorize/
2/2
LoopVectorize.cpp
-
SLPVectorizer.cpp

Differential D79162

[Analysis] TTI: Add CastContextHint for getCastInstrCost
ClosedPublic

Authored by dmgreen on Apr 30 2020, 2:38 AM.

Download Raw Diff

Details

Reviewers

samparker
SjoerdMeijer
RKSimon
spatel
craig.topper
fhahn
Ayal
kbarton
efriedma
Pierre-vh

Commits

rG60280e9818a6: [Analysis] TTI: Add CastContextHint for getCastInstrCost

Summary

Currently, getCastInstrCost has limited information about the cast it's rating, often just the opcode and types.
Sometimes there is an instruction as well, but it isn't trustworthy: for instance, when the vectorizer is rating a plan, it calls getCastInstrCost with the old instructions when, in fact, it's trying to evaluate the cost of the instruction post-vectorization.
Thus, the current system isn't good enough as the cost of a cast can vary greatly based on the context in which it's used.

For example, if the vectorizer queries getCastInstrCost to evaluate the cost of a sext (load) with tail predication enabled, getCastInstrCost will think it's free most of the time, but it's not always free. On ARM MVE, if the destination type doesn't fit in a 128 bits register, those kind of casts can be very expensive (2 or more instructions per vector element: each element is converted individually!). (Note: the fix for this is the child revision).

To fix that, this path adds a new parameter to getCastInstrCost to give it a hint about the context of the cast. It adds a CastContextHint enum which contains the type of the load/store being created by the vectorizer - one for each of the types it can produce.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Pierre-vh created this revision.Apr 30 2020, 2:38 AM

Pierre-vh created this object with visibility "No One".

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2020, 2:38 AM

Pierre-vh mentioned this in D79163: [Target][ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores.Apr 30 2020, 2:52 AM

Pierre-vh added a child revision: D79163: [Target][ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores.

Pierre-vh added reviewers: samparker, dmgreen, SjoerdMeijer.

Pierre-vh changed the visibility from "No One" to "Public (No Login Required)".

Herald added subscribers: llvm-commits, JDevlieghere, kbarton and 3 others. · View Herald TranscriptApr 30 2020, 2:53 AM

Pierre-vh edited the summary of this revision. (Show Details)Apr 30 2020, 3:32 AM

Pierre-vh added reviewers: RKSimon, spatel, craig.topper.

Herald added a subscriber: • wuzish. · View Herald TranscriptApr 30 2020, 3:32 AM

This feels like a hack to me too, I think we need to move away from passing snippets of information to the instruction cost hooks. There are many other places in the vectorizer where instruction costs are calculated too. What information do we have at this point and what do we need to know? I like the sound of TTI taking something like the LoopVectorizationLegality object, but doing it at a higher level than on a per-instruction basis, allowing TTI to look at the loop.

In D79162#2012372, @samparker wrote:

This feels like a hack to me too, I think we need to move away from passing snippets of information to the instruction cost hooks. There are many other places in the vectorizer where instruction costs are calculated too. What information do we have at this point and what do we need to know? I like the sound of TTI taking something like the LoopVectorizationLegality object, but doing it at a higher level than on a per-instruction basis, allowing TTI to look at the loop.

What information do we have at this point and what do we need to know?

For D79163, we'd need to know whether a LoadInstr or a StoreInstr will become a masked load/store or not (and LoopVectorizationLegality can answer that question).

I like the sound of TTI taking something like the LoopVectorizationLegality object, but doing it at a higher level than on a per-instruction basis, allowing TTI to look at the loop.

What do you mean by "doing it at a higher level"?
LoopVectorizationLegality already contains the Loop and LoopInfo objects. We'd just need to add more (const) getters to the class so TTI can access them (because they're private right now).

What do you mean by "doing it at a higher level"?

I mean that trying to attach extra bits of information to individual instruction costs hooks doesn't scale. Also, if we want to add in-depth information, such as the legalization object, then we'd probably have to create a whole new API for the vectorizer to use too. Instead, we could enable TTI to calculate the cost of the loop, not just each instruction. This would give us the freedom to evaluate all the memory operations, evaluating their extends/truncs together, and enable us to make a good decision.

samparker added a reviewer: fhahn.Apr 30 2020, 5:32 AM

samparker added reviewers: Ayal, kbarton.

In D79162#2012504, @samparker wrote:

What do you mean by "doing it at a higher level"?

I mean that trying to attach extra bits of information to individual instruction costs hooks doesn't scale. Also, if we want to add in-depth information, such as the legalization object, then we'd probably have to create a whole new API for the vectorizer to use too. Instead, we could enable TTI to calculate the cost of the loop, not just each instruction. This would give us the freedom to evaluate all the memory operations, evaluating their extends/truncs together, and enable us to make a good decision.

This sounds like a much better solution than what I proposed, but wouldn't that "getLoopCost" method also need to access the legalization object? How would it know which load/stores will be masked? Would it guess it by looking at the loop as a whole?

Pierre-vh edited the summary of this revision. (Show Details)Apr 30 2020, 6:21 AM

but wouldn't that "getLoopCost" method also need to access the legalization object?

Indeed. But if were have a getLoopVectorizationCost API, then it would seem reasonable enough to pass that object there.

I don't think I agree that this is a hack, exactly. At least if it was cleaned up. It follows the same method as getArithmeticInstrCost where the type of the parameter is passed through, allowing us to get the information we need but not forcing us to pin this to a potentially incorrect or non-existent instructions. This separation seems like a good thing if we can do it well.

We (ARM/MVE) need to do the same thing for the cost of gather and interleaved loads. Whether the sext/zext is free there is equally variable. The way that I would have imagined this is a enum that can be one of the types of loads that the vectorizer produces (Normal, Masked, Interleave, Gather, Expanded?). There probably needs to be an option for None or Unknown too. I understand that you tried this before but ran into trouble? Can you speak to what kinds of problems you ran into doing things that way?

In D79162#2012659, @samparker wrote:

Indeed. But if were have a getLoopVectorizationCost API, then it would seem reasonable enough to pass that object there.

Although this might well be something that we do need in the long run, it will be very tied into how vplan ends up doing costmodelling, should probably not be limited the loopvectorization and is probably a much bigger task to design and implement than this. Not something that an intern with less than a month left should be asked to do. Being able to cost blocks of code at a time feels like it's quite important to MVE, but is not something we should rush into here.

The other option if all this is unworkable is to put the cost into getMaskedMemOpCost. Or just bypass the cost modelling and force the vectorizer to not consider wide vectors when tail predicating, which I think is something that you've suggested before. If we do try to cost it properly, where we choose to put the high cost is up to us in a way. It's the masked load we are choosing not to split into something where the the extend would end up as free, even if it is extend which is expanded. The actual cost, when you get down to it, comes from choosing to not split the masked load and not being able to sensible split VCTP's, leaving the predicate shuffle between the vctp and the load as high cost.

We may run into the same kinds of problems in getMaskedMemOpCost, but if we pass an 'I' through we can try and tell if it needs to be extended there.

In D79162#2013748, @dmgreen wrote:

We (ARM/MVE) need to do the same thing for the cost of gather and interleaved loads. Whether the sext/zext is free there is equally variable. The way that I would have imagined this is a enum that can be one of the types of loads that the vectorizer produces (Normal, Masked, Interleave, Gather, Expanded?). There probably needs to be an option for None or Unknown too. I understand that you tried this before but ran into trouble? Can you speak to what kinds of problems you ran into doing things that way?

Yes, I originally tried an enum with one entry per "kind" of load/store, as you said. It was working fine, but I felt that it was a bit confusing, as each entry had a different meaning based on the opcode. For instance, I had the MaskedVector entry, which, for most casts, meant that the operand was a masked load, but for truncs meant that the single user of the cast is a masked store. ,What if another target needs to deal with truncs of masked loads or something like that ? They wouldn't be able to use that API, they'd have to hack it. (I don't know if this will ever come up, I'm just trying to think about all of the use cases for this enum)
With the current format of the enum, it's a clearer IMHO, there is no ambiguity, and any target can add their specific case in there and deal with it in the backend.

Of course, both versions work equally well, I'm fine with both.

In D79162#2014497, @Pierre-vh wrote:

Yes, I originally tried an enum with one entry per "kind" of load/store, as you said. It was working fine, but I felt that it was a bit confusing, as each entry had a different meaning based on the opcode. For instance, I had the MaskedVector entry, which, for most casts, meant that the operand was a masked load, but for truncs meant that the single user of the cast is a masked store. ,What if another target needs to deal with truncs of masked loads or something like that ? They wouldn't be able to use that API, they'd have to hack it. (I don't know if this will ever come up, I'm just trying to think about all of the use cases for this enum)
With the current format of the enum, it's a clearer IMHO, there is no ambiguity, and any target can add their specific case in there and deal with it in the backend.

Of course, both versions work equally well, I'm fine with both.

I feel like a truncating load is not a very common thing. We should try and get the common case working first. Although I see your point about the other types of casts, it might be enough at the moment to only look at sext, zext and trunc. Trunc is a little different because we are looking at the operands, might not be the prettiest, but otherwise I think should be OK.

I think that we in MVE need a way to distinguish all the types of loads/stores that the vectorizer produces, even if the context instruction is incorrect and cannot be trusted. But it may be better not to think of this from an individual backend perspective exactly. We are trying to pass the information that the midend knows through to the costmodel. In that way it makes sense to me to add a parameter that has vales {None, Normal, Masked, Interleaved and Gather}, maybe reversed too as the vectorizer can produce it. Get them to be passed through to the correct places, or calculated from the context if nothing else is known. That sounds like it should be able to be done cleanly and simply enough.

I believe that extending load and truncating stores are at least the most common ones we need to worry about, (if not the only one's). Can give that a go, see how it looks?

In D79162#2017965, @dmgreen wrote:

In D79162#2014497, @Pierre-vh wrote:

Yes, I originally tried an enum with one entry per "kind" of load/store, as you said. It was working fine, but I felt that it was a bit confusing, as each entry had a different meaning based on the opcode. For instance, I had the MaskedVector entry, which, for most casts, meant that the operand was a masked load, but for truncs meant that the single user of the cast is a masked store. ,What if another target needs to deal with truncs of masked loads or something like that ? They wouldn't be able to use that API, they'd have to hack it. (I don't know if this will ever come up, I'm just trying to think about all of the use cases for this enum)
With the current format of the enum, it's a clearer IMHO, there is no ambiguity, and any target can add their specific case in there and deal with it in the backend.

Of course, both versions work equally well, I'm fine with both.

I feel like a truncating load is not a very common thing. We should try and get the common case working first. Although I see your point about the other types of casts, it might be enough at the moment to only look at sext, zext and trunc. Trunc is a little different because we are looking at the operands, might not be the prettiest, but otherwise I think should be OK.

I think that we in MVE need a way to distinguish all the types of loads/stores that the vectorizer produces, even if the context instruction is incorrect and cannot be trusted. But it may be better not to think of this from an individual backend perspective exactly. We are trying to pass the information that the midend knows through to the costmodel. In that way it makes sense to me to add a parameter that has vales {None, Normal, Masked, Interleaved and Gather}, maybe reversed too as the vectorizer can produce it. Get them to be passed through to the correct places, or calculated from the context if nothing else is known. That sounds like it should be able to be done cleanly and simply enough.

I believe that extending load and truncating stores are at least the most common ones we need to worry about, (if not the only one's). Can give that a go, see how it looks?

If I understand correctly, I should:

Use {None, Normal, Masked, Interleaved and Gather} instead of the current enum values. (No Scatter? e.g. for trunc to scatter store? Do I need the "reversed" one as well?)
Only set CastContextHint for zext, sext and trunc.

Is that correct? If yes, I can give it a go and we'll see how it looks.

In D79162#2020222, @Pierre-vh wrote:

If I understand correctly, I should:

Use {None, Normal, Masked, Interleaved and Gather} instead of the current enum values. (No Scatter? e.g. for trunc to scatter store? Do I need the "reversed" one as well?)

Only set CastContextHint for zext, sext and trunc.

Is that correct? If yes, I can give it a go and we'll see how it looks.

I think the vectorizer calls it GatherScatter. That sounds like a good thing to mirror.

Reverse end up as extend(reverseshuffle(load))), and it doesn't look like the extend will usually get pushed through the shuffle. You could either make it a None or add and extra Reverse option for completeness. Either way, it should probably not be free by default.

Moved CastContextHint to TargetTransformInfo
Moved the logic that calculates the CastContextHint from an Instruction* from getCastInstrCost (in TargetTransformInfo.cpp) to a static function in TargetTransformInfo named getCastContextHint).
CastContextHint is no longer an optional parameter. Callers have to choose between using a None for the context, using a custom context or calling TTI::getCastContextHint
Removed my change in BasicTTIImpl.h - (Should I restore it? It seemed to have no effects on tests)

This should much better than the previous version, and more in-line with other similar enums used by getArithmeticInstrCost for instance.

Note that this patch doesn't support Interleave and Reversed in getCastContextHint - I personally don't know how to do that, so feedback is welcome.

Herald added subscribers: dantrushin, kerbowa, nhaehnle and 2 others. · View Herald TranscriptMay 11 2020, 3:20 AM

Yeah, I think this is looking better. Thanks for the update. But I would be interested in the opinion of others too. It seems to get this information through to the costmodel and be more accurate than just presuming 'I' will be correct.

In D79162#2029088, @Pierre-vh wrote:

Note that this patch doesn't support Interleave and Reversed in getCastContextHint - I personally don't know how to do that, so feedback is welcome.

I think that's fine of the moment. They are implemented as shufflevectors, so I don't think the extend and the load will usually combine. Essentially treating them as None should be OK, as far as I understand. And we can adjust that in the future if we need to.

llvm/include/llvm/Analysis/TargetTransformInfo.h
1041	Considering how this is used, it looks like None would mean "This is not a load/strore", not "No context". I guess if we don't have I and we are not provided with anything better, we don't have any context. But in all other situations we at least try to calculate it from I.
llvm/include/llvm/CodeGen/BasicTTIImpl.h
765	I feel like this should now be CCH == TTI::CastContextHint::Normal. The other types of loads won't apply for the logic below.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6481	If the VF ==1 and the operand is a load/store, this should likely use Normal.
6482	You either don't need unsigned Opcode = I->getOpcode(), or the variable can be used more throughout this function?

Pierre-vh updated this revision to Diff 263646.May 13 2020, 1:56 AM

Pierre-vh marked 4 inline comments as done.

Had a first quick look, and here's a drive-by comment from my side: I can't say it was love at first sight for me with this patch. It indeed feels like a very narrow approach, and am not entirely sure if it is similar to getArithmeticInstrCost. I see the challenge here though, and haven't given it any time to think about an alternative. Not sure, would it be worth to write to the dev list to get some more exposure/views/opinion on this?

SjoerdMeijer added a reviewer: efriedma.May 13 2020, 7:02 AM

The general idea of passing down information about the instruction makes sense.

The one thing I would say here is that we probably shouldn't pass down the "Instruction" from the vectorizer at all: in general, the cost computation doesn't know what sort of transforms the vectorizer is going to do, so any information computed from its operands/uses will be misleading in general.

In D79162#2034911, @efriedma wrote:

The general idea of passing down information about the instruction makes sense.

The one thing I would say here is that we probably shouldn't pass down the "Instruction" from the vectorizer at all: in general, the cost computation doesn't know what sort of transforms the vectorizer is going to do, so any information computed from its operands/uses will be misleading in general.

I agree, I'll remove the instruction from the calls to getCastInstrCost in the vectorizer.
It was mostly useless with those changes anyway. I removed it, and tests are still passing just fine.

In D79162#2033897, @SjoerdMeijer wrote:

Had a first quick look, and here's a drive-by comment from my side: I can't say it was love at first sight for me with this patch. It indeed feels like a very narrow approach, and am not entirely sure if it is similar to getArithmeticInstrCost. I see the challenge here though, and haven't given it any time to think about an alternative. Not sure, would it be worth to write to the dev list to get some more exposure/views/opinion on this?

I can't think of many alternatives though. We also thought about implementing getMaskedMemoryOpCost in the ARM backend, but it doesn't solve the issue nicely:

The cost would be in the wrong place: it's not the load that's expensive, it's the extend/trunc. If you remove the cast, the problem disappears.
We'd also need to change getMaskedMemoryOpCost. At the very least, it'll need to know the type after/before extend/trunc, which will need to be calculated in the vectorizer to account for the minimal bitwidths (e.g. sometimes, instructions such as zext i32 can become zext i16 after vectorization as the vectorizer know that the value doesn't need more than 16 bits)

I agree that sending a message on the mailing list (with a link to this patch) would be a good idea so we can get more feedback.

Removing instruction from calls to getCastInstrCost in the LoopVectorizer.

I agree that sending a message on the mailing list (with a link to this patch) would be a good idea so we can get more feedback.

Well, I was happy that @efriedma replied and shared his opinion, as it looks like we are on the right track and not doing something really strange. Don't know, perhaps that's enough confirmation.

On its own, this doesn't seem like the sort of change that would need an llvm-dev thread; it's a localized change to one specific function on TargetTransformInfo. If you're looking for feedback on the general approach of adding "context" hints to get*InstrCost, it might make sense to send a message to llvm-dev outlining the general direction.

If you're looking for feedback on the general approach of adding "context" hints to get*InstrCost, it might make sense to send a message to llvm-dev outlining the general direction.

Yeah, so I was getting worried people might want to see a more generic approach here. We might want to do that anyway at some point, but for now I of course don't want to torpedo or delay this work, so was happy that this would be acceptable as a local change to getCastInstrCost.

In D79162#2035811, @Pierre-vh wrote:

Removing instruction from calls to getCastInstrCost in the LoopVectorizer.

Although I would agree that in theory this would make a lot of sense, there are other places that are currently using the context instruction for things that are not modeled here. And just removing them from the vectorier will likely lead to regressions in practice. They won't be tested because the costmodel is usually tested through non-vectorizer tests (testing the costmodel directly), and this change will now treat the vectorizer differently to those tests.

From a quick look:

aarch64 looks for "isWideningInstruction".
arm/neon can now do the same as of a recent patch.
systemz seems to use it for loads and.. something to do with compares?

I would recommend that for the moment we keep the context instruction in place from the vectorizer. The alternative would be to try and replace all needed modelling with hints or extra parameters, but that sounds like it will get very messy quite quickly. If only the opcode of the surrounding instructions is used, it will likely be "correct enough" for most cases (I think). In the long run as the vectorizer learns to transform code more, and vplan starts to learn new tricks this is more likely to break down, but I think that will need larger changes to the costmodelling anyway. What we have here is at least well defined (the type of the load/store) and is known to be fixing something that is incorrectly used at the moment.

In the long run I would like to see something that really tries to cost multiple instructions at the same time. If we have trunc(shl(mul(sext, sext))) and we know in the backend that we can convert that to a vmulh, it's going to be next to impossible to sensible costmodel that without something that looks at the entire tree and gives it a single cost. You don't want to have a sextend looking at it's uses uses uses to see if the whole thing together makes something that is cheap vs something that is expensive. Maybe that's not a great example but hopefully you can see my point. I imagine it might need a better way to get context instructions for things that don't exist too. From vplan recipes or runtime unrolled loops or the like. It would be good to be able to get a fake instruction analogue without being tied specifically to the original IR. That will all need a lot of careful design though.

Re-adding the instructions in the calls to TTI::getCastInstrCost in the LoopVectorizer.

Thanks. I would be happy with this. But I'd like to get a second opinion on that.

(I guess you argue that Masked shouldn't be a type, and should be applied as some sort of modifier to the other types. I don't know of any architecture where this would matter though, so may not be important in practice. Gathers are always masked and Interleaved are never, from the architectures I know of that make use of them.)

@Pierre-vh Sam has made some potentially awkward changes here. Can you rebase this and see if it's still looking OK?

dmgreen mentioned this in D78937: [CostModel] Use isExtLoad in BasicTTI.May 21 2020, 2:26 AM

Rebasing the patch - there's now one more call side for getCastInstrCost

dmgreen added inline comments.May 21 2020, 7:28 AM

llvm/include/llvm/CodeGen/BasicTTIImpl.h
765	Actually looking at the code again, this needn't check I? Because we are not using it's value directly, and we know the operand it a load now. That would imply that the `if (!I)` condition above can change to just guard the TLI call too.

Pierre-vh updated this revision to Diff 265671.May 21 2020, 11:53 PM

Pierre-vh marked an inline comment as done.

Rebasing the patch. (Changed code in TargetTransformInfoImpl.hpp, around line 864).

Pierre is now off to do more important things, so I'll take this over. I'll commandeer the patch and rebase it.

dmgreen updated this revision to Diff 266143.May 26 2020, 2:50 AM

Rebase onto trunk and attempt to get some FP converts working in the same way too.

Herald added a subscriber: bmahjour. · View Herald TranscriptJun 18 2020, 6:14 AM

dmgreen edited the summary of this revision. (Show Details)Jun 18 2020, 6:19 AM

bmahjour removed a subscriber: bmahjour.Jun 18 2020, 7:06 AM

Ping. Anyone happy/unhappy with this?

fhahn added inline comments.Jul 2 2020, 1:20 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
1046	Did you consider also excluding non load/store cast uses? For example, it might be worth to include arithmetic instructions into which casts can be folded, e.g. USUBL & co on AArch64.

dmgreen marked an inline comment as done.Jul 2 2020, 8:11 AM

dmgreen added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
1046	Yes.. That might be a possibility. AArch64TTIImpl::getCastInstrCost already has some code to check for isWideningInstruction, which should already catch a lot of these situations I think. That relies on the context instruction being correct, but I believe that will be correct most of the time currently. It's really the type of the load or store that is most often incorrect. I'm honestly not sure how well this would scale to many different operand kinds. I think in the long run we need something that makes costing multiple nodes together easier, if you imagine something like a trunc(shift(mul(sext(x), sext(y), 16), all being a single instruction! And with vplan making larger changes during vectorization it would ideally handle "hypothetical" instructions better without relying on "real" context instructions. But this patch is at least a little step that gets the load/store kinds more correct.

fhahn added inline comments.Jul 15 2020, 10:54 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
1046	AArch64TTIImpl::getCastInstrCost already has some code to check for isWideningInstruction, which should already catch a lot of these situations I think. That relies on the context instruction being correct, but I believe that will be correct most of the time currently. It's really the type of the load or store that is most often incorrect. Yep, but I think it only works if the original IR does the widening already as part of the context instruction. I'm honestly not sure how well this would scale to many different operand kinds. I think in the long run we need something that makes costing multiple nodes together easier, if you imagine something like a trunc(shift(mul(sext(x), sext(y), 16), all being a single instructio yeah, the isolated helpers reach a limit of usefulness. A way to cost arbitrary instruction trees would be very useful in many contexts (with some limits to the size I guess). But this patch is at least a little step that gets the load/store kinds more correct. Sounds good to me, it might be good to just mention the reason for only limiting to memory cases initially somewhere (sorry if I missed that this is already said somewhere.

Update the comment with a fixme about possibly changing to costing multiple instructions.

Ping :)

I think there's consensus this patch is a reasonable fix. So, this LGTM, but please wait a day in case there are more comments.

llvm/lib/Analysis/TargetTransformInfo.cpp
733	Can this and `computeTruncCastContextHint` be merged into 1 function? Essentially only the cases in the switch are different, which can be merged.

This revision is now accepted and ready to land.Jul 27 2020, 9:31 AM

Thanks.

Closed by commit rG60280e9818a6: [Analysis] TTI: Add CastContextHint for getCastInstrCost (authored by dmgreen). · Explain WhyJul 29 2020, 5:33 AM

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG60280e9818a6: [Analysis] TTI: Add CastContextHint for getCastInstrCost.

dmgreen mentioned this in rG9ddb28964c92: [ARM] Tune getCastInstrCost for extending masked loads and truncating masked….Jul 29 2020, 5:42 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

42 lines

TargetTransformInfoImpl.h

4 lines

CodeGen/

BasicTTIImpl.h

26 lines

lib/

Analysis/

TargetTransformInfo.cpp

47 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

2 lines

AArch64TargetTransformInfo.cpp

16 lines

ARM/

ARMTargetTransformInfo.h

2 lines

ARMTargetTransformInfo.cpp

14 lines

Hexagon/

HexagonTargetTransformInfo.h

5 lines

HexagonTargetTransformInfo.cpp

4 lines

PowerPC/

PPCTargetTransformInfo.h

2 lines

PPCTargetTransformInfo.cpp

3 lines

SystemZ/

SystemZTargetTransformInfo.h

2 lines

SystemZTargetTransformInfo.cpp

9 lines

X86/

X86TargetTransformInfo.h

2 lines

X86TargetTransformInfo.cpp

6 lines

Transforms/

Scalar/

RewriteStatepointsForGC.cpp

4 lines

Utils/

ScalarEvolutionExpander.cpp

5 lines

Vectorize/

LoopVectorize.cpp

52 lines

SLPVectorizer.cpp

19 lines

Diff 281536

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 1,015 Lines • ▼ Show 20 Lines	public:
/// \return The cost of a shuffle instruction of kind Kind and of type Tp.		/// \return The cost of a shuffle instruction of kind Kind and of type Tp.
/// The index and subtype parameters are used by the subvector insertion and		/// The index and subtype parameters are used by the subvector insertion and
/// extraction shuffle kinds to show the insert/extract point and the type of		/// extraction shuffle kinds to show the insert/extract point and the type of
/// the subvector being inserted/extracted.		/// the subvector being inserted/extracted.
/// NOTE: For subvector extractions Tp represents the source type.		/// NOTE: For subvector extractions Tp represents the source type.
int getShuffleCost(ShuffleKind Kind, VectorType *Tp, int Index = 0,		int getShuffleCost(ShuffleKind Kind, VectorType *Tp, int Index = 0,
VectorType *SubTp = nullptr) const;		VectorType *SubTp = nullptr) const;

		/// Represents a hint about the context in which a cast is used.
		///
		/// For zext/sext, the context of the cast is the operand, which must be a
		/// load of some kind. For trunc, the context is of the cast is the single
		/// user of the instruction, which must be a store of some kind.
		///
		/// This enum allows the vectorizer to give getCastInstrCost an idea of the
		/// type of cast it's dealing with, as not every cast is equal. For instance,
		/// the zext of a load may be free, but the zext of an interleaving load can
		//// be (very) expensive!
		///
		/// See \c getCastContextHint to compute a CastContextHint from a cast
		/// Instruction*. Callers can use it if they don't need to override the
		/// context and just want it to be calculated from the instruction.
		///
		/// FIXME: This handles the types of load/store that the vectorizer can
		/// produce, which are the cases where the context instruction is most
		/// likely to be incorrect. There are other situations where that can happen
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Considering how this is used, it looks like None would mean "This is not a load/strore", not "No context". I guess if we don't have I and we are not provided with anything better, we don't have any context. But in all other situations we at least try to calculate it from I. dmgreen: Considering how this is used, it looks like None would mean "This is not a load/strore", not…
		/// too, which might be handled here but in the long run a more general
		/// solution of costing multiple instructions at the same times may be better.
		enum class CastContextHint : uint8_t {
		None, ///< The cast is not used with a load/store of any kind.
		Normal, ///< The cast is used with a normal load/store.
		fhahnUnsubmitted Not Done Reply Inline Actions Did you consider also excluding non load/store cast uses? For example, it might be worth to include arithmetic instructions into which casts can be folded, e.g. USUBL & co on AArch64. fhahn: Did you consider also excluding non load/store cast uses? For example, it might be worth to…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Yes.. That might be a possibility. AArch64TTIImpl::getCastInstrCost already has some code to check for isWideningInstruction, which should already catch a lot of these situations I think. That relies on the context instruction being correct, but I believe that will be correct most of the time currently. It's really the type of the load or store that is most often incorrect. I'm honestly not sure how well this would scale to many different operand kinds. I think in the long run we need something that makes costing multiple nodes together easier, if you imagine something like a trunc(shift(mul(sext(x), sext(y), 16), all being a single instruction! And with vplan making larger changes during vectorization it would ideally handle "hypothetical" instructions better without relying on "real" context instructions. But this patch is at least a little step that gets the load/store kinds more correct. dmgreen: Yes.. That might be a possibility. AArch64TTIImpl::getCastInstrCost already has some code to…
		fhahnUnsubmitted Not Done Reply Inline Actions AArch64TTIImpl::getCastInstrCost already has some code to check for isWideningInstruction, which should already catch a lot of these situations I think. That relies on the context instruction being correct, but I believe that will be correct most of the time currently. It's really the type of the load or store that is most often incorrect. Yep, but I think it only works if the original IR does the widening already as part of the context instruction. I'm honestly not sure how well this would scale to many different operand kinds. I think in the long run we need something that makes costing multiple nodes together easier, if you imagine something like a trunc(shift(mul(sext(x), sext(y), 16), all being a single instructio yeah, the isolated helpers reach a limit of usefulness. A way to cost arbitrary instruction trees would be very useful in many contexts (with some limits to the size I guess). But this patch is at least a little step that gets the load/store kinds more correct. Sounds good to me, it might be good to just mention the reason for only limiting to memory cases initially somewhere (sorry if I missed that this is already said somewhere. fhahn: > AArch64TTIImpl::getCastInstrCost already has some code to check for isWideningInstruction…
		Masked, ///< The cast is used with a masked load/store.
		GatherScatter, ///< The cast is used with a gather/scatter.
		Interleave, ///< The cast is used with an interleaved load/store.
		Reversed, ///< The cast is used with a reversed load/store.
		};

		/// Calculates a CastContextHint from \p I.
		/// This should be used by callers of getCastInstrCost if they wish to
		/// determine the context from some instruction.
		/// \returns the CastContextHint for ZExt/SExt/Trunc, None if \p I is nullptr,
		/// or if it's another type of cast.
		static CastContextHint getCastContextHint(const Instruction *I);

/// \return The expected cost of cast instructions, such as bitcast, trunc,		/// \return The expected cost of cast instructions, such as bitcast, trunc,
/// zext, etc. If there is an existing instruction that holds Opcode, it		/// zext, etc. If there is an existing instruction that holds Opcode, it
/// may be passed in the 'I' parameter.		/// may be passed in the 'I' parameter.
int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency,		TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency,
const Instruction *I = nullptr) const;		const Instruction *I = nullptr) const;

/// \return The expected cost of a sign- or zero-extended vector extract. Use		/// \return The expected cost of a sign- or zero-extended vector extract. Use
/// -1 to indicate that there is no information about the index value.		/// -1 to indicate that there is no information about the index value.
int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,		int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,
unsigned Index = -1) const;		unsigned Index = -1) const;

▲ Show 20 Lines • Show All 413 Lines • ▼ Show 20 Lines	virtual unsigned getArithmeticInstrCost(
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
OperandValueKind Opd1Info,		OperandValueKind Opd1Info,
OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo,		OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo,
OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args,		OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args,
const Instruction *CxtI = nullptr) = 0;		const Instruction *CxtI = nullptr) = 0;
virtual int getShuffleCost(ShuffleKind Kind, VectorType *Tp, int Index,		virtual int getShuffleCost(ShuffleKind Kind, VectorType *Tp, int Index,
VectorType *SubTp) = 0;		VectorType *SubTp) = 0;
virtual int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		virtual int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
		CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) = 0;		const Instruction *I) = 0;
virtual int getExtractWithExtendCost(unsigned Opcode, Type *Dst,		virtual int getExtractWithExtendCost(unsigned Opcode, Type *Dst,
VectorType *VecTy, unsigned Index) = 0;		VectorType *VecTy, unsigned Index) = 0;
virtual int getCFInstrCost(unsigned Opcode,		virtual int getCFInstrCost(unsigned Opcode,
TTI::TargetCostKind CostKind) = 0;		TTI::TargetCostKind CostKind) = 0;
virtual int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		virtual int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
▲ Show 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty,
return Impl.getArithmeticInstrCost(Opcode, Ty, CostKind, Opd1Info, Opd2Info,		return Impl.getArithmeticInstrCost(Opcode, Ty, CostKind, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo, Args, CxtI);		Opd1PropInfo, Opd2PropInfo, Args, CxtI);
}		}
int getShuffleCost(ShuffleKind Kind, VectorType *Tp, int Index,		int getShuffleCost(ShuffleKind Kind, VectorType *Tp, int Index,
VectorType *SubTp) override {		VectorType *SubTp) override {
return Impl.getShuffleCost(Kind, Tp, Index, SubTp);		return Impl.getShuffleCost(Kind, Tp, Index, SubTp);
}		}
int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::TargetCostKind CostKind,		CastContextHint CCH, TTI::TargetCostKind CostKind,
const Instruction *I) override {		const Instruction *I) override {
return Impl.getCastInstrCost(Opcode, Dst, Src, CostKind, I);		return Impl.getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
}		}
int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,		int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,
unsigned Index) override {		unsigned Index) override {
return Impl.getExtractWithExtendCost(Opcode, Dst, VecTy, Index);		return Impl.getExtractWithExtendCost(Opcode, Dst, VecTy, Index);
}		}
int getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) override {		int getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) override {
return Impl.getCFInstrCost(Opcode, CostKind);		return Impl.getCFInstrCost(Opcode, CostKind);
}		}
▲ Show 20 Lines • Show All 259 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 417 Lines • ▼ Show 20 Lines	public:
}		}

unsigned getShuffleCost(TTI::ShuffleKind Kind, VectorType *Ty, int Index,		unsigned getShuffleCost(TTI::ShuffleKind Kind, VectorType *Ty, int Index,
VectorType *SubTp) {		VectorType *SubTp) {
return 1;		return 1;
}		}

unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) {		const Instruction *I) {
switch (Opcode) {		switch (Opcode) {
default:		default:
break;		break;
case Instruction::IntToPtr: {		case Instruction::IntToPtr: {
unsigned SrcSize = Src->getScalarSizeInBits();		unsigned SrcSize = Src->getScalarSizeInBits();
if (DL.isLegalInteger(SrcSize) &&		if (DL.isLegalInteger(SrcSize) &&
▲ Show 20 Lines • Show All 476 Lines • ▼ Show 20 Lines	int getUserCost(const User U, ArrayRef<const Value > Operands,
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast:		case Instruction::BitCast:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::AddrSpaceCast:		case Instruction::AddrSpaceCast:
return TargetTTI->getCastInstrCost(Opcode, Ty, OpTy, CostKind, I);		return TargetTTI->getCastInstrCost(
		Opcode, Ty, OpTy, TTI::getCastContextHint(I), CostKind, I);
case Instruction::Store: {		case Instruction::Store: {
auto *SI = cast<StoreInst>(U);		auto *SI = cast<StoreInst>(U);
Type *ValTy = U->getOperand(0)->getType();		Type *ValTy = U->getOperand(0)->getType();
return TargetTTI->getMemoryOpCost(Opcode, ValTy, SI->getAlign(),		return TargetTTI->getMemoryOpCost(Opcode, ValTy, SI->getAlign(),
SI->getPointerAddressSpace(),		SI->getPointerAddressSpace(),
CostKind, I);		CostKind, I);
}		}
case Instruction::Load: {		case Instruction::Load: {
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 710 Lines • ▼ Show 20 Lines	unsigned getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp, int Index,
case TTI::SK_InsertSubvector:		case TTI::SK_InsertSubvector:
return getInsertSubvectorOverhead(cast<FixedVectorType>(Tp), Index,		return getInsertSubvectorOverhead(cast<FixedVectorType>(Tp), Index,
cast<FixedVectorType>(SubTp));		cast<FixedVectorType>(SubTp));
}		}
llvm_unreachable("Unknown TTI::ShuffleKind");		llvm_unreachable("Unknown TTI::ShuffleKind");
}		}

unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr) {		const Instruction *I = nullptr) {
if (BaseT::getCastInstrCost(Opcode, Dst, Src, CostKind, I) == 0)		if (BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I) == 0)
return 0;		return 0;

const TargetLoweringBase *TLI = getTLI();		const TargetLoweringBase *TLI = getTLI();
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");
std::pair<unsigned, MVT> SrcLT = TLI->getTypeLegalizationCost(DL, Src);		std::pair<unsigned, MVT> SrcLT = TLI->getTypeLegalizationCost(DL, Src);
std::pair<unsigned, MVT> DstLT = TLI->getTypeLegalizationCost(DL, Dst);		std::pair<unsigned, MVT> DstLT = TLI->getTypeLegalizationCost(DL, Dst);

Show All 21 Lines	case Instruction::FPExt:
if (I && getTLI()->isExtFree(I))		if (I && getTLI()->isExtFree(I))
return 0;		return 0;
break;		break;
case Instruction::ZExt:		case Instruction::ZExt:
if (TLI->isZExtFree(SrcLT.second, DstLT.second))		if (TLI->isZExtFree(SrcLT.second, DstLT.second))
return 0;		return 0;
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case Instruction::SExt:		case Instruction::SExt:
if (!I)		if (I && getTLI()->isExtFree(I))
break;

if (getTLI()->isExtFree(I))
return 0;		return 0;

// If this is a zext/sext of a load, return 0 if the corresponding		// If this is a zext/sext of a load, return 0 if the corresponding
// extending load exists on target.		// extending load exists on target.
if (I && isa<LoadInst>(I->getOperand(0))) {		if (CCH == TTI::CastContextHint::Normal) {
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I feel like this should now be CCH == TTI::CastContextHint::Normal. The other types of loads won't apply for the logic below. dmgreen: I feel like this should now be CCH == TTI::CastContextHint::Normal. The other types of loads…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Actually looking at the code again, this needn't check I? Because we are not using it's value directly, and we know the operand it a load now. That would imply that the `if (!I)` condition above can change to just guard the TLI call too. dmgreen: Actually looking at the code again, this needn't check I? Because we are not using it's value…
EVT ExtVT = EVT::getEVT(Dst);		EVT ExtVT = EVT::getEVT(Dst);
EVT LoadVT = EVT::getEVT(Src);		EVT LoadVT = EVT::getEVT(Src);
unsigned LType =		unsigned LType =
((Opcode == Instruction::ZExt) ? ISD::ZEXTLOAD : ISD::SEXTLOAD);		((Opcode == Instruction::ZExt) ? ISD::ZEXTLOAD : ISD::SEXTLOAD);
if (TLI->isLoadExtLegal(LType, ExtVT, LoadVT))		if (TLI->isLoadExtLegal(LType, ExtVT, LoadVT))
return 0;		return 0;
}		}
break;		break;
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	if (DstVTy && SrcVTy) {
cast<FixedVectorType>(DstVTy)->getNumElements() > 1) {		cast<FixedVectorType>(DstVTy)->getNumElements() > 1) {
Type *SplitDstTy = VectorType::getHalfElementsVectorType(DstVTy);		Type *SplitDstTy = VectorType::getHalfElementsVectorType(DstVTy);
Type *SplitSrcTy = VectorType::getHalfElementsVectorType(SrcVTy);		Type *SplitSrcTy = VectorType::getHalfElementsVectorType(SrcVTy);
T TTI = static_cast<T >(this);		T TTI = static_cast<T >(this);
// If both types need to be split then the split is free.		// If both types need to be split then the split is free.
unsigned SplitCost =		unsigned SplitCost =
(!SplitSrc \|\| !SplitDst) ? TTI->getVectorSplitCost() : 0;		(!SplitSrc \|\| !SplitDst) ? TTI->getVectorSplitCost() : 0;
return SplitCost +		return SplitCost +
(2 * TTI->getCastInstrCost(Opcode, SplitDstTy, SplitSrcTy,		(2 * TTI->getCastInstrCost(Opcode, SplitDstTy, SplitSrcTy, CCH,
CostKind, I));		CostKind, I));
}		}

// In other cases where the source or destination are illegal, assume		// In other cases where the source or destination are illegal, assume
// the operation will get scalarized.		// the operation will get scalarized.
unsigned Num = cast<FixedVectorType>(DstVTy)->getNumElements();		unsigned Num = cast<FixedVectorType>(DstVTy)->getNumElements();
unsigned Cost = thisT()->getCastInstrCost(		unsigned Cost = thisT()->getCastInstrCost(
Opcode, Dst->getScalarType(), Src->getScalarType(), CostKind, I);		Opcode, Dst->getScalarType(), Src->getScalarType(), CCH, CostKind, I);

// Return the cost of multiple scalar invocation plus the cost of		// Return the cost of multiple scalar invocation plus the cost of
// inserting and extracting the values.		// inserting and extracting the values.
return getScalarizationOverhead(DstVTy, true, true) + Num * Cost;		return getScalarizationOverhead(DstVTy, true, true) + Num * Cost;
}		}

// We already handled vector-to-vector and scalar-to-scalar conversions.		// We already handled vector-to-vector and scalar-to-scalar conversions.
// This		// This
// is where we handle bitcast between vectors and scalars. We need to assume		// is where we handle bitcast between vectors and scalars. We need to assume
// that the conversion is scalarized in one way or another.		// that the conversion is scalarized in one way or another.
if (Opcode == Instruction::BitCast) {		if (Opcode == Instruction::BitCast) {
// Illegal bitcasts are done by storing and loading from a stack slot.		// Illegal bitcasts are done by storing and loading from a stack slot.
return (SrcVTy ? getScalarizationOverhead(SrcVTy, false, true) : 0) +		return (SrcVTy ? getScalarizationOverhead(SrcVTy, false, true) : 0) +
(DstVTy ? getScalarizationOverhead(DstVTy, true, false) : 0);		(DstVTy ? getScalarizationOverhead(DstVTy, true, false) : 0);
}		}

llvm_unreachable("Unhandled cast");		llvm_unreachable("Unhandled cast");
}		}

unsigned getExtractWithExtendCost(unsigned Opcode, Type *Dst,		unsigned getExtractWithExtendCost(unsigned Opcode, Type *Dst,
VectorType *VecTy, unsigned Index) {		VectorType *VecTy, unsigned Index) {
return thisT()->getVectorInstrCost(Instruction::ExtractElement, VecTy,		return thisT()->getVectorInstrCost(Instruction::ExtractElement, VecTy,
Index) +		Index) +
thisT()->getCastInstrCost(Opcode, Dst, VecTy->getElementType(),		thisT()->getCastInstrCost(Opcode, Dst, VecTy->getElementType(),
TTI::TCK_RecipThroughput);		TTI::CastContextHint::None, TTI::TCK_RecipThroughput);
}		}

unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) {		unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) {
return BaseT::getCFInstrCost(Opcode, CostKind);		return BaseT::getCFInstrCost(Opcode, CostKind);
}		}

unsigned getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		unsigned getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
▲ Show 20 Lines • Show All 633 Lines • ▼ Show 20 Lines	unsigned getTypeBasedIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
}		}
case Intrinsic::smul_fix:		case Intrinsic::smul_fix:
case Intrinsic::umul_fix: {		case Intrinsic::umul_fix: {
unsigned ExtSize = RetTy->getScalarSizeInBits() * 2;		unsigned ExtSize = RetTy->getScalarSizeInBits() * 2;
Type *ExtTy = RetTy->getWithNewBitWidth(ExtSize);		Type *ExtTy = RetTy->getWithNewBitWidth(ExtSize);

unsigned ExtOp =		unsigned ExtOp =
IID == Intrinsic::smul_fix ? Instruction::SExt : Instruction::ZExt;		IID == Intrinsic::smul_fix ? Instruction::SExt : Instruction::ZExt;
		TTI::CastContextHint CCH = TTI::CastContextHint::None;

unsigned Cost = 0;		unsigned Cost = 0;
Cost += 2 * thisT()->getCastInstrCost(ExtOp, ExtTy, RetTy, CostKind);		Cost += 2 * thisT()->getCastInstrCost(ExtOp, ExtTy, RetTy, CCH, CostKind);
Cost +=		Cost +=
thisT()->getArithmeticInstrCost(Instruction::Mul, ExtTy, CostKind);		thisT()->getArithmeticInstrCost(Instruction::Mul, ExtTy, CostKind);
Cost += 2 * thisT()->getCastInstrCost(Instruction::Trunc, RetTy, ExtTy,		Cost += 2 * thisT()->getCastInstrCost(Instruction::Trunc, RetTy, ExtTy,
CostKind);		CCH, CostKind);
Cost += thisT()->getArithmeticInstrCost(Instruction::LShr, RetTy,		Cost += thisT()->getArithmeticInstrCost(Instruction::LShr, RetTy,
CostKind, TTI::OK_AnyValue,		CostKind, TTI::OK_AnyValue,
TTI::OK_UniformConstantValue);		TTI::OK_UniformConstantValue);
Cost += thisT()->getArithmeticInstrCost(Instruction::Shl, RetTy, CostKind,		Cost += thisT()->getArithmeticInstrCost(Instruction::Shl, RetTy, CostKind,
TTI::OK_AnyValue,		TTI::OK_AnyValue,
TTI::OK_UniformConstantValue);		TTI::OK_UniformConstantValue);
Cost += thisT()->getArithmeticInstrCost(Instruction::Or, RetTy, CostKind);		Cost += thisT()->getArithmeticInstrCost(Instruction::Or, RetTy, CostKind);
return Cost;		return Cost;
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	unsigned getTypeBasedIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
case Intrinsic::umul_with_overflow: {		case Intrinsic::umul_with_overflow: {
Type *MulTy = RetTy->getContainedType(0);		Type *MulTy = RetTy->getContainedType(0);
Type *OverflowTy = RetTy->getContainedType(1);		Type *OverflowTy = RetTy->getContainedType(1);
unsigned ExtSize = MulTy->getScalarSizeInBits() * 2;		unsigned ExtSize = MulTy->getScalarSizeInBits() * 2;
Type *ExtTy = MulTy->getWithNewBitWidth(ExtSize);		Type *ExtTy = MulTy->getWithNewBitWidth(ExtSize);

unsigned ExtOp =		unsigned ExtOp =
IID == Intrinsic::smul_fix ? Instruction::SExt : Instruction::ZExt;		IID == Intrinsic::smul_fix ? Instruction::SExt : Instruction::ZExt;
		TTI::CastContextHint CCH = TTI::CastContextHint::None;

unsigned Cost = 0;		unsigned Cost = 0;
Cost += 2 * thisT()->getCastInstrCost(ExtOp, ExtTy, MulTy, CostKind);		Cost += 2 * thisT()->getCastInstrCost(ExtOp, ExtTy, MulTy, CCH, CostKind);
Cost +=		Cost +=
thisT()->getArithmeticInstrCost(Instruction::Mul, ExtTy, CostKind);		thisT()->getArithmeticInstrCost(Instruction::Mul, ExtTy, CostKind);
Cost += 2 * thisT()->getCastInstrCost(Instruction::Trunc, MulTy, ExtTy,		Cost += 2 * thisT()->getCastInstrCost(Instruction::Trunc, MulTy, ExtTy,
CostKind);		CCH, CostKind);
Cost += thisT()->getArithmeticInstrCost(Instruction::LShr, MulTy,		Cost += thisT()->getArithmeticInstrCost(Instruction::LShr, MulTy,
CostKind, TTI::OK_AnyValue,		CostKind, TTI::OK_AnyValue,
TTI::OK_UniformConstantValue);		TTI::OK_UniformConstantValue);

if (IID == Intrinsic::smul_with_overflow)		if (IID == Intrinsic::smul_with_overflow)
Cost += thisT()->getArithmeticInstrCost(Instruction::AShr, MulTy,		Cost += thisT()->getArithmeticInstrCost(Instruction::AShr, MulTy,
CostKind, TTI::OK_AnyValue,		CostKind, TTI::OK_AnyValue,
TTI::OK_UniformConstantValue);		TTI::OK_UniformConstantValue);
▲ Show 20 Lines • Show All 302 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 724 Lines • ▼ Show 20 Lines

	int TargetTransformInfo::getShuffleCost(ShuffleKind Kind, VectorType *Ty,			int TargetTransformInfo::getShuffleCost(ShuffleKind Kind, VectorType *Ty,
	int Index, VectorType *SubTp) const {			int Index, VectorType *SubTp) const {
	int Cost = TTIImpl->getShuffleCost(Kind, Ty, Index, SubTp);			int Cost = TTIImpl->getShuffleCost(Kind, Ty, Index, SubTp);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

				TTI::CastContextHint
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Can this and `computeTruncCastContextHint` be merged into 1 function? Essentially only the cases in the switch are different, which can be merged. SjoerdMeijer: Can this and `computeTruncCastContextHint` be merged into 1 function? Essentially only the…
				TargetTransformInfo::getCastContextHint(const Instruction *I) {
				if (!I)
				return CastContextHint::None;

				auto getLoadStoreKind = [](const Value *V, unsigned LdStOp, unsigned MaskedOp,
				unsigned GatScatOp) {
				const Instruction *I = dyn_cast<Instruction>(V);
				if (!I)
				return CastContextHint::None;

				if (I->getOpcode() == LdStOp)
				return CastContextHint::Normal;

				if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
				if (II->getIntrinsicID() == MaskedOp)
				return TTI::CastContextHint::Masked;
				if (II->getIntrinsicID() == GatScatOp)
				return TTI::CastContextHint::GatherScatter;
				}

				return TTI::CastContextHint::None;
				};

				switch (I->getOpcode()) {
				case Instruction::ZExt:
				case Instruction::SExt:
				case Instruction::FPExt:
				return getLoadStoreKind(I->getOperand(0), Instruction::Load,
				Intrinsic::masked_load, Intrinsic::masked_gather);
				case Instruction::Trunc:
				case Instruction::FPTrunc:
				if (I->hasOneUse())
				return getLoadStoreKind(*I->user_begin(), Instruction::Store,
				Intrinsic::masked_store,
				Intrinsic::masked_scatter);
				break;
				default:
				return CastContextHint::None;
				}

				return TTI::CastContextHint::None;
				}

	int TargetTransformInfo::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,			int TargetTransformInfo::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
				CastContextHint CCH,
	TTI::TargetCostKind CostKind,			TTI::TargetCostKind CostKind,
	const Instruction *I) const {			const Instruction *I) const {
	assert((I == nullptr \|\| I->getOpcode() == Opcode) &&			assert((I == nullptr \|\| I->getOpcode() == Opcode) &&
	"Opcode should reflect passed instruction.");			"Opcode should reflect passed instruction.");
	int Cost = TTIImpl->getCastInstrCost(Opcode, Dst, Src, CostKind, I);			int Cost = TTIImpl->getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

	int TargetTransformInfo::getExtractWithExtendCost(unsigned Opcode, Type *Dst,			int TargetTransformInfo::getExtractWithExtendCost(unsigned Opcode, Type *Dst,
	VectorType *VecTy,			VectorType *VecTy,
	unsigned Index) const {			unsigned Index) const {
	int Cost = TTIImpl->getExtractWithExtendCost(Opcode, Dst, VecTy, Index);			int Cost = TTIImpl->getExtractWithExtendCost(Opcode, Dst, VecTy, Index);
	▲ Show 20 Lines • Show All 615 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	public:

unsigned getMinVectorRegisterBitWidth() {		unsigned getMinVectorRegisterBitWidth() {
return ST->getMinVectorRegisterBitWidth();		return ST->getMinVectorRegisterBitWidth();
}		}

unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);

int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::TargetCostKind CostKind,		TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,		int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,
unsigned Index);		unsigned Index);

unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind);		unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind);

int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);
▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	bool AArch64TTIImpl::isWideningInstruction(Type *DstTy, unsigned Opcode,
unsigned NumSrcEls = SrcTyL.first * SrcTyL.second.getVectorNumElements();		unsigned NumSrcEls = SrcTyL.first * SrcTyL.second.getVectorNumElements();

// Return true if the legalized types have the same number of vector elements		// Return true if the legalized types have the same number of vector elements
// and the destination element type size is twice that of the source type.		// and the destination element type size is twice that of the source type.
return NumDstEls == NumSrcEls && 2 * SrcElTySize == DstElTySize;		return NumDstEls == NumSrcEls && 2 * SrcElTySize == DstElTySize;
}		}

int AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) {		const Instruction *I) {
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");

// If the cast is observable, and it is used by a widening instruction (e.g.,		// If the cast is observable, and it is used by a widening instruction (e.g.,
// uaddl, saddw, etc.), it may be free.		// uaddl, saddw, etc.), it may be free.
if (I && I->hasOneUse()) {		if (I && I->hasOneUse()) {
Show All 20 Lines	if (CostKind != TTI::TCK_RecipThroughput)
return Cost == 0 ? 0 : 1;		return Cost == 0 ? 0 : 1;
return Cost;		return Cost;
};		};

EVT SrcTy = TLI->getValueType(DL, Src);		EVT SrcTy = TLI->getValueType(DL, Src);
EVT DstTy = TLI->getValueType(DL, Dst);		EVT DstTy = TLI->getValueType(DL, Dst);

if (!SrcTy.isSimple() \|\| !DstTy.isSimple())		if (!SrcTy.isSimple() \|\| !DstTy.isSimple())
return AdjustCost(BaseT::getCastInstrCost(Opcode, Dst, Src, CostKind, I));		return AdjustCost(
		BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));

static const TypeConversionCostTblEntry		static const TypeConversionCostTblEntry
ConversionTbl[] = {		ConversionTbl[] = {
{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i32, 1 },		{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i32, 1 },
{ ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, 0 },		{ ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, 0 },
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 3 },		{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 3 },
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 6 },		{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 6 },

▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	ConversionTbl[] = {
{ ISD::FP_TO_UINT, MVT::v2i8, MVT::v2f64, 2 },		{ ISD::FP_TO_UINT, MVT::v2i8, MVT::v2f64, 2 },
};		};

if (const auto *Entry = ConvertCostTableLookup(ConversionTbl, ISD,		if (const auto *Entry = ConvertCostTableLookup(ConversionTbl, ISD,
DstTy.getSimpleVT(),		DstTy.getSimpleVT(),
SrcTy.getSimpleVT()))		SrcTy.getSimpleVT()))
return AdjustCost(Entry->Cost);		return AdjustCost(Entry->Cost);

return AdjustCost(BaseT::getCastInstrCost(Opcode, Dst, Src, CostKind, I));		return AdjustCost(
		BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));
}		}

int AArch64TTIImpl::getExtractWithExtendCost(unsigned Opcode, Type *Dst,		int AArch64TTIImpl::getExtractWithExtendCost(unsigned Opcode, Type *Dst,
VectorType *VecTy,		VectorType *VecTy,
unsigned Index) {		unsigned Index) {

// Make sure we were given a valid extend opcode.		// Make sure we were given a valid extend opcode.
assert((Opcode == Instruction::SExt \|\| Opcode == Instruction::ZExt) &&		assert((Opcode == Instruction::SExt \|\| Opcode == Instruction::ZExt) &&
Show All 15 Lines	int AArch64TTIImpl::getExtractWithExtendCost(unsigned Opcode, Type *Dst,
auto DstVT = TLI->getValueType(DL, Dst);		auto DstVT = TLI->getValueType(DL, Dst);
auto SrcVT = TLI->getValueType(DL, Src);		auto SrcVT = TLI->getValueType(DL, Src);
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

// If the resulting type is still a vector and the destination type is legal,		// If the resulting type is still a vector and the destination type is legal,
// we may get the extension for free. If not, get the default cost for the		// we may get the extension for free. If not, get the default cost for the
// extend.		// extend.
if (!VecLT.second.isVector() \|\| !TLI->isTypeLegal(DstVT))		if (!VecLT.second.isVector() \|\| !TLI->isTypeLegal(DstVT))
return Cost + getCastInstrCost(Opcode, Dst, Src, CostKind);		return Cost + getCastInstrCost(Opcode, Dst, Src, TTI::CastContextHint::None,
		CostKind);

// The destination type should be larger than the element type. If not, get		// The destination type should be larger than the element type. If not, get
// the default cost for the extend.		// the default cost for the extend.
if (DstVT.getSizeInBits() < SrcVT.getSizeInBits())		if (DstVT.getSizeInBits() < SrcVT.getSizeInBits())
return Cost + getCastInstrCost(Opcode, Dst, Src, CostKind);		return Cost + getCastInstrCost(Opcode, Dst, Src, TTI::CastContextHint::None,
		CostKind);

switch (Opcode) {		switch (Opcode) {
default:		default:
llvm_unreachable("Opcode should be either SExt or ZExt");		llvm_unreachable("Opcode should be either SExt or ZExt");

// For sign-extends, we only need a smov, which performs the extension		// For sign-extends, we only need a smov, which performs the extension
// automatically.		// automatically.
case Instruction::SExt:		case Instruction::SExt:
return Cost;		return Cost;

// For zero-extends, the extend is performed automatically by a umov unless		// For zero-extends, the extend is performed automatically by a umov unless
// the destination type is i64 and the element type is i8 or i16.		// the destination type is i64 and the element type is i8 or i16.
case Instruction::ZExt:		case Instruction::ZExt:
if (DstVT.getSizeInBits() != 64u \|\| SrcVT.getSizeInBits() == 32u)		if (DstVT.getSizeInBits() != 64u \|\| SrcVT.getSizeInBits() == 32u)
return Cost;		return Cost;
}		}

// If we are unable to perform the extend for free, get the default cost.		// If we are unable to perform the extend for free, get the default cost.
return Cost + getCastInstrCost(Opcode, Dst, Src, CostKind);		return Cost + getCastInstrCost(Opcode, Dst, Src, TTI::CastContextHint::None,
		CostKind);
}		}

unsigned AArch64TTIImpl::getCFInstrCost(unsigned Opcode,		unsigned AArch64TTIImpl::getCFInstrCost(unsigned Opcode,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return Opcode == Instruction::PHI ? 0 : 1;		return Opcode == Instruction::PHI ? 0 : 1;
assert(CostKind == TTI::TCK_RecipThroughput && "unexpected CostKind");		assert(CostKind == TTI::TCK_RecipThroughput && "unexpected CostKind");
// Branches are assumed to be predicted.		// Branches are assumed to be predicted.
▲ Show 20 Lines • Show All 606 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 204 Lines • ▼ Show 20 Lines	bool shouldExpandReduction(const IntrinsicInst *II) const {

default:		default:
// Don't expand anything else, let legalization deal with it.		// Don't expand anything else, let legalization deal with it.
return false;		return false;
}		}
}		}

int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::TargetCostKind CostKind,		TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);

▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	int ARMTTIImpl::getIntImmCostInst(unsigned Opcode, unsigned Idx, const APInt &Imm,
// xor a, -1 can always be folded to MVN		// xor a, -1 can always be folded to MVN
if (Opcode == Instruction::Xor && Imm.isAllOnesValue())		if (Opcode == Instruction::Xor && Imm.isAllOnesValue())
return 0;		return 0;

return getIntImmCost(Imm, Ty, CostKind);		return getIntImmCost(Imm, Ty, CostKind);
}		}

int ARMTTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int ARMTTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) {		const Instruction *I) {
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");

// TODO: Allow non-throughput costs that aren't binary.		// TODO: Allow non-throughput costs that aren't binary.
auto AdjustCost = [&CostKind](int Cost) {		auto AdjustCost = [&CostKind](int Cost) {
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return Cost == 0 ? 0 : 1;		return Cost == 0 ? 0 : 1;
return Cost;		return Cost;
};		};

EVT SrcTy = TLI->getValueType(DL, Src);		EVT SrcTy = TLI->getValueType(DL, Src);
EVT DstTy = TLI->getValueType(DL, Dst);		EVT DstTy = TLI->getValueType(DL, Dst);

if (!SrcTy.isSimple() \|\| !DstTy.isSimple())		if (!SrcTy.isSimple() \|\| !DstTy.isSimple())
return AdjustCost(BaseT::getCastInstrCost(Opcode, Dst, Src, CostKind, I));		return AdjustCost(
		BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));

// The extend of a load is free		// The extend of a load is free
if (I && isa<LoadInst>(I->getOperand(0))) {		if (I && isa<LoadInst>(I->getOperand(0))) {
static const TypeConversionCostTblEntry LoadConversionTbl[] = {		static const TypeConversionCostTblEntry LoadConversionTbl[] = {
{ISD::SIGN_EXTEND, MVT::i32, MVT::i16, 0},		{ISD::SIGN_EXTEND, MVT::i32, MVT::i16, 0},
{ISD::ZERO_EXTEND, MVT::i32, MVT::i16, 0},		{ISD::ZERO_EXTEND, MVT::i32, MVT::i16, 0},
{ISD::SIGN_EXTEND, MVT::i32, MVT::i8, 0},		{ISD::SIGN_EXTEND, MVT::i32, MVT::i8, 0},
{ISD::ZERO_EXTEND, MVT::i32, MVT::i8, 0},		{ISD::ZERO_EXTEND, MVT::i32, MVT::i8, 0},
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	static const TypeConversionCostTblEntry MVELoadConversionTbl[] = {
{ISD::TRUNCATE, MVT::v4i32, MVT::v4i8, 0},		{ISD::TRUNCATE, MVT::v4i32, MVT::v4i8, 0},
{ISD::TRUNCATE, MVT::v8i16, MVT::v8i8, 0},		{ISD::TRUNCATE, MVT::v8i16, MVT::v8i8, 0},
{ISD::TRUNCATE, MVT::v8i32, MVT::v8i16, 1},		{ISD::TRUNCATE, MVT::v8i32, MVT::v8i16, 1},
{ISD::TRUNCATE, MVT::v16i32, MVT::v16i8, 3},		{ISD::TRUNCATE, MVT::v16i32, MVT::v16i8, 3},
{ISD::TRUNCATE, MVT::v16i16, MVT::v16i8, 1},		{ISD::TRUNCATE, MVT::v16i16, MVT::v16i8, 1},
};		};
if (SrcTy.isVector() && ST->hasMVEIntegerOps()) {		if (SrcTy.isVector() && ST->hasMVEIntegerOps()) {
if (const auto *Entry =		if (const auto *Entry =
ConvertCostTableLookup(MVELoadConversionTbl, ISD, SrcTy.getSimpleVT(),		ConvertCostTableLookup(MVELoadConversionTbl, ISD,
DstTy.getSimpleVT()))		SrcTy.getSimpleVT(), DstTy.getSimpleVT()))
return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor());		return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor());
}		}

static const TypeConversionCostTblEntry MVEFLoadConversionTbl[] = {		static const TypeConversionCostTblEntry MVEFLoadConversionTbl[] = {
{ISD::FP_ROUND, MVT::v4f32, MVT::v4f16, 1},		{ISD::FP_ROUND, MVT::v4f32, MVT::v4f16, 1},
{ISD::FP_ROUND, MVT::v8f32, MVT::v8f16, 3},		{ISD::FP_ROUND, MVT::v8f32, MVT::v8f16, 3},
};		};
if (SrcTy.isVector() && ST->hasMVEFloatOps()) {		if (SrcTy.isVector() && ST->hasMVEFloatOps()) {
if (const auto *Entry =		if (const auto *Entry =
ConvertCostTableLookup(MVEFLoadConversionTbl, ISD, SrcTy.getSimpleVT(),		ConvertCostTableLookup(MVEFLoadConversionTbl, ISD,
DstTy.getSimpleVT()))		SrcTy.getSimpleVT(), DstTy.getSimpleVT()))
return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor());		return AdjustCost(Entry->Cost * ST->getMVEVectorCostFactor());
}		}
}		}

// NEON vector operations that can extend their inputs.		// NEON vector operations that can extend their inputs.
if ((ISD == ISD::SIGN_EXTEND \|\| ISD == ISD::ZERO_EXTEND) &&		if ((ISD == ISD::SIGN_EXTEND \|\| ISD == ISD::ZERO_EXTEND) &&
I && I->hasOneUse() && ST->hasNEON() && SrcTy.isVector()) {		I && I->hasOneUse() && ST->hasNEON() && SrcTy.isVector()) {
static const TypeConversionCostTblEntry NEONDoubleWidthTbl[] = {		static const TypeConversionCostTblEntry NEONDoubleWidthTbl[] = {
▲ Show 20 Lines • Show All 255 Lines • ▼ Show 20 Lines	if (const auto *Entry = ConvertCostTableLookup(ARMIntegerConversionTbl, ISD,
SrcTy.getSimpleVT()))		SrcTy.getSimpleVT()))
return AdjustCost(Entry->Cost);		return AdjustCost(Entry->Cost);
}		}

int BaseCost = ST->hasMVEIntegerOps() && Src->isVectorTy()		int BaseCost = ST->hasMVEIntegerOps() && Src->isVectorTy()
? ST->getMVEVectorCostFactor()		? ST->getMVEVectorCostFactor()
: 1;		: 1;
return AdjustCost(		return AdjustCost(
BaseCost * BaseT::getCastInstrCost(Opcode, Dst, Src, CostKind, I));		BaseCost * BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));
}		}

int ARMTTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,		int ARMTTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,
unsigned Index) {		unsigned Index) {
// Penalize inserting into an D-subregister. We end up with a three times		// Penalize inserting into an D-subregister. We end up with a three times
// lower estimated throughput on swift.		// lower estimated throughput on swift.
if (ST->hasSlowLoadDSubregister() && Opcode == Instruction::InsertElement &&		if (ST->hasSlowLoadDSubregister() && Opcode == Instruction::InsertElement &&
ValTy->isVectorTy() && ValTy->getScalarSizeInBits() <= 32)		ValTy->isVectorTy() && ValTy->getScalarSizeInBits() <= 32)
▲ Show 20 Lines • Show All 1,080 Lines • Show Last 20 Lines

llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	unsigned getArithmeticInstrCost(
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >(),		ArrayRef<const Value > Args = ArrayRef<const Value >(),
const Instruction *CxtI = nullptr);		const Instruction *CxtI = nullptr);
unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);

unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) {		unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) {
return 1;		return 1;
}		}

/// @}		/// @}

Show All 9 Lines

llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	if (Ty->isVectorTy()) {
if (LT.second.isFloatingPoint())		if (LT.second.isFloatingPoint())
return LT.first + FloatFactor * getTypeNumElements(Ty);		return LT.first + FloatFactor * getTypeNumElements(Ty);
}		}
return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Opd1Info, Opd2Info,		return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo, Args, CxtI);		Opd1PropInfo, Opd2PropInfo, Args, CxtI);
}		}

unsigned HexagonTTIImpl::getCastInstrCost(unsigned Opcode, Type *DstTy,		unsigned HexagonTTIImpl::getCastInstrCost(unsigned Opcode, Type *DstTy,
Type SrcTy, TTI::TargetCostKind CostKind, const Instruction I) {		Type *SrcTy, TTI::CastContextHint CCH,
		TTI::TargetCostKind CostKind,
		const Instruction *I) {
if (SrcTy->isFPOrFPVectorTy() \|\| DstTy->isFPOrFPVectorTy()) {		if (SrcTy->isFPOrFPVectorTy() \|\| DstTy->isFPOrFPVectorTy()) {
unsigned SrcN = SrcTy->isFPOrFPVectorTy() ? getTypeNumElements(SrcTy) : 0;		unsigned SrcN = SrcTy->isFPOrFPVectorTy() ? getTypeNumElements(SrcTy) : 0;
unsigned DstN = DstTy->isFPOrFPVectorTy() ? getTypeNumElements(DstTy) : 0;		unsigned DstN = DstTy->isFPOrFPVectorTy() ? getTypeNumElements(DstTy) : 0;

std::pair<int, MVT> SrcLT = TLI.getTypeLegalizationCost(DL, SrcTy);		std::pair<int, MVT> SrcLT = TLI.getTypeLegalizationCost(DL, SrcTy);
std::pair<int, MVT> DstLT = TLI.getTypeLegalizationCost(DL, DstTy);		std::pair<int, MVT> DstLT = TLI.getTypeLegalizationCost(DL, DstTy);
unsigned Cost = std::max(SrcLT.first, DstLT.first) + FloatFactor * (SrcN + DstN);		unsigned Cost = std::max(SrcLT.first, DstLT.first) + FloatFactor * (SrcN + DstN);
// TODO: Allow non-throughput costs that aren't binary.		// TODO: Allow non-throughput costs that aren't binary.
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	int getArithmeticInstrCost(
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >(),		ArrayRef<const Value > Args = ArrayRef<const Value >(),
const Instruction *CxtI = nullptr);		const Instruction *CxtI = nullptr);
int getShuffleCost(TTI::ShuffleKind Kind, Type Tp, int Index, Type SubTp);		int getShuffleCost(TTI::ShuffleKind Kind, Type Tp, int Index, Type SubTp);
int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::TargetCostKind CostKind,		TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
int getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind);		int getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind);
int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);
int getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,		int getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
Show All 16 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

	Show First 20 Lines • Show All 873 Lines • ▼ Show 20 Lines
	int PPCTTIImpl::getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) {			int PPCTTIImpl::getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) {
	if (CostKind != TTI::TCK_RecipThroughput)			if (CostKind != TTI::TCK_RecipThroughput)
	return Opcode == Instruction::PHI ? 0 : 1;			return Opcode == Instruction::PHI ? 0 : 1;
	// Branches are assumed to be predicted.			// Branches are assumed to be predicted.
	return CostKind == TTI::TCK_RecipThroughput ? 0 : 1;			return CostKind == TTI::TCK_RecipThroughput ? 0 : 1;
	}			}

	int PPCTTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,			int PPCTTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
				TTI::CastContextHint CCH,
	TTI::TargetCostKind CostKind,			TTI::TargetCostKind CostKind,
	const Instruction *I) {			const Instruction *I) {
	assert(TLI->InstructionOpcodeToISD(Opcode) && "Invalid opcode");			assert(TLI->InstructionOpcodeToISD(Opcode) && "Invalid opcode");

	int Cost = BaseT::getCastInstrCost(Opcode, Dst, Src, CostKind, I);			int Cost = BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
	Cost = vectorCostAdjustment(Cost, Opcode, Dst, Src);			Cost = vectorCostAdjustment(Cost, Opcode, Dst, Src);
	// TODO: Allow non-throughput costs that aren't binary.			// TODO: Allow non-throughput costs that aren't binary.
	if (CostKind != TTI::TCK_RecipThroughput)			if (CostKind != TTI::TCK_RecipThroughput)
	return Cost == 0 ? 0 : 1;			return Cost == 0 ? 0 : 1;
	return Cost;			return Cost;
	}			}

	int PPCTTIImpl::getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,			int PPCTTIImpl::getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
	▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	int getArithmeticInstrCost(
const Instruction *CxtI = nullptr);		const Instruction *CxtI = nullptr);
int getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp, int Index,		int getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp, int Index,
VectorType *SubTp);		VectorType *SubTp);
unsigned getVectorTruncCost(Type SrcTy, Type DstTy);		unsigned getVectorTruncCost(Type SrcTy, Type DstTy);
unsigned getVectorBitmaskConversionCost(Type SrcTy, Type DstTy);		unsigned getVectorBitmaskConversionCost(Type SrcTy, Type DstTy);
unsigned getBoolVecToIntConversionCost(unsigned Opcode, Type *Dst,		unsigned getBoolVecToIntConversionCost(unsigned Opcode, Type *Dst,
const Instruction *I);		const Instruction *I);
int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::TargetCostKind CostKind,		TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);
bool isFoldableLoad(const LoadInst Ld, const Instruction &FoldedValue);		bool isFoldableLoad(const LoadInst Ld, const Instruction &FoldedValue);
int getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,		int getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
unsigned AddressSpace, TTI::TargetCostKind CostKind,		unsigned AddressSpace, TTI::TargetCostKind CostKind,
Show All 16 Lines

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

Show First 20 Lines • Show All 693 Lines • ▼ Show 20 Lines	if (CmpOpTy != nullptr)
Cost = getVectorBitmaskConversionCost(CmpOpTy, Dst);		Cost = getVectorBitmaskConversionCost(CmpOpTy, Dst);
if (Opcode == Instruction::ZExt \|\| Opcode == Instruction::UIToFP)		if (Opcode == Instruction::ZExt \|\| Opcode == Instruction::UIToFP)
// One 'vn' per dst vector with an immediate mask.		// One 'vn' per dst vector with an immediate mask.
Cost += getNumVectorRegs(Dst);		Cost += getNumVectorRegs(Dst);
return Cost;		return Cost;
}		}

int SystemZTTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int SystemZTTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) {		const Instruction *I) {
// FIXME: Can the logic below also be used for these cost kinds?		// FIXME: Can the logic below also be used for these cost kinds?
if (CostKind == TTI::TCK_CodeSize \|\| CostKind == TTI::TCK_SizeAndLatency) {		if (CostKind == TTI::TCK_CodeSize \|\| CostKind == TTI::TCK_SizeAndLatency) {
int BaseCost = BaseT::getCastInstrCost(Opcode, Dst, Src, CostKind, I);		int BaseCost = BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
return BaseCost == 0 ? BaseCost : 1;		return BaseCost == 0 ? BaseCost : 1;
}		}

unsigned DstScalarBits = Dst->getScalarSizeInBits();		unsigned DstScalarBits = Dst->getScalarSizeInBits();
unsigned SrcScalarBits = Src->getScalarSizeInBits();		unsigned SrcScalarBits = Src->getScalarSizeInBits();

if (!Src->isVectorTy()) {		if (!Src->isVectorTy()) {
assert (!Dst->isVectorTy());		assert (!Dst->isVectorTy());
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	if (Opcode == Instruction::SIToFP \|\| Opcode == Instruction::UIToFP \|\|

if (SrcScalarBits == 1)		if (SrcScalarBits == 1)
return getBoolVecToIntConversionCost(Opcode, Dst, I) + NumDstVectors;		return getBoolVecToIntConversionCost(Opcode, Dst, I) + NumDstVectors;
}		}

// Return the cost of multiple scalar invocation plus the cost of		// Return the cost of multiple scalar invocation plus the cost of
// inserting and extracting the values. Base implementation does not		// inserting and extracting the values. Base implementation does not
// realize float->int gets scalarized.		// realize float->int gets scalarized.
unsigned ScalarCost = getCastInstrCost(Opcode, Dst->getScalarType(),		unsigned ScalarCost = getCastInstrCost(
Src->getScalarType(), CostKind);		Opcode, Dst->getScalarType(), Src->getScalarType(), CCH, CostKind);
unsigned TotCost = VF * ScalarCost;		unsigned TotCost = VF * ScalarCost;
bool NeedsInserts = true, NeedsExtracts = true;		bool NeedsInserts = true, NeedsExtracts = true;
// FP128 registers do not get inserted or extracted.		// FP128 registers do not get inserted or extracted.
if (DstScalarBits == 128 &&		if (DstScalarBits == 128 &&
(Opcode == Instruction::SIToFP \|\| Opcode == Instruction::UIToFP))		(Opcode == Instruction::SIToFP \|\| Opcode == Instruction::UIToFP))
NeedsInserts = false;		NeedsInserts = false;
if (SrcScalarBits == 128 &&		if (SrcScalarBits == 128 &&
(Opcode == Instruction::FPToSI \|\| Opcode == Instruction::FPToUI))		(Opcode == Instruction::FPToSI \|\| Opcode == Instruction::FPToUI))
Show All 24 Lines	if (Opcode == Instruction::FPExt) {
// scalarized.		// scalarized.
return VF * 2;		return VF * 2;
}		}
// -> fp128. VF * lxdb/lxeb + extraction of elements.		// -> fp128. VF * lxdb/lxeb + extraction of elements.
return VF + getScalarizationOverhead(SrcVecTy, false, true);		return VF + getScalarizationOverhead(SrcVecTy, false, true);
}		}
}		}

return BaseT::getCastInstrCost(Opcode, Dst, Src, CostKind, I);		return BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
}		}

// Scalar i8 / i16 operations will typically be made after first extending		// Scalar i8 / i16 operations will typically be made after first extending
// the operands to i32.		// the operands to i32.
static unsigned getOperandsExtensionCost(const Instruction *I) {		static unsigned getOperandsExtensionCost(const Instruction *I) {
unsigned ExtCost = 0;		unsigned ExtCost = 0;
for (Value *Op : I->operands())		for (Value *Op : I->operands())
// A load of i8 or i16 sign/zero extends to i32.		// A load of i8 or i16 sign/zero extends to i32.
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	int getArithmeticInstrCost(
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >(),		ArrayRef<const Value > Args = ArrayRef<const Value >(),
const Instruction *CxtI = nullptr);		const Instruction *CxtI = nullptr);
int getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp, int Index,		int getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp, int Index,
VectorType *SubTp);		VectorType *SubTp);
int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::TargetCostKind CostKind,		TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);
unsigned getScalarizationOverhead(VectorType *Ty, const APInt &DemandedElts,		unsigned getScalarizationOverhead(VectorType *Ty, const APInt &DemandedElts,
bool Insert, bool Extract);		bool Insert, bool Extract);
int getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,		int getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,361 Lines • ▼ Show 20 Lines	int X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, VectorType *BaseTp,
if (ST->hasSSE1())		if (ST->hasSSE1())
if (const auto *Entry = CostTableLookup(SSE1ShuffleTbl, Kind, LT.second))		if (const auto *Entry = CostTableLookup(SSE1ShuffleTbl, Kind, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

return BaseT::getShuffleCost(Kind, BaseTp, Index, SubTp);		return BaseT::getShuffleCost(Kind, BaseTp, Index, SubTp);
}		}

int X86TTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int X86TTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) {		const Instruction *I) {
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");

// TODO: Allow non-throughput costs that aren't binary.		// TODO: Allow non-throughput costs that aren't binary.
auto AdjustCost = [&CostKind](int Cost) {		auto AdjustCost = [&CostKind](int Cost) {
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
▲ Show 20 Lines • Show All 605 Lines • ▼ Show 20 Lines	if (const auto *Entry = ConvertCostTableLookup(SSE2ConversionTbl, ISD,
return AdjustCost(LTSrc.first * Entry->Cost);		return AdjustCost(LTSrc.first * Entry->Cost);
}		}

EVT SrcTy = TLI->getValueType(DL, Src);		EVT SrcTy = TLI->getValueType(DL, Src);
EVT DstTy = TLI->getValueType(DL, Dst);		EVT DstTy = TLI->getValueType(DL, Dst);

// The function getSimpleVT only handles simple value types.		// The function getSimpleVT only handles simple value types.
if (!SrcTy.isSimple() \|\| !DstTy.isSimple())		if (!SrcTy.isSimple() \|\| !DstTy.isSimple())
return AdjustCost(BaseT::getCastInstrCost(Opcode, Dst, Src, CostKind));		return AdjustCost(BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind));

MVT SimpleSrcTy = SrcTy.getSimpleVT();		MVT SimpleSrcTy = SrcTy.getSimpleVT();
MVT SimpleDstTy = DstTy.getSimpleVT();		MVT SimpleDstTy = DstTy.getSimpleVT();

if (ST->useAVX512Regs()) {		if (ST->useAVX512Regs()) {
if (ST->hasBWI())		if (ST->hasBWI())
if (const auto *Entry = ConvertCostTableLookup(AVX512BWConversionTbl, ISD,		if (const auto *Entry = ConvertCostTableLookup(AVX512BWConversionTbl, ISD,
SimpleDstTy, SimpleSrcTy))		SimpleDstTy, SimpleSrcTy))
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	int X86TTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
}		}

if (ST->hasSSE2()) {		if (ST->hasSSE2()) {
if (const auto *Entry = ConvertCostTableLookup(SSE2ConversionTbl, ISD,		if (const auto *Entry = ConvertCostTableLookup(SSE2ConversionTbl, ISD,
SimpleDstTy, SimpleSrcTy))		SimpleDstTy, SimpleSrcTy))
return AdjustCost(Entry->Cost);		return AdjustCost(Entry->Cost);
}		}

return AdjustCost(BaseT::getCastInstrCost(Opcode, Dst, Src, CostKind, I));		return AdjustCost(
		BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));
}		}

int X86TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		int X86TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) {		const Instruction *I) {
// TODO: Handle other cost kinds.		// TODO: Handle other cost kinds.
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, CostKind, I);		return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, CostKind, I);
▲ Show 20 Lines • Show All 2,475 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp

Show First 20 Lines • Show All 2,019 Lines • ▼ Show 20 Lines	chainToBasePointerCost(SmallVectorImpl<Instruction*> &Chain,

for (Instruction *Instr : Chain) {		for (Instruction *Instr : Chain) {
if (CastInst *CI = dyn_cast<CastInst>(Instr)) {		if (CastInst *CI = dyn_cast<CastInst>(Instr)) {
assert(CI->isNoopCast(CI->getModule()->getDataLayout()) &&		assert(CI->isNoopCast(CI->getModule()->getDataLayout()) &&
"non noop cast is found during rematerialization");		"non noop cast is found during rematerialization");

Type *SrcTy = CI->getOperand(0)->getType();		Type *SrcTy = CI->getOperand(0)->getType();
Cost += TTI.getCastInstrCost(CI->getOpcode(), CI->getType(), SrcTy,		Cost += TTI.getCastInstrCost(CI->getOpcode(), CI->getType(), SrcTy,
TargetTransformInfo::TCK_SizeAndLatency,		TTI::getCastContextHint(CI),
CI);		TargetTransformInfo::TCK_SizeAndLatency, CI);

} else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Instr)) {		} else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Instr)) {
// Cost of the address calculation		// Cost of the address calculation
Type *ValTy = GEP->getSourceElementType();		Type *ValTy = GEP->getSourceElementType();
Cost += TTI.getAddressComputationCost(ValTy);		Cost += TTI.getAddressComputationCost(ValTy);

// And cost of the GEP itself		// And cost of the GEP itself
// TODO: Use TTI->getGEPCost here (it exists, but appears to be not		// TODO: Use TTI->getGEPCost here (it exists, but appears to be not
▲ Show 20 Lines • Show All 869 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp

Show First 20 Lines • Show All 2,144 Lines • ▼ Show 20 Lines	case scZeroExtend:
break;		break;
case scSignExtend:		case scSignExtend:
Opcode = Instruction::SExt;		Opcode = Instruction::SExt;
break;		break;
default:		default:
llvm_unreachable("There are no other cast types.");		llvm_unreachable("There are no other cast types.");
}		}
const SCEV *Op = CastExpr->getOperand();		const SCEV *Op = CastExpr->getOperand();
BudgetRemaining -= TTI.getCastInstrCost(Opcode, /Dst=/S->getType(),		BudgetRemaining -= TTI.getCastInstrCost(
/Src=/Op->getType(), CostKind);		Opcode, /Dst=/S->getType(),
		/Src=/Op->getType(), TTI::CastContextHint::None, CostKind);
Worklist.emplace_back(Op);		Worklist.emplace_back(Op);
return false; // Will answer upon next entry into this function.		return false; // Will answer upon next entry into this function.
}		}

if (auto *UDivExpr = dyn_cast<SCEVUDivExpr>(S)) {		if (auto *UDivExpr = dyn_cast<SCEVUDivExpr>(S)) {
// If the divisor is a power of two count this as a logical right-shift.		// If the divisor is a power of two count this as a logical right-shift.
if (auto *SC = dyn_cast<SCEVConstant>(UDivExpr->getRHS())) {		if (auto *SC = dyn_cast<SCEVConstant>(UDivExpr->getRHS())) {
if (SC->getAPInt().isPowerOf2()) {		if (SC->getAPInt().isPowerOf2()) {
▲ Show 20 Lines • Show All 376 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,452 Lines • ▼ Show 20 Lines	unsigned LoopVectorizationCostModel::getInstructionCost(Instruction *I,
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
		// Computes the CastContextHint from a Load/Store instruction.
		auto ComputeCCH = [&](Instruction *I) -> TTI::CastContextHint {
		assert((isa<LoadInst>(I) \|\| isa<StoreInst>(I)) &&
		"Expected a load or a store!");

		if (VF == 1)
		return TTI::CastContextHint::Normal;

		switch (getWideningDecision(I, VF)) {
		case LoopVectorizationCostModel::CM_GatherScatter:
		return TTI::CastContextHint::GatherScatter;
		case LoopVectorizationCostModel::CM_Interleave:
		return TTI::CastContextHint::Interleave;
		case LoopVectorizationCostModel::CM_Scalarize:
		case LoopVectorizationCostModel::CM_Widen:
		return Legal->isMaskRequired(I) ? TTI::CastContextHint::Masked
		: TTI::CastContextHint::Normal;
		case LoopVectorizationCostModel::CM_Widen_Reverse:
		return TTI::CastContextHint::Reversed;
		case LoopVectorizationCostModel::CM_Unknown:
		llvm_unreachable("Instr did not go through cost modelling?");
		dmgreenAuthorUnsubmitted Done Reply Inline Actions If the VF ==1 and the operand is a load/store, this should likely use Normal. dmgreen: If the VF ==1 and the operand is a load/store, this should likely use Normal.
		}
		dmgreenAuthorUnsubmitted Done Reply Inline Actions You either don't need unsigned Opcode = I->getOpcode(), or the variable can be used more throughout this function? dmgreen: You either don't need unsigned Opcode = I->getOpcode(), or the variable can be used more…

		llvm_unreachable("Unhandled case!");
		};

		unsigned Opcode = I->getOpcode();
		TTI::CastContextHint CCH = TTI::CastContextHint::None;
		// For Trunc, the context is the only user, which must be a StoreInst.
		if (Opcode == Instruction::Trunc \|\| Opcode == Instruction::FPTrunc) {
		if (I->hasOneUse())
		if (StoreInst Store = dyn_cast<StoreInst>(I->user_begin()))
		CCH = ComputeCCH(Store);
		}
		// For Z/Sext, the context is the operand, which must be a LoadInst.
		else if (Opcode == Instruction::ZExt \|\| Opcode == Instruction::SExt \|\|
		Opcode == Instruction::FPExt) {
		if (LoadInst *Load = dyn_cast<LoadInst>(I->getOperand(0)))
		CCH = ComputeCCH(Load);
		}

// We optimize the truncation of induction variables having constant		// We optimize the truncation of induction variables having constant
// integer steps. The cost of these truncations is the same as the scalar		// integer steps. The cost of these truncations is the same as the scalar
// operation.		// operation.
if (isOptimizableIVTruncate(I, VF)) {		if (isOptimizableIVTruncate(I, VF)) {
auto *Trunc = cast<TruncInst>(I);		auto *Trunc = cast<TruncInst>(I);
return TTI.getCastInstrCost(Instruction::Trunc, Trunc->getDestTy(),		return TTI.getCastInstrCost(Instruction::Trunc, Trunc->getDestTy(),
Trunc->getSrcTy(), CostKind, Trunc);		Trunc->getSrcTy(), CCH, CostKind, Trunc);
}		}

Type *SrcScalarTy = I->getOperand(0)->getType();		Type *SrcScalarTy = I->getOperand(0)->getType();
Type *SrcVecTy =		Type *SrcVecTy =
VectorTy->isVectorTy() ? ToVectorTy(SrcScalarTy, VF) : SrcScalarTy;		VectorTy->isVectorTy() ? ToVectorTy(SrcScalarTy, VF) : SrcScalarTy;
if (canTruncateToMinimalBitwidth(I, VF)) {		if (canTruncateToMinimalBitwidth(I, VF)) {
// This cast is going to be shrunk. This may remove the cast or it might		// This cast is going to be shrunk. This may remove the cast or it might
// turn it into slightly different cast. For example, if MinBW == 16,		// turn it into slightly different cast. For example, if MinBW == 16,
// "zext i8 %1 to i32" becomes "zext i8 %1 to i16".		// "zext i8 %1 to i32" becomes "zext i8 %1 to i16".
//		//
// Calculate the modified src and dest types.		// Calculate the modified src and dest types.
Type *MinVecTy = VectorTy;		Type *MinVecTy = VectorTy;
if (I->getOpcode() == Instruction::Trunc) {		if (Opcode == Instruction::Trunc) {
SrcVecTy = smallestIntegerVectorType(SrcVecTy, MinVecTy);		SrcVecTy = smallestIntegerVectorType(SrcVecTy, MinVecTy);
VectorTy =		VectorTy =
largestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);		largestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);
} else if (I->getOpcode() == Instruction::ZExt \|\|		} else if (Opcode == Instruction::ZExt \|\| Opcode == Instruction::SExt) {
I->getOpcode() == Instruction::SExt) {
SrcVecTy = largestIntegerVectorType(SrcVecTy, MinVecTy);		SrcVecTy = largestIntegerVectorType(SrcVecTy, MinVecTy);
VectorTy =		VectorTy =
smallestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);		smallestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);
}		}
}		}

unsigned N = isScalarAfterVectorization(I, VF) ? VF : 1;		unsigned N = isScalarAfterVectorization(I, VF) ? VF : 1;
return N * TTI.getCastInstrCost(I->getOpcode(), VectorTy, SrcVecTy,		return N *
CostKind, I);		TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I);
}		}
case Instruction::Call: {		case Instruction::Call: {
bool NeedToScalarize;		bool NeedToScalarize;
CallInst *CI = cast<CallInst>(I);		CallInst *CI = cast<CallInst>(I);
unsigned CallCost = getVectorCallCost(CI, VF, NeedToScalarize);		unsigned CallCost = getVectorCallCost(CI, VF, NeedToScalarize);
if (getVectorIntrinsicIDForCall(CI, TLI))		if (getVectorIntrinsicIDForCall(CI, TLI))
return std::min(CallCost, getVectorIntrinsicCost(CI, VF));		return std::min(CallCost, getVectorIntrinsicCost(CI, VF));
return CallCost;		return CallCost;
▲ Show 20 Lines • Show All 1,672 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,393 Lines • ▼ Show 20 Lines	case Instruction::ExtractElement: {
all_of(Ext->users(),		all_of(Ext->users(),
[](User *U) { return isa<GetElementPtrInst>(U); })) {		[](User *U) { return isa<GetElementPtrInst>(U); })) {
// Use getExtractWithExtendCost() to calculate the cost of		// Use getExtractWithExtendCost() to calculate the cost of
// extractelement/ext pair.		// extractelement/ext pair.
DeadCost -= TTI->getExtractWithExtendCost(		DeadCost -= TTI->getExtractWithExtendCost(
Ext->getOpcode(), Ext->getType(), VecTy, i);		Ext->getOpcode(), Ext->getType(), VecTy, i);
// Add back the cost of s\|zext which is subtracted separately.		// Add back the cost of s\|zext which is subtracted separately.
DeadCost += TTI->getCastInstrCost(		DeadCost += TTI->getCastInstrCost(
Ext->getOpcode(), Ext->getType(), E->getType(), CostKind,		Ext->getOpcode(), Ext->getType(), E->getType(),
Ext);		TTI::getCastContextHint(Ext), CostKind, Ext);
continue;		continue;
}		}
}		}
DeadCost -=		DeadCost -=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, i);		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, i);
}		}
}		}
return DeadCost;		return DeadCost;
}		}
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
Type *SrcTy = VL0->getOperand(0)->getType();		Type *SrcTy = VL0->getOperand(0)->getType();
int ScalarEltCost =		int ScalarEltCost =
TTI->getCastInstrCost(E->getOpcode(), ScalarTy, SrcTy, CostKind,		TTI->getCastInstrCost(E->getOpcode(), ScalarTy, SrcTy,
VL0);		TTI::getCastContextHint(VL0), CostKind, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;		ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) * ScalarEltCost;
}		}

// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
int ScalarCost = VL.size() * ScalarEltCost;		int ScalarCost = VL.size() * ScalarEltCost;

auto *SrcVecTy = FixedVectorType::get(SrcTy, VL.size());		auto *SrcVecTy = FixedVectorType::get(SrcTy, VL.size());
int VecCost = 0;		int VecCost = 0;
// Check if the values are candidates to demote.		// Check if the values are candidates to demote.
if (!MinBWs.count(VL0) \|\| VecTy != SrcVecTy) {		if (!MinBWs.count(VL0) \|\| VecTy != SrcVecTy) {
VecCost = ReuseShuffleCost +		VecCost =
		ReuseShuffleCost +
TTI->getCastInstrCost(E->getOpcode(), VecTy, SrcVecTy,		TTI->getCastInstrCost(E->getOpcode(), VecTy, SrcVecTy,
CostKind, VL0);		TTI::getCastContextHint(VL0), CostKind, VL0);
}		}
return VecCost - ScalarCost;		return VecCost - ScalarCost;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::Select: {		case Instruction::Select: {
// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
int ScalarEltCost = TTI->getCmpSelInstrCost(E->getOpcode(), ScalarTy,		int ScalarEltCost = TTI->getCmpSelInstrCost(E->getOpcode(), ScalarTy,
▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	case Instruction::ShuffleVector: {
VecCost += TTI->getArithmeticInstrCost(E->getAltOpcode(), VecTy,		VecCost += TTI->getArithmeticInstrCost(E->getAltOpcode(), VecTy,
CostKind);		CostKind);
} else {		} else {
Type *Src0SclTy = E->getMainOp()->getOperand(0)->getType();		Type *Src0SclTy = E->getMainOp()->getOperand(0)->getType();
Type *Src1SclTy = E->getAltOp()->getOperand(0)->getType();		Type *Src1SclTy = E->getAltOp()->getOperand(0)->getType();
auto *Src0Ty = FixedVectorType::get(Src0SclTy, VL.size());		auto *Src0Ty = FixedVectorType::get(Src0SclTy, VL.size());
auto *Src1Ty = FixedVectorType::get(Src1SclTy, VL.size());		auto *Src1Ty = FixedVectorType::get(Src1SclTy, VL.size());
VecCost = TTI->getCastInstrCost(E->getOpcode(), VecTy, Src0Ty,		VecCost = TTI->getCastInstrCost(E->getOpcode(), VecTy, Src0Ty,
CostKind);		TTI::CastContextHint::None, CostKind);
VecCost += TTI->getCastInstrCost(E->getAltOpcode(), VecTy, Src1Ty,		VecCost += TTI->getCastInstrCost(E->getAltOpcode(), VecTy, Src1Ty,
CostKind);		TTI::CastContextHint::None, CostKind);
}		}
VecCost += TTI->getShuffleCost(TargetTransformInfo::SK_Select, VecTy, 0);		VecCost += TTI->getShuffleCost(TargetTransformInfo::SK_Select, VecTy, 0);
return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
}		}
default:		default:
llvm_unreachable("Unknown instruction");		llvm_unreachable("Unknown instruction");
}		}
}		}
▲ Show 20 Lines • Show All 3,979 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Analysis] TTI: Add CastContextHint for getCastInstrCostClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 281536

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h

llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

[Analysis] TTI: Add CastContextHint for getCastInstrCost
ClosedPublic