This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
docs/
-
LangRef.rst
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
ISDOpcodes.h
-
IR/
-
IRBuilder.h
-
Intrinsics.td
-
Transforms/Utils/
-
Utils/
-
LoopUtils.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
VectorUtils.cpp
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
LegalizeTypes.h
-
LegalizeVectorTypes.cpp
-
SelectionDAG.cpp
-
SelectionDAGBuilder.h
-
SelectionDAGBuilder.cpp
-
SelectionDAGDumper.cpp
-
IR/
-
IRBuilder.cpp
-
Transforms/
-
Utils/
-
LoopUtils.cpp
-
Vectorize/
-
LoopVectorize.cpp
-
SLPVectorizer.cpp

Differential D30086

Add generic IR vector reductions
ClosedPublic

Authored by aemerson on Feb 17 2017, 4:00 AM.

Download Raw Diff

Details

Reviewers

RKSimon
rengolin
delena
ABataev
jmolloy
mkuper
hsaito
mehdi_amini
aemerson

Commits

rGcf9daa33a787: Introduce experimental generic intrinsics for horizontal vector reductions.
rL302514: Introduce experimental generic intrinsics for horizontal vector reductions.

Summary

This patch adds IR intrinsics for horizontal vector reductions and allows targets to opt-in to using them instead of the log2 shuffle vector algorithm.

So far the reductions currently added are: int add, mul, and, or, xor, [s|u]min, [s|u]max, fp add, fmax, fmin.

The SLP and Loop vectorizers have the common code to do shuffle reductions factored out into LoopUtils, and now have a unified interface for generating reductions regardless of the preference of the target. LoopUtils now uses TTI to determine what kind of reductions the target wants to handle.

For CodeGen, basic legalization support is added. I have a follow up patch to begin using these new representations for AArch64 NEON once this implementation is finalised.

Diff Detail

Repository: rL LLVM

Event Timeline

aemerson created this revision.Feb 17 2017, 4:00 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptFeb 17 2017, 4:00 AM

aemerson added reviewers: RKSimon, mkuper.Feb 17 2017, 4:02 AM

mssimpso added a subscriber: mssimpso.Feb 17 2017, 4:10 AM

tschuett added a subscriber: tschuett.Feb 17 2017, 6:57 AM

mehdi_amini added inline comments.Feb 17 2017, 9:52 AM

docs/LangRef.rst
11620 ↗	(On Diff #88870)	Why this specification about implementation detail here?
11655 ↗	(On Diff #88870)	I think it'd be important to clarify the ordering: do we require a "fast" FMF to change it? If not, why?
11767 ↗	(On Diff #88870)	Why the explicit `<scalar-type>` in the signature?
11810 ↗	(On Diff #88870)	There is no second operand in the prototype (And we have FMF for this purpose, so remove the sentence).

RKSimon added a reviewer: ABataev.Feb 17 2017, 12:16 PM

mkuper edited edge metadata.Feb 17 2017, 5:07 PM

I'm not sure this is the right strategy for deciding when and how to produce an intrinsic vs. shuffle sequence.

There are two related issues:

The normal behavior for non-experimental target-independent intrinsics is that targets *must* support them. It's ok if the "fallback" way to support them is to lower them back to to the "brute-force" shuffle pyramid, and that fallback gets used by all targets that don't have a more direct way to support it. But it still has to be supported somehow. Basically, the way I understand the general philosophy is - it's up to the optimizer to ask the target which constructs are cheap and which are expensive. But if the target is handed an expensive construct, it still has to lower it, it's just that the lowering is allowed to be really bad.

Will this now be the canonical form for reductions? E.g. should instcombine match the shuffle pyramid sequence into this? If the answer is yes, then the generic lowering for the intrinsic should be as good as lowering for the shuffle pyramid. That doesn't seem too hard, since that's the natural lowering, but that would put a bound on how bad the generic lowering is allowed to be. That would also means we won't have createTargetReduction() - all of that logic would move into lowering.

I can see some advantages to forming the intrinsic conditionally, but I'm not sure they're worth the cost of not having a canonical representation.

docs/LangRef.rst
11630 ↗	(On Diff #88870)	A bit of bikeshedding: we usually use a slightly different format - e.g. something like LangRef.html#add-instruction (In particular, it's nice to more explicitly state that the return type is an integer type, etc.)
11687 ↗	(On Diff #88870)	Integer types only, presumably?
11708 ↗	(On Diff #88870)	I believe EOR is XOR in non-arm land. :-)
11811 ↗	(On Diff #88870)	What are the semantics for NaNs, when they are present? Also, what are the semantics for signed zeroes?

According to the code that I see, all reduction intrinsics have the same form and the same handling.
IMO, llvm.vector.reduce(<metadata>, <vector>) would cover all aspects of FP modes, and integer signed-unsigned variants.
The <metadata> class can include opcode and "properties".
I know that it was discussed earlier, but another option is providing the full set for FP and the full set for integers - all *SInt* and *UInt* variants.

I don't think that providing FP properties as additional boolean parameters is a good solution.

docs/LangRef.rst
11655 ↗	(On Diff #88870)	I think that in the case of FP operations we need two intrinsics - "ordered" and "fast".
11832 ↗	(On Diff #88870)	Usually, we give code examples in the documentation.
include/llvm/Analysis/TargetTransformInfo.h
715 ↗	(On Diff #88870)	What does IsMaxOp mean? I think that it is better to split this interface into several functions and do not mix fp and integers together.

In D30086#681167, @delena wrote:

According to the code that I see, all reduction intrinsics have the same form and the same handling.
IMO, llvm.vector.reduce(<metadata>, <vector>) would cover all aspects of FP modes, and integer signed-unsigned variants.
The <metadata> class can include opcode and "properties".

What is the advantage of this?

I mean, we don't do our scalar instructions as binop <metadata>, op1, op2?

mehdi_amini added inline comments.Feb 19 2017, 10:27 AM

docs/LangRef.rst
11655 ↗	(On Diff #88870)	Why don't reusing the existing FMF? The intrinsic without FMF would be equivalent to "ordered" and the "fast" FMF flag would allow any ordering.

mkuper added inline comments.Feb 19 2017, 1:06 PM

docs/LangRef.rst
11655 ↗	(On Diff #88870)	I don't think we can currently attach FMF to CallInsts.

mehdi_amini added inline comments.Feb 19 2017, 1:39 PM

docs/LangRef.rst
11655 ↗	(On Diff #88870)	We do actually: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151214/319516.html

In D30086#680635, @mkuper wrote:

I'm not sure this is the right strategy for deciding when and how to produce an intrinsic vs. shuffle sequence.

There are two related issues:

The normal behavior for non-experimental target-independent intrinsics is that targets *must* support them. It's ok if the "fallback" way to support them is to lower them back to to the "brute-force" shuffle pyramid, and that fallback gets used by all targets that don't have a more direct way to support it. But it still has to be supported somehow. Basically, the way I understand the general philosophy is - it's up to the optimizer to ask the target which constructs are cheap and which are expensive. But if the target is handed an expensive construct, it still has to lower it, it's just that the lowering is allowed to be really bad.

Will this now be the canonical form for reductions? E.g. should instcombine match the shuffle pyramid sequence into this? If the answer is yes, then the generic lowering for the intrinsic should be as good as lowering for the shuffle pyramid. That doesn't seem too hard, since that's the natural lowering, but that would put a bound on how bad the generic lowering is allowed to be. That would also means we won't have createTargetReduction() - all of that logic would move into lowering.

I can see some advantages to forming the intrinsic conditionally, but I'm not sure they're worth the cost of not having a canonical representation.

This patch was intended to be a transitional stage, my expectation was that over time targets would decide to move to this representation by opting in and handling them. The reason for the transitional step was that I didn't want to force this form on all targets (even if it lowers to the same code by a lowering step) as it changes the IR during the mid-end stage. At the moment, the only target which requires this intrinsic form without any alternative is SVE. If you think it's ok to change the reduction form for all targets until a late lowering then I'm happy to implement that.

docs/LangRef.rst
11620 ↗	(On Diff #88870)	I was trying to give some explanation why there may be two different representations of reductions depending on the target. I can remove this.
11655 ↗	(On Diff #88870)	Thanks, I hadn't realised CallInsts supported FMFs, it makes some sense to use for the ordered/unordered distinction but what are your thoughts on the extra scalar accumulator operand for ordered reductions? An undef scalar value could be given for fast reductions but it my view it clutters the intrinsic for the vast majority of reductions, though not a deal breaker.
11687 ↗	(On Diff #88870)	Yes.
11767 ↗	(On Diff #88870)	This is the syntax of the declaration in the IR.
include/llvm/Analysis/TargetTransformInfo.h
715 ↗	(On Diff #88870)	It signals to the function whether the reduction is a min or max operation, if the opcode is a CMP. I'll look at splitting this up.

mehdi_amini added inline comments.Feb 20 2017, 10:25 AM

docs/LangRef.rst
11655 ↗	(On Diff #88870)	I'm not sure about the pros/cons of the undef scalar thing.

In D30086#681447, @aemerson wrote:

In D30086#680635, @mkuper wrote:

I can see some advantages to forming the intrinsic conditionally, but I'm not sure they're worth the cost of not having a canonical representation.

This patch was intended to be a transitional stage, my expectation was that over time targets would decide to move to this representation by opting in and handling them. The reason for the transitional step was that I didn't want to force this form on all targets (even if it lowers to the same code by a lowering step) as it changes the IR during the mid-end stage. At the moment, the only target which requires this intrinsic form without any alternative is SVE. If you think it's ok to change the reduction form for all targets until a late lowering then I'm happy to implement that.

What I'd really like to avoid is having a nominally "target-independent" intrinsic, which can, in practice, only be handled by one or two targets.

There are three ways I can see this going:
a) If we know this is not the right generic representation, this should not go in as a target-independent intrinsic. I haven't seen anyone raise any "deal-breaker" issues so far, though.
b) If we think this is the right generic representation, but we're not sure, we can introduce these intrinsics as llvm.experimental. For an experimental intrinsic, I think it makes more sense to have targets opt-in like this patch does (but we probably still want default lowering for targets that don't opt-in).
c) If we are sure about this, then we should do just it for all targets, and have late lowering by default. I'm not really the right person to determine that, though. I guess that's a question for the backend owners.

In D30086#681823, @mkuper wrote:

What I'd really like to avoid is having a nominally "target-independent" intrinsic, which can, in practice, only be handled by one or two targets.

Agreed.

a) If we know this is not the right generic representation, this should not go in as a target-independent intrinsic. I haven't seen anyone raise any "deal-breaker" issues so far, though.

This seems to tick all the boxes, and have been reviewed for both ARM back-ends (including the future SVE) and x86 (including the future AVX1024).

We should really have people from other targets to make sure this makes sense, but at this point, I think the changes will be minimal, since any architecture specific change can be done at the lowering stage on their own back-ends.

b) If we think this is the right generic representation, but we're not sure, we can introduce these intrinsics as llvm.experimental. For an experimental intrinsic, I think it makes more sense to have targets opt-in like this patch does (but we probably still want default lowering for targets that don't opt-in).

I think, for now, this is the best thing to do.

The log2 shuffle pattern is canonical and it works well with the "lower how you can" premise already. It may be cumbersome for AVX512 (long pyramid), but it's not unbearable. It will become unbearable for AVX-1024 and impossible for SVE to use the same pattern, so those targets can begin using it on the side, and then we can extend to the remaining targets on their pace.

The alternative is to have an SVE/AVX specific intrinsic, which means the same thing as the pyramid, and doesn't need to be handled by other targets. Clang (via intrinsics) would generate them as well as the vectorisers, and that would be completely opaque to any other pass (as external function calls). But this would make it harder to extend support for the other smaller vector sizes (renames, tests, etc) if we want to move it to the canonical form.

I personally think that this (concept) is a good representation of a horizontal reduction, so could easily be a canonical form one day. The particular shape may need to change, but that's orthogonal.

c) If we are sure about this, then we should do just it for all targets, and have late lowering by default. I'm not really the right person to determine that, though. I guess that's a question for the backend owners.

If we make it experimental, we don't need to push any target to support it ASAP. Furthermore, both SVE and AVX 1024 are, themselves, "experimental", and this change is mostly for their benefit, so I don't see why this shouldn't be experimental at this stage. It'll also give us time to change and other targets to adapt it if needed.

In D30086#681823, @mkuper wrote:

In D30086#681447, @aemerson wrote:

In D30086#680635, @mkuper wrote:

I can see some advantages to forming the intrinsic conditionally, but I'm not sure they're worth the cost of not having a canonical representation.

This patch was intended to be a transitional stage, my expectation was that over time targets would decide to move to this representation by opting in and handling them. The reason for the transitional step was that I didn't want to force this form on all targets (even if it lowers to the same code by a lowering step) as it changes the IR during the mid-end stage. At the moment, the only target which requires this intrinsic form without any alternative is SVE. If you think it's ok to change the reduction form for all targets until a late lowering then I'm happy to implement that.

What I'd really like to avoid is having a nominally "target-independent" intrinsic, which can, in practice, only be handled by one or two targets.

There are three ways I can see this going:
a) If we know this is not the right generic representation, this should not go in as a target-independent intrinsic. I haven't seen anyone raise any "deal-breaker" issues so far, though.
b) If we think this is the right generic representation, but we're not sure, we can introduce these intrinsics as llvm.experimental. For an experimental intrinsic, I think it makes more sense to have targets opt-in like this patch does (but we probably still want default lowering for targets that don't opt-in).
c) If we are sure about this, then we should do just it for all targets, and have late lowering by default. I'm not really the right person to determine that, though. I guess that's a question for the backend owners.

! In D30086#682110, @rengolin wrote:

! In D30086#681823, @mkuper wrote:

What I'd really like to avoid is having a nominally "target-independent" intrinsic, which can, in practice, only be handled by one or two targets.

Agreed.

a) If we know this is not the right generic representation, this should not go in as a target-independent intrinsic. I haven't seen anyone raise any "deal-breaker" issues so far, though.

This seems to tick all the boxes, and have been reviewed for both ARM back-ends (including the future SVE) and x86 (including the future AVX1024).

We should really have people from other targets to make sure this makes sense, but at this point, I think the changes will be minimal, since any architecture specific change can be done at the lowering stage on their own back-ends.

b) If we think this is the right generic representation, but we're not sure, we can introduce these intrinsics as llvm.experimental. For an experimental intrinsic, I think it makes more sense to have targets opt-in like this patch does (but we probably still want default lowering for targets that don't opt-in).

I think, for now, this is the best thing to do.

The log2 shuffle pattern is canonical and it works well with the "lower how you can" premise already. It may be cumbersome for AVX512 (long pyramid), but it's not unbearable. It will become unbearable for AVX-1024 and impossible for SVE to use the same pattern, so those targets can begin using it on the side, and then we can extend to the remaining targets on their pace.

The alternative is to have an SVE/AVX specific intrinsic, which means the same thing as the pyramid, and doesn't need to be handled by other targets. Clang (via intrinsics) would generate them as well as the vectorisers, and that would be completely opaque to any other pass (as external function calls). But this would make it harder to extend support for the other smaller vector sizes (renames, tests, etc) if we want to move it to the canonical form.

I personally think that this (concept) is a good representation of a horizontal reduction, so could easily be a canonical form one day. The particular shape may need to change, but that's orthogonal.

c) If we are sure about this, then we should do just it for all targets, and have late lowering by default. I'm not really the right person to determine that, though. I guess that's a question for the backend owners.

If we make it experimental, we don't need to push any target to support it ASAP. Furthermore, both SVE and AVX 1024 are, themselves, "experimental", and this change is mostly for their benefit, so I don't see why this shouldn't be experimental at this stage. It'll also give us time to change and other targets to adapt it if needed.

Ok, so for now I'll make the changes to address the review comments but I won't be moving to a late lowering solution (which I think is more preferable but if these are experimental then we can't do that).

On the ordered intrinsics issue: what I propose is that we retain separation of the intrinsics for the purposes of cleaner IR for the usual fast reductions, so the scalar argument isn't needed. As these are now experimental intrinsics we can change the implementation later if necessary.

In D30086#683340, @aemerson wrote:

Ok, so for now I'll make the changes to address the review comments but I won't be moving to a late lowering solution (which I think is more preferable but if these are experimental then we can't do that).

I think we need late lowering support, in addition to the opt-in, even if the intrinsics are experimental. Without it, it's kind of hard to "experiment" with them. :-)
E.g. we'd like to enable them on x86, and see what breaks.

fhahn added a subscriber: fhahn.Feb 23 2017, 2:01 AM

I've finally got around to re-working this. Change summary:

Addressed langref comments.
Intrinsics are now experimental.
The TTI interface has been split into two to give the min/max reductions a separate interface. I haven't split out the FP from int types because it doesn't really seem to give any benefit as the handling is very similar.
A new SDNode type has been added to support fast-math flags on unary op nodes. Previously this was only supported for binary ops.
The intrinsics for FP add/mul now take an additional scalar operand which should be undef if the reduction is a conventional "fast" reduction, otherwise it's the accumulator value for strict, i.e. ordered, reductions. CallInst fast-math flags are used to determine which semantics are wanted.
The fmin/fmax intrinsics now use the fast-math flags to propagate NoNaN.

The separate lowering pass for these will come in a subsequent patch. For AArch64, my intention is to begin using these directly without using expansion however.

aemerson mentioned this in D32245: Add an IR expansion pass for the experimental reductions.Apr 19 2017, 2:34 PM

aemerson added a child revision: D32245: Add an IR expansion pass for the experimental reductions.

aemerson mentioned this in D32247: Switch AArch64 to use reduction intrinsics.Apr 19 2017, 2:47 PM

aemerson added a child revision: D32247: Switch AArch64 to use reduction intrinsics.Apr 19 2017, 2:53 PM

delena added inline comments.Apr 23 2017, 11:38 PM

lib/Transforms/Utils/LoopUtils.cpp
1179 ↗	(On Diff #95795)	the "else" is redundant.
1191 ↗	(On Diff #95795)	You implemented 2 functions with the same name. It's not clear what the difference between them. Please add some comments before each of them.
1281 ↗	(On Diff #95795)	I'd rather combine useMinMaxReductionIntrinsic and useReductionIntrinsic in one and provide 3 parameters: Type, Op and Flags

aemerson marked 3 inline comments as done.Apr 24 2017, 7:23 AM

aemerson added inline comments.

lib/Transforms/Utils/LoopUtils.cpp
1191 ↗	(On Diff #95795)	I've added some comments to the API in the header, but I'll flesh those out more and add some short ones here too.

rengolin added inline comments.Apr 24 2017, 8:25 AM

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1595 ↗	(On Diff #95795)	Wouldn't this assert be better to check on the type of the operand?
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1 ↗	(On Diff #95795)	unnecessary whitespace change
lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
352 ↗	(On Diff #95795)	format?
lib/Transforms/Utils/LoopUtils.cpp
1191 ↗	(On Diff #95795)	I think they're different enough that they deserve different names. Normally, you should use the same name if the arguments are different, but the functionality is the same. Here, the implementation and the logic are wildly different that naming them the same thing could be misleading. I haven't looked deep to know if there is SLP logic here that could have remained in the SLP vectorizer's code, or if you could separate the implementation where one is just a wrapper to the other. This would make it more clear, and probably easier to maintain.

aemerson marked 5 inline comments as done.Apr 24 2017, 8:43 AM

aemerson added inline comments.

lib/Transforms/Utils/LoopUtils.cpp
1191 ↗	(On Diff #95795)	I'll rename the functions but I'd argue that the two functions are intended to have the same general functionality, creating vector reductions. The difference is that one of them requires a recurrence descriptor which some clients may not have (SLP vs LV), and the other is a simpler API at the cost of not being able to generate min/max reductions. The SLP's pairwise reduction code is different to the normal reduction shuffle IR sequence which is why I left that there, there aren't any other users of that code to justify re-factoring yet.

rengolin added inline comments.Apr 24 2017, 8:52 AM

lib/Transforms/Utils/LoopUtils.cpp
1191 ↗	(On Diff #95795)	That's why I said this smells like SLP/LV are leaking code here... :)

aemerson marked an inline comment as done.Apr 24 2017, 8:57 AM

aemerson added inline comments.

lib/Transforms/Utils/LoopUtils.cpp
1191 ↗	(On Diff #95795)	It's not leaking, part of the purpose of this patch is to factor out common code from clients emitting identical reduction code.

rengolin added inline comments.Apr 24 2017, 8:59 AM

lib/Transforms/Utils/LoopUtils.cpp
1191 ↗	(On Diff #95795)	Well, it seems that the LV calls one and the SLP calls another, and they're anything but common. True, their return value is the same and they have the same purpose, but they're hardly the same thing. This means any change/refactoring in one will require similar to the other, and this will not always be clear.

aemerson added inline comments.Apr 24 2017, 9:46 AM

lib/Transforms/Utils/LoopUtils.cpp
1191 ↗	(On Diff #95795)	By common code I meant the actual mechanism of generating the shuffle pattern if the target doesn't request to use the intrinsic forms. That code is essentially duplicated between the SLP and LV. Anyway, I'll continue with the requested changes.

Addressed review comments. Renamed the simpler create function to createSimpleTargetReduction() and added comments to both header and definition.

Don't you need unrolling support for the strict float opcodes? As I understand it, the shuffle reductions can't be used?

include/llvm/CodeGen/ISDOpcodes.h
773 ↗	(On Diff #96574)	Please can you improve the description here, especially detailing the difference between the no-strict/strict version of fadd/fmul.
include/llvm/CodeGen/SelectionDAGNodes.h
1077 ↗	(On Diff #96574)	If possible I'd like to see this generalization of the SDNodeFlags support added separately first, reducing this patch and allow us to get on with adding triple node support for FMA opcodes.

At the moment nothing is emitting strict float reductions as no target supports it. We have it implemented for SVE but the IR type and vectorizer changes aren't upstream yet. The reason I've had to include it in this patch is because we want to agree on an intrinsics spec first without changing it later when SVE support lands.

include/llvm/CodeGen/SelectionDAGNodes.h
1077 ↗	(On Diff #96574)	I can do that. There's a dependency on the new opcodes for the isUnaryOpWithFlags function but I'll leave that blank returning false in the generalisation patch and add the opcodes in this patch.

aemerson mentioned this in D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..Apr 26 2017, 3:02 AM

aemerson added a parent revision: D32527: Generalize flag carrying SDNodes beyond binary ops. NFC..

Split out SDNode changes into D32527 which is now committed.
Added comments to the ISDNodes definitions.

Ok to go?

rengolin added inline comments.May 2 2017, 4:06 AM

lib/Transforms/Utils/LoopUtils.cpp
1191 ↗	(On Diff #95795)	If the Add/Fadd cases are already covered by this function, and they end up calling the same methods to create the reduction nodes, and half of the arguments are the same, and the purpose is the same and the return value is the sane, I still don't understand why the function above can't be a wrapper to this one. Creating a recurrence descriptor doesn't seem that hard, and you don't need to duplicate the Add/Fadd logic up there.

aemerson added inline comments.May 2 2017, 4:31 AM

lib/Transforms/Utils/LoopUtils.cpp
1191 ↗	(On Diff #95795)	My reasoning is that any improvement in this piece of code due to your suggestion will be superficial at best, and IMO is outweighed by a conceptual cost. . A recurrence descriptor holds information about a particular recurrence operation, and there's fairly complex code dedicated to using SCEV and LoopInfo to do all the analysis and create one in `RecurrenceDescriptor::isReductionPHI()` which also calls `RecurrenceDescriptor::AddReductionVar()`. The resulting object at a conceptual level models a recurrence, which by definition has to be in a loop. Creating a recurrence descriptor in the `createSimpleTargetReduction()` function means creating a "fake" recurrence descriptor, one which may not actually model a recurrence. Ultimately the createSimpleTargetReduction() is not a wrapper because they cover different functionality at a higher level.

rengolin added inline comments.May 2 2017, 4:41 AM

lib/Transforms/Utils/LoopUtils.cpp
1191 ↗	(On Diff #95795)	Ok, but you don't use any of that logic, so you could also try the opposite: work directly with opcodes and transform descriptors into opcode?

Ok, so I've restructured the two functions a bit so that the simple (non minmax) reductions are generated from the createSimpleTargetReduction() function and the recurrence descriptor uses that for the simple cases, passing in the opcode.

In D30086#743350, @aemerson wrote:

Ok, so I've restructured the two functions a bit so that the simple (non minmax) reductions are generated from the createSimpleTargetReduction() function and the recurrence descriptor uses that for the simple cases, passing in the opcode.

But they're still two functions doing the same thing, with different arguments...

Maybe I'm not expressing myself right.

What I mean is, keep createSimpleTargetReduction as it is, then make createTargetReduction just massage the parameters (get opcode, flags) and pass to createSimpleTargetReduction, returning whatever value it does. Ie. createTargetReduction is just a wrapper to createSimpleTargetReduction.

You may need to add some functionality to createSimpleTargetReduction so that it does what createTargetReduction needs, in addition to what it needs, but that's ok. Right now their implementations are still near identical.

cheers,
--renato

Renato and I discussed this offline for a bit because we got our wires crossed a bit before. We agreed to simplify this code a bit more by extending createSimpleTargetReduction() to handle min/max by passing it the ReductionFlags. This essentially moves code from createTargetReduction() making it now just unwrap information from a RecurrenceDescriptor. Some other API changes done as a result.

Ping. Ok to go?

Hi Amara,

Thanks for the update, this is what I had in mind, yes.

I just have one additional small comment inline, and one small (probably irrelevant) nitpick.

cheers,
--renato

lib/Transforms/Utils/LoopUtils.cpp
1273 ↗	(On Diff #97890)	I may be wrong, but I think the LLVM style could be to put this curly brackets on a new line. clang-format would know better. :)
1293 ↗	(On Diff #97890)	Isn't this `Flags` a local variable? Where is this being used after the call to `createSimpleTargetReduction`?

A couple of minor typos

docs/LangRef.rst
11880 ↗	(On Diff #97890)	llvm.experimental.vector.reduce.xor.*
12011 ↗	(On Diff #97890)	llvm.experimental.vector.reduce.fmin.*

aemerson marked an inline comment as done.May 8 2017, 5:37 AM

aemerson added inline comments.

docs/LangRef.rst
12011 ↗	(On Diff #97890)	Thanks, will fix in final commit.
lib/Transforms/Utils/LoopUtils.cpp
1273 ↗	(On Diff #97890)	This is the LLVM style according to clang-format.
1293 ↗	(On Diff #97890)	It's captured by reference by the `getSimpleRdx` lambda.

Right, this is looking much better. Now, what about tests?

We'd probably need a bunch of tests to make sure that the intrinsics are accepted in the syntax that they're documented and rejected otherwise.

I'm not sure how's the best way forward, but probably just having IR in the right/wrong format and passing -validate or something expecting it to pass/fail would be a start.

cheers,
--renato

lib/Transforms/Utils/LoopUtils.cpp
1273 ↗	(On Diff #97890)	ok
1293 ↗	(On Diff #97890)	ah, yes.

In D30086#748597, @rengolin wrote:

Right, this is looking much better. Now, what about tests?

We'd probably need a bunch of tests to make sure that the intrinsics are accepted in the syntax that they're documented and rejected otherwise.

I'm not sure how's the best way forward, but probably just having IR in the right/wrong format and passing -validate or something expecting it to pass/fail would be a start.

cheers,
--renato

The type checking is already done as part of the intrinsics handling framework. When people add new intrinsics they don't really add tests to check the IR since you get assertion failures during the FunctionType creation if something goes wrong. It's why you specify the types in the intrinsics signatures. If on the other hand there were other tests like checking that the actual values given to the intrinsics are well formed then verifier tests make sense. Here however the reductions don't care about the actual values.

Since this is laying the foundations, and that future patches will a) enable this for AArch64 (which comes with reduction tests), and b) adds an expansion pass to convert back into shuffle reductions, I don't see the need to test the actual bare intrinsics part here.

In D30086#748611, @aemerson wrote:

When people add new intrinsics they don't really add tests to check the IR since you get assertion failures during the FunctionType creation if something goes wrong.

But without IR functions that actually uses them, there's no way to get the assertion, right?

Since this is laying the foundations, and that future patches will a) enable this for AArch64 (which comes with reduction tests), and b) adds an expansion pass to convert back into shuffle reductions, I don't see the need to test the actual bare intrinsics part here.

If everyone is happy for this patch to go in without tests (relying on further tests coming in following patches), I'm ok with it too.

For this reason, I'll let this review for someone else (@mkuper?, @RKSimon?) to approve.

cheers,
--renato

In D30086#748613, @rengolin wrote:

In D30086#748611, @aemerson wrote:

When people add new intrinsics they don't really add tests to check the IR since you get assertion failures during the FunctionType creation if something goes wrong.

But without IR functions that actually uses them, there's no way to get the assertion, right?

If you have a look at prior art for adding intrinsics you'll see that actual verifier tests are only done for illegal combinations of constant value parameters. There are no illegal constant parameter combinations with this patch. E.g. Dan Berlin's r294341 doesn't come with a test, likewise with others.

A minor comment, the code looks good to me.

lib/Transforms/Utils/LoopUtils.cpp
1159 ↗	(On Diff #97890)	MinMaxKind should not be invalid in this case. I suggest to add assert() here.

Thanks. I'll make the last few changes requested and commit.

This revision is now accepted and ready to land.May 9 2017, 1:55 AM

Closed by commit rL302514: Introduce experimental generic intrinsics for horizontal vector reductions. (authored by aemerson). · Explain WhyMay 9 2017, 3:56 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

docs/

LangRef.rst

332 lines

include/

llvm/

Analysis/

TargetTransformInfo.h

19 lines

TargetTransformInfoImpl.h

6 lines

CodeGen/

ISDOpcodes.h

14 lines

IR/

IRBuilder.h

39 lines

Intrinsics.td

44 lines

Transforms/

Utils/

LoopUtils.h

26 lines

lib/

Analysis/

TargetTransformInfo.cpp

6 lines

VectorUtils.cpp

1 line

CodeGen/

SelectionDAG/

LegalizeTypes.h

1 line

LegalizeVectorTypes.cpp

58 lines

SelectionDAG.cpp

2 lines

SelectionDAGBuilder.h

2 lines

SelectionDAGBuilder.cpp

88 lines

SelectionDAGDumper.cpp

13 lines

IR/

IRBuilder.cpp

88 lines

Transforms/

Utils/

LoopUtils.cpp

202 lines

Vectorize/

LoopVectorize.cpp

37 lines

SLPVectorizer.cpp

48 lines

Diff 98264

llvm/trunk/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 11,681 Lines • ▼ Show 20 Lines

	Examples:			Examples:
	"""""""""			"""""""""

	.. code-block:: llvm			.. code-block:: llvm

	%r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields float:r2 = (a * b) + c			%r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields float:r2 = (a * b) + c


				Experimental Vector Reduction Intrinsics
				----------------------------------------

				Horizontal reductions of vectors can be expressed using the following
				intrinsics. Each one takes a vector operand as an input and applies its
				respective operation across all elements of the vector, returning a single
				scalar result of the same element type.


				'``llvm.experimental.vector.reduce.add.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare i32 @llvm.experimental.vector.reduce.add.i32.v4i32(<4 x i32> %a)
				declare i64 @llvm.experimental.vector.reduce.add.i64.v2i64(<2 x i64> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.add.*``' intrinsics do an integer ``ADD``
				reduction of a vector, returning the result as a scalar. The return type matches
				the element-type of the vector input.

				Arguments:
				""""""""""
				The argument to this intrinsic must be a vector of integer values.

				'``llvm.experimental.vector.reduce.fadd.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float %acc, <4 x float> %a)
				declare double @llvm.experimental.vector.reduce.fadd.f64.v2f64(double %acc, <2 x double> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.fadd.*``' intrinsics do a floating point
				``ADD`` reduction of a vector, returning the result as a scalar. The return type
				matches the element-type of the vector input.

				If the intrinsic call has fast-math flags, then the reduction will not preserve
				the associativity of an equivalent scalarized counterpart. If it does not have
				fast-math flags, then the reduction will be ordered, implying that the
				operation respects the associativity of a scalarized reduction.


				Arguments:
				""""""""""
				The first argument to this intrinsic is a scalar accumulator value, which is
				only used when there are no fast-math flags attached. This argument may be undef
				when fast-math flags are used.

				The second argument must be a vector of floating point values.

				Examples:
				"""""""""

				.. code-block:: llvm

				%fast = call fast float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float undef, <4 x float> %input) ; fast reduction
				%ord = call float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float %acc, <4 x float> %input) ; ordered reduction


				'``llvm.experimental.vector.reduce.mul.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare i32 @llvm.experimental.vector.reduce.mul.i32.v4i32(<4 x i32> %a)
				declare i64 @llvm.experimental.vector.reduce.mul.i64.v2i64(<2 x i64> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.mul.*``' intrinsics do an integer ``MUL``
				reduction of a vector, returning the result as a scalar. The return type matches
				the element-type of the vector input.

				Arguments:
				""""""""""
				The argument to this intrinsic must be a vector of integer values.

				'``llvm.experimental.vector.reduce.fmul.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare float @llvm.experimental.vector.reduce.fmul.f32.v4f32(float %acc, <4 x float> %a)
				declare double @llvm.experimental.vector.reduce.fmul.f64.v2f64(double %acc, <2 x double> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.fmul.*``' intrinsics do a floating point
				``MUL`` reduction of a vector, returning the result as a scalar. The return type
				matches the element-type of the vector input.

				If the intrinsic call has fast-math flags, then the reduction will not preserve
				the associativity of an equivalent scalarized counterpart. If it does not have
				fast-math flags, then the reduction will be ordered, implying that the
				operation respects the associativity of a scalarized reduction.


				Arguments:
				""""""""""
				The first argument to this intrinsic is a scalar accumulator value, which is
				only used when there are no fast-math flags attached. This argument may be undef
				when fast-math flags are used.

				The second argument must be a vector of floating point values.

				Examples:
				"""""""""

				.. code-block:: llvm

				%fast = call fast float @llvm.experimental.vector.reduce.fmul.f32.v4f32(float undef, <4 x float> %input) ; fast reduction
				%ord = call float @llvm.experimental.vector.reduce.fmul.f32.v4f32(float %acc, <4 x float> %input) ; ordered reduction

				'``llvm.experimental.vector.reduce.and.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare i32 @llvm.experimental.vector.reduce.and.i32.v4i32(<4 x i32> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.and.*``' intrinsics do a bitwise ``AND``
				reduction of a vector, returning the result as a scalar. The return type matches
				the element-type of the vector input.

				Arguments:
				""""""""""
				The argument to this intrinsic must be a vector of integer values.

				'``llvm.experimental.vector.reduce.or.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.or.*``' intrinsics do a bitwise ``OR`` reduction
				of a vector, returning the result as a scalar. The return type matches the
				element-type of the vector input.

				Arguments:
				""""""""""
				The argument to this intrinsic must be a vector of integer values.

				'``llvm.experimental.vector.reduce.xor.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare i32 @llvm.experimental.vector.reduce.xor.i32.v4i32(<4 x i32> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.xor.*``' intrinsics do a bitwise ``XOR``
				reduction of a vector, returning the result as a scalar. The return type matches
				the element-type of the vector input.

				Arguments:
				""""""""""
				The argument to this intrinsic must be a vector of integer values.

				'``llvm.experimental.vector.reduce.smax.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare i32 @llvm.experimental.vector.reduce.smax.i32.v4i32(<4 x i32> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.smax.*``' intrinsics do a signed integer
				``MAX`` reduction of a vector, returning the result as a scalar. The return type
				matches the element-type of the vector input.

				Arguments:
				""""""""""
				The argument to this intrinsic must be a vector of integer values.

				'``llvm.experimental.vector.reduce.smin.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare i32 @llvm.experimental.vector.reduce.smin.i32.v4i32(<4 x i32> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.smin.*``' intrinsics do a signed integer
				``MIN`` reduction of a vector, returning the result as a scalar. The return type
				matches the element-type of the vector input.

				Arguments:
				""""""""""
				The argument to this intrinsic must be a vector of integer values.

				'``llvm.experimental.vector.reduce.umax.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare i32 @llvm.experimental.vector.reduce.umax.i32.v4i32(<4 x i32> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.umax.*``' intrinsics do an unsigned
				integer ``MAX`` reduction of a vector, returning the result as a scalar. The
				return type matches the element-type of the vector input.

				Arguments:
				""""""""""
				The argument to this intrinsic must be a vector of integer values.

				'``llvm.experimental.vector.reduce.umin.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare i32 @llvm.experimental.vector.reduce.umin.i32.v4i32(<4 x i32> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.umin.*``' intrinsics do an unsigned
				integer ``MIN`` reduction of a vector, returning the result as a scalar. The
				return type matches the element-type of the vector input.

				Arguments:
				""""""""""
				The argument to this intrinsic must be a vector of integer values.

				'``llvm.experimental.vector.reduce.fmax.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare float @llvm.experimental.vector.reduce.fmax.f32.v4f32(<4 x float> %a)
				declare double @llvm.experimental.vector.reduce.fmax.f64.v2f64(<2 x double> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.fmax.*``' intrinsics do a floating point
				``MAX`` reduction of a vector, returning the result as a scalar. The return type
				matches the element-type of the vector input.

				If the intrinsic call has the ``nnan`` fast-math flag then the operation can
				assume that NaNs are not present in the input vector.

				Arguments:
				""""""""""
				The argument to this intrinsic must be a vector of floating point values.

				'``llvm.experimental.vector.reduce.fmin.*``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare float @llvm.experimental.vector.reduce.fmin.f32.v4f32(<4 x float> %a)
				declare double @llvm.experimental.vector.reduce.fmin.f64.v2f64(<2 x double> %a)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.reduce.fmin.*``' intrinsics do a floating point
				``MIN`` reduction of a vector, returning the result as a scalar. The return type
				matches the element-type of the vector input.

				If the intrinsic call has the ``nnan`` fast-math flag then the operation can
				assume that NaNs are not present in the input vector.

				Arguments:
				""""""""""
				The argument to this intrinsic must be a vector of floating point values.

	Half Precision Floating Point Intrinsics			Half Precision Floating Point Intrinsics
	----------------------------------------			----------------------------------------

	For most target platforms, half precision floating point is a			For most target platforms, half precision floating point is a
	storage-only format. This means that it is a dense encoding (in memory)			storage-only format. This means that it is a dense encoding (in memory)
	but does not support computation in the format.			but does not support computation in the format.

	This means that code must first load the half-precision floating point			This means that code must first load the half-precision floating point
	▲ Show 20 Lines • Show All 1,583 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 734 Lines • ▼ Show 20 Lines	unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
VectorType *VecTy) const;		VectorType *VecTy) const;

/// \returns The new vector factor value if the target doesn't support \p		/// \returns The new vector factor value if the target doesn't support \p
/// SizeInBytes stores or has a better vector factor.		/// SizeInBytes stores or has a better vector factor.
unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const;		VectorType *VecTy) const;

		/// Flags describing the kind of vector reduction.
		struct ReductionFlags {
		ReductionFlags() : IsMaxOp(false), IsSigned(false), NoNaN(false) {}
		bool IsMaxOp; ///< If the op a min/max kind, true if it's a max operation.
		bool IsSigned; ///< Whether the operation is a signed int reduction.
		bool NoNaN; ///< If op is an fp min/max, whether NaNs may be present.
		};

		/// \returns True if the target wants to handle the given reduction idiom in
		/// the intrinsics form instead of the shuffle form.
		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
		ReductionFlags Flags) const;

/// @}		/// @}

private:		private:
/// \brief The abstract base class used to type erase specific TTI		/// \brief The abstract base class used to type erase specific TTI
/// implementations.		/// implementations.
class Concept;		class Concept;

/// \brief The template model for the base class which wraps a concrete		/// \brief The template model for the base class which wraps a concrete
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	virtual bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,
unsigned Alignment,		unsigned Alignment,
unsigned AddrSpace) const = 0;		unsigned AddrSpace) const = 0;
virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,		virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
		virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
		ReductionFlags) const = 0;
};		};

template <typename T>		template <typename T>
class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {		class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
T Impl;		T Impl;

public:		public:
Model(T Impl) : Impl(std::move(Impl)) {}		Model(T Impl) : Impl(std::move(Impl)) {}
▲ Show 20 Lines • Show All 289 Lines • ▼ Show 20 Lines	unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
return Impl.getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);		return Impl.getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);
}		}
unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);		return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
}		}
		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
		ReductionFlags Flags) const override {
		return Impl.useReductionIntrinsic(Opcode, Ty, Flags);
		}
};		};

template <typename T>		template <typename T>
TargetTransformInfo::TargetTransformInfo(T Impl)		TargetTransformInfo::TargetTransformInfo(T Impl)
: TTIImpl(new Model<T>(Impl)) {}		: TTIImpl(new Model<T>(Impl)) {}

/// \brief Analysis pass providing the \c TargetTransformInfo.		/// \brief Analysis pass providing the \c TargetTransformInfo.
///		///
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
return VF;		return VF;
}		}

unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const {		VectorType *VecTy) const {
return VF;		return VF;
}		}

		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
		TTI::ReductionFlags Flags) const {
		return false;
		}

protected:		protected:
// Obtain the minimum required size to hold the value (without the sign)		// Obtain the minimum required size to hold the value (without the sign)
// In case of a vector it returns the min required size for one element.		// In case of a vector it returns the min required size for one element.
unsigned minRequiredElementSize(const Value* Val, bool &isSigned) {		unsigned minRequiredElementSize(const Value* Val, bool &isSigned) {
if (isa<ConstantDataVector>(Val) \|\| isa<ConstantVector>(Val)) {		if (isa<ConstantDataVector>(Val) \|\| isa<ConstantVector>(Val)) {
const auto* VectorValue = cast<Constant>(Val);		const auto* VectorValue = cast<Constant>(Val);

// In case of a vector need to pick the max between the min		// In case of a vector need to pick the max between the min
▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 777 Lines • ▼ Show 20 Lines	enum NodeType {
GC_TRANSITION_END,		GC_TRANSITION_END,

/// GET_DYNAMIC_AREA_OFFSET - get offset from native SP to the address of		/// GET_DYNAMIC_AREA_OFFSET - get offset from native SP to the address of
/// the most recent dynamic alloca. For most targets that would be 0, but		/// the most recent dynamic alloca. For most targets that would be 0, but
/// for some others (e.g. PowerPC, PowerPC64) that would be compile-time		/// for some others (e.g. PowerPC, PowerPC64) that would be compile-time
/// known nonzero constant. The only operand here is the chain.		/// known nonzero constant. The only operand here is the chain.
GET_DYNAMIC_AREA_OFFSET,		GET_DYNAMIC_AREA_OFFSET,

		/// Generic reduction nodes. These nodes represent horizontal vector
		/// reduction operations, producing a scalar result.
		/// The STRICT variants perform reductions in sequential order. The first
		/// operand is an initial scalar accumulator value, and the second operand
		/// is the vector to reduce.
		VECREDUCE_STRICT_FADD, VECREDUCE_STRICT_FMUL,
		/// These reductions are non-strict, and have a single vector operand.
		VECREDUCE_FADD, VECREDUCE_FMUL,
		VECREDUCE_ADD, VECREDUCE_MUL,
		VECREDUCE_AND, VECREDUCE_OR, VECREDUCE_XOR,
		VECREDUCE_SMAX, VECREDUCE_SMIN, VECREDUCE_UMAX, VECREDUCE_UMIN,
		/// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.
		VECREDUCE_FMAX, VECREDUCE_FMIN,

/// BUILTIN_OP_END - This must be the last enum value in this list.		/// BUILTIN_OP_END - This must be the last enum value in this list.
/// The target-specific pre-isel opcode values start here.		/// The target-specific pre-isel opcode values start here.
BUILTIN_OP_END		BUILTIN_OP_END
};		};

/// FIRST_TARGET_MEMORY_OPCODE - Target-specific pre-isel operations		/// FIRST_TARGET_MEMORY_OPCODE - Target-specific pre-isel operations
/// which do not reference a specific memory location should be less than		/// which do not reference a specific memory location should be less than
/// this value. Those that do must not be less than this value, and can		/// this value. Those that do must not be less than this value, and can
▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/IR/IRBuilder.h

Show First 20 Lines • Show All 448 Lines • ▼ Show 20 Lines	return CreateMemMove(Dst, Src, getInt64(Size), Align, isVolatile,
TBAATag, ScopeTag, NoAliasTag);		TBAATag, ScopeTag, NoAliasTag);
}		}

CallInst CreateMemMove(Value Dst, Value Src, Value Size, unsigned Align,		CallInst CreateMemMove(Value Dst, Value Src, Value Size, unsigned Align,
bool isVolatile = false, MDNode *TBAATag = nullptr,		bool isVolatile = false, MDNode *TBAATag = nullptr,
MDNode *ScopeTag = nullptr,		MDNode *ScopeTag = nullptr,
MDNode *NoAliasTag = nullptr);		MDNode *NoAliasTag = nullptr);

		/// \brief Create a vector fadd reduction intrinsic of the source vector.
		/// The first parameter is a scalar accumulator value for ordered reductions.
		CallInst CreateFAddReduce(Value Acc, Value *Src);

		/// \brief Create a vector fmul reduction intrinsic of the source vector.
		/// The first parameter is a scalar accumulator value for ordered reductions.
		CallInst CreateFMulReduce(Value Acc, Value *Src);

		/// \brief Create a vector int add reduction intrinsic of the source vector.
		CallInst CreateAddReduce(Value Src);

		/// \brief Create a vector int mul reduction intrinsic of the source vector.
		CallInst CreateMulReduce(Value Src);

		/// \brief Create a vector int AND reduction intrinsic of the source vector.
		CallInst CreateAndReduce(Value Src);

		/// \brief Create a vector int OR reduction intrinsic of the source vector.
		CallInst CreateOrReduce(Value Src);

		/// \brief Create a vector int XOR reduction intrinsic of the source vector.
		CallInst CreateXorReduce(Value Src);

		/// \brief Create a vector integer max reduction intrinsic of the source
		/// vector.
		CallInst CreateIntMaxReduce(Value Src, bool IsSigned = false);

		/// \brief Create a vector integer min reduction intrinsic of the source
		/// vector.
		CallInst CreateIntMinReduce(Value Src, bool IsSigned = false);

		/// \brief Create a vector float max reduction intrinsic of the source
		/// vector.
		CallInst CreateFPMaxReduce(Value Src, bool NoNaN = false);

		/// \brief Create a vector float min reduction intrinsic of the source
		/// vector.
		CallInst CreateFPMinReduce(Value Src, bool NoNaN = false);

/// \brief Create a lifetime.start intrinsic.		/// \brief Create a lifetime.start intrinsic.
///		///
/// If the pointer isn't i8* it will be converted.		/// If the pointer isn't i8* it will be converted.
CallInst CreateLifetimeStart(Value Ptr, ConstantInt *Size = nullptr);		CallInst CreateLifetimeStart(Value Ptr, ConstantInt *Size = nullptr);

/// \brief Create a lifetime.end intrinsic.		/// \brief Create a lifetime.end intrinsic.
///		///
/// If the pointer isn't i8* it will be converted.		/// If the pointer isn't i8* it will be converted.
▲ Show 20 Lines • Show All 1,439 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 806 Lines • ▼ Show 20 Lines
	//			//

	def int_memcpy_element_atomic : Intrinsic<[],			def int_memcpy_element_atomic : Intrinsic<[],
	[llvm_anyptr_ty, llvm_anyptr_ty,			[llvm_anyptr_ty, llvm_anyptr_ty,
	llvm_i64_ty, llvm_i32_ty],			llvm_i64_ty, llvm_i32_ty],
	[IntrArgMemOnly, NoCapture<0>, NoCapture<1>,			[IntrArgMemOnly, NoCapture<0>, NoCapture<1>,
	WriteOnly<0>, ReadOnly<1>]>;			WriteOnly<0>, ReadOnly<1>]>;

				//===------------------------ Reduction Intrinsics ------------------------===//
				//
				def int_experimental_vector_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],
				[llvm_anyfloat_ty,
				llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],
				[llvm_anyfloat_ty,
				llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_add : Intrinsic<[llvm_anyint_ty],
				[llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_mul : Intrinsic<[llvm_anyint_ty],
				[llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_and : Intrinsic<[llvm_anyint_ty],
				[llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_or : Intrinsic<[llvm_anyint_ty],
				[llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_xor : Intrinsic<[llvm_anyint_ty],
				[llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_smax : Intrinsic<[llvm_anyint_ty],
				[llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_smin : Intrinsic<[llvm_anyint_ty],
				[llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_umax : Intrinsic<[llvm_anyint_ty],
				[llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_umin : Intrinsic<[llvm_anyint_ty],
				[llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_fmax : Intrinsic<[llvm_anyfloat_ty],
				[llvm_anyvector_ty],
				[IntrNoMem]>;
				def int_experimental_vector_reduce_fmin : Intrinsic<[llvm_anyfloat_ty],
				[llvm_anyvector_ty],
				[IntrNoMem]>;

	//===----- Intrinsics that are used to provide predicate information -----===//			//===----- Intrinsics that are used to provide predicate information -----===//

	def int_ssa_copy : Intrinsic<[llvm_any_ty], [LLVMMatchType<0>],			def int_ssa_copy : Intrinsic<[llvm_any_ty], [LLVMMatchType<0>],
	[IntrNoMem, Returned<0>]>;			[IntrNoMem, Returned<0>]>;
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Target-specific intrinsics			// Target-specific intrinsics
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	Show All 12 Lines

llvm/trunk/include/llvm/Transforms/Utils/LoopUtils.h

	Show All 15 Lines

	#include "llvm/ADT/DenseMap.h"			#include "llvm/ADT/DenseMap.h"
	#include "llvm/ADT/Optional.h"			#include "llvm/ADT/Optional.h"
	#include "llvm/ADT/SmallPtrSet.h"			#include "llvm/ADT/SmallPtrSet.h"
	#include "llvm/ADT/SmallVector.h"			#include "llvm/ADT/SmallVector.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/Analysis/AliasAnalysis.h"			#include "llvm/Analysis/AliasAnalysis.h"
	#include "llvm/Analysis/EHPersonalities.h"			#include "llvm/Analysis/EHPersonalities.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/IR/Dominators.h"			#include "llvm/IR/Dominators.h"
	#include "llvm/IR/IRBuilder.h"			#include "llvm/IR/IRBuilder.h"
	#include "llvm/IR/InstrTypes.h"			#include "llvm/IR/InstrTypes.h"
	#include "llvm/IR/Operator.h"			#include "llvm/IR/Operator.h"
	#include "llvm/IR/ValueHandle.h"			#include "llvm/IR/ValueHandle.h"
	#include "llvm/Support/Casting.h"			#include "llvm/Support/Casting.h"

	namespace llvm {			namespace llvm {

	class AliasSet;			class AliasSet;
	class AliasSetTracker;			class AliasSetTracker;
	class BasicBlock;			class BasicBlock;
	class DataLayout;			class DataLayout;
	class Loop;			class Loop;
	class LoopInfo;			class LoopInfo;
	class OptimizationRemarkEmitter;			class OptimizationRemarkEmitter;
	class PredicatedScalarEvolution;			class PredicatedScalarEvolution;
	class PredIteratorCache;			class PredIteratorCache;
	class ScalarEvolution;			class ScalarEvolution;
	class SCEV;			class SCEV;
	class TargetLibraryInfo;			class TargetLibraryInfo;
				class TargetTransformInfo;

	/// \brief Captures loop safety information.			/// \brief Captures loop safety information.
	/// It keep information for loop & its header may throw exception.			/// It keep information for loop & its header may throw exception.
	struct LoopSafetyInfo {			struct LoopSafetyInfo {
	bool MayThrow = false; // The current loop contains an instruction which			bool MayThrow = false; // The current loop contains an instruction which
	// may throw.			// may throw.
	bool HeaderMayThrow = false; // Same as previous, but specific to loop header			bool HeaderMayThrow = false; // Same as previous, but specific to loop header
	// Used to update funclet bundle operands.			// Used to update funclet bundle operands.
	▲ Show 20 Lines • Show All 431 Lines • ▼ Show 20 Lines
	/// instructions from loop body to preheader/exit. Check if the instruction			/// instructions from loop body to preheader/exit. Check if the instruction
	/// can execute speculatively.			/// can execute speculatively.
	/// If \p ORE is set use it to emit optimization remarks.			/// If \p ORE is set use it to emit optimization remarks.
	bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,			bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
	Loop CurLoop, AliasSetTracker CurAST,			Loop CurLoop, AliasSetTracker CurAST,
	LoopSafetyInfo *SafetyInfo,			LoopSafetyInfo *SafetyInfo,
	OptimizationRemarkEmitter *ORE = nullptr);			OptimizationRemarkEmitter *ORE = nullptr);

				/// Create a target reduction of the given vector. The reduction operation
				/// is described by the \p Opcode parameter. min/max reductions require
				/// additional information supplied in \p Flags.
				/// The target is queried to determine if intrinsics or shuffle sequences are
				/// required to implement the reduction.
				Value *
				createSimpleTargetReduction(IRBuilder<> &B, const TargetTransformInfo *TTI,
				unsigned Opcode, Value *Src,
				TargetTransformInfo::ReductionFlags Flags =
				TargetTransformInfo::ReductionFlags(),
				ArrayRef<Value > RedOps = ArrayRef<Value >());

				/// Create a generic target reduction using a recurrence descriptor \p Desc
				/// The target is queried to determine if intrinsics or shuffle sequences are
				/// required to implement the reduction.
				Value createTargetReduction(IRBuilder<> &B, const TargetTransformInfo TTI,
				RecurrenceDescriptor &Desc, Value *Src,
				bool NoNaN = false);

				/// Get the intersection (logical and) of all of the potential IR flags
				/// of each scalar operation (VL) that will be converted into a vector (I).
				/// Flag set: NSW, NUW, exact, and all of fast-math.
				void propagateIRFlags(Value I, ArrayRef<Value > VL);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_UTILS_LOOPUTILS_H			#endif // LLVM_TRANSFORMS_UTILS_LOOPUTILS_H

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 494 Lines • ▼ Show 20 Lines

	unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,			unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
	unsigned StoreSize,			unsigned StoreSize,
	unsigned ChainSizeInBytes,			unsigned ChainSizeInBytes,
	VectorType *VecTy) const {			VectorType *VecTy) const {
	return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);			return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
	}			}

				bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,
				Type *Ty, ReductionFlags Flags) const {
				return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);
				}


	TargetTransformInfo::Concept::~Concept() {}			TargetTransformInfo::Concept::~Concept() {}

	TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}			TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}

	TargetIRAnalysis::TargetIRAnalysis(			TargetIRAnalysis::TargetIRAnalysis(
	std::function<Result(const Function &)> TTICallback)			std::function<Result(const Function &)> TTICallback)
	: TTICallback(std::move(TTICallback)) {}			: TTICallback(std::move(TTICallback)) {}

	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/VectorUtils.cpp

	Show All 17 Lines
	#include "llvm/Analysis/ScalarEvolution.h"			#include "llvm/Analysis/ScalarEvolution.h"
	#include "llvm/Analysis/TargetTransformInfo.h"			#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/Analysis/ValueTracking.h"			#include "llvm/Analysis/ValueTracking.h"
	#include "llvm/Analysis/VectorUtils.h"			#include "llvm/Analysis/VectorUtils.h"
	#include "llvm/IR/GetElementPtrTypeIterator.h"			#include "llvm/IR/GetElementPtrTypeIterator.h"
	#include "llvm/IR/PatternMatch.h"			#include "llvm/IR/PatternMatch.h"
	#include "llvm/IR/Value.h"			#include "llvm/IR/Value.h"
	#include "llvm/IR/Constants.h"			#include "llvm/IR/Constants.h"
				#include "llvm/IR/IRBuilder.h"

	using namespace llvm;			using namespace llvm;
	using namespace llvm::PatternMatch;			using namespace llvm::PatternMatch;

	/// \brief Identify if the intrinsic is trivially vectorizable.			/// \brief Identify if the intrinsic is trivially vectorizable.
	/// This method returns true if the intrinsic's argument types are all			/// This method returns true if the intrinsic's argument types are all
	/// scalars for the scalar form of the intrinsic and all vectors for			/// scalars for the scalar form of the intrinsic and all vectors for
	/// the vector form of the intrinsic.			/// the vector form of the intrinsic.
	▲ Show 20 Lines • Show All 542 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 669 Lines • ▼ Show 20 Lines	private:
void SplitVecRes_SCALAR_TO_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_SCALAR_TO_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_SETCC(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_SETCC(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_VECTOR_SHUFFLE(ShuffleVectorSDNode *N, SDValue &Lo,		void SplitVecRes_VECTOR_SHUFFLE(ShuffleVectorSDNode *N, SDValue &Lo,
SDValue &Hi);		SDValue &Hi);

// Vector Operand Splitting: <128 x ty> -> 2 x <64 x ty>.		// Vector Operand Splitting: <128 x ty> -> 2 x <64 x ty>.
bool SplitVectorOperand(SDNode *N, unsigned OpNo);		bool SplitVectorOperand(SDNode *N, unsigned OpNo);
SDValue SplitVecOp_VSELECT(SDNode *N, unsigned OpNo);		SDValue SplitVecOp_VSELECT(SDNode *N, unsigned OpNo);
		SDValue SplitVecOp_VECREDUCE(SDNode *N, unsigned OpNo);
SDValue SplitVecOp_UnaryOp(SDNode *N);		SDValue SplitVecOp_UnaryOp(SDNode *N);
SDValue SplitVecOp_TruncateHelper(SDNode *N);		SDValue SplitVecOp_TruncateHelper(SDNode *N);

SDValue SplitVecOp_BITCAST(SDNode *N);		SDValue SplitVecOp_BITCAST(SDNode *N);
SDValue SplitVecOp_EXTRACT_SUBVECTOR(SDNode *N);		SDValue SplitVecOp_EXTRACT_SUBVECTOR(SDNode *N);
SDValue SplitVecOp_EXTRACT_VECTOR_ELT(SDNode *N);		SDValue SplitVecOp_EXTRACT_VECTOR_ELT(SDNode *N);
SDValue SplitVecOp_ExtVecInRegOp(SDNode *N);		SDValue SplitVecOp_ExtVecInRegOp(SDNode *N);
SDValue SplitVecOp_STORE(StoreSDNode *N, unsigned OpNo);		SDValue SplitVecOp_STORE(StoreSDNode *N, unsigned OpNo);
▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 1,507 Lines • ▼ Show 20 Lines	case ISD::FCANONICALIZE:
Res = SplitVecOp_UnaryOp(N);		Res = SplitVecOp_UnaryOp(N);
break;		break;

case ISD::ANY_EXTEND_VECTOR_INREG:		case ISD::ANY_EXTEND_VECTOR_INREG:
case ISD::SIGN_EXTEND_VECTOR_INREG:		case ISD::SIGN_EXTEND_VECTOR_INREG:
case ISD::ZERO_EXTEND_VECTOR_INREG:		case ISD::ZERO_EXTEND_VECTOR_INREG:
Res = SplitVecOp_ExtVecInRegOp(N);		Res = SplitVecOp_ExtVecInRegOp(N);
break;		break;

		case ISD::VECREDUCE_FADD:
		case ISD::VECREDUCE_FMUL:
		case ISD::VECREDUCE_ADD:
		case ISD::VECREDUCE_MUL:
		case ISD::VECREDUCE_AND:
		case ISD::VECREDUCE_OR:
		case ISD::VECREDUCE_XOR:
		case ISD::VECREDUCE_SMAX:
		case ISD::VECREDUCE_SMIN:
		case ISD::VECREDUCE_UMAX:
		case ISD::VECREDUCE_UMIN:
		case ISD::VECREDUCE_FMAX:
		case ISD::VECREDUCE_FMIN:
		Res = SplitVecOp_VECREDUCE(N, OpNo);
		break;
}		}
}		}

// If the result is null, the sub-method took care of registering results etc.		// If the result is null, the sub-method took care of registering results etc.
if (!Res.getNode()) return false;		if (!Res.getNode()) return false;

// If the result is N, the sub-method updated N in place. Tell the legalizer		// If the result is N, the sub-method updated N in place. Tell the legalizer
// core about this.		// core about this.
Show All 36 Lines	SDValue DAGTypeLegalizer::SplitVecOp_VSELECT(SDNode *N, unsigned OpNo) {
SDValue LoSelect =		SDValue LoSelect =
DAG.getNode(ISD::VSELECT, DL, LoOpVT, LoMask, LoOp0, LoOp1);		DAG.getNode(ISD::VSELECT, DL, LoOpVT, LoMask, LoOp0, LoOp1);
SDValue HiSelect =		SDValue HiSelect =
DAG.getNode(ISD::VSELECT, DL, HiOpVT, HiMask, HiOp0, HiOp1);		DAG.getNode(ISD::VSELECT, DL, HiOpVT, HiMask, HiOp0, HiOp1);

return DAG.getNode(ISD::CONCAT_VECTORS, DL, Src0VT, LoSelect, HiSelect);		return DAG.getNode(ISD::CONCAT_VECTORS, DL, Src0VT, LoSelect, HiSelect);
}		}

		SDValue DAGTypeLegalizer::SplitVecOp_VECREDUCE(SDNode *N, unsigned OpNo) {
		EVT ResVT = N->getValueType(0);
		SDValue Lo, Hi;
		SDLoc dl(N);

		SDValue VecOp = N->getOperand(OpNo);
		EVT VecVT = VecOp.getValueType();
		assert(VecVT.isVector() && "Can only split reduce vector operand");
		GetSplitVector(VecOp, Lo, Hi);
		EVT LoOpVT, HiOpVT;
		std::tie(LoOpVT, HiOpVT) = DAG.GetSplitDestVTs(VecVT);

		bool NoNaN = N->getFlags().hasNoNaNs();
		unsigned CombineOpc = 0;
		switch (N->getOpcode()) {
		case ISD::VECREDUCE_FADD: CombineOpc = ISD::FADD; break;
		case ISD::VECREDUCE_FMUL: CombineOpc = ISD::FMUL; break;
		case ISD::VECREDUCE_ADD: CombineOpc = ISD::ADD; break;
		case ISD::VECREDUCE_MUL: CombineOpc = ISD::MUL; break;
		case ISD::VECREDUCE_AND: CombineOpc = ISD::AND; break;
		case ISD::VECREDUCE_OR: CombineOpc = ISD::OR; break;
		case ISD::VECREDUCE_XOR: CombineOpc = ISD::XOR; break;
		case ISD::VECREDUCE_SMAX: CombineOpc = ISD::SMAX; break;
		case ISD::VECREDUCE_SMIN: CombineOpc = ISD::SMIN; break;
		case ISD::VECREDUCE_UMAX: CombineOpc = ISD::UMAX; break;
		case ISD::VECREDUCE_UMIN: CombineOpc = ISD::UMIN; break;
		case ISD::VECREDUCE_FMAX:
		CombineOpc = NoNaN ? ISD::FMAXNUM : ISD::FMAXNAN;
		break;
		case ISD::VECREDUCE_FMIN:
		CombineOpc = NoNaN ? ISD::FMINNUM : ISD::FMINNAN;
		break;
		default:
		llvm_unreachable("Unexpected reduce ISD node");
		}

		// Use the appropriate scalar instruction on the split subvectors before
		// reducing the now partially reduced smaller vector.
		SDValue Partial = DAG.getNode(CombineOpc, dl, LoOpVT, Lo, Hi);
		return DAG.getNode(N->getOpcode(), dl, ResVT, Partial);
		}

SDValue DAGTypeLegalizer::SplitVecOp_UnaryOp(SDNode *N) {		SDValue DAGTypeLegalizer::SplitVecOp_UnaryOp(SDNode *N) {
// The result has a legal vector type, but the input needs splitting.		// The result has a legal vector type, but the input needs splitting.
EVT ResVT = N->getValueType(0);		EVT ResVT = N->getValueType(0);
SDValue Lo, Hi;		SDValue Lo, Hi;
SDLoc dl(N);		SDLoc dl(N);
GetSplitVector(N->getOperand(0), Lo, Hi);		GetSplitVector(N->getOperand(0), Lo, Hi);
EVT InVT = Lo.getValueType();		EVT InVT = Lo.getValueType();

▲ Show 20 Lines • Show All 2,438 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,964 Lines • ▼ Show 20 Lines	SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
return getNode(Opcode, DL, VT, NewOps);		return getNode(Opcode, DL, VT, NewOps);
}		}

SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,		SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
ArrayRef<SDValue> Ops, const SDNodeFlags Flags) {		ArrayRef<SDValue> Ops, const SDNodeFlags Flags) {
unsigned NumOps = Ops.size();		unsigned NumOps = Ops.size();
switch (NumOps) {		switch (NumOps) {
case 0: return getNode(Opcode, DL, VT);		case 0: return getNode(Opcode, DL, VT);
case 1: return getNode(Opcode, DL, VT, Ops[0]);		case 1: return getNode(Opcode, DL, VT, Ops[0], Flags);
case 2: return getNode(Opcode, DL, VT, Ops[0], Ops[1], Flags);		case 2: return getNode(Opcode, DL, VT, Ops[0], Ops[1], Flags);
case 3: return getNode(Opcode, DL, VT, Ops[0], Ops[1], Ops[2]);		case 3: return getNode(Opcode, DL, VT, Ops[0], Ops[1], Ops[2]);
default: break;		default: break;
}		}

switch (Opcode) {		switch (Opcode) {
default: break;		default: break;
case ISD::CONCAT_VECTORS: {		case ISD::CONCAT_VECTORS: {
▲ Show 20 Lines • Show All 1,848 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show First 20 Lines • Show All 903 Lines • ▼ Show 20 Lines	private:
void visitStackmap(const CallInst &I);		void visitStackmap(const CallInst &I);
void visitPatchpoint(ImmutableCallSite CS,		void visitPatchpoint(ImmutableCallSite CS,
const BasicBlock *EHPadBB = nullptr);		const BasicBlock *EHPadBB = nullptr);

// These two are implemented in StatepointLowering.cpp		// These two are implemented in StatepointLowering.cpp
void visitGCRelocate(const GCRelocateInst &I);		void visitGCRelocate(const GCRelocateInst &I);
void visitGCResult(const GCResultInst &I);		void visitGCResult(const GCResultInst &I);

		void visitVectorReduce(const CallInst &I, unsigned Intrinsic);

void visitUserOp1(const Instruction &I) {		void visitUserOp1(const Instruction &I) {
llvm_unreachable("UserOp1 should not exist at instruction selection time!");		llvm_unreachable("UserOp1 should not exist at instruction selection time!");
}		}
void visitUserOp2(const Instruction &I) {		void visitUserOp2(const Instruction &I) {
llvm_unreachable("UserOp2 should not exist at instruction selection time!");		llvm_unreachable("UserOp2 should not exist at instruction selection time!");
}		}

void processIntegerCallValue(const Instruction &I,		void processIntegerCallValue(const Instruction &I,
▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,731 Lines • ▼ Show 20 Lines	case Intrinsic::xray_customevent: {
SDValue patchableNode = SDValue(MN, 0);		SDValue patchableNode = SDValue(MN, 0);
DAG.setRoot(patchableNode);		DAG.setRoot(patchableNode);
setValue(&I, patchableNode);		setValue(&I, patchableNode);
return nullptr;		return nullptr;
}		}
case Intrinsic::experimental_deoptimize:		case Intrinsic::experimental_deoptimize:
LowerDeoptimizeCall(&I);		LowerDeoptimizeCall(&I);
return nullptr;		return nullptr;

		case Intrinsic::experimental_vector_reduce_fadd:
		case Intrinsic::experimental_vector_reduce_fmul:
		case Intrinsic::experimental_vector_reduce_add:
		case Intrinsic::experimental_vector_reduce_mul:
		case Intrinsic::experimental_vector_reduce_and:
		case Intrinsic::experimental_vector_reduce_or:
		case Intrinsic::experimental_vector_reduce_xor:
		case Intrinsic::experimental_vector_reduce_smax:
		case Intrinsic::experimental_vector_reduce_smin:
		case Intrinsic::experimental_vector_reduce_umax:
		case Intrinsic::experimental_vector_reduce_umin:
		case Intrinsic::experimental_vector_reduce_fmax:
		case Intrinsic::experimental_vector_reduce_fmin: {
		visitVectorReduce(I, Intrinsic);
		return nullptr;
		}

}		}
}		}

void SelectionDAGBuilder::visitConstrainedFPIntrinsic(const CallInst &I,		void SelectionDAGBuilder::visitConstrainedFPIntrinsic(const CallInst &I,
unsigned Intrinsic) {		unsigned Intrinsic) {
SDLoc sdl = getCurSDLoc();		SDLoc sdl = getCurSDLoc();
unsigned Opcode;		unsigned Opcode;
switch (Intrinsic) {		switch (Intrinsic) {
▲ Show 20 Lines • Show All 1,863 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitPatchpoint(ImmutableCallSite CS,
} else		} else
DAG.ReplaceAllUsesWith(Call, MN);		DAG.ReplaceAllUsesWith(Call, MN);
DAG.DeleteNode(Call);		DAG.DeleteNode(Call);

// Inform the Frame Information that we have a patchpoint in this function.		// Inform the Frame Information that we have a patchpoint in this function.
FuncInfo.MF->getFrameInfo().setHasPatchPoint();		FuncInfo.MF->getFrameInfo().setHasPatchPoint();
}		}

		void SelectionDAGBuilder::visitVectorReduce(const CallInst &I,
		unsigned Intrinsic) {
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		SDValue Op1 = getValue(I.getArgOperand(0));
		SDValue Op2;
		if (I.getNumArgOperands() > 1)
		Op2 = getValue(I.getArgOperand(1));
		SDLoc dl = getCurSDLoc();
		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
		SDValue Res;
		FastMathFlags FMF;
		if (isa<FPMathOperator>(I))
		FMF = I.getFastMathFlags();
		SDNodeFlags SDFlags;
		SDFlags.setNoNaNs(FMF.noNaNs());

		switch (Intrinsic) {
		case Intrinsic::experimental_vector_reduce_fadd:
		if (FMF.unsafeAlgebra())
		Res = DAG.getNode(ISD::VECREDUCE_FADD, dl, VT, Op2);
		else
		Res = DAG.getNode(ISD::VECREDUCE_STRICT_FADD, dl, VT, Op1, Op2);
		break;
		case Intrinsic::experimental_vector_reduce_fmul:
		if (FMF.unsafeAlgebra())
		Res = DAG.getNode(ISD::VECREDUCE_FMUL, dl, VT, Op2);
		else
		Res = DAG.getNode(ISD::VECREDUCE_STRICT_FMUL, dl, VT, Op1, Op2);
		break;
		case Intrinsic::experimental_vector_reduce_add:
		Res = DAG.getNode(ISD::VECREDUCE_ADD, dl, VT, Op1);
		break;
		case Intrinsic::experimental_vector_reduce_mul:
		Res = DAG.getNode(ISD::VECREDUCE_MUL, dl, VT, Op1);
		break;
		case Intrinsic::experimental_vector_reduce_and:
		Res = DAG.getNode(ISD::VECREDUCE_AND, dl, VT, Op1);
		break;
		case Intrinsic::experimental_vector_reduce_or:
		Res = DAG.getNode(ISD::VECREDUCE_OR, dl, VT, Op1);
		break;
		case Intrinsic::experimental_vector_reduce_xor:
		Res = DAG.getNode(ISD::VECREDUCE_XOR, dl, VT, Op1);
		break;
		case Intrinsic::experimental_vector_reduce_smax:
		Res = DAG.getNode(ISD::VECREDUCE_SMAX, dl, VT, Op1);
		break;
		case Intrinsic::experimental_vector_reduce_smin:
		Res = DAG.getNode(ISD::VECREDUCE_SMIN, dl, VT, Op1);
		break;
		case Intrinsic::experimental_vector_reduce_umax:
		Res = DAG.getNode(ISD::VECREDUCE_UMAX, dl, VT, Op1);
		break;
		case Intrinsic::experimental_vector_reduce_umin:
		Res = DAG.getNode(ISD::VECREDUCE_UMIN, dl, VT, Op1);
		break;
		case Intrinsic::experimental_vector_reduce_fmax: {
		Res = DAG.getNode(ISD::VECREDUCE_FMAX, dl, VT, Op1, SDFlags);
		break;
		}
		case Intrinsic::experimental_vector_reduce_fmin: {
		Res = DAG.getNode(ISD::VECREDUCE_FMIN, dl, VT, Op1, SDFlags);
		break;
		}
		default:
		llvm_unreachable("Unhandled vector reduce intrinsic");
		}
		setValue(&I, Res);
		}

/// Returns an AttributeList representing the attributes applied to the return		/// Returns an AttributeList representing the attributes applied to the return
/// value of the given call.		/// value of the given call.
static AttributeList getReturnAttrs(TargetLowering::CallLoweringInfo &CLI) {		static AttributeList getReturnAttrs(TargetLowering::CallLoweringInfo &CLI) {
SmallVector<Attribute::AttrKind, 2> Attrs;		SmallVector<Attribute::AttrKind, 2> Attrs;
if (CLI.RetSExt)		if (CLI.RetSExt)
Attrs.push_back(Attribute::SExt);		Attrs.push_back(Attribute::SExt);
if (CLI.RetZExt)		if (CLI.RetZExt)
Attrs.push_back(Attribute::ZExt);		Attrs.push_back(Attribute::ZExt);
▲ Show 20 Lines • Show All 1,870 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	case ISD::CONDCODE:
case ISD::SETLE: return "setle";		case ISD::SETLE: return "setle";
case ISD::SETNE: return "setne";		case ISD::SETNE: return "setne";

case ISD::SETTRUE: return "settrue";		case ISD::SETTRUE: return "settrue";
case ISD::SETTRUE2: return "settrue2";		case ISD::SETTRUE2: return "settrue2";
case ISD::SETFALSE: return "setfalse";		case ISD::SETFALSE: return "setfalse";
case ISD::SETFALSE2: return "setfalse2";		case ISD::SETFALSE2: return "setfalse2";
}		}
		case ISD::VECREDUCE_FADD: return "vecreduce_fadd";
		case ISD::VECREDUCE_FMUL: return "vecreduce_fmul";
		case ISD::VECREDUCE_ADD: return "vecreduce_add";
		case ISD::VECREDUCE_MUL: return "vecreduce_mul";
		case ISD::VECREDUCE_AND: return "vecreduce_and";
		case ISD::VECREDUCE_OR: return "vecreduce_or";
		case ISD::VECREDUCE_XOR: return "vecreduce_xor";
		case ISD::VECREDUCE_SMAX: return "vecreduce_smax";
		case ISD::VECREDUCE_SMIN: return "vecreduce_smin";
		case ISD::VECREDUCE_UMAX: return "vecreduce_umax";
		case ISD::VECREDUCE_UMIN: return "vecreduce_umin";
		case ISD::VECREDUCE_FMAX: return "vecreduce_fmax";
		case ISD::VECREDUCE_FMIN: return "vecreduce_fmin";
}		}
}		}

const char *SDNode::getIndexedModeName(ISD::MemIndexedMode AM) {		const char *SDNode::getIndexedModeName(ISD::MemIndexedMode AM) {
switch (AM) {		switch (AM) {
default: return "";		default: return "";
case ISD::PRE_INC: return "<pre-inc>";		case ISD::PRE_INC: return "<pre-inc>";
case ISD::PRE_DEC: return "<pre-dec>";		case ISD::PRE_DEC: return "<pre-dec>";
▲ Show 20 Lines • Show All 362 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/IRBuilder.cpp

Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	if (ScopeTag)
CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);		CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);

if (NoAliasTag)		if (NoAliasTag)
CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);		CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);

return CI;		return CI;
}		}

		static CallInst getReductionIntrinsic(IRBuilderBase Builder, Intrinsic::ID ID,
		Value *Src) {
		Module *M = Builder->GetInsertBlock()->getParent()->getParent();
		Value *Ops[] = {Src};
		Type *Tys[] = { Src->getType()->getVectorElementType(), Src->getType() };
		auto Decl = Intrinsic::getDeclaration(M, ID, Tys);
		return createCallHelper(Decl, Ops, Builder);
		}

		CallInst IRBuilderBase::CreateFAddReduce(Value Acc, Value *Src) {
		Module *M = GetInsertBlock()->getParent()->getParent();
		Value *Ops[] = {Acc, Src};
		Type *Tys[] = {Src->getType()->getVectorElementType(), Acc->getType(),
		Src->getType()};
		auto Decl = Intrinsic::getDeclaration(
		M, Intrinsic::experimental_vector_reduce_fadd, Tys);
		return createCallHelper(Decl, Ops, this);
		}

		CallInst IRBuilderBase::CreateFMulReduce(Value Acc, Value *Src) {
		Module *M = GetInsertBlock()->getParent()->getParent();
		Value *Ops[] = {Acc, Src};
		Type *Tys[] = {Src->getType()->getVectorElementType(), Acc->getType(),
		Src->getType()};
		auto Decl = Intrinsic::getDeclaration(
		M, Intrinsic::experimental_vector_reduce_fmul, Tys);
		return createCallHelper(Decl, Ops, this);
		}

		CallInst IRBuilderBase::CreateAddReduce(Value Src) {
		return getReductionIntrinsic(this, Intrinsic::experimental_vector_reduce_add,
		Src);
		}

		CallInst IRBuilderBase::CreateMulReduce(Value Src) {
		return getReductionIntrinsic(this, Intrinsic::experimental_vector_reduce_mul,
		Src);
		}

		CallInst IRBuilderBase::CreateAndReduce(Value Src) {
		return getReductionIntrinsic(this, Intrinsic::experimental_vector_reduce_and,
		Src);
		}

		CallInst IRBuilderBase::CreateOrReduce(Value Src) {
		return getReductionIntrinsic(this, Intrinsic::experimental_vector_reduce_or,
		Src);
		}

		CallInst IRBuilderBase::CreateXorReduce(Value Src) {
		return getReductionIntrinsic(this, Intrinsic::experimental_vector_reduce_xor,
		Src);
		}

		CallInst IRBuilderBase::CreateIntMaxReduce(Value Src, bool IsSigned) {
		auto ID = IsSigned ? Intrinsic::experimental_vector_reduce_smax
		: Intrinsic::experimental_vector_reduce_umax;
		return getReductionIntrinsic(this, ID, Src);
		}

		CallInst IRBuilderBase::CreateIntMinReduce(Value Src, bool IsSigned) {
		auto ID = IsSigned ? Intrinsic::experimental_vector_reduce_smin
		: Intrinsic::experimental_vector_reduce_umin;
		return getReductionIntrinsic(this, ID, Src);
		}

		CallInst IRBuilderBase::CreateFPMaxReduce(Value Src, bool NoNaN) {
		auto Rdx = getReductionIntrinsic(
		this, Intrinsic::experimental_vector_reduce_fmax, Src);
		if (NoNaN) {
		FastMathFlags FMF;
		FMF.setNoNaNs();
		Rdx->setFastMathFlags(FMF);
		}
		return Rdx;
		}

		CallInst IRBuilderBase::CreateFPMinReduce(Value Src, bool NoNaN) {
		auto Rdx = getReductionIntrinsic(
		this, Intrinsic::experimental_vector_reduce_fmin, Src);
		if (NoNaN) {
		FastMathFlags FMF;
		FMF.setNoNaNs();
		Rdx->setFastMathFlags(FMF);
		}
		return Rdx;
		}

CallInst IRBuilderBase::CreateLifetimeStart(Value Ptr, ConstantInt *Size) {		CallInst IRBuilderBase::CreateLifetimeStart(Value Ptr, ConstantInt *Size) {
assert(isa<PointerType>(Ptr->getType()) &&		assert(isa<PointerType>(Ptr->getType()) &&
"lifetime.start only applies to pointers.");		"lifetime.start only applies to pointers.");
Ptr = getCastedInt8PtrValue(Ptr);		Ptr = getCastedInt8PtrValue(Ptr);
if (!Size)		if (!Size)
Size = getInt64(-1);		Size = getInt64(-1);
else		else
assert(Size->getType() == getInt64Ty() &&		assert(Size->getType() == getInt64Ty() &&
▲ Show 20 Lines • Show All 327 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Utils/LoopUtils.cpp

Show All 12 Lines

#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"		#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
#include "llvm/Analysis/ScalarEvolutionExpander.h"		#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
▲ Show 20 Lines • Show All 1,078 Lines • ▼ Show 20 Lines	Optional<unsigned> llvm::getLoopEstimatedTripCount(Loop *L) {

// Divide the count of the backedge by the count of the edge exiting the loop,		// Divide the count of the backedge by the count of the edge exiting the loop,
// rounding to nearest.		// rounding to nearest.
if (LatchBR->getSuccessor(0) == L->getHeader())		if (LatchBR->getSuccessor(0) == L->getHeader())
return (TrueVal + (FalseVal / 2)) / FalseVal;		return (TrueVal + (FalseVal / 2)) / FalseVal;
else		else
return (FalseVal + (TrueVal / 2)) / TrueVal;		return (FalseVal + (TrueVal / 2)) / TrueVal;
}		}

		/// \brief Adds a 'fast' flag to floating point operations.
		static Value addFastMathFlag(Value V) {
		if (isa<FPMathOperator>(V)) {
		FastMathFlags Flags;
		Flags.setUnsafeAlgebra();
		cast<Instruction>(V)->setFastMathFlags(Flags);
		}
		return V;
		}

		// Helper to generate a log2 shuffle reduction.
		static Value *
		getShuffleReduction(IRBuilder<> &Builder, Value *Src, unsigned Op,
		RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind =
		RecurrenceDescriptor::MRK_Invalid,
		ArrayRef<Value > RedOps = ArrayRef<Value >()) {
		unsigned VF = Src->getType()->getVectorNumElements();
		// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
		// and vector ops, reducing the set of values being computed by half each
		// round.
		assert(isPowerOf2_32(VF) &&
		"Reduction emission only supported for pow2 vectors!");
		Value *TmpVec = Src;
		SmallVector<Constant *, 32> ShuffleMask(VF, nullptr);
		for (unsigned i = VF; i != 1; i >>= 1) {
		// Move the upper half of the vector to the lower half.
		for (unsigned j = 0; j != i / 2; ++j)
		ShuffleMask[j] = Builder.getInt32(i / 2 + j);

		// Fill the rest of the mask with undef.
		std::fill(&ShuffleMask[i / 2], ShuffleMask.end(),
		UndefValue::get(Builder.getInt32Ty()));

		Value *Shuf = Builder.CreateShuffleVector(
		TmpVec, UndefValue::get(TmpVec->getType()),
		ConstantVector::get(ShuffleMask), "rdx.shuf");

		if (Op != Instruction::ICmp && Op != Instruction::FCmp) {
		// Floating point operations had to be 'fast' to enable the reduction.
		TmpVec = addFastMathFlag(Builder.CreateBinOp((Instruction::BinaryOps)Op,
		TmpVec, Shuf, "bin.rdx"));
		} else {
		assert(MinMaxKind != RecurrenceDescriptor::MRK_Invalid &&
		"Invalid min/max");
		TmpVec = RecurrenceDescriptor::createMinMaxOp(Builder, MinMaxKind, TmpVec,
		Shuf);
		}
		if (!RedOps.empty())
		propagateIRFlags(TmpVec, RedOps);
		}
		// The result is in the first element of the vector.
		return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));
		}

		/// Create a simple vector reduction specified by an opcode and some
		/// flags (if generating min/max reductions).
		Value *llvm::createSimpleTargetReduction(
		IRBuilder<> &Builder, const TargetTransformInfo *TTI, unsigned Opcode,
		Value *Src, TargetTransformInfo::ReductionFlags Flags,
		ArrayRef<Value *> RedOps) {
		assert(isa<VectorType>(Src->getType()) && "Type must be a vector");

		Value *ScalarUdf = UndefValue::get(Src->getType()->getVectorElementType());
		std::function<Value*()> BuildFunc;
		using RD = RecurrenceDescriptor;
		RD::MinMaxRecurrenceKind MinMaxKind = RD::MRK_Invalid;
		// TODO: Support creating ordered reductions.
		FastMathFlags FMFUnsafe;
		FMFUnsafe.setUnsafeAlgebra();

		switch (Opcode) {
		case Instruction::Add:
		BuildFunc = [&]() { return Builder.CreateAddReduce(Src); };
		break;
		case Instruction::Mul:
		BuildFunc = [&]() { return Builder.CreateMulReduce(Src); };
		break;
		case Instruction::And:
		BuildFunc = [&]() { return Builder.CreateAndReduce(Src); };
		break;
		case Instruction::Or:
		BuildFunc = [&]() { return Builder.CreateOrReduce(Src); };
		break;
		case Instruction::Xor:
		BuildFunc = [&]() { return Builder.CreateXorReduce(Src); };
		break;
		case Instruction::FAdd:
		BuildFunc = [&]() {
		auto Rdx = Builder.CreateFAddReduce(ScalarUdf, Src);
		cast<CallInst>(Rdx)->setFastMathFlags(FMFUnsafe);
		return Rdx;
		};
		break;
		case Instruction::FMul:
		BuildFunc = [&]() {
		auto Rdx = Builder.CreateFMulReduce(ScalarUdf, Src);
		cast<CallInst>(Rdx)->setFastMathFlags(FMFUnsafe);
		return Rdx;
		};
		break;
		case Instruction::ICmp:
		if (Flags.IsMaxOp) {
		MinMaxKind = Flags.IsSigned ? RD::MRK_SIntMax : RD::MRK_UIntMax;
		BuildFunc = [&]() {
		return Builder.CreateIntMaxReduce(Src, Flags.IsSigned);
		};
		} else {
		MinMaxKind = Flags.IsSigned ? RD::MRK_SIntMin : RD::MRK_UIntMin;
		BuildFunc = [&]() {
		return Builder.CreateIntMinReduce(Src, Flags.IsSigned);
		};
		}
		break;
		case Instruction::FCmp:
		if (Flags.IsMaxOp) {
		MinMaxKind = RD::MRK_FloatMax;
		BuildFunc = [&]() { return Builder.CreateFPMaxReduce(Src, Flags.NoNaN); };
		} else {
		MinMaxKind = RD::MRK_FloatMin;
		BuildFunc = [&]() { return Builder.CreateFPMinReduce(Src, Flags.NoNaN); };
		}
		break;
		default:
		llvm_unreachable("Unhandled opcode");
		break;
		}
		if (TTI->useReductionIntrinsic(Opcode, Src->getType(), Flags))
		return BuildFunc();
		return getShuffleReduction(Builder, Src, Opcode, MinMaxKind, RedOps);
		}

		/// Create a vector reduction using a given recurrence descriptor.
		Value *llvm::createTargetReduction(IRBuilder<> &Builder,
		const TargetTransformInfo *TTI,
		RecurrenceDescriptor &Desc, Value *Src,
		bool NoNaN) {
		// TODO: Support in-order reductions based on the recurrence descriptor.
		RecurrenceDescriptor::RecurrenceKind RecKind = Desc.getRecurrenceKind();
		TargetTransformInfo::ReductionFlags Flags;
		Flags.NoNaN = NoNaN;
		auto getSimpleRdx = [&](unsigned Opc) {
		return createSimpleTargetReduction(Builder, TTI, Opc, Src, Flags);
		};
		switch (RecKind) {
		case RecurrenceDescriptor::RK_FloatAdd:
		return getSimpleRdx(Instruction::FAdd);
		case RecurrenceDescriptor::RK_FloatMult:
		return getSimpleRdx(Instruction::FMul);
		case RecurrenceDescriptor::RK_IntegerAdd:
		return getSimpleRdx(Instruction::Add);
		case RecurrenceDescriptor::RK_IntegerMult:
		return getSimpleRdx(Instruction::Mul);
		case RecurrenceDescriptor::RK_IntegerAnd:
		return getSimpleRdx(Instruction::And);
		case RecurrenceDescriptor::RK_IntegerOr:
		return getSimpleRdx(Instruction::Or);
		case RecurrenceDescriptor::RK_IntegerXor:
		return getSimpleRdx(Instruction::Xor);
		case RecurrenceDescriptor::RK_IntegerMinMax: {
		switch (Desc.getMinMaxRecurrenceKind()) {
		case RecurrenceDescriptor::MRK_SIntMax:
		Flags.IsSigned = true;
		Flags.IsMaxOp = true;
		break;
		case RecurrenceDescriptor::MRK_UIntMax:
		Flags.IsMaxOp = true;
		break;
		case RecurrenceDescriptor::MRK_SIntMin:
		Flags.IsSigned = true;
		break;
		case RecurrenceDescriptor::MRK_UIntMin:
		break;
		default:
		llvm_unreachable("Unhandled MRK");
		}
		return getSimpleRdx(Instruction::ICmp);
		}
		case RecurrenceDescriptor::RK_FloatMinMax: {
		Flags.IsMaxOp =
		Desc.getMinMaxRecurrenceKind() == RecurrenceDescriptor::MRK_FloatMax;
		return getSimpleRdx(Instruction::FCmp);
		}
		default:
		llvm_unreachable("Unhandled RecKind");
		}
		}

		void llvm::propagateIRFlags(Value I, ArrayRef<Value > VL) {
		if (auto *VecOp = dyn_cast<Instruction>(I)) {
		if (auto *I0 = dyn_cast<Instruction>(VL[0])) {
		// VecOVp is initialized to the 0th scalar, so start counting from index
		// '1'.
		VecOp->copyIRFlags(I0);
		for (int i = 1, e = VL.size(); i < e; ++i) {
		if (auto *Scalar = dyn_cast<Instruction>(VL[i]))
		VecOp->andIRFlags(Scalar);
		}
		}
		}
		}

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,694 Lines • ▼ Show 20 Lines	public:
/// predication. Such instructions include conditional stores and		/// predication. Such instructions include conditional stores and
/// instructions that may divide by zero.		/// instructions that may divide by zero.
bool isScalarWithPredication(Instruction *I);		bool isScalarWithPredication(Instruction *I);

/// Returns true if \p I is a memory instruction with consecutive memory		/// Returns true if \p I is a memory instruction with consecutive memory
/// access that can be widened.		/// access that can be widened.
bool memoryInstructionCanBeWidened(Instruction *I, unsigned VF = 1);		bool memoryInstructionCanBeWidened(Instruction *I, unsigned VF = 1);

		// Returns true if the NoNaN attribute is set on the function.
		bool hasFunNoNaNAttr() const { return HasFunNoNaNAttr; }

private:		private:
/// Check if a single basic block loop is vectorizable.		/// Check if a single basic block loop is vectorizable.
/// At this point we know that this is a loop with a constant trip count		/// At this point we know that this is a loop with a constant trip count
/// and we only need to check individual instructions.		/// and we only need to check individual instructions.
bool canVectorizeInstrs();		bool canVectorizeInstrs();

/// When we vectorize loops we may change the order in which		/// When we vectorize loops we may change the order in which
/// we read and write from memory. This method checks if it is		/// we read and write from memory. This method checks if it is
▲ Show 20 Lines • Show All 2,542 Lines • ▼ Show 20 Lines	if (Op != Instruction::ICmp && Op != Instruction::FCmp)
Builder.CreateBinOp((Instruction::BinaryOps)Op, RdxParts[part],		Builder.CreateBinOp((Instruction::BinaryOps)Op, RdxParts[part],
ReducedPartRdx, "bin.rdx"));		ReducedPartRdx, "bin.rdx"));
else		else
ReducedPartRdx = RecurrenceDescriptor::createMinMaxOp(		ReducedPartRdx = RecurrenceDescriptor::createMinMaxOp(
Builder, MinMaxKind, ReducedPartRdx, RdxParts[part]);		Builder, MinMaxKind, ReducedPartRdx, RdxParts[part]);
}		}

if (VF > 1) {		if (VF > 1) {
// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles		bool NoNaN = Legal->hasFunNoNaNAttr();
// and vector ops, reducing the set of values being computed by half each
// round.
assert(isPowerOf2_32(VF) &&
"Reduction emission only supported for pow2 vectors!");
Value *TmpVec = ReducedPartRdx;
SmallVector<Constant *, 32> ShuffleMask(VF, nullptr);
for (unsigned i = VF; i != 1; i >>= 1) {
// Move the upper half of the vector to the lower half.
for (unsigned j = 0; j != i / 2; ++j)
ShuffleMask[j] = Builder.getInt32(i / 2 + j);

// Fill the rest of the mask with undef.
std::fill(&ShuffleMask[i / 2], ShuffleMask.end(),
UndefValue::get(Builder.getInt32Ty()));

Value *Shuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()),
ConstantVector::get(ShuffleMask), "rdx.shuf");

if (Op != Instruction::ICmp && Op != Instruction::FCmp)
// Floating point operations had to be 'fast' to enable the reduction.
TmpVec = addFastMathFlag(Builder.CreateBinOp(
(Instruction::BinaryOps)Op, TmpVec, Shuf, "bin.rdx"));
else
TmpVec = RecurrenceDescriptor::createMinMaxOp(Builder, MinMaxKind,
TmpVec, Shuf);
}

// The result is in the first element of the vector.
ReducedPartRdx =		ReducedPartRdx =
Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));		createTargetReduction(Builder, TTI, RdxDesc, ReducedPartRdx, NoNaN);

// If the reduction can be performed in a smaller type, we need to extend		// If the reduction can be performed in a smaller type, we need to extend
// the reduction to the wider type before we branch to the original loop.		// the reduction to the wider type before we branch to the original loop.
if (Phi->getType() != RdxDesc.getRecurrenceType())		if (Phi->getType() != RdxDesc.getRecurrenceType())
ReducedPartRdx =		ReducedPartRdx =
RdxDesc.isSigned()		RdxDesc.isSigned()
? Builder.CreateSExt(ReducedPartRdx, Phi->getType())		? Builder.CreateSExt(ReducedPartRdx, Phi->getType())
: Builder.CreateZExt(ReducedPartRdx, Phi->getType());		: Builder.CreateZExt(ReducedPartRdx, Phi->getType());
}		}
▲ Show 20 Lines • Show All 3,689 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp

Show All 35 Lines
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/GraphWriter.h"		#include "llvm/Support/GraphWriter.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
#include <algorithm>		#include <algorithm>
#include <memory>		#include <memory>

using namespace llvm;		using namespace llvm;
using namespace slpvectorizer;		using namespace slpvectorizer;

#define SV_NAME "slp-vectorizer"		#define SV_NAME "slp-vectorizer"
▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	if (!I \|\| Opcode != I->getOpcode()) {
if (canCombineAsAltInst(Opcode) && i == 1)		if (canCombineAsAltInst(Opcode) && i == 1)
return isAltInst(VL);		return isAltInst(VL);
return 0;		return 0;
}		}
}		}
return Opcode;		return Opcode;
}		}

/// Get the intersection (logical and) of all of the potential IR flags
/// of each scalar operation (VL) that will be converted into a vector (I).
/// Flag set: NSW, NUW, exact, and all of fast-math.
static void propagateIRFlags(Value I, ArrayRef<Value > VL) {
if (auto *VecOp = dyn_cast<Instruction>(I)) {
if (auto *I0 = dyn_cast<Instruction>(VL[0])) {
// VecOVp is initialized to the 0th scalar, so start counting from index
// '1'.
VecOp->copyIRFlags(I0);
for (int i = 1, e = VL.size(); i < e; ++i) {
if (auto *Scalar = dyn_cast<Instruction>(VL[i]))
VecOp->andIRFlags(Scalar);
}
}
}
}

/// \returns true if all of the values in \p VL have the same type or false		/// \returns true if all of the values in \p VL have the same type or false
/// otherwise.		/// otherwise.
static bool allSameType(ArrayRef<Value *> VL) {		static bool allSameType(ArrayRef<Value *> VL) {
Type *Ty = VL[0]->getType();		Type *Ty = VL[0]->getType();
for (int i = 1, e = VL.size(); i < e; i++)		for (int i = 1, e = VL.size(); i < e; i++)
if (VL[i]->getType() != Ty)		if (VL[i]->getType() != Ty)
return false;		return false;

▲ Show 20 Lines • Show All 4,268 Lines • ▼ Show 20 Lines	while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > 2) {
<< ". (HorRdx)\n");		<< ". (HorRdx)\n");

// Vectorize a tree.		// Vectorize a tree.
DebugLoc Loc = cast<Instruction>(ReducedVals[i])->getDebugLoc();		DebugLoc Loc = cast<Instruction>(ReducedVals[i])->getDebugLoc();
Value *VectorizedRoot = V.vectorizeTree(ExternallyUsedValues);		Value *VectorizedRoot = V.vectorizeTree(ExternallyUsedValues);

// Emit a reduction.		// Emit a reduction.
Value *ReducedSubTree =		Value *ReducedSubTree =
emitReduction(VectorizedRoot, Builder, ReduxWidth, ReductionOps);		emitReduction(VectorizedRoot, Builder, ReduxWidth, ReductionOps, TTI);
if (VectorizedTree) {		if (VectorizedTree) {
Builder.SetCurrentDebugLocation(Loc);		Builder.SetCurrentDebugLocation(Loc);
VectorizedTree = Builder.CreateBinOp(ReductionOpcode, VectorizedTree,		VectorizedTree = Builder.CreateBinOp(ReductionOpcode, VectorizedTree,
ReducedSubTree, "bin.rdx");		ReducedSubTree, "bin.rdx");
propagateIRFlags(VectorizedTree, ReductionOps);		propagateIRFlags(VectorizedTree, ReductionOps);
} else		} else
VectorizedTree = ReducedSubTree;		VectorizedTree = ReducedSubTree;
i += ReduxWidth;		i += ReduxWidth;
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	DEBUG(dbgs() << "SLP: Adding cost " << VecReduxCost - ScalarReduxCost
<< (IsPairwiseReduction ? "pairwise" : "splitting")		<< (IsPairwiseReduction ? "pairwise" : "splitting")
<< " reduction)\n");		<< " reduction)\n");

return VecReduxCost - ScalarReduxCost;		return VecReduxCost - ScalarReduxCost;
}		}

/// \brief Emit a horizontal reduction of the vectorized value.		/// \brief Emit a horizontal reduction of the vectorized value.
Value emitReduction(Value VectorizedValue, IRBuilder<> &Builder,		Value emitReduction(Value VectorizedValue, IRBuilder<> &Builder,
unsigned ReduxWidth, ArrayRef<Value *> RedOps) {		unsigned ReduxWidth, ArrayRef<Value *> RedOps,
		const TargetTransformInfo *TTI) {
assert(VectorizedValue && "Need to have a vectorized tree node");		assert(VectorizedValue && "Need to have a vectorized tree node");
assert(isPowerOf2_32(ReduxWidth) &&		assert(isPowerOf2_32(ReduxWidth) &&
"We only handle power-of-two reductions for now");		"We only handle power-of-two reductions for now");

		if (!IsPairwiseReduction)
		return createSimpleTargetReduction(
		Builder, TTI, ReductionOpcode, VectorizedValue,
		TargetTransformInfo::ReductionFlags(), RedOps);

Value *TmpVec = VectorizedValue;		Value *TmpVec = VectorizedValue;
for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {		for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {
if (IsPairwiseReduction) {
Value *LeftMask =		Value *LeftMask =
createRdxShuffleMask(ReduxWidth, i, true, true, Builder);		createRdxShuffleMask(ReduxWidth, i, true, true, Builder);
Value *RightMask =		Value *RightMask =
createRdxShuffleMask(ReduxWidth, i, true, false, Builder);		createRdxShuffleMask(ReduxWidth, i, true, false, Builder);

Value *LeftShuf = Builder.CreateShuffleVector(		Value *LeftShuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");		TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");
Value *RightShuf = Builder.CreateShuffleVector(		Value *RightShuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),		TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
"rdx.shuf.r");		"rdx.shuf.r");
TmpVec = Builder.CreateBinOp(ReductionOpcode, LeftShuf, RightShuf,		TmpVec =
"bin.rdx");		Builder.CreateBinOp(ReductionOpcode, LeftShuf, RightShuf, "bin.rdx");
} else {
Value *UpperHalf =
createRdxShuffleMask(ReduxWidth, i, false, false, Builder);
Value *Shuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()), UpperHalf, "rdx.shuf");
TmpVec = Builder.CreateBinOp(ReductionOpcode, TmpVec, Shuf, "bin.rdx");
}
propagateIRFlags(TmpVec, RedOps);		propagateIRFlags(TmpVec, RedOps);
}		}

// The result is in the first element of the vector.		// The result is in the first element of the vector.
return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));		return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));
}		}
};		};
} // end anonymous namespace		} // end anonymous namespace
▲ Show 20 Lines • Show All 549 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add generic IR vector reductionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 98264

llvm/trunk/docs/LangRef.rst

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/trunk/include/llvm/CodeGen/ISDOpcodes.h

llvm/trunk/include/llvm/IR/IRBuilder.h

llvm/trunk/include/llvm/IR/Intrinsics.td

llvm/trunk/include/llvm/Transforms/Utils/LoopUtils.h

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

llvm/trunk/lib/Analysis/VectorUtils.cpp

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeTypes.h

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

llvm/trunk/lib/IR/IRBuilder.cpp

llvm/trunk/lib/Transforms/Utils/LoopUtils.cpp

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp

Add generic IR vector reductions
ClosedPublic