This is an archive of the discontinued LLVM Phabricator instance.

Add generic IR vector reductions
ClosedPublic

Authored by aemerson on Feb 17 2017, 4:00 AM.

Details

Summary

This patch adds IR intrinsics for horizontal vector reductions and allows targets to opt-in to using them instead of the log2 shuffle vector algorithm.

So far the reductions currently added are: int add, mul, and, or, xor, [s|u]min, [s|u]max, fp add, fmax, fmin.

The SLP and Loop vectorizers have the common code to do shuffle reductions factored out into LoopUtils, and now have a unified interface for generating reductions regardless of the preference of the target. LoopUtils now uses TTI to determine what kind of reductions the target wants to handle.

For CodeGen, basic legalization support is added. I have a follow up patch to begin using these new representations for AArch64 NEON once this implementation is finalised.

Diff Detail

Repository
rL LLVM

Event Timeline

aemerson created this revision.Feb 17 2017, 4:00 AM
mehdi_amini added inline comments.Feb 17 2017, 9:52 AM
docs/LangRef.rst
11620 ↗(On Diff #88870)

Why this specification about implementation detail here?

11655 ↗(On Diff #88870)

I think it'd be important to clarify the ordering: do we require a "fast" FMF to change it? If not, why?

11767 ↗(On Diff #88870)

Why the explicit <scalar-type> in the signature?

11810 ↗(On Diff #88870)

There is no second operand in the prototype (And we have FMF for this purpose, so remove the sentence).

mkuper edited edge metadata.Feb 17 2017, 5:07 PM

I'm not sure this is the right strategy for deciding when and how to produce an intrinsic vs. shuffle sequence.

There are two related issues:

  1. The normal behavior for non-experimental target-independent intrinsics is that targets *must* support them. It's ok if the "fallback" way to support them is to lower them back to to the "brute-force" shuffle pyramid, and that fallback gets used by all targets that don't have a more direct way to support it. But it still has to be supported somehow. Basically, the way I understand the general philosophy is - it's up to the optimizer to ask the target which constructs are cheap and which are expensive. But if the target is handed an expensive construct, it still has to lower it, it's just that the lowering is allowed to be really bad.
  1. Will this now be the canonical form for reductions? E.g. should instcombine match the shuffle pyramid sequence into this? If the answer is yes, then the generic lowering for the intrinsic should be as good as lowering for the shuffle pyramid. That doesn't seem too hard, since that's the natural lowering, but that would put a bound on how bad the generic lowering is allowed to be. That would also means we won't have createTargetReduction() - all of that logic would move into lowering.

I can see some advantages to forming the intrinsic conditionally, but I'm not sure they're worth the cost of not having a canonical representation.

docs/LangRef.rst
11630 ↗(On Diff #88870)

A bit of bikeshedding: we usually use a slightly different format - e.g. something like LangRef.html#add-instruction
(In particular, it's nice to more explicitly state that the return type is an integer type, etc.)

11687 ↗(On Diff #88870)

Integer types only, presumably?

11708 ↗(On Diff #88870)

I believe EOR is XOR in non-arm land. :-)

11811 ↗(On Diff #88870)

What are the semantics for NaNs, when they are present? Also, what are the semantics for signed zeroes?

delena edited edge metadata.Feb 18 2017, 11:15 PM

According to the code that I see, all reduction intrinsics have the same form and the same handling.
IMO, llvm.vector.reduce(<metadata>, <vector>) would cover all aspects of FP modes, and integer signed-unsigned variants.
The <metadata> class can include opcode and "properties".
I know that it was discussed earlier, but another option is providing the full set for FP and the full set for integers - all *SInt* and *UInt* variants.

I don't think that providing FP properties as additional boolean parameters is a good solution.

docs/LangRef.rst
11655 ↗(On Diff #88870)

I think that in the case of FP operations we need two intrinsics - "ordered" and "fast".

11832 ↗(On Diff #88870)

Usually, we give code examples in the documentation.

include/llvm/Analysis/TargetTransformInfo.h
715 ↗(On Diff #88870)

What does IsMaxOp mean? I think that it is better to split this interface into several functions and do not mix fp and integers together.

mehdi_amini edited edge metadata.Feb 19 2017, 10:25 AM

According to the code that I see, all reduction intrinsics have the same form and the same handling.
IMO, llvm.vector.reduce(<metadata>, <vector>) would cover all aspects of FP modes, and integer signed-unsigned variants.
The <metadata> class can include opcode and "properties".

What is the advantage of this?

I mean, we don't do our scalar instructions as binop <metadata>, op1, op2?

mehdi_amini added inline comments.Feb 19 2017, 10:27 AM
docs/LangRef.rst
11655 ↗(On Diff #88870)

Why don't reusing the existing FMF? The intrinsic without FMF would be equivalent to "ordered" and the "fast" FMF flag would allow any ordering.

mkuper added inline comments.Feb 19 2017, 1:06 PM
docs/LangRef.rst
11655 ↗(On Diff #88870)

I don't think we can currently attach FMF to CallInsts.

aemerson marked 3 inline comments as done.Feb 20 2017, 3:09 AM

I'm not sure this is the right strategy for deciding when and how to produce an intrinsic vs. shuffle sequence.

There are two related issues:

  1. The normal behavior for non-experimental target-independent intrinsics is that targets *must* support them. It's ok if the "fallback" way to support them is to lower them back to to the "brute-force" shuffle pyramid, and that fallback gets used by all targets that don't have a more direct way to support it. But it still has to be supported somehow. Basically, the way I understand the general philosophy is - it's up to the optimizer to ask the target which constructs are cheap and which are expensive. But if the target is handed an expensive construct, it still has to lower it, it's just that the lowering is allowed to be really bad.
  1. Will this now be the canonical form for reductions? E.g. should instcombine match the shuffle pyramid sequence into this? If the answer is yes, then the generic lowering for the intrinsic should be as good as lowering for the shuffle pyramid. That doesn't seem too hard, since that's the natural lowering, but that would put a bound on how bad the generic lowering is allowed to be. That would also means we won't have createTargetReduction() - all of that logic would move into lowering.

I can see some advantages to forming the intrinsic conditionally, but I'm not sure they're worth the cost of not having a canonical representation.

This patch was intended to be a transitional stage, my expectation was that over time targets would decide to move to this representation by opting in and handling them. The reason for the transitional step was that I didn't want to force this form on all targets (even if it lowers to the same code by a lowering step) as it changes the IR during the mid-end stage. At the moment, the only target which requires this intrinsic form without any alternative is SVE. If you think it's ok to change the reduction form for all targets until a late lowering then I'm happy to implement that.

docs/LangRef.rst
11620 ↗(On Diff #88870)

I was trying to give some explanation why there may be two different representations of reductions depending on the target. I can remove this.

11655 ↗(On Diff #88870)

Thanks, I hadn't realised CallInsts supported FMFs, it makes some sense to use for the ordered/unordered distinction but what are your thoughts on the extra scalar accumulator operand for ordered reductions? An undef scalar value could be given for fast reductions but it my view it clutters the intrinsic for the vast majority of reductions, though not a deal breaker.

11687 ↗(On Diff #88870)

Yes.

11767 ↗(On Diff #88870)

This is the syntax of the declaration in the IR.

include/llvm/Analysis/TargetTransformInfo.h
715 ↗(On Diff #88870)

It signals to the function whether the reduction is a min or max operation, if the opcode is a CMP. I'll look at splitting this up.

mehdi_amini added inline comments.Feb 20 2017, 10:25 AM
docs/LangRef.rst
11655 ↗(On Diff #88870)

I'm not sure about the pros/cons of the undef scalar thing.

I can see some advantages to forming the intrinsic conditionally, but I'm not sure they're worth the cost of not having a canonical representation.

This patch was intended to be a transitional stage, my expectation was that over time targets would decide to move to this representation by opting in and handling them. The reason for the transitional step was that I didn't want to force this form on all targets (even if it lowers to the same code by a lowering step) as it changes the IR during the mid-end stage. At the moment, the only target which requires this intrinsic form without any alternative is SVE. If you think it's ok to change the reduction form for all targets until a late lowering then I'm happy to implement that.

What I'd really like to avoid is having a nominally "target-independent" intrinsic, which can, in practice, only be handled by one or two targets.

There are three ways I can see this going:
a) If we know this is not the right generic representation, this should not go in as a target-independent intrinsic. I haven't seen anyone raise any "deal-breaker" issues so far, though.
b) If we think this is the right generic representation, but we're not sure, we can introduce these intrinsics as llvm.experimental. For an experimental intrinsic, I think it makes more sense to have targets opt-in like this patch does (but we probably still want default lowering for targets that don't opt-in).
c) If we are sure about this, then we should do just it for all targets, and have late lowering by default. I'm not really the right person to determine that, though. I guess that's a question for the backend owners.

rengolin edited edge metadata.Feb 21 2017, 4:42 AM

What I'd really like to avoid is having a nominally "target-independent" intrinsic, which can, in practice, only be handled by one or two targets.

Agreed.

a) If we know this is not the right generic representation, this should not go in as a target-independent intrinsic. I haven't seen anyone raise any "deal-breaker" issues so far, though.

This seems to tick all the boxes, and have been reviewed for both ARM back-ends (including the future SVE) and x86 (including the future AVX1024).

We should really have people from other targets to make sure this makes sense, but at this point, I think the changes will be minimal, since any architecture specific change can be done at the lowering stage on their own back-ends.

b) If we think this is the right generic representation, but we're not sure, we can introduce these intrinsics as llvm.experimental. For an experimental intrinsic, I think it makes more sense to have targets opt-in like this patch does (but we probably still want default lowering for targets that don't opt-in).

I think, for now, this is the best thing to do.

The log2 shuffle pattern is canonical and it works well with the "lower how you can" premise already. It may be cumbersome for AVX512 (long pyramid), but it's not unbearable. It will become unbearable for AVX-1024 and impossible for SVE to use the same pattern, so those targets can begin using it on the side, and then we can extend to the remaining targets on their pace.

The alternative is to have an SVE/AVX specific intrinsic, which means the same thing as the pyramid, and doesn't need to be handled by other targets. Clang (via intrinsics) would generate them as well as the vectorisers, and that would be completely opaque to any other pass (as external function calls). But this would make it harder to extend support for the other smaller vector sizes (renames, tests, etc) if we want to move it to the canonical form.

I personally think that this (concept) is a good representation of a horizontal reduction, so could easily be a canonical form one day. The particular shape may need to change, but that's orthogonal.

c) If we are sure about this, then we should do just it for all targets, and have late lowering by default. I'm not really the right person to determine that, though. I guess that's a question for the backend owners.

If we make it experimental, we don't need to push any target to support it ASAP. Furthermore, both SVE and AVX 1024 are, themselves, "experimental", and this change is mostly for their benefit, so I don't see why this shouldn't be experimental at this stage. It'll also give us time to change and other targets to adapt it if needed.

aemerson marked an inline comment as done.Feb 22 2017, 2:48 AM

I can see some advantages to forming the intrinsic conditionally, but I'm not sure they're worth the cost of not having a canonical representation.

This patch was intended to be a transitional stage, my expectation was that over time targets would decide to move to this representation by opting in and handling them. The reason for the transitional step was that I didn't want to force this form on all targets (even if it lowers to the same code by a lowering step) as it changes the IR during the mid-end stage. At the moment, the only target which requires this intrinsic form without any alternative is SVE. If you think it's ok to change the reduction form for all targets until a late lowering then I'm happy to implement that.

What I'd really like to avoid is having a nominally "target-independent" intrinsic, which can, in practice, only be handled by one or two targets.

There are three ways I can see this going:
a) If we know this is not the right generic representation, this should not go in as a target-independent intrinsic. I haven't seen anyone raise any "deal-breaker" issues so far, though.
b) If we think this is the right generic representation, but we're not sure, we can introduce these intrinsics as llvm.experimental. For an experimental intrinsic, I think it makes more sense to have targets opt-in like this patch does (but we probably still want default lowering for targets that don't opt-in).
c) If we are sure about this, then we should do just it for all targets, and have late lowering by default. I'm not really the right person to determine that, though. I guess that's a question for the backend owners.

! In D30086#682110, @rengolin wrote:

! In D30086#681823, @mkuper wrote:

What I'd really like to avoid is having a nominally "target-independent" intrinsic, which can, in practice, only be handled by one or two targets.

Agreed.

a) If we know this is not the right generic representation, this should not go in as a target-independent intrinsic. I haven't seen anyone raise any "deal-breaker" issues so far, though.

This seems to tick all the boxes, and have been reviewed for both ARM back-ends (including the future SVE) and x86 (including the future AVX1024).

We should really have people from other targets to make sure this makes sense, but at this point, I think the changes will be minimal, since any architecture specific change can be done at the lowering stage on their own back-ends.

b) If we think this is the right generic representation, but we're not sure, we can introduce these intrinsics as llvm.experimental. For an experimental intrinsic, I think it makes more sense to have targets opt-in like this patch does (but we probably still want default lowering for targets that don't opt-in).

I think, for now, this is the best thing to do.

The log2 shuffle pattern is canonical and it works well with the "lower how you can" premise already. It may be cumbersome for AVX512 (long pyramid), but it's not unbearable. It will become unbearable for AVX-1024 and impossible for SVE to use the same pattern, so those targets can begin using it on the side, and then we can extend to the remaining targets on their pace.

The alternative is to have an SVE/AVX specific intrinsic, which means the same thing as the pyramid, and doesn't need to be handled by other targets. Clang (via intrinsics) would generate them as well as the vectorisers, and that would be completely opaque to any other pass (as external function calls). But this would make it harder to extend support for the other smaller vector sizes (renames, tests, etc) if we want to move it to the canonical form.

I personally think that this (concept) is a good representation of a horizontal reduction, so could easily be a canonical form one day. The particular shape may need to change, but that's orthogonal.

c) If we are sure about this, then we should do just it for all targets, and have late lowering by default. I'm not really the right person to determine that, though. I guess that's a question for the backend owners.

If we make it experimental, we don't need to push any target to support it ASAP. Furthermore, both SVE and AVX 1024 are, themselves, "experimental", and this change is mostly for their benefit, so I don't see why this shouldn't be experimental at this stage. It'll also give us time to change and other targets to adapt it if needed.

Ok, so for now I'll make the changes to address the review comments but I won't be moving to a late lowering solution (which I think is more preferable but if these are experimental then we can't do that).

On the ordered intrinsics issue: what I propose is that we retain separation of the intrinsics for the purposes of cleaner IR for the usual fast reductions, so the scalar argument isn't needed. As these are now experimental intrinsics we can change the implementation later if necessary.

Ok, so for now I'll make the changes to address the review comments but I won't be moving to a late lowering solution (which I think is more preferable but if these are experimental then we can't do that).

I think we need late lowering support, in addition to the opt-in, even if the intrinsics are experimental. Without it, it's kind of hard to "experiment" with them. :-)
E.g. we'd like to enable them on x86, and see what breaks.

fhahn added a subscriber: fhahn.Feb 23 2017, 2:01 AM
aemerson updated this revision to Diff 95795.Apr 19 2017, 11:58 AM

I've finally got around to re-working this. Change summary:

  • Addressed langref comments.
  • Intrinsics are now experimental.
  • The TTI interface has been split into two to give the min/max reductions a separate interface. I haven't split out the FP from int types because it doesn't really seem to give any benefit as the handling is very similar.
  • A new SDNode type has been added to support fast-math flags on unary op nodes. Previously this was only supported for binary ops.
  • The intrinsics for FP add/mul now take an additional scalar operand which should be undef if the reduction is a conventional "fast" reduction, otherwise it's the accumulator value for strict, i.e. ordered, reductions. CallInst fast-math flags are used to determine which semantics are wanted.
  • The fmin/fmax intrinsics now use the fast-math flags to propagate NoNaN.

The separate lowering pass for these will come in a subsequent patch. For AArch64, my intention is to begin using these directly without using expansion however.

delena added inline comments.Apr 23 2017, 11:38 PM
lib/Transforms/Utils/LoopUtils.cpp
1179 ↗(On Diff #95795)

the "else" is redundant.

1191 ↗(On Diff #95795)

You implemented 2 functions with the same name. It's not clear what the difference between them. Please add some comments before each of them.

1281 ↗(On Diff #95795)

I'd rather combine useMinMaxReductionIntrinsic and useReductionIntrinsic in one and provide 3 parameters: Type, Op and Flags

aemerson marked 3 inline comments as done.Apr 24 2017, 7:23 AM
aemerson added inline comments.
lib/Transforms/Utils/LoopUtils.cpp
1191 ↗(On Diff #95795)

I've added some comments to the API in the header, but I'll flesh those out more and add some short ones here too.

rengolin added inline comments.Apr 24 2017, 8:25 AM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1595 ↗(On Diff #95795)

Wouldn't this assert be better to check on the type of the operand?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1 ↗(On Diff #95795)

unnecessary whitespace change

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
352 ↗(On Diff #95795)

format?

lib/Transforms/Utils/LoopUtils.cpp
1191 ↗(On Diff #95795)

I think they're different enough that they deserve different names. Normally, you should use the same name if the arguments are different, but the functionality is the same. Here, the implementation and the logic are wildly different that naming them the same thing could be misleading.

I haven't looked deep to know if there is SLP logic here that could have remained in the SLP vectorizer's code, or if you could separate the implementation where one is just a wrapper to the other. This would make it more clear, and probably easier to maintain.

aemerson marked 5 inline comments as done.Apr 24 2017, 8:43 AM
aemerson added inline comments.
lib/Transforms/Utils/LoopUtils.cpp
1191 ↗(On Diff #95795)

I'll rename the functions but I'd argue that the two functions are intended to have the same general functionality, creating vector reductions. The difference is that one of them requires a recurrence descriptor which some clients may not have (SLP vs LV), and the other is a simpler API at the cost of not being able to generate min/max reductions.

The SLP's pairwise reduction code is different to the normal reduction shuffle IR sequence which is why I left that there, there aren't any other users of that code to justify re-factoring yet.

rengolin added inline comments.Apr 24 2017, 8:52 AM
lib/Transforms/Utils/LoopUtils.cpp
1191 ↗(On Diff #95795)

That's why I said this smells like SLP/LV are leaking code here... :)

aemerson marked an inline comment as done.Apr 24 2017, 8:57 AM
aemerson added inline comments.
lib/Transforms/Utils/LoopUtils.cpp
1191 ↗(On Diff #95795)

It's not leaking, part of the purpose of this patch is to factor out common code from clients emitting identical reduction code.

rengolin added inline comments.Apr 24 2017, 8:59 AM
lib/Transforms/Utils/LoopUtils.cpp
1191 ↗(On Diff #95795)

Well, it seems that the LV calls one and the SLP calls another, and they're anything but common.

True, their return value is the same and they have the same purpose, but they're hardly the same thing.

This means any change/refactoring in one will require similar to the other, and this will not always be clear.

aemerson added inline comments.Apr 24 2017, 9:46 AM
lib/Transforms/Utils/LoopUtils.cpp
1191 ↗(On Diff #95795)

By common code I meant the actual mechanism of generating the shuffle pattern if the target doesn't request to use the intrinsic forms. That code is essentially duplicated between the SLP and LV. Anyway, I'll continue with the requested changes.

aemerson updated this revision to Diff 96574.Apr 25 2017, 8:37 AM

Addressed review comments. Renamed the simpler create function to createSimpleTargetReduction() and added comments to both header and definition.

RKSimon edited edge metadata.Apr 25 2017, 9:16 AM

Don't you need unrolling support for the strict float opcodes? As I understand it, the shuffle reductions can't be used?

include/llvm/CodeGen/ISDOpcodes.h
773 ↗(On Diff #96574)

Please can you improve the description here, especially detailing the difference between the no-strict/strict version of fadd/fmul.

include/llvm/CodeGen/SelectionDAGNodes.h
1077 ↗(On Diff #96574)

If possible I'd like to see this generalization of the SDNodeFlags support added separately first, reducing this patch and allow us to get on with adding triple node support for FMA opcodes.

aemerson marked 2 inline comments as done.Apr 25 2017, 9:30 AM

At the moment nothing is emitting strict float reductions as no target supports it. We have it implemented for SVE but the IR type and vectorizer changes aren't upstream yet. The reason I've had to include it in this patch is because we want to agree on an intrinsics spec first without changing it later when SVE support lands.

include/llvm/CodeGen/SelectionDAGNodes.h
1077 ↗(On Diff #96574)

I can do that. There's a dependency on the new opcodes for the isUnaryOpWithFlags function but I'll leave that blank returning false in the generalisation patch and add the opcodes in this patch.

aemerson updated this revision to Diff 97424.May 2 2017, 3:56 AM
aemerson marked an inline comment as done.
  • Split out SDNode changes into D32527 which is now committed.
  • Added comments to the ISDNodes definitions.

Ok to go?

rengolin added inline comments.May 2 2017, 4:06 AM
lib/Transforms/Utils/LoopUtils.cpp
1191 ↗(On Diff #95795)

If the Add/Fadd cases are already covered by this function, and they end up calling the same methods to create the reduction nodes, and half of the arguments are the same, and the purpose is the same and the return value is the sane, I still don't understand why the function above can't be a wrapper to this one.

Creating a recurrence descriptor doesn't seem that hard, and you don't need to duplicate the Add/Fadd logic up there.

aemerson added inline comments.May 2 2017, 4:31 AM
lib/Transforms/Utils/LoopUtils.cpp
1191 ↗(On Diff #95795)

My reasoning is that any improvement in this piece of code due to your suggestion will be superficial at best, and IMO is outweighed by a conceptual cost. .

A recurrence descriptor holds information about a particular recurrence operation, and there's fairly complex code dedicated to using SCEV and LoopInfo to do all the analysis and create one in RecurrenceDescriptor::isReductionPHI() which also calls RecurrenceDescriptor::AddReductionVar(). The resulting object at a conceptual level models a recurrence, which by definition has to be in a loop. Creating a recurrence descriptor in the createSimpleTargetReduction() function means creating a "fake" recurrence descriptor, one which may not actually model a recurrence.

Ultimately the createSimpleTargetReduction() is not a wrapper because they cover different functionality at a higher level.

rengolin added inline comments.May 2 2017, 4:41 AM
lib/Transforms/Utils/LoopUtils.cpp
1191 ↗(On Diff #95795)

Ok, but you don't use any of that logic, so you could also try the opposite: work directly with opcodes and transform descriptors into opcode?

aemerson updated this revision to Diff 97443.May 2 2017, 7:13 AM

Ok, so I've restructured the two functions a bit so that the simple (non minmax) reductions are generated from the createSimpleTargetReduction() function and the recurrence descriptor uses that for the simple cases, passing in the opcode.

Ok, so I've restructured the two functions a bit so that the simple (non minmax) reductions are generated from the createSimpleTargetReduction() function and the recurrence descriptor uses that for the simple cases, passing in the opcode.

But they're still two functions doing the same thing, with different arguments...

Maybe I'm not expressing myself right.

What I mean is, keep createSimpleTargetReduction as it is, then make createTargetReduction just massage the parameters (get opcode, flags) and pass to createSimpleTargetReduction, returning whatever value it does. Ie. createTargetReduction is just a wrapper to createSimpleTargetReduction.

You may need to add some functionality to createSimpleTargetReduction so that it does what createTargetReduction needs, in addition to what it needs, but that's ok. Right now their implementations are still near identical.

cheers,
--renato

aemerson updated this revision to Diff 97890.May 4 2017, 3:59 PM

Renato and I discussed this offline for a bit because we got our wires crossed a bit before. We agreed to simplify this code a bit more by extending createSimpleTargetReduction() to handle min/max by passing it the ReductionFlags. This essentially moves code from createTargetReduction() making it now just unwrap information from a RecurrenceDescriptor. Some other API changes done as a result.

Ping. Ok to go?

Hi Amara,

Thanks for the update, this is what I had in mind, yes.

I just have one additional small comment inline, and one small (probably irrelevant) nitpick.

cheers,
--renato

lib/Transforms/Utils/LoopUtils.cpp
1273 ↗(On Diff #97890)

I may be wrong, but I think the LLVM style could be to put this curly brackets on a new line.

clang-format would know better. :)

1293 ↗(On Diff #97890)

Isn't this Flags a local variable? Where is this being used after the call to createSimpleTargetReduction?

A couple of minor typos

docs/LangRef.rst
11880 ↗(On Diff #97890)

llvm.experimental.vector.reduce.xor.*

12011 ↗(On Diff #97890)

llvm.experimental.vector.reduce.fmin.*

aemerson marked an inline comment as done.May 8 2017, 5:37 AM
aemerson added inline comments.
docs/LangRef.rst
12011 ↗(On Diff #97890)

Thanks, will fix in final commit.

lib/Transforms/Utils/LoopUtils.cpp
1273 ↗(On Diff #97890)

This is the LLVM style according to clang-format.

1293 ↗(On Diff #97890)

It's captured by reference by the getSimpleRdx lambda.

Right, this is looking much better. Now, what about tests?

We'd probably need a bunch of tests to make sure that the intrinsics are accepted in the syntax that they're documented and rejected otherwise.

I'm not sure how's the best way forward, but probably just having IR in the right/wrong format and passing -validate or something expecting it to pass/fail would be a start.

cheers,
--renato

lib/Transforms/Utils/LoopUtils.cpp
1273 ↗(On Diff #97890)

ok

1293 ↗(On Diff #97890)

ah, yes.

Right, this is looking much better. Now, what about tests?

We'd probably need a bunch of tests to make sure that the intrinsics are accepted in the syntax that they're documented and rejected otherwise.

I'm not sure how's the best way forward, but probably just having IR in the right/wrong format and passing -validate or something expecting it to pass/fail would be a start.

cheers,
--renato

The type checking is already done as part of the intrinsics handling framework. When people add new intrinsics they don't really add tests to check the IR since you get assertion failures during the FunctionType creation if something goes wrong. It's why you specify the types in the intrinsics signatures. If on the other hand there were other tests like checking that the actual values given to the intrinsics are well formed then verifier tests make sense. Here however the reductions don't care about the actual values.

Since this is laying the foundations, and that future patches will a) enable this for AArch64 (which comes with reduction tests), and b) adds an expansion pass to convert back into shuffle reductions, I don't see the need to test the actual bare intrinsics part here.

When people add new intrinsics they don't really add tests to check the IR since you get assertion failures during the FunctionType creation if something goes wrong.

But without IR functions that actually uses them, there's no way to get the assertion, right?

Since this is laying the foundations, and that future patches will a) enable this for AArch64 (which comes with reduction tests), and b) adds an expansion pass to convert back into shuffle reductions, I don't see the need to test the actual bare intrinsics part here.

If everyone is happy for this patch to go in without tests (relying on further tests coming in following patches), I'm ok with it too.

For this reason, I'll let this review for someone else (@mkuper?, @RKSimon?) to approve.

cheers,
--renato

When people add new intrinsics they don't really add tests to check the IR since you get assertion failures during the FunctionType creation if something goes wrong.

But without IR functions that actually uses them, there's no way to get the assertion, right?

If you have a look at prior art for adding intrinsics you'll see that actual verifier tests are only done for illegal combinations of constant value parameters. There are no illegal constant parameter combinations with this patch. E.g. Dan Berlin's r294341 doesn't come with a test, likewise with others.

A minor comment, the code looks good to me.

lib/Transforms/Utils/LoopUtils.cpp
1159 ↗(On Diff #97890)

MinMaxKind should not be invalid in this case. I suggest to add assert() here.

aemerson accepted this revision.May 9 2017, 1:55 AM
aemerson marked 2 inline comments as done.

Thanks. I'll make the last few changes requested and commit.

This revision is now accepted and ready to land.May 9 2017, 1:55 AM
This revision was automatically updated to reflect the committed changes.