This is an archive of the discontinued LLVM Phabricator instance.

[LV] Add support for minimum/maximum intrinsics
ClosedPublic

Authored by anna on May 25 2023, 11:38 AM.

Details

Summary

{mini|maxi}mum intrinsics are different from {min|max}num intrinsics
in the propagation of NaN and signed zero. Also, the minnum/maxnum intrinsics require the presence of nsz flags to be valid reductions in vectorizer. In this regard, we introduce a new recurrence kind and also add support for identifying reduction patterns using these intrinsics.

There are tests added which show how this interacts across chains of
min/max patterns.

Diff Detail

Event Timeline

anna created this revision.May 25 2023, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2023, 11:38 AM
anna requested review of this revision.May 25 2023, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2023, 11:38 AM
anna added a comment.May 25 2023, 12:21 PM

The motivation here is we added support for lowering of minimum/maximum in X86:https://reviews.llvm.org/D149844
This is the last bit needed to support generating these vectorized fminimum/maximum through the optimizer, which should benefit other targets (apart from X86) that lower these intrinsics.

dmgreen added inline comments.May 30 2023, 9:15 AM
llvm/lib/Analysis/IVDescriptors.cpp
814

As far as I understand from the language ref, llvm.vector.reduce.fmin will be unspecified for 0.0/-0.0, and will return a value unless all the inputs are Nan. https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. This matches the behaviour of llvm.minnum. llvm.minimum has stronger semantics for signed zero and propagates nans, so would only be valid to convert with fast math flags.

I'm not 100% sure that llvm.minnum would need fast math flags, but llvm.minimum would seem to. Does it sound like I have that the right way around?

anna added inline comments.May 30 2023, 1:54 PM
llvm/lib/Analysis/IVDescriptors.cpp
814

As far as I understand from the language ref, llvm.vector.reduce.fmin will be unspecified for 0.0/-0.0, and will return a value unless all the inputs are Nan.

Ah, looks like vector.reduce.fmin was designed with only minnum in mind according to the langref: "This instruction has the same comparison semantics as the ‘llvm.minnum.*’ intrinsic. ". Adding the flags nnan and nsz for llvm.minimum so that we can feed it into vector.reduce.fmin unfortunately defeats the purpose of these minimum intrinsic.
What we need is something like:

%v = call <2 x float> @llvm.minimum.v2f32(...)
...
%F = llvm.vector.reduce.fminimum(%v)

where llvm.vector.reduce.fminimum follows the semantics of llvm.minimum.
I'm not sure if we can piggyback on the existing llvm.vector.reduce.fmin intrinsic or would need to create a new one.

anna planned changes to this revision.May 31 2023, 12:11 PM

Looking at X86 lowering for the vector reduce intrinsics, we do not handle signed zeroes. So, we will need to introduce new intrinsics as llvm.vector.reduce.fminimum and llvm.vector.reduce.fmaximum, which is similar to existing vector reduce intrinsics but handles signed zeroes and NaN.

anna added a comment.May 31 2023, 1:26 PM

The original support for fmax/fmin reduction was added here: https://reviews.llvm.org/D87391 and the idea was to rely on semantics of fminnum/fmaxnum since most targets would crash on fminimum/fmaximum. Now, that we do have the lowerings for these in more targets, we can introduce a separate intrinsic for propagating NaNs and handling signed zeroes (i.e. support fminimum and fmaximum).

anna added a subscriber: skatkov.May 31 2023, 1:35 PM

Looking at X86 lowering for the vector reduce intrinsics, we do not handle signed zeroes. So, we will need to introduce new intrinsics as llvm.vector.reduce.fminimum and llvm.vector.reduce.fmaximum, which is similar to existing vector reduce intrinsics but handles signed zeroes and NaN.

My understanding is as follows:
You need to inroduce new reduce intrinsic for fminium and fmaximum as you wrote.
In ExpandReductionsPass replace this intrinsic with reduction where you delegates each comparison to existing fminimum/maximum intrinsics. You just need to propagate nnan and nsz flags to that inrinisics.

With that you will have a correct reduction supporting both NaNs and signed zeros.

I think that adding a new reduction intrinsic sounds like a good addition. We should even have native instructions we could use for it under AArch64. For Arm we have seemed to have never added support for llvm.minimum/maximum. (There isn't a good instruction for it, and there isn't any generic expansion, but adding a new type of reduction shouldn't make that any worse).

anna added a comment.Jun 1 2023, 10:10 AM

Looking at X86 lowering for the vector reduce intrinsics, we do not handle signed zeroes. So, we will need to introduce new intrinsics as llvm.vector.reduce.fminimum and llvm.vector.reduce.fmaximum, which is similar to existing vector reduce intrinsics but handles signed zeroes and NaN.

My understanding is as follows:
You need to inroduce new reduce intrinsic for fminium and fmaximum as you wrote.
In ExpandReductionsPass replace this intrinsic with reduction where you delegates each comparison to existing fminimum/maximum intrinsics. You just need to propagate nnan and nsz flags to that inrinisics.

With that you will have a correct reduction supporting both NaNs and signed zeros.

This is a great idea to use the shuffle reductions as the generic lowering. Also, IIUC, propagating the nnan and nsz is an optimization, not necessary for correctness since fminimum and fmaximum supports NaNs and signed zeroes.

There is one comment in the shuffle reductions stating that we need nnan to perform a shuffle reduction (under the vector_reduce_fmax and vector_reduce_fmin categories), but I don't see anything in LangRef stating that the shuffles themselves do not support NaNs. So, we do not need this extra check of nnan flag when adding the fminimum and fmaximum intrinsic support here. In fact, I think if anyone's interested, one can even remove this restriction from vector_reduce_fmax/vector_reduce_fmin themselves once we add support for fminnum/fmaxnum in createMinMaxOp.

skatkov added a comment.EditedJun 1 2023, 7:56 PM

I think that adding a new reduction intrinsic sounds like a good addition. We should even have native instructions we could use for it under AArch64. For Arm we have seemed to have never added support for llvm.minimum/maximum. (There isn't a good instruction for it, and there isn't any generic expansion, but adding a new type of reduction shouldn't make that any worse).

As I know AArch64 support minimum/maximum intrinsics and they are lowered to native fmin/fmax instructions.

Or you are talking about vector max/min on AArch64?

https://godbolt.org/z/Mb9K88scT

Ok, you told about Arm not AArch64 :)
I've nicely talked to myself :)

anna added a comment.Jun 7 2023, 6:55 AM

The reduction intrinsic is introduced here: https://reviews.llvm.org/D152370
Once that and dependent patch lands, this patch will be rebased to use the new reduction intrinsic.

anna updated this revision to Diff 531021.Jun 13 2023, 11:55 AM

rebased over landed fminimum/fmaximum.reduce intrinsics. The new recurrence (and supported operations) are introduced as well.

anna added a reviewer: nikic.Jun 13 2023, 11:56 AM
anna marked an inline comment as done.
anna added inline comments.
llvm/lib/Analysis/IVDescriptors.cpp
814

this is handled now - we have the new llvm.vector.reduce.fminimum/fmaximum intrinsics introduced.

anna edited the summary of this revision. (Show Details)Jun 14 2023, 6:51 AM

Hello. I've been looking at the AArch64 lowering lately, and some of the other little bits and pieces like cost modelling. From what I can tell this mostly LGTM, but can you add a test with interleaving? It appears to go via createMinMaxOp->getMinMaxReductionPredicate.

llvm/lib/Analysis/IVDescriptors.cpp
808

I'm not sure that minnum would require nnan and nsz to convert to a vector.reduce.fmin intrinsic. That might come from it matching select/fcmp, and sounds like a separate issue though.

anna added a comment.Jun 19 2023, 8:27 AM

Thanks @dmgreen. Will add the test and update the missing intrinsics in the getMinMaxReductionPredicate API.

llvm/lib/Analysis/IVDescriptors.cpp
808

You might be right here. NaN is propagated with a different logic (compared to minimum) and signed zero is ignored. However, note that the recurKind used by select/fcmp and the fmin intrinsic is the same.

However, I've left the code as it was before for minnum/maxnum, so this can be addressed separately (along with tests added to make sure reduce.fmin lowers correctly without nnan flags?)

anna updated this revision to Diff 532717.Jun 19 2023, 11:22 AM

added interleaving of 2. Better code gen for interleaving by using intrinsic itself instead of fcmp + select pattern.

anna added a comment.Jun 19 2023, 11:25 AM

Thanks @dmgreen. Will add the test and update the missing intrinsics in the getMinMaxReductionPredicate API.

Confirmed interleaving does infact go through this API and crashes since the intrinsic support was missing. Updated the API, but also made the createMinMaxOp use the intrinsic itself rather than the Fcmp + Select pattern. So the getMinMaxReductionPredicate API is unused at this point for fminimum/fmaximum, but support is left in for completeness.

Thanks. I think this LGTM, with some Nits

llvm/include/llvm/Analysis/IVDescriptors.h
51

Nit: Remove extra whitespace

llvm/lib/Transforms/Utils/LoopUtils.cpp
936

If this is unused, could it be removed? I think it goes via getMinMaxReductionIntrinsicOp?

I'm not sure if the expansion to cmp+select would be valid without nnan and nsz fmf flags, creating the intrinsic like you have it sounds much better.

llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
1

Nit: Whitespace

anna added inline comments.Jun 20 2023, 4:52 AM
llvm/lib/Transforms/Utils/LoopUtils.cpp
936

There are two options:

  1. Remove it and add a comment that usage directly goes through getMinMaxReductionIntrinsicOp.
  2. Change the predicate to the correct one: FCMP_ULT. Instead of ordered, we need unordered variants as seen by the langref: https://llvm.org/docs/LangRef.html#id308 (it allows NaNs: QNaN or SNaN makes no difference I think).

I prefer the second option for completeness (if for any reason we choose getMinMaxReductionPredicate API for some other purpose in future, not necessarily within createMinMaxOp).

What do you think?

dmgreen added inline comments.Jun 20 2023, 5:36 AM
llvm/lib/Transforms/Utils/LoopUtils.cpp
936

Yeah I was wondering about the same thing.
minimum also handles -0.0 as less than 0.0, which means it isn't very obvious how to expand it out into fcmp+select. Something like this is needed: https://alive2.llvm.org/ce/z/iB2Dt7. (But that doesn't handle nan as Alive2 seems to think that SNaN turns into +inf, in a way I don't understand. So I wasn't able to test ULT vs OLT).

I think olt and ult will just pick differently between the first and second operand of the select if they are Nan though, not choose _which_ is a Nan. Ordered will chose the second operand if either operand is a nan. Unordered will pick the first. In neither case is that guaranteed to be the Nan.

So it would likely be valid to expand into fcmp+sel provoding it had nsz and nnan, but not without. It may be better to just remove it and rely on createMinMaxOp. We can hopefully remove getMinMaxReductionPredicate in the future if we get to the point where MinNum and MaxNum can be handled, with the necessary FMF.

anna added inline comments.Jun 20 2023, 6:35 AM
llvm/lib/Transforms/Utils/LoopUtils.cpp
936

agreed, thanks for checking it. I'll update.

anna updated this revision to Diff 532936.Jun 20 2023, 7:55 AM
anna marked 4 inline comments as done.

addressed review comments.

dmgreen accepted this revision.Jun 20 2023, 8:03 AM

Thanks. LGTM

This revision is now accepted and ready to land.Jun 20 2023, 8:03 AM
This revision was landed with ongoing or failed builds.Jun 20 2023, 10:17 AM
This revision was automatically updated to reflect the committed changes.