Page MenuHomePhabricator

[X86] Convert fmin/fmax _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506)
ClosedPublic

Authored by pengfei on Dec 13 2020, 8:15 AM.

Details

Summary

This is a follow up of D92940.

We have successfully converted fadd/fmul _mm_reduce_* intrinsics to
llvm.reduction + reassoc flag. We can do the same approach for fmin/fmax
too, i.e. llvm.reduction + nnan flag.

Diff Detail

Event Timeline

RKSimon requested review of this revision.Dec 13 2020, 8:15 AM
RKSimon created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 13 2020, 8:15 AM

Whatever the final decision is, maybe add a doxygen comment explaining the semantics?

Yes, it is using maxpd/minpd here. Sorry for missing the nan cases.
In fact, I doubt about the behavior when using fast math with target intrinsics. In my opinion, target intrinsics are always associated with given instructions (reduce* are exceptions). So the behavior of intrinsics, e.g. respect nans, signaling, rounding, exception etc. are concordance with their associated instructions. That said the fast math flags won't change the behavior of intrinsics.
Assume above, I'm happy to set these expansions to fast math. Either keeping the existing implementaion or expansion LGTM.

If we're going by existing behavior/compatibility, gcc/icc use packed ops too:
https://godbolt.org/z/9jEhaW
...so there's an implicit 'nnan nsz' in these intrinsics (and that should be documented in the header file (and file a bug for Intel's page at https://software.intel.com/sites/landingpage/IntrinsicsGuide/ ?).

If we're going by existing behavior/compatibility, gcc/icc use packed ops too:
https://godbolt.org/z/9jEhaW
...so there's an implicit 'nnan nsz' in these intrinsics (and that should be documented in the header file (and file a bug for Intel's page at https://software.intel.com/sites/landingpage/IntrinsicsGuide/ ?).

I agreed. I have filed a bug internally for intrinsic guide.
The link for reporting bugs on the intrinsics guide is broken. Please let me know if you find any bugs of intrinsic guide. Thanks.

Hi Simon, I found we have the same problem for fadd/fmul. See https://godbolt.org/z/3YKaGx
X86 Intrinsics imply reassoc flag, but llvm.vector.reduce.* doesn't.

Hi Simon, I found we have the same problem for fadd/fmul. See https://godbolt.org/z/3YKaGx
X86 Intrinsics imply reassoc flag, but llvm.vector.reduce.* doesn't.

I'm not surprised - my current plan (after the holidays) is to add doxygen descriptions for all the reduction intrinsics and then update them making it clear what fast-math flags are assumed.

Hi Simon, I found we have the same problem for fadd/fmul. See https://godbolt.org/z/3YKaGx
X86 Intrinsics imply reassoc flag, but llvm.vector.reduce.* doesn't.

I'm not surprised - my current plan (after the holidays) is to add doxygen descriptions for all the reduction intrinsics and then update them making it clear what fast-math flags are assumed.

That's great. We are also updating intrinsic guide for such information. Anyway, have a good holiday.

RKSimon planned changes to this revision.Jan 7 2021, 7:33 AM

Hi @RKSimon, what's the status of updating these reduce intrinsics? Is there any difficulty for always assigning them fast math flag? I received bug report for the previous change D92940. Can we revert it if the problem is not easy to fix?

Hi @RKSimon, what's the status of updating these reduce intrinsics? Is there any difficulty for always assigning them fast math flag? I received bug report for the previous change D92940. Can we revert it if the problem is not easy to fix?

Any thoughts @RKSimon?

@pengfei I'm sorry I haven't gotten back to looking at this yet - it makes sense to create a patch to revert the fadd/fmul reduction changes for trunk/12.x.

@pengfei I'm sorry I haven't gotten back to looking at this yet - it makes sense to create a patch to revert the fadd/fmul reduction changes for trunk/12.x.

Thanks @RKSimon, I had a try and found maybe we don't need to revert it. See D96231.

@pengfei Would you be happy to commandeer this patch to make it match D96231?

pengfei commandeered this revision.Tue, Feb 9, 4:51 PM
pengfei edited reviewers, added: RKSimon; removed: pengfei.

@RKSimon Sure, with pleasure😊

pengfei updated this revision to Diff 322611.Wed, Feb 10, 1:05 AM

Add nnan and nsz flags for fmin/fmax llvm.reduction intrinsics.

pengfei updated this revision to Diff 322613.Wed, Feb 10, 1:11 AM

Minor fixes in tests.

pengfei updated this revision to Diff 322615.Wed, Feb 10, 1:17 AM

Minor fixes in tests.

pengfei updated this revision to Diff 322616.Wed, Feb 10, 1:22 AM

Minor fixes in tests.

Harbormaster completed remote builds in B88591: Diff 322615.
spatel added inline comments.Wed, Feb 10, 10:40 AM
clang/lib/Headers/avx512fintrin.h
9305

If I understand correctly, there's nothing preventing -0.0 or NaN values in these ops. But if either of those are present, then the result is potentially indeterminate.

For the LLVM intrinsic, we have this text in LangRef:
"The result will always be a number unless all elements of the vector are NaN. For a vector with minimum element magnitude 0.0 and containing both +0.0 and -0.0 elements, the sign of the result is unspecified."

pengfei updated this revision to Diff 323601.Sun, Feb 14, 3:07 AM

Address Sanjay's comment.

clang/lib/Headers/avx512fintrin.h
9305

Thanks @spatel for the information. I checked that the LLVM intrinsic does ignore the sign of zeros. https://godbolt.org/z/a9Yj8a. So we can remove the no signed zero assumption.
But X86 fmin/fmax instructions have difference with the LLVM intrinsic, i.e. the result might be NaN even there's only one NaN in elements.
Besides, the LangRef also says "If the intrinsic call has the nnan fast-math flag, then the operation can assume that NaNs are not present in the input vector."
So we still need the no nan assumption here.

pengfei updated this revision to Diff 323602.Sun, Feb 14, 3:18 AM

Update test accordingly.

pengfei edited the summary of this revision. (Show Details)Sun, Feb 14, 3:20 AM
pengfei updated this revision to Diff 323603.Sun, Feb 14, 3:24 AM

Minor fix.

spatel added inline comments.Sun, Feb 14, 5:20 AM
clang/lib/Headers/avx512fintrin.h
9305

I understand the behavior that we want to specify for this API, but we are not stating it clearly for the user. I think we should do that. How about:

* For floating-point intrinsics:
* 1. When using fadd/fmul intrinsics, the order of operations within the vector is unspecified (associative math).
* 2. When using fmin/fmax intrinsics, NaN or -0.0 elements within the vector produce unspecified results.
pengfei added inline comments.Sun, Feb 14, 6:53 AM
clang/lib/Headers/avx512fintrin.h
9305

Great! It's much clear now. Thanks @spatel !

pengfei updated this revision to Diff 323616.Sun, Feb 14, 6:53 AM

Address Sanjay's comment.

spatel accepted this revision.Sun, Feb 14, 7:32 AM

Thanks! LGTM.

This revision is now accepted and ready to land.Sun, Feb 14, 7:32 AM
This revision was landed with ongoing or failed builds.Sun, Feb 14, 5:16 PM
This revision was automatically updated to reflect the committed changes.