This is an archive of the discontinued LLVM Phabricator instance.

[Clang][X86] Change X86 cast intrinsics to use __builtin_nondeterministic_value
ClosedPublic

Authored by ManuelJBrito on Feb 3 2023, 10:25 AM.

Details

Summary

The following intrinsics are currently implemented using a shufflevector with an undefined mask, this is however incorrect according to intel's semantics for undefined value which expect an unknown but consistent value.

With __builtin_nondeterministic_value we can now match intel's undefined value.

Related patch for more context : https://reviews.llvm.org/D103874

Diff Detail

Event Timeline

ManuelJBrito created this revision.Feb 3 2023, 10:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 10:25 AM
ManuelJBrito requested review of this revision.Feb 3 2023, 10:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 10:25 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

We have a couple bugs that show (freeze (poison)) doesn't work past SelectionDAG. Is that a concern here? The most recent https://github.com/llvm/llvm-project/issues/60429

What do we gain from using __builtin_nondeterministic_value instead of just setzero? https://godbolt.org/z/zrb6858Mr

We have a couple bugs that show (freeze (poison)) doesn't work past SelectionDAG. Is that a concern here? The most recent https://github.com/llvm/llvm-project/issues/60429

I don't think it's a concern . Here https://godbolt.org/z/1ecM8roYh the lowering seems ok.

What do we gain from using __builtin_nondeterministic_value instead of just setzero? https://godbolt.org/z/zrb6858Mr

__builtin_nondeterministic_value is lowered to freeze(poison)

These intrinsics are meant to be no-op and we can't to that with zeroinitializer, although as of right now instcombine seems to transform the freeze(poison) into zeroinitializer(as shown by your example), so i'll make another patch to fix that.

We have a couple bugs that show (freeze (poison)) doesn't work past SelectionDAG. Is that a concern here? The most recent https://github.com/llvm/llvm-project/issues/60429

I don't think it's a concern . Here https://godbolt.org/z/1ecM8roYh the lowering seems ok.

The bugs start occurring if you use the same nondeterministic value multiple times and expect the value to be the same for all uses.

We have a couple bugs that show (freeze (poison)) doesn't work past SelectionDAG. Is that a concern here? The most recent https://github.com/llvm/llvm-project/issues/60429

I don't think it's a concern . Here https://godbolt.org/z/1ecM8roYh the lowering seems ok.

The bugs start occurring if you use the same nondeterministic value multiple times and expect the value to be the same for all uses.

I understand now ... so if the bug is present it works as if it were an undef. For these intrinsics we would just regress to the current behavior.
So is it okay if we land this patch? Because it will still be an improvement.

Add end to end tests

Currrently these expect a mov that will be removed when the instcombine bug is fixed

RKSimon added inline comments.Feb 8 2023, 7:10 AM
clang/test/CodeGen/X86/avx-builtins.c
146–147

Would it be useful to add a check for a "freeze <2 x double> poison" (or similar) to each cast test?

ManuelJBrito retitled this revision from [Clang][x86] Change x86 cast intrinsics to use __builtin_nondeterministic_value to [Clang][X86] Change X86 cast intrinsics to use __builtin_nondeterministic_value.

Match freeze(poison) operand in tests

ManuelJBrito updated this revision to Diff 500149.EditedFeb 24 2023, 5:11 AM

Update tests after D143593.

There are some performance regressions with casts from 128 to 512. The backend inserts vinsertf instructions. So that has to be fixed.
In D130339 @RKSimon mentioned something about custom vector widening patterns that need to be adjusted to handle freeze(undef), any pointers for how a patch for that would look like?

There are some performance regressions with casts from 128 to 512. The backend inserts vinsertf instructions. So that has to be fixed.
In D130339 @RKSimon mentioned something about custom vector widening patterns that need to be adjusted to handle freeze(undef), any pointers for how a patch for that would look like?

IIRC there are various places in X86ISelLowering where we try to match vector widening/concat patterns - such as collectConcatOps/getTargetShuffleAndZeroables/resolveTargetShuffleInputsAndMask (and many more) - it should be fine to address these on a case by case basis though, as many should just disappear via SimplifyDemandedVectorElts etc.

aqjune added a subscriber: aqjune.Mar 8 2023, 12:17 PM

H, is D104790 superseded by this patch? I wonder what is the status of this patch as well.

H, is D104790 superseded by this patch?

I don't think so we still need to fix the undefined intrinsics, right? Maybe I'm not understanding the question.

I wonder what is the status of this patch as well.

We need to land D144903 first to have the correct assembly, but it's currently held up by a crash in one of the tests due to an unrelated issue.

aqjune added a comment.Mar 8 2023, 2:21 PM

H, is D104790 superseded by this patch?

I don't think so we still need to fix the undefined intrinsics, right? Maybe I'm not understanding the question.

Oh right, D104790 and this deal with different intrinsics, thanks.
I was wondering whether I could clean up some of my open patches.

I wonder what is the status of this patch as well.

We need to land D144903 first to have the correct assembly, but it's currently held up by a crash in one of the tests due to an unrelated issue.

I see, hope that the crash is fixed soon..!

Rebase

avx-cast-builtins.c was moved to D144903

ManuelJBrito planned changes to this revision.Mar 17 2023, 7:55 AM

Implementing the 128 to 512 casts by filling the rest of the vector with the same definition of a nondeterministic_value is not correct because :

a = freeze poison
v = <a, a>

is not the same as

v = freeze poison

The only solution I'm seeing ,using the shufflevector, is doing the conversion in two steps:

  • build a 256 vector with the upper half being undefined( freeze poison)
  • build a 512 vector where the lower half is the previous 256 vector and the upper half being undefined

I think this would require two shuffles which is unfortunate.

This would ensure no miscompilations due to multiple uses of the same freeze undef/poison but would probably require some backend work to ensure the pattern is recognized to emit efficient assembly.

Would this work ? @RKSimon @craig.topper

Implementing the 128 to 512 casts by filling the rest of the vector with the same definition of a nondeterministic_value is not correct because :

a = freeze poison
v = <a, a>

is not the same as

v = freeze poison

The only solution I'm seeing ,using the shufflevector, is doing the conversion in two steps:

  • build a 256 vector with the upper half being undefined( freeze poison)
  • build a 512 vector where the lower half is the previous 256 vector and the upper half being undefined

I think this would require two shuffles which is unfortunate.

This would ensure no miscompilations due to multiple uses of the same freeze undef/poison but would probably require some backend work to ensure the pattern is recognized to emit efficient assembly.

Semantically that is correct.
But the backend may require some tweaks to recognize this pattern.

Update to remove multiple uses of freeze poison.
I am unsure about the code style used in the 128 to 512 casts. Any comments are appreciated.

nlopes accepted this revision.Apr 7 2023, 4:53 AM

LGTM, but please wait for another reviewer.

This revision is now accepted and ready to land.Apr 7 2023, 4:53 AM

Any further comments @RKSimon ?

RKSimon accepted this revision.Apr 15 2023, 2:16 AM

LGTM - just simplify the shuffle masks (even if it break 80-col).

Please keep an eye out for any regressions, I'm not certain we've shaken out every possible issue.

clang/lib/Headers/avx512fintrin.h
401

We don't always keep to clang formatting in the headers if it confuses things - better to keep the entire shuffle mask on a single line if possible - same for the others

return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a),
                               0, 1, 2, 3, 4, 5, 6, 7);
408

Maybe:

return __builtin_shufflevector(__a, __builtin_nondeterministic_value(__a), 0,
                               0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15);
clang/test/CodeGen/X86/avx-cast-builtins.c
113

fixme

Thanks for the review! I'll simplify the masks.

This revision was landed with ongoing or failed builds.Apr 17 2023, 4:59 AM
This revision was automatically updated to reflect the committed changes.