Page MenuHomePhabricator

[IR] Rename the shufflevector's undef mask to poison
Needs ReviewPublic

Authored by aqjune on Jun 8 2021, 12:36 AM.

Details

Summary

This is a patch that renames shufflevector's undef mask to poison.

By D93818, shufflevector's undef mask isn't undef anymore; it returns poison instead.

%v = shufflevector <2 x i8> %x, <2 x i8> %y, <2 x i8> <i8 0, i8 poison>
; %v[0] = %x[0]
; %v[1] = poison

Since poison is more undefined than undef, this validates many existing transformations that we wanted to support.
Also, this allows more aggressive optimizations because poison is more propagative (e.g. poison & 0 = poison whereas undef & 0 != undef).

This patch updates shufflevector mask's printed string to be poison to match its new semantics.

This has changes in clang tests as well.
They are mainly about vector intrinsics being lowered into shufflevector.
The unused elements were filled with undef previously, but with this patch they are filled with poison.
Since they are unused elements anyway, I believe this isn't a functional change in fact.
But, I'm happy with this being double-checked by someone who works on these intrinsics as well.

Diff Detail

Event Timeline

aqjune created this revision.Jun 8 2021, 12:36 AM
aqjune requested review of this revision.Jun 8 2021, 12:36 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 8 2021, 12:36 AM

I noted the cases where it looks like the undef->poison change might actually impact code using compiler intrinsic functions that have external specifications. The relevant specifications say the elements in question are "undefined", without really specifying what that means.

Currently, for the Intel intrinsics, we treat "undefined" as something more conservative than LLVM undef; see https://github.com/llvm/llvm-project/blob/d2012d965d60c3258b3a69d024491698f8aec386/clang/lib/CodeGen/CGBuiltin.cpp#L12483 . Maybe we should make the cast intrinsics more conservative to match. And maybe we should do the same for OpenCL. Would need to do some backend work to make sure we don't regress the generated code, though.

For __builtin_shufflevector, I think I'm fine with this changing the "-1" to mean poison; we don't have any external spec to conform to, and anyone explicitly passing -1 should know what they're doing. But maybe worth noting in the clang documentation.

clang/test/CodeGen/X86/avx-builtins.c
182

This change might be visible to user code.

clang/test/CodeGen/builtinshufflevector2.c
41

This might be visible to user code.

clang/test/CodeGenOpenCL/preserve_vec3.cl
21

This change might be visible to user code.

aqjune added a comment.EditedJun 8 2021, 7:50 PM

I noted the cases where it looks like the undef->poison change might actually impact code using compiler intrinsic functions that have external specifications. The relevant specifications say the elements in question are "undefined", without really specifying what that means.

Currently, for the Intel intrinsics, we treat "undefined" as something more conservative than LLVM undef; see https://github.com/llvm/llvm-project/blob/d2012d965d60c3258b3a69d024491698f8aec386/clang/lib/CodeGen/CGBuiltin.cpp#L12483 . Maybe we should make the cast intrinsics more conservative to match. And maybe we should do the same for OpenCL. Would need to do some backend work to make sure we don't regress the generated code, though.

Makes sense, the PR (https://llvm.org/PR32176) that is left at the comment says it should be something like freeze poison as well. (BTW, this means the current shufflevector lowering is already incorrect as well..)

Then, _mm256_castsi128_si256 should be lowered into something like this:

%fr = freeze <2 x i64> poison
shufflevector <2 x i64> %x, <2 x i64> %fr, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

BTW, the Intel intrinsic guide for _mm256_castsi128_si256 ( https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_castsi128_si256&expand=628 ) says:

This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

We should teach the backend to understand the shufflevector+freeze form and lower it into an efficient assembly.

I *guess* teaching this form to the backend will be enough in terms of performance.
In practice, the frozen element won't be used in most of the cases; the middle-end's demanded elements analysis will trigger instcombine to almost always remove the freeze.

What do you think? If people agree with the shufflevector+freeze lowering, I'll create a separate patch that lowers this to the new freeze+shufflevector format (since it is already incorrect).

In practice, the frozen element won't be used in most of the cases; the middle-end's demanded elements analysis will trigger instcombine to almost always remove the freeze.

Well, in the cases it gets removed, it doesn't really matter what we use. It's likely if someone is reaching for these more obscure constructs, they're seeing that the compiler isn't doing what they want with more normal code, though.

What do you think? If people agree with the shufflevector+freeze lowering, I'll create a separate patch that lowers this to the new freeze+shufflevector format (since it is already incorrect).

Using "freeze poison" seems reasonable.