Page MenuHomePhabricator

[SLP]Try to emit canonical shuffle with undef operand.
ClosedPublic

Authored by ABataev on Sep 21 2022, 10:54 AM.

Details

Summary

In the canonical form of the shuffle the poison/undef operand is the
second operand, the patch tries to emit canonical form for partial
vectorization of the buildvector sequence.
Also, this patch starts emitting freeze instruction for shuffles with undef indices if the second shuffle operan is undef, not poison. It is an initial step to D93818, where undef mask element are treated as returning poison value.

Diff Detail

Unit TestsFailed

TimeTest
90 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::lljit-initialize-deinitialize.ll
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/lli -jit-kind=orc -jit-linker=jitlink -orc-runtime=/var/lib/buildkite-agent/builds/llvm-project/build/./lib/clang/16.0.0/lib/x86_64-unknown-linux-gnu/liborc_rt.a /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/lljit-initialize-deinitialize.ll | FileCheck /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/lljit-initialize-deinitialize.ll
120 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::priority-static-initializer.S
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/priority-static-initializer.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/priority-static-initializer.S
90 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-cxa-atexit.S
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/trivial-cxa-atexit.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/trivial-cxa-atexit.S

Event Timeline

ABataev created this revision.Sep 21 2022, 10:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 10:54 AM
ABataev requested review of this revision.Sep 21 2022, 10:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 10:54 AM
vdmitrie added inline comments.Sep 28 2022, 9:36 AM
llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll
15

I'm not sure I understand well why freeze is generated here. I likely missed something but the LangRef says:
"If the shuffle mask is undefined, the result vector is undefined. If the shuffle mask selects an undefined element from one of the input vectors, the resulting element is undefined. An undefined element in the mask vector specifies that the resulting element is undefined. An undefined element in the mask vector prevents a poisoned vector element from propagating."
So here the first two lanes of VECINS_I_5_I1 are taken from TMP6, all the rest is undef. The for TMP6 the first two lanes taken from TMP5, all the rest of TM5 is undef, TMP5 is a bitcast of TMP4, both lines of which are normal values. It means that all lanes of VECINS_I_5_I1 are either normal values or undef. So why freeze is here?

It would be nice if you extend patch description with some information about what is deemed to be a canonical form of a shuffle.

ABataev added inline comments.Sep 28 2022, 9:38 AM
llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll
15

It is going to be changed soon to be poisoned instead of undefined.

vdmitrie added inline comments.Sep 28 2022, 9:48 AM
llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll
15

Ah, okay. Hence there must be a discussion about that somewhere. Can you please point out to the discussion?

Since this is yet anticipated change the patch description should definitely be more detailed.

ABataev added inline comments.Sep 28 2022, 9:51 AM
llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll
15
ABataev edited the summary of this revision. (Show Details)Sep 28 2022, 10:20 AM
ABataev edited the summary of this revision. (Show Details)Oct 3 2022, 10:41 AM
ABataev added inline comments.
llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll
15

Done

vdmitrie accepted this revision.Oct 3 2022, 2:44 PM

Thank you, Alexey. Looks good.

This revision is now accepted and ready to land.Oct 3 2022, 2:44 PM
This revision was landed with ongoing or failed builds.Oct 4 2022, 8:17 AM
This revision was automatically updated to reflect the committed changes.
dyung added a subscriber: dyung.Oct 5 2022, 6:07 PM

Hi @ABataev, we have an internal test where this seemed to cause the codegen for a horizontal add test to get worse (I think) after your change. I have filed the issue as #58177, can you take a look?

Hi @ABataev, we have an internal test where this seemed to cause the codegen for a horizontal add test to get worse (I think) after your change. I have filed the issue as #58177, can you take a look?

Looks like need to add the analysis of freeze instructions to the shuffles emission/cost estimation function, will do it later today.