Page MenuHomePhabricator

[DAGCombiner] Scalarize splats with just one demanded lane
Changes PlannedPublic

Authored by tlively on Jul 10 2020, 6:16 PM.

Details

Summary

This patch implements a combine to scalarize subtrees of the selection
DAG that produce splat values for which only a single lane is
demanded. The scalarization only happens when the target supports
scalar versions of each operation in the subtree to avoid introducing
any new transitions between vector and scalar registers and to avoid
potentially-expensive expansions of scalarized operations.

Diff Detail

Unit TestsFailed

TimeTest
530 mslinux > SanitizerCommon-asan-x86_64-Linux.Linux::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=address -m64 -ldl -std=c++11 -O0 -g /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/asan-x86_64-Linux/Linux/Output/protoent.cpp.tmp && /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/asan-x86_64-Linux/Linux/Output/protoent.cpp.tmp 2>&1 | FileCheck /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp
330 mslinux > SanitizerCommon-lsan-x86_64-Linux.Linux::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=leak -m64 -ldl -std=c++11 -O0 -g /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/lsan-x86_64-Linux/Linux/Output/protoent.cpp.tmp && /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/lsan-x86_64-Linux/Linux/Output/protoent.cpp.tmp 2>&1 | FileCheck /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp
430 mslinux > SanitizerCommon-msan-x86_64-Linux.Linux::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=memory -m64 -ldl -std=c++11 -O0 -g /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/protoent.cpp.tmp && /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/protoent.cpp.tmp 2>&1 | FileCheck /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp
530 mslinux > SanitizerCommon-tsan-x86_64-Linux.Linux::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=thread -m64 -ldl -std=c++11 -O0 -g /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/tsan-x86_64-Linux/Linux/Output/protoent.cpp.tmp && /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/tsan-x86_64-Linux/Linux/Output/protoent.cpp.tmp 2>&1 | FileCheck /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp
390 mslinux > SanitizerCommon-ubsan-x86_64-Linux.Linux::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=undefined -m64 -ldl -std=c++11 -O0 -g /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/ubsan-x86_64-Linux/Linux/Output/protoent.cpp.tmp && /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/ubsan-x86_64-Linux/Linux/Output/protoent.cpp.tmp 2>&1 | FileCheck /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp
View Full Test Results (6 Failed)

Event Timeline

tlively created this revision.Jul 10 2020, 6:16 PM

Is this supposed to fix some lowering-produced code?
If not, shouldn't this be best done in the middle-end?

Is this supposed to fix some lowering-produced code?
If not, shouldn't this be best done in the middle-end?

Yes, this fixes lowering-produced code. In particular, WebAssembly's vector shift instructions take a scalar shift amount, but in LLVM IR vector shifts take vector shift amounts. WebAssembly's lowering then needs to scalarize the shift entirely except when the shift amount is a splat value, in which case it can just take one lane as the scalar shift amount. This sequence of patches improves codegen in that case.

I think this is a nice idea! But I'd like people working on other targets to check this too.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17658

I can imagine ADD is legal for MVT::i32, but when Opc is BUILD_VECTOR or SPLAT_VECTOR, are they legal operations for scalar types too, such as as MVT::i32? If so, why? Aren't they for vector types?

17676

Can't we use this code for BUILD_VECTOR here for all splat-type vectors, including SPLAT_VECTOR, BUILD_VECTOR, and SHUFFLE_VECTOR? getSplatSourceVector seems to handle all these.

Would it make sense to generalize or add a TLI hook similar to the one used in scalarizeExtractedBinop()?

/// Try to convert an extract element of a vector binary operation into an
/// extract element followed by a scalar operation.
virtual bool shouldScalarizeBinop(SDValue VecOp) const;
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17677–17679

Why only these 3 binops?
Could this be TLI.isBinOp(Opc) instead?

srj added a subscriber: srj.Jul 14 2020, 2:50 PM
aheejin added inline comments.Jul 14 2020, 9:18 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17677–17679

I was wondering about the same thing. I suspect the author might have tried to match opcodes listed here. But those opcodes in SelectionDAG::isSplatValue look like they were selected somewhat arbitrarily in the first place (it says they are "common patterns").

We've been making progress performing similar ops in the vector-combiner pass based on cost metrics - have you looked at performing it there?

Yes, this fixes lowering-produced code. In particular, WebAssembly's vector shift instructions take a scalar shift amount, but in LLVM IR vector shifts take vector shift amounts. WebAssembly's lowering then needs to scalarize the shift entirely except when the shift amount is a splat value, in which case it can just take one lane as the scalar shift amount. This sequence of patches improves codegen in that case.

X86/SSE uses a similar 'vector shift by scalar' approach - and SimlifyDemandedVectorElts etc. manages to remove similar issues - is it possible that WebAssembly is just missing a combine from its shift ops to try and simplify the operands?

Thank you all for the comments! I agree that the opcodes in SelectionDAG::isSplatValue are rather arbitrary, so a more principled approach using a TLI hook might be better. I will take a look at what X86 is doing to see if a simpler solution would work for WebAssembly, too.

tlively planned changes to this revision.Jul 21 2020, 2:10 PM

I'm taking this patch sequence out of the review queue for now, pending an investigation of the alternatives suggested above.