This patch implements a combine to scalarize subtrees of the selection
DAG that produce splat values for which only a single lane is
demanded. The scalarization only happens when the target supports
scalar versions of each operation in the subtree to avoid introducing
any new transitions between vector and scalar registers and to avoid
potentially-expensive expansions of scalarized operations.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Is this supposed to fix some lowering-produced code?
If not, shouldn't this be best done in the middle-end?
Yes, this fixes lowering-produced code. In particular, WebAssembly's vector shift instructions take a scalar shift amount, but in LLVM IR vector shifts take vector shift amounts. WebAssembly's lowering then needs to scalarize the shift entirely except when the shift amount is a splat value, in which case it can just take one lane as the scalar shift amount. This sequence of patches improves codegen in that case.
I think this is a nice idea! But I'd like people working on other targets to check this too.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
17658 | I can imagine ADD is legal for MVT::i32, but when Opc is BUILD_VECTOR or SPLAT_VECTOR, are they legal operations for scalar types too, such as as MVT::i32? If so, why? Aren't they for vector types? | |
17676 | Can't we use this code for BUILD_VECTOR here for all splat-type vectors, including SPLAT_VECTOR, BUILD_VECTOR, and SHUFFLE_VECTOR? getSplatSourceVector seems to handle all these. |
Would it make sense to generalize or add a TLI hook similar to the one used in scalarizeExtractedBinop()?
/// Try to convert an extract element of a vector binary operation into an /// extract element followed by a scalar operation. virtual bool shouldScalarizeBinop(SDValue VecOp) const;
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
17677–17679 | Why only these 3 binops? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
17677–17679 | I was wondering about the same thing. I suspect the author might have tried to match opcodes listed here. But those opcodes in SelectionDAG::isSplatValue look like they were selected somewhat arbitrarily in the first place (it says they are "common patterns"). |
We've been making progress performing similar ops in the vector-combiner pass based on cost metrics - have you looked at performing it there?
X86/SSE uses a similar 'vector shift by scalar' approach - and SimlifyDemandedVectorElts etc. manages to remove similar issues - is it possible that WebAssembly is just missing a combine from its shift ops to try and simplify the operands?
Thank you all for the comments! I agree that the opcodes in SelectionDAG::isSplatValue are rather arbitrary, so a more principled approach using a TLI hook might be better. I will take a look at what X86 is doing to see if a simpler solution would work for WebAssembly, too.
I'm taking this patch sequence out of the review queue for now, pending an investigation of the alternatives suggested above.
@tlively Is this still necessary? My guess is that SimplifyDemandedVectorElts target node handling should do everything you need?
I'm not sure, although I recently started working in this area again. I'll close this revision and open a new one if any changes do turn out to be necessary.
I can imagine ADD is legal for MVT::i32, but when Opc is BUILD_VECTOR or SPLAT_VECTOR, are they legal operations for scalar types too, such as as MVT::i32? If so, why? Aren't they for vector types?