Extend existing SDIV combine for pow2 constant divider to handle
non-splat vectors of pow2 constants.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 16623 Build 16623: arc lint + arc unit
Event Timeline
How bad does the codegen get if we don't limit this to targets with vector shifts? Again, thinking AVX1 (Jaguar) here., but combine_vec_sdiv_by_pow2b_v4i64 looks like a missed opportunity
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2912 | Did you look at using matchBinaryPredicate here? | |
2932 | You can probably get the getNode/FoldConstantArithmetic methods to do most of this for you? | |
lib/Target/X86/X86ISelLowering.cpp | ||
22738 ↗ | (On Diff #131263) | If we're going to expose this then ideally we'd allow isSupportedVectorVarShift to flag XOP vector shifts as supported (and fast), but the original SupportedVectorVarShift usage below prevented us as we have to call custom lowering to X86ISD::VPSHA/VPSHL. Also, there is a lot of crossover with isVectorShiftByScalarCheap. |
I think you are right. Probably all cases will profit except for v2i64. Will try to drop the TLI hook.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2912 | Thanks for suggesting. Will use it. | |
2932 | Will try to make the most of it. Unfortunately, there are two separate helpers: FoldConstantVectorArithmetic and FoldConstantArithmetic for vectors and scalars respectively, so there will still be a divergence. I noticed there is a TODO to join the two functions, so whenever we get to work on it, the divergence can be eliminated. |
Following Simon's suggestions, dropping the TLI hook seems to improve all cases except for v2i64 on SSE/AVX1.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2885 | Why is it important for splat vectors to use a single entry in KnownNegatives? And won't it cause issues when matchBinaryPredicate enumerates across the entire vector? | |
2912 | Oops, sorry that should have been matchUnaryPredicate... | |
2934 | Would it be easier to just do something like this which should work for scalars and vectors? SDValue C1 = DAG.getNode(ISD::CTTZ, DL, VT, N1) C1 = DAG.getZextOrTrunc(C1, DL, ShiftAmtTy); SDValue INEXACT = DAG.getNode(ISD::SUB, DL, ShiftAmtTy, BITS, C1); if (!isConstantOrConstantVector(INEXACT)) return SDValue() |
- matchBinaryPredicate -> matchUnryPredicate
- Use Simon's uniform scalar/vector code suggestion for computing INEXACT
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2885 | Still worried about this - matchUnaryPredicate doesn't check for a splat vector (I can't remember why not) - it will check every element. with IsPowerOfTwo so EltIndex will go to number of vector elements. | |
2905 | This comment needs updating. | |
2930 | Why can't LG2 and C1 be merged? |
I still think it'd be better if you treated splat vectors as a vector instead of a scalar - your change to matchUnaryPredicate means that we're accepting UNDEF elements where we weren't before, which for DIV/REM opcodes is supposed to be a big no-no.
Also, please can you clean up the variable names to follow coding style (CamelCase capitalization etc.)?
@zvi Do you have time to work on this patch, otherwise would you mind if I comandeered it?
I apologize about the delay. Got some other priority work. Will make an effort to continue this week. If not, please go ahead and complete the work, Simon.
The previous revision was a rebase on a dated revision. This is a rebase on the latest.
I think that at the point this combine runs there should not be any undef elements because there is an earlier combine that handles division by undef (or vector with any undef elements)., see SelectionDAG::isUndef.
Having said that, there is room for improvement by visiting only the source splatted value
instead of every BUILD_VECTOR operand. Will upload an improved patch.
- Remove deprecated comment
- Fix variable names
- N1C can be passed to matchUnaryPredicate also when the divisor is a constant-splat. Makes check more efficient as we avoid re-checking the splatted element for every vector element. (why doesn't matchUnaryPredicate check for splat and do the same?)
Because that would mean that we couldn't use the predicate to control when we accept undefs or not.
Why is it important for splat vectors to use a single entry in KnownNegatives? And won't it cause issues when matchBinaryPredicate enumerates across the entire vector?