Page MenuHomePhabricator

DAGCombiner: Combine SDIV with non-splat vector pow2 divisor
ClosedPublic

Authored by zvi on Jan 24 2018, 7:08 AM.

Details

Summary

Extend existing SDIV combine for pow2 constant divider to handle
non-splat vectors of pow2 constants.

Diff Detail

Repository
rL LLVM

Event Timeline

zvi created this revision.Jan 24 2018, 7:08 AM

How bad does the codegen get if we don't limit this to targets with vector shifts? Again, thinking AVX1 (Jaguar) here., but combine_vec_sdiv_by_pow2b_v4i64 looks like a missed opportunity

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2932 ↗(On Diff #131263)

Did you look at using matchBinaryPredicate here?

2970 ↗(On Diff #131263)

You can probably get the getNode/FoldConstantArithmetic methods to do most of this for you?

lib/Target/X86/X86ISelLowering.cpp
22738 ↗(On Diff #131263)

If we're going to expose this then ideally we'd allow isSupportedVectorVarShift to flag XOP vector shifts as supported (and fast), but the original SupportedVectorVarShift usage below prevented us as we have to call custom lowering to X86ISD::VPSHA/VPSHL.

Also, there is a lot of crossover with isVectorShiftByScalarCheap.

zvi added a comment.Feb 4 2018, 10:17 AM

How bad does the codegen get if we don't limit this to targets with vector shifts? Again, thinking AVX1 (Jaguar) here., but combine_vec_sdiv_by_pow2b_v4i64 looks like a missed opportunity

I think you are right. Probably all cases will profit except for v2i64. Will try to drop the TLI hook.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2932 ↗(On Diff #131263)

Thanks for suggesting. Will use it.

2970 ↗(On Diff #131263)

Will try to make the most of it. Unfortunately, there are two separate helpers: FoldConstantVectorArithmetic and FoldConstantArithmetic for vectors and scalars respectively, so there will still be a divergence. I noticed there is a TODO to join the two functions, so whenever we get to work on it, the divergence can be eliminated.

zvi updated this revision to Diff 132771.Feb 4 2018, 10:34 AM

Following Simon's suggestions, dropping the TLI hook seems to improve all cases except for v2i64 on SSE/AVX1.

RKSimon added inline comments.Feb 7 2018, 6:18 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2932 ↗(On Diff #131263)

Oops, sorry that should have been matchUnaryPredicate...

2905 ↗(On Diff #132771)

Why is it important for splat vectors to use a single entry in KnownNegatives? And won't it cause issues when matchBinaryPredicate enumerates across the entire vector?

2955 ↗(On Diff #132771)

Would it be easier to just do something like this which should work for scalars and vectors?

SDValue C1 = DAG.getNode(ISD::CTTZ, DL, VT, N1)
C1 = DAG.getZextOrTrunc(C1, DL, ShiftAmtTy);
SDValue INEXACT = DAG.getNode(ISD::SUB, DL, ShiftAmtTy, BITS, C1);
if (!isConstantOrConstantVector(INEXACT))
  return SDValue()
zvi updated this revision to Diff 133806.Feb 11 2018, 2:27 PM
  1. matchBinaryPredicate -> matchUnryPredicate
  2. Use Simon's uniform scalar/vector code suggestion for computing INEXACT
zvi retitled this revision from DAGCombiner: Combine SDIV with non-splat vector pow2 divider to DAGCombiner: Combine SDIV with non-splat vector pow2 divisor.Feb 11 2018, 2:32 PM
zvi edited the summary of this revision. (Show Details)
RKSimon added inline comments.Feb 13 2018, 6:39 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2905 ↗(On Diff #132771)

Still worried about this - matchUnaryPredicate doesn't check for a splat vector (I can't remember why not) - it will check every element. with IsPowerOfTwo so EltIndex will go to number of vector elements.

2925 ↗(On Diff #133806)

This comment needs updating.

2950 ↗(On Diff #133806)

Why can't LG2 and C1 be merged?

zvi updated this revision to Diff 134462.Feb 15 2018, 11:05 AM

Addressing Simon's comments

I still think it'd be better if you treated splat vectors as a vector instead of a scalar - your change to matchUnaryPredicate means that we're accepting UNDEF elements where we weren't before, which for DIV/REM opcodes is supposed to be a big no-no.

Also, please can you clean up the variable names to follow coding style (CamelCase capitalization etc.)?

@zvi Do you have time to work on this patch, otherwise would you mind if I comandeered it?

zvi added a comment.Mar 19 2018, 6:26 AM

I apologize about the delay. Got some other priority work. Will make an effort to continue this week. If not, please go ahead and complete the work, Simon.

zvi updated this revision to Diff 140590.Apr 1 2018, 12:10 PM

Rebase

zvi updated this revision to Diff 140606.Apr 1 2018, 8:21 PM

The previous revision was a rebase on a dated revision. This is a rebase on the latest.

zvi added a comment.EditedApr 5 2018, 2:34 PM

I still think it'd be better if you treated splat vectors as a vector instead of a scalar - your change to matchUnaryPredicate means that we're accepting UNDEF elements where we weren't before, which for DIV/REM opcodes is supposed to be a big no-no.

I think that at the point this combine runs there should not be any undef elements because there is an earlier combine that handles division by undef (or vector with any undef elements)., see SelectionDAG::isUndef.
Having said that, there is room for improvement by visiting only the source splatted value
instead of every BUILD_VECTOR operand. Will upload an improved patch.

zvi updated this revision to Diff 141228.Apr 5 2018, 3:20 PM
  • Remove deprecated comment
  • Fix variable names
  • N1C can be passed to matchUnaryPredicate also when the divisor is a constant-splat. Makes check more efficient as we avoid re-checking the splatted element for every vector element. (why doesn't matchUnaryPredicate check for splat and do the same?)
In D42479#1059005, @zvi wrote:
  • N1C can be passed to matchUnaryPredicate also when the divisor is a constant-splat. Makes check more efficient as we avoid re-checking the splatted element for every vector element. (why doesn't matchUnaryPredicate check for splat and do the same?)

Because that would mean that we couldn't use the predicate to control when we accept undefs or not.

RKSimon accepted this revision.Apr 7 2018, 9:31 AM

LGTM - cheers.

This revision is now accepted and ready to land.Apr 7 2018, 9:31 AM
This revision was automatically updated to reflect the committed changes.