zvi (Zvi Rackover)
User

Projects

User does not belong to any projects.

User Details

User Since
Jun 8 2016, 12:50 PM (89 w, 15 h)

Recent Activity

Thu, Feb 15

zvi accepted D37418: [X86] Use btc/btr/bts to implement xor/and/or that affects a single bit in the upper 32-bits of a 64-bit operation..

LGTM

Thu, Feb 15, 11:14 AM
zvi updated the diff for D42479: DAGCombiner: Combine SDIV with non-splat vector pow2 divisor.

Addressing Simon's comments

Thu, Feb 15, 11:05 AM

Wed, Feb 14

zvi accepted D42896: [SelectionDAG] Add initial implementation of TargetLowering::SimplifyDemandedVectorElts.

LGTM after fixing the signed/unsigned mismatches.

Wed, Feb 14, 1:10 PM
zvi added inline comments to D42896: [SelectionDAG] Add initial implementation of TargetLowering::SimplifyDemandedVectorElts.
Wed, Feb 14, 7:21 AM
zvi added inline comments to D42896: [SelectionDAG] Add initial implementation of TargetLowering::SimplifyDemandedVectorElts.
Wed, Feb 14, 6:46 AM

Sun, Feb 11

zvi retitled D42479: DAGCombiner: Combine SDIV with non-splat vector pow2 divisor from DAGCombiner: Combine SDIV with non-splat vector pow2 divider to DAGCombiner: Combine SDIV with non-splat vector pow2 divisor.
Sun, Feb 11, 2:34 PM
zvi updated the diff for D42479: DAGCombiner: Combine SDIV with non-splat vector pow2 divisor.
  1. matchBinaryPredicate -> matchUnryPredicate
  2. Use Simon's uniform scalar/vector code suggestion for computing INEXACT
Sun, Feb 11, 2:27 PM

Tue, Feb 6

zvi added inline comments to D42770: [X86] Don't emit KTEST instructions unless only the Z flag is being used.
Tue, Feb 6, 11:03 AM

Sun, Feb 4

zvi updated the diff for D42044: X86: Utilize ZeroableElements for canWidenShuffleElements.

Rebase + ping

Sun, Feb 4, 11:42 AM
zvi committed rL324200: X86 Tests: Add shuffle that can be improved by widening elements. NFC.
X86 Tests: Add shuffle that can be improved by widening elements. NFC
Sun, Feb 4, 11:33 AM
zvi updated the diff for D42479: DAGCombiner: Combine SDIV with non-splat vector pow2 divisor.

Following Simon's suggestions, dropping the TLI hook seems to improve all cases except for v2i64 on SSE/AVX1.

Sun, Feb 4, 10:34 AM
zvi added a comment to D42479: DAGCombiner: Combine SDIV with non-splat vector pow2 divisor.

How bad does the codegen get if we don't limit this to targets with vector shifts? Again, thinking AVX1 (Jaguar) here., but combine_vec_sdiv_by_pow2b_v4i64 looks like a missed opportunity

I think you are right. Probably all cases will profit except for v2i64. Will try to drop the TLI hook.

Sun, Feb 4, 10:17 AM

Tue, Jan 30

zvi accepted D42526: [X86][XOP] Update isVectorShiftByScalarCheap with cases covered by XOP.

LGTM

Tue, Jan 30, 5:25 AM

Thu, Jan 25

zvi committed rL323418: X86 Tests: Add AVX+XOP config to SDIV combine tests.
X86 Tests: Add AVX+XOP config to SDIV combine tests
Thu, Jan 25, 6:09 AM

Wed, Jan 24

zvi accepted D42258: [X86][SSE] Aggressively use PMADDWD for v4i32 multiplies with 17 or more leading zeros.

LGTM with a minor request:

Wed, Jan 24, 10:05 AM
zvi committed rL323343: InstSimplify: If divisor element is undef simplify to undef.
InstSimplify: If divisor element is undef simplify to undef
Wed, Jan 24, 9:24 AM
zvi closed D42485: InstSimplify: If divisor element is undef simplify to undef.
Wed, Jan 24, 9:23 AM
zvi added a comment to D42485: InstSimplify: If divisor element is undef simplify to undef.

LGTM. Just curious - do we have vector intrinsics or any passes that create vector integer division?

Wed, Jan 24, 9:20 AM
zvi created D42485: InstSimplify: If divisor element is undef simplify to undef.
Wed, Jan 24, 8:25 AM
zvi created D42479: DAGCombiner: Combine SDIV with non-splat vector pow2 divisor.
Wed, Jan 24, 7:09 AM
zvi committed rL323329: X86 Tests: Add more sdiv combine cases. NFC.
X86 Tests: Add more sdiv combine cases. NFC
Wed, Jan 24, 7:06 AM

Tue, Jan 23

zvi closed D42437: X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW.
Tue, Jan 23, 5:38 PM
zvi committed rL323292: X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW.
X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW
Tue, Jan 23, 5:38 PM
zvi updated the diff for D42437: X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW.

Rebase

Tue, Jan 23, 5:36 PM
zvi created D42437: X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW.
Tue, Jan 23, 12:11 PM
zvi committed rL323242: X86 Tests: Add AVX512BW config to CodeGenPrepare test. NFC.
X86 Tests: Add AVX512BW config to CodeGenPrepare test. NFC
Tue, Jan 23, 11:22 AM
zvi accepted D42431: [X86][AVX] LowerBUILD_VECTORAsVariablePermute - add support for VPERMILPV to v2i64/v2f64.

LGTM

Tue, Jan 23, 11:21 AM

Jan 23 2018

zvi added a comment to D42380: [X86][SSE] LowerBUILD_VECTORAsVariablePermute - fix PSHUFB source/index operand ordering.

Thanks for the fix.

Jan 23 2018, 2:00 AM

Jan 19 2018

zvi added inline comments to D42044: X86: Utilize ZeroableElements for canWidenShuffleElements.
Jan 19 2018, 9:10 AM

Jan 17 2018

zvi added a comment to D42205: [X86] Add intrinsic support for the RDPID instruction.

Add some basic encoding tests?

Jan 17 2018, 10:55 PM
zvi added a comment to D42171: X86CallFrameOptimization: Bail on win64cc calls.
In D42171#978790, @rnk wrote:

Isn't this MI buggy? We're adjusting SP down by 40 bytes and storing to SP+48, which could overwrite data. I think the assert is valid.

Jan 17 2018, 1:49 PM
zvi added a comment to D42171: X86CallFrameOptimization: Bail on win64cc calls.

I would appreciate suggestions for alternative solutions.

Jan 17 2018, 6:11 AM
zvi created D42171: X86CallFrameOptimization: Bail on win64cc calls.
Jan 17 2018, 6:10 AM

Jan 16 2018

zvi added a comment to D41944: [LLVM][IR][LIT] support of 'no-overflow' flag for sdiv\udiv instructions.
  • With the way you are modeling the new flag, means that existing bitcode/.ll files will change semantics when read with newer compilers. I'm not sure that is a good idea for this, in any way at the very least you have to provide AutoUpgrade logic for that.

This seems like a real issue. With no version info in the module, how can AutoUpgrade tell if a divide with no 'nof' attribute is of the old form or new form? This is really a performance issue, because AutoUpgrade can always pessimistically not add 'nof' if the version of the incoming module is unknown. Possible solutions:

Jan 16 2018, 1:50 AM

Jan 14 2018

zvi created D42044: X86: Utilize ZeroableElements for canWidenShuffleElements.
Jan 14 2018, 3:50 PM

Jan 13 2018

zvi abandoned D40865: X86 AVX2: Prefer one VPERMV over ShuffleAsRepeatedMaskAndLanePermute.
Jan 13 2018, 9:54 AM
zvi committed rL322446: X86: Add pattern matching for PMADDWD.
X86: Add pattern matching for PMADDWD
Jan 13 2018, 9:43 AM
zvi closed D41811: X86: Add pattern matching for PMADDWD.
Jan 13 2018, 9:43 AM
zvi updated the diff for D41811: X86: Add pattern matching for PMADDWD.

Generalize to account for commutativity of add and mul

Jan 13 2018, 12:24 AM
zvi committed rL322434: X86 Tests: add more pamddwd cases. NFC.
X86 Tests: add more pamddwd cases. NFC
Jan 13 2018, 12:22 AM

Jan 12 2018

zvi updated the diff for D41811: X86: Add pattern matching for PMADDWD.

Check both BUILD_VECTOR nodes together if one is composed of odd indexed extracts and the other composed of even idexed extracts.

Jan 12 2018, 1:44 AM

Jan 11 2018

zvi updated the diff for D41811: X86: Add pattern matching for PMADDWD.

Rebase

Jan 11 2018, 10:09 AM
zvi committed rL322300: DAGCombine: Let truncates negate extension through extract-subvector.
DAGCombine: Let truncates negate extension through extract-subvector
Jan 11 2018, 10:04 AM
zvi closed D41927: DAGCombine: Let truncates negate extension through extract-subvector.
Jan 11 2018, 10:04 AM
zvi updated the diff for D41927: DAGCombine: Let truncates negate extension through extract-subvector.

Rebase after adding the missing zext cases

Jan 11 2018, 9:56 AM
zvi committed rL322297: X86 Tests: Add zext cases in (trunc (subvector)) test. NFC.
X86 Tests: Add zext cases in (trunc (subvector)) test. NFC
Jan 11 2018, 9:51 AM
zvi committed rL322296: X86: Refactor type-splitting to target-legal size vector to a helper function.
X86: Refactor type-splitting to target-legal size vector to a helper function
Jan 11 2018, 9:31 AM
zvi closed D41925: X86: Refactor type-splitting to target-legal size vector to a helper function.
Jan 11 2018, 9:31 AM
zvi updated the diff for D41925: X86: Refactor type-splitting to target-legal size vector to a helper function.

Add asserions for type sizes and fix typo in comment

Jan 11 2018, 9:25 AM
zvi committed rL322272: X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices.
X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices
Jan 11 2018, 4:28 AM
zvi closed D41865: X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices.
Jan 11 2018, 4:28 AM

Jan 10 2018

zvi added a dependent revision for D41925: X86: Refactor type-splitting to target-legal size vector to a helper function: D41811: X86: Add pattern matching for PMADDWD.
Jan 10 2018, 4:53 PM
zvi added a dependency for D41811: X86: Add pattern matching for PMADDWD: D41925: X86: Refactor type-splitting to target-legal size vector to a helper function.
Jan 10 2018, 4:53 PM
zvi added a dependent revision for D41811: X86: Add pattern matching for PMADDWD: D41927: DAGCombine: Let truncates negate extension through extract-subvector.
Jan 10 2018, 4:53 PM
zvi added a dependency for D41927: DAGCombine: Let truncates negate extension through extract-subvector: D41811: X86: Add pattern matching for PMADDWD.
Jan 10 2018, 4:53 PM
zvi created D41927: DAGCombine: Let truncates negate extension through extract-subvector.
Jan 10 2018, 4:51 PM
zvi updated the diff for D41811: X86: Add pattern matching for PMADDWD.

Reabase on top D41925

Jan 10 2018, 4:48 PM
zvi created D41925: X86: Refactor type-splitting to target-legal size vector to a helper function.
Jan 10 2018, 4:45 PM
zvi added a comment to D40055: [SelectionDAG][X86] Explicitly store the scale in the gather/scatter ISD nodes.

There are some occurrences of calls to getMaskedGather in DAGCombine.cpp which i do not see being addressed by this patch. I guess they are not being covered by tests?

Jan 10 2018, 9:50 AM
zvi updated the diff for D41865: X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices.

Fix issue identified by Simon: use original vector type for the insert_vector

Jan 10 2018, 9:25 AM
zvi committed rL322192: X86 Tests: Add isel tests for truncate-extract_vector-extend. NFC..
X86 Tests: Add isel tests for truncate-extract_vector-extend. NFC.
Jan 10 2018, 6:57 AM
zvi added inline comments to D41811: X86: Add pattern matching for PMADDWD.
Jan 10 2018, 5:54 AM
zvi added inline comments to D40055: [SelectionDAG][X86] Explicitly store the scale in the gather/scatter ISD nodes.
Jan 10 2018, 4:32 AM
zvi added a comment to D41865: X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices.
In D41865#971307, @zvi wrote:

Sure, but looking at your example the return type should have the same number of elements as the indices vector, right?

Yup, sorry for the typo. Are you intending to support cases like this?

Jan 10 2018, 4:20 AM
zvi updated the diff for D41811: X86: Add pattern matching for PMADDWD.

Average lowering fully using the refactored type-splitting code.

Jan 10 2018, 1:15 AM
zvi updated the diff for D41811: X86: Add pattern matching for PMADDWD.
  1. Following Simon's suggestion, refactored out the code that splits the vector to legal-types to 'LowerBinTo' (the function name probably needs revision)) and applied to PMADDWD.
  2. Added a missing DAGCombine to let a truncate negate a sext through an EXTRACT_SUBVECTOR.
Jan 10 2018, 12:47 AM

Jan 9 2018

zvi updated the diff for D41811: X86: Add pattern matching for PMADDWD.

Fixes for Craig's comments

Jan 9 2018, 12:31 PM
zvi added inline comments to D41811: X86: Add pattern matching for PMADDWD.
Jan 9 2018, 12:09 PM
zvi updated the diff for D41865: X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices.

Added test with source vector larger than indices vector

Jan 9 2018, 10:30 AM
zvi added a comment to D41865: X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices.

Sure, but looking at your example the return type should have the same number of elements as the indices vector, right?

Jan 9 2018, 10:04 AM
zvi accepted D41850: [X86] Add a DAG combine to combine (sext (setcc)) with VLX.

LGTM

Jan 9 2018, 8:47 AM
zvi committed rL322090: X86 Tests: Update more isel tests with FastVariableShuffle feature.
X86 Tests: Update more isel tests with FastVariableShuffle feature
Jan 9 2018, 8:27 AM
This revision was not accepted when it landed; it landed in state Needs Review.
Jan 9 2018, 8:27 AM
zvi updated the diff for D41851: X86 Tests: Update more isel tests with FastVariableShuffle feature.

Rebase + apply fixes for Simon's comments. Will commit this change right away to avoid conflicts.

Jan 9 2018, 8:20 AM
zvi committed rL322089: X86 Tests: Add common check prefix to test-case. NFC..
X86 Tests: Add common check prefix to test-case. NFC.
Jan 9 2018, 8:15 AM
zvi added inline comments to D41851: X86 Tests: Update more isel tests with FastVariableShuffle feature.
Jan 9 2018, 8:01 AM
zvi created D41865: X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices.
Jan 9 2018, 7:53 AM
zvi added a comment to D41851: X86 Tests: Update more isel tests with FastVariableShuffle feature.
Assuming D41436 is accepted, is the plan to remove the +fast-variable-shuffle arg from the avx512 cases? In which case might it make sense to commit the avx2 and avx512 changes separately?
Jan 9 2018, 5:04 AM

Jan 8 2018

zvi created D41851: X86 Tests: Update more isel tests with FastVariableShuffle feature.
Jan 8 2018, 11:20 PM

Jan 7 2018

zvi added a comment to D41062: [X86] Legalize v2i32 via widening rather than promoting.

There are some regressions that need to be addressed (or we decide to accept), but overall your approach seems right to me.

Jan 7 2018, 11:02 PM
zvi added inline comments to D41811: X86: Add pattern matching for PMADDWD.
Jan 7 2018, 2:42 PM
zvi added a comment to D41436: [X86][AVX512] Enable variable shuffle combining by default on AVX512 targets.
In D41436#969481, @zvi wrote:

Still trying to get hold of a KNL expert that will answer whether KNL should be included. Can we for now conservatively assume no and exclude KNL from this patch just so this patch can make progress? I want to follow-up on updating the AVX2 tests with FastVariableShuffle configurations.

Isn't that what we have already? Skylake etc all have FeatureFastVariableShuffle enabled, the issue with this patch was whether we should enable it for the avx512 attribute and not just on a per-cpu basis.

Jan 7 2018, 1:22 PM
zvi created D41811: X86: Add pattern matching for PMADDWD.
Jan 7 2018, 12:31 PM
zvi committed rL321970: X86 Tests: Add Tests for PMADDWD selection. NFC..
X86 Tests: Add Tests for PMADDWD selection. NFC.
Jan 7 2018, 12:22 PM
zvi added a comment to D41436: [X86][AVX512] Enable variable shuffle combining by default on AVX512 targets.

Still trying to get hold of a KNL expert that will answer whether KNL should be included. Can we for now conservatively assume no and exclude KNL from this patch just so this patch can make progress? I want to follow-up on updating the AVX2 tests with FastVariableShuffle configurations.

Jan 7 2018, 11:37 AM

Dec 24 2017

zvi added inline comments to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
Dec 24 2017, 6:08 AM

Dec 23 2017

zvi added a comment to D41480: Unsigned saturation subtraction canonicalization [Instcombine part].

LGTM, but would like to see a more seasoned InstCombine contributer take a look before giving a final ok

Dec 23 2017, 11:45 PM
zvi added a reviewer for D41480: Unsigned saturation subtraction canonicalization [Instcombine part]: craig.topper.
Dec 23 2017, 11:43 PM

Dec 21 2017

zvi added inline comments to D41480: Unsigned saturation subtraction canonicalization [Instcombine part].
Dec 21 2017, 4:46 AM
zvi added a comment to D41436: [X86][AVX512] Enable variable shuffle combining by default on AVX512 targets.

@gadi.haber This change means that KNL will be more aggressive with shuffle combining as well - is that OK?

Dec 21 2017, 3:46 AM

Dec 20 2017

zvi added a comment to D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.

I've created D41436 - the main issue is whether KNL prefers variable shuffles the same as SkylakeServer

Dec 20 2017, 2:51 PM
zvi added a comment to D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.

Since all known processors with AVX512 will prefer this new feature turned on, can we make AVX512 imply Fast-var-shuffles?

Dec 20 2017, 2:12 AM

Dec 18 2017

zvi accepted D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.
In D41323#958558, @zvi wrote:

Here's the full list of tests that are affected by setting AllowVariableMask for Depth=2. I think that we should have the full list covered with the new configuration.
I would be happy to assist with the work involved.

Sure - as long as we're testing with the fast and slow cases for them all. Are you happy with this patch with its tests changes as it is and you just update the remaining tests as followups?

Dec 18 2017, 2:25 PM
zvi added a comment to D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.

Here's the full list of tests that are affected by setting AllowVariableMask for Depth=2. I think that we should have the full list covered with the new configuration.
I would be happy to assist with the work involved.

Dec 18 2017, 9:58 AM
zvi requested changes to D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.

On second thought, i think we need to update the tests with a -mattr=+fast-variable-shuffle configuration, right?

Dec 18 2017, 6:33 AM
zvi added a comment to D40865: X86 AVX2: Prefer one VPERMV over ShuffleAsRepeatedMaskAndLanePermute.

@zvi Are you happy with my proposal in D41323?

Dec 18 2017, 6:29 AM
zvi added a comment to D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.

LGTM. Thanks

Dec 18 2017, 6:26 AM

Dec 12 2017

zvi added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Ping

Dec 12 2017, 10:06 PM

Dec 11 2017

zvi added a comment to D40865: X86 AVX2: Prefer one VPERMV over ShuffleAsRepeatedMaskAndLanePermute.
In D40865#948072, @zvi wrote:

@RKSimon, I'm not too familiar with the MachineCombiner. Are there already any shuffle cases that are handled or was that wishful thinking? :)

Yes, it keeps being proposed but it's a big job, part of the idea behind D40602 was to show how it'd work in principle for a much simpler case (double shifts) than shuffles. The idea would be to perform more aggressive combining to variable shuffles (PSHUFB/VPERMPS etc.) in the MC, so we'd still keep to the '3 shuffles limit' for variable mask folding in DAG lowering as that works better for AMD Jaguar/Bulldozer/Zen and older Intel cores, and then the MC driven by the scheduler models tries again later on. But there's still concerns that there will be plenty of regressions due to register pressure, load latency etc. and whether the code really is port5 bound....

A second (temporary?) option mentioned in D38318 was to add a feature flag for more recent intel cores that reduced the 'AllowVariableMask depth limit' in combineX86ShuffleChain to 2.

Dec 11 2017, 11:25 PM

Dec 7 2017

zvi added a comment to D40865: X86 AVX2: Prefer one VPERMV over ShuffleAsRepeatedMaskAndLanePermute.

@spatel, this patch is for lowerV8I32VectorShuffle() which won't be called for AVX1-only targets. Would be nice if we could somehow get AVX covered as well, if profitable.
I did not observe any speedups with this patch, but FWIW IACA reports that (for Intel processors, of course) the throughput can be higher even if the load is not hoisted.
What triggered this patch was a case i discovered while working on deprecation of llvm.x86.avx2.permd and llvm.x86.avx2.permps. After trashing these intrinsics that case ends up with:

Dec 7 2017, 6:00 AM