Page MenuHomePhabricator
Feed Advanced Search

Jan 9 2018

zvi closed D41851: X86 Tests: Update more isel tests with FastVariableShuffle feature.
Jan 9 2018, 8:27 AM
zvi updated the diff for D41851: X86 Tests: Update more isel tests with FastVariableShuffle feature.

Rebase + apply fixes for Simon's comments. Will commit this change right away to avoid conflicts.

Jan 9 2018, 8:20 AM
zvi committed rL322089: X86 Tests: Add common check prefix to test-case. NFC..
X86 Tests: Add common check prefix to test-case. NFC.
Jan 9 2018, 8:15 AM
zvi added inline comments to D41851: X86 Tests: Update more isel tests with FastVariableShuffle feature.
Jan 9 2018, 8:01 AM
zvi created D41865: X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices.
Jan 9 2018, 7:53 AM
zvi added a comment to D41851: X86 Tests: Update more isel tests with FastVariableShuffle feature.
Assuming D41436 is accepted, is the plan to remove the +fast-variable-shuffle arg from the avx512 cases? In which case might it make sense to commit the avx2 and avx512 changes separately?
Jan 9 2018, 5:04 AM

Jan 8 2018

zvi created D41851: X86 Tests: Update more isel tests with FastVariableShuffle feature.
Jan 8 2018, 11:20 PM

Jan 7 2018

zvi added a comment to D41062: [X86] Legalize v2i32 via widening rather than promoting.

There are some regressions that need to be addressed (or we decide to accept), but overall your approach seems right to me.

Jan 7 2018, 11:02 PM
zvi added inline comments to D41811: X86: Add pattern matching for PMADDWD.
Jan 7 2018, 2:42 PM
zvi added a comment to D41436: [X86][AVX512] Enable variable shuffle combining by default on AVX512 targets.
In D41436#969481, @zvi wrote:

Still trying to get hold of a KNL expert that will answer whether KNL should be included. Can we for now conservatively assume no and exclude KNL from this patch just so this patch can make progress? I want to follow-up on updating the AVX2 tests with FastVariableShuffle configurations.

Isn't that what we have already? Skylake etc all have FeatureFastVariableShuffle enabled, the issue with this patch was whether we should enable it for the avx512 attribute and not just on a per-cpu basis.

Jan 7 2018, 1:22 PM
zvi created D41811: X86: Add pattern matching for PMADDWD.
Jan 7 2018, 12:31 PM
zvi committed rL321970: X86 Tests: Add Tests for PMADDWD selection. NFC..
X86 Tests: Add Tests for PMADDWD selection. NFC.
Jan 7 2018, 12:22 PM
zvi added a comment to D41436: [X86][AVX512] Enable variable shuffle combining by default on AVX512 targets.

Still trying to get hold of a KNL expert that will answer whether KNL should be included. Can we for now conservatively assume no and exclude KNL from this patch just so this patch can make progress? I want to follow-up on updating the AVX2 tests with FastVariableShuffle configurations.

Jan 7 2018, 11:37 AM

Dec 24 2017

zvi added inline comments to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
Dec 24 2017, 6:08 AM

Dec 23 2017

zvi added a comment to D41480: Unsigned saturation subtraction canonicalization [Instcombine part].

LGTM, but would like to see a more seasoned InstCombine contributer take a look before giving a final ok

Dec 23 2017, 11:45 PM
zvi added a reviewer for D41480: Unsigned saturation subtraction canonicalization [Instcombine part]: craig.topper.
Dec 23 2017, 11:43 PM

Dec 21 2017

zvi added inline comments to D41480: Unsigned saturation subtraction canonicalization [Instcombine part].
Dec 21 2017, 4:46 AM
zvi added a comment to D41436: [X86][AVX512] Enable variable shuffle combining by default on AVX512 targets.

@gadi.haber This change means that KNL will be more aggressive with shuffle combining as well - is that OK?

Dec 21 2017, 3:46 AM

Dec 20 2017

zvi added a comment to D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.

I've created D41436 - the main issue is whether KNL prefers variable shuffles the same as SkylakeServer

Dec 20 2017, 2:51 PM
zvi added a comment to D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.

Since all known processors with AVX512 will prefer this new feature turned on, can we make AVX512 imply Fast-var-shuffles?

Dec 20 2017, 2:12 AM

Dec 18 2017

zvi accepted D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.
In D41323#958558, @zvi wrote:

Here's the full list of tests that are affected by setting AllowVariableMask for Depth=2. I think that we should have the full list covered with the new configuration.
I would be happy to assist with the work involved.

Sure - as long as we're testing with the fast and slow cases for them all. Are you happy with this patch with its tests changes as it is and you just update the remaining tests as followups?

Dec 18 2017, 2:25 PM
zvi added a comment to D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.

Here's the full list of tests that are affected by setting AllowVariableMask for Depth=2. I think that we should have the full list covered with the new configuration.
I would be happy to assist with the work involved.

Dec 18 2017, 9:58 AM
zvi requested changes to D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.

On second thought, i think we need to update the tests with a -mattr=+fast-variable-shuffle configuration, right?

Dec 18 2017, 6:33 AM
zvi added a comment to D40865: X86 AVX2: Prefer one VPERMV over ShuffleAsRepeatedMaskAndLanePermute.

@zvi Are you happy with my proposal in D41323?

Dec 18 2017, 6:29 AM
zvi added a comment to D41323: [X86][SSE] Add cpu feature for aggressive combining to variable shuffles.

LGTM. Thanks

Dec 18 2017, 6:26 AM

Dec 12 2017

zvi added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Ping

Dec 12 2017, 10:06 PM

Dec 11 2017

zvi added a comment to D40865: X86 AVX2: Prefer one VPERMV over ShuffleAsRepeatedMaskAndLanePermute.
In D40865#948072, @zvi wrote:

@RKSimon, I'm not too familiar with the MachineCombiner. Are there already any shuffle cases that are handled or was that wishful thinking? :)

Yes, it keeps being proposed but it's a big job, part of the idea behind D40602 was to show how it'd work in principle for a much simpler case (double shifts) than shuffles. The idea would be to perform more aggressive combining to variable shuffles (PSHUFB/VPERMPS etc.) in the MC, so we'd still keep to the '3 shuffles limit' for variable mask folding in DAG lowering as that works better for AMD Jaguar/Bulldozer/Zen and older Intel cores, and then the MC driven by the scheduler models tries again later on. But there's still concerns that there will be plenty of regressions due to register pressure, load latency etc. and whether the code really is port5 bound....

A second (temporary?) option mentioned in D38318 was to add a feature flag for more recent intel cores that reduced the 'AllowVariableMask depth limit' in combineX86ShuffleChain to 2.

Dec 11 2017, 11:25 PM

Dec 7 2017

zvi added a comment to D40865: X86 AVX2: Prefer one VPERMV over ShuffleAsRepeatedMaskAndLanePermute.

@spatel, this patch is for lowerV8I32VectorShuffle() which won't be called for AVX1-only targets. Would be nice if we could somehow get AVX covered as well, if profitable.
I did not observe any speedups with this patch, but FWIW IACA reports that (for Intel processors, of course) the throughput can be higher even if the load is not hoisted.
What triggered this patch was a case i discovered while working on deprecation of llvm.x86.avx2.permd and llvm.x86.avx2.permps. After trashing these intrinsics that case ends up with:

Dec 7 2017, 6:00 AM

Dec 6 2017

zvi added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Ping

Dec 6 2017, 11:13 AM
zvi committed rL319910: InstructionSimplify: 'extractelement' with an undef index is undef.
InstructionSimplify: 'extractelement' with an undef index is undef
Dec 6 2017, 9:52 AM
zvi closed D40231: InstructionSimplify: 'extractelement' with an undef index is undef by committing rL319910: InstructionSimplify: 'extractelement' with an undef index is undef.
Dec 6 2017, 9:52 AM
zvi updated the diff for D40231: InstructionSimplify: 'extractelement' with an undef index is undef.

Final rebase

Dec 6 2017, 9:50 AM
zvi committed rL319907: AMDGPU Tests: Change a case to be run with -O0.
AMDGPU Tests: Change a case to be run with -O0
Dec 6 2017, 9:40 AM

Dec 5 2017

zvi created D40865: X86 AVX2: Prefer one VPERMV over ShuffleAsRepeatedMaskAndLanePermute.
Dec 5 2017, 3:21 PM
zvi updated the diff for D40231: InstructionSimplify: 'extractelement' with an undef index is undef.

Preseve migrated test-case's CHECK:'s which as @arnsenm pointed out can used as-is for -O0.

Dec 5 2017, 1:45 PM

Dec 4 2017

zvi added inline comments to D40231: InstructionSimplify: 'extractelement' with an undef index is undef.
Dec 4 2017, 11:40 AM

Dec 3 2017

zvi added inline comments to D40231: InstructionSimplify: 'extractelement' with an undef index is undef.
Dec 3 2017, 10:41 PM
zvi updated the diff for D40231: InstructionSimplify: 'extractelement' with an undef index is undef.

Following Sanjay's suggestion, moving test-case from indirect-addressing-si.ll to indirect-addressing-si-noopt.ll.
If this looks ok i will commit the changes to these tests in a separate commit.

Dec 3 2017, 3:30 PM

Nov 30 2017

zvi accepted D40290: [X86] Fix a bug in handling GRXX subclasses in Domain Reassignment pass.

LGTM

Nov 30 2017, 3:23 AM

Nov 29 2017

zvi added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
In D38313#937269, @zvi wrote:

I'm really worried that the compile time hit of this for LTO will be non-negligible. Do you have numbers?

Will follow-up on this.

Nov 29 2017, 11:10 PM

Nov 28 2017

zvi added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

I'm really worried that the compile time hit of this for LTO will be non-negligible. Do you have numbers?

Nov 28 2017, 12:03 AM

Nov 27 2017

zvi added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Two comments on the trunc thing:

  1. Thank you!!! As a GPU target maintainer, one of my main frustrations is how much LLVM *loves* to generate code that is needlessly too wide when smaller would do. We mostly have avoided this problem due to being float-heavy, but as integer code becomes more important, I absolutely love any chance I can get to reduce 32-bit to 16-bit and save register space accordingly.

Sometimes it's LLVM, and sometimes it's the frontend that is required to extend small typed values before performing operations.

Nov 27 2017, 11:59 PM
zvi updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Add missing AggressiveInstCombine.h and fix missing 'opt' dependency. Thanks, @lsaba, for noticing.

Nov 27 2017, 11:34 PM

Nov 26 2017

zvi added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

ping

Nov 26 2017, 1:10 AM

Nov 22 2017

zvi accepted D39729: [X86][SSE] Use (V)PHMINPOSUW for vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions (PR32841).

LGTM

Nov 22 2017, 10:44 PM
zvi added a comment to D40330: Separate ExecutionDepsFix into 4 parts - enable breaking false dependencies for all reg classes..

LGTM with a minor comment. Thanks!

Nov 22 2017, 9:26 AM
zvi added a comment to D40334: [X86] Break false dependencies for POPCNT, LZCNT, TZCNT.

I think we will need a subtarget feature indicating whether the false-dependency on these instructions happens. For starters, does this happen in AMD processors?

Nov 22 2017, 9:24 AM
zvi added inline comments to D40290: [X86] Fix a bug in handling GRXX subclasses in Domain Reassignment pass.
Nov 22 2017, 8:02 AM

Nov 20 2017

zvi added a comment to D40231: InstructionSimplify: 'extractelement' with an undef index is undef.

One option is to run llc with -O0 which will disable passes which use InstructionSimplify, but this will require changing the CHECK's to expect different generated code. If the purpose of the test is to not crash in the Verifier, are the CHECK's needed?

Nov 20 2017, 11:00 PM
zvi added inline comments to D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction.
Nov 20 2017, 11:00 AM

Nov 19 2017

zvi added a comment to D40231: InstructionSimplify: 'extractelement' with an undef index is undef.

With this patch i am seeing failures in test/CodeGen/AMDGPU/indirect-addressing-si.ll . This test include extractelement instructions with undef indices, which is what this patch targets. I would appreciate your help with advice on how we can modify these tests so that they can be used safely. Please bear in mind that i have no knowledge of the AMDGPU backend.

Nov 19 2017, 3:07 PM
zvi created D40231: InstructionSimplify: 'extractelement' with an undef index is undef.
Nov 19 2017, 3:01 PM
zvi added inline comments to D39952: [X86][X87]: Adding full coverage of MC encoding for all X87 ISA Sets.<NFC>.
Nov 19 2017, 12:41 PM
zvi updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Rebase on ToT. NFC in this revision.

Nov 19 2017, 12:24 PM
zvi updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Address the last of Craig's comments:

  • Thanks, @lsaba, for porting the pass to the new PassManager.
  • Removed shrinkage of vector types until we sort out if it is generally allowed to shrink element types of vector operations.
  • Some minor fixes to comments.
Nov 19 2017, 12:15 PM
zvi added inline comments to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
Nov 19 2017, 12:51 AM

Nov 16 2017

zvi updated subscribers of D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
Nov 16 2017, 6:31 AM
zvi updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Address some of Craig's recent comments.

Nov 16 2017, 4:47 AM
zvi added inline comments to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
Nov 16 2017, 4:45 AM

Nov 12 2017

zvi added inline comments to D39840: [MC][X86] Code padding for performance stability - Branch instructions and targets alignment.
Nov 12 2017, 11:20 PM

Nov 9 2017

zvi commandeered D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Commandeering this patch while Amjad is away on a few weeks of vacation.
Also sending a friendly ping to the reviewers. AFAIK, all comments were addressed as of the latest revision of this patch. Please let me know if i missed anything. Thanks.

Nov 9 2017, 12:44 AM

Nov 7 2017

zvi added a comment to D38417: [test-suite] Adding HACCKernels app.

Since the final commit of this patch, rL317483, the AVX2 buildbot is broken: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/1402

Nov 7 2017, 10:17 PM

Nov 6 2017

zvi committed rL317463: X86 ISel: Basic support for variable-index vector permutations.
X86 ISel: Basic support for variable-index vector permutations
Nov 6 2017, 12:26 AM
zvi closed D39126: X86 ISel: Basic support for variable-index vector permutations by committing rL317463: X86 ISel: Basic support for variable-index vector permutations.
Nov 6 2017, 12:26 AM
zvi updated the diff for D39126: X86 ISel: Basic support for variable-index vector permutations.

Last minute changes in the spirit of Simon's suggestion

Nov 6 2017, 12:25 AM

Nov 2 2017

zvi updated the diff for D39126: X86 ISel: Basic support for variable-index vector permutations.

Follow Simon's suggestion to pull the oth index scan into the loop.
Fix typo + style defects.

Nov 2 2017, 3:37 PM
zvi added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Thanks, Amjad! This patch LGTM, but i think it would be best to wait for an LGTM from one of the assigned reviewers.

Nov 2 2017, 8:32 AM
zvi added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Some more minor comments

Nov 2 2017, 7:07 AM

Nov 1 2017

zvi added a comment to D39476: [X86][SSE] Truncate with PACKSS any input with sufficient sign-bits.

LGTM

Nov 1 2017, 2:31 AM

Oct 30 2017

zvi updated the diff for D39126: X86 ISel: Basic support for variable-index vector permutations.

Add support for floating-point types.

Oct 30 2017, 2:20 PM
zvi committed rL316946: X86 Tests: Update the variable-index permute tests with FP types. NFC..
X86 Tests: Update the variable-index permute tests with FP types. NFC.
Oct 30 2017, 12:29 PM
zvi added a comment to D39126: X86 ISel: Basic support for variable-index vector permutations.

ping

Oct 30 2017, 4:37 AM
zvi added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

I know you decided to move this to a new pass, but here's a couple of more comments that will be relevant.

Oct 30 2017, 4:32 AM
zvi accepted D38684: [X86][AVX512] lowering broadcastm intrinsic - llvm part.

LGTM

Oct 30 2017, 4:08 AM
zvi accepted D39411: [X86] Make sure we don't create locked inc/dec instructions when the carry flag is being used..

LGTM

Oct 30 2017, 4:05 AM
zvi added a comment to D39402: [X86] Prevent fast isel from folding loads into the instructions listed in hasPartialRegUpdate..

LGTM, with a minor concern that with this change FastISel is a bit slower because the bail-out condition was moved to a later point?

Oct 30 2017, 3:14 AM

Oct 29 2017

zvi added a comment to D37775: Add a verifier test to check the access on both sides of COPY are the same.

Hi All,

Oct 29 2017, 2:09 AM

Oct 26 2017

zvi added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Amjad, some questions about where do we want this work to evolve to.

Oct 26 2017, 6:45 AM

Oct 24 2017

zvi added inline comments to D38730: X86: Fix X86CallFrameOptimization to search for the COPY StackPointer.
Oct 24 2017, 3:25 PM
zvi committed rL316434: X86CallFrameOptimization: Update comments and variable names. NFCI..
X86CallFrameOptimization: Update comments and variable names. NFCI.
Oct 24 2017, 6:25 AM
zvi committed rL316431: X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms.
X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms
Oct 24 2017, 5:13 AM
zvi closed D38738: X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms by committing rL316431: X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms.
Oct 24 2017, 5:13 AM
zvi added inline comments to D38738: X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms.
Oct 24 2017, 4:56 AM
zvi committed rL316416: X86: Fix X86CallFrameOptimization to search for the COPY StackPointer.
X86: Fix X86CallFrameOptimization to search for the COPY StackPointer
Oct 24 2017, 12:39 AM
zvi added inline comments to D38730: X86: Fix X86CallFrameOptimization to search for the COPY StackPointer.
Oct 24 2017, 12:39 AM
zvi closed D38730: X86: Fix X86CallFrameOptimization to search for the COPY StackPointer by committing rL316416: X86: Fix X86CallFrameOptimization to search for the COPY StackPointer.
Oct 24 2017, 12:38 AM

Oct 23 2017

zvi committed rL316412: X86: Register the X86CallFrameOptimization pass.
X86: Register the X86CallFrameOptimization pass
Oct 23 2017, 10:48 PM
zvi closed D38729: X86: Register the X86CallFrameOptimization pass by committing rL316412: X86: Register the X86CallFrameOptimization pass.
Oct 23 2017, 10:48 PM

Oct 20 2017

zvi created D39126: X86 ISel: Basic support for variable-index vector permutations.
Oct 20 2017, 8:39 AM
zvi committed rL316216: X86 Tests: Add tests for vector permutes with variable indices. NFC..
X86 Tests: Add tests for vector permutes with variable indices. NFC.
Oct 20 2017, 8:32 AM

Oct 19 2017

zvi added a comment to D39077: [X86] Teach the assembly parser to warn on duplicate registers in gather instructions..

LGTM

Oct 19 2017, 4:35 AM
zvi added a reviewer for D39077: [X86] Teach the assembly parser to warn on duplicate registers in gather instructions.: coby.
Oct 19 2017, 4:28 AM

Oct 18 2017

zvi added a comment to D38738: X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms.

ping

Oct 18 2017, 6:30 AM
zvi added a comment to D38730: X86: Fix X86CallFrameOptimization to search for the COPY StackPointer.

ping

Oct 18 2017, 6:30 AM
zvi added a comment to D38729: X86: Register the X86CallFrameOptimization pass.

ping

Oct 18 2017, 6:29 AM

Oct 17 2017

zvi added a comment to D36597: DAG: Fix creating select with wrong condition type.

The changes in WidenVSELECTAndMask and the X86 test LGTM.

Oct 17 2017, 8:39 AM

Oct 16 2017

zvi accepted D37251: [X86] Add a pass to convert instruction chains between domains.

LGTM

Oct 16 2017, 1:16 PM

Oct 10 2017

zvi added a parent revision for D38738: X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms: D38730: X86: Fix X86CallFrameOptimization to search for the COPY StackPointer.
Oct 10 2017, 8:18 AM
zvi added a child revision for D38730: X86: Fix X86CallFrameOptimization to search for the COPY StackPointer: D38738: X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms.
Oct 10 2017, 8:18 AM
zvi updated the summary of D38738: X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms.
Oct 10 2017, 8:18 AM