Page MenuHomePhabricator

dtemirbulatov (Dinar Temirbulatov)
User

Projects

User does not belong to any projects.

User Details

User Since
Sep 17 2015, 10:06 AM (196 w, 6 d)

Recent Activity

Thu, Jun 13

dtemirbulatov accepted D60897: [SLP] Look-ahead operand reordering heuristic..

LGTM.

Thu, Jun 13, 5:39 AM · Restricted Project

Wed, Jun 12

dtemirbulatov added a comment to D60897: [SLP] Look-ahead operand reordering heuristic..

hmm, I have another failure with this change on my setup, now it is PR39774.ll. Probably, it might be sort algorithm differents of similar, since it just swap of two inserts. I don't have this failure if PR39774.ll is intact.

Wed, Jun 12, 6:07 PM · Restricted Project
dtemirbulatov committed rGb2f45ba1e8ac: [SLP] Update propagate_ir_flags.ll test to check that we do retain the common… (authored by dtemirbulatov).
[SLP] Update propagate_ir_flags.ll test to check that we do retain the common…
Wed, Jun 12, 5:19 PM
dtemirbulatov added a comment to D62938: [SLP] Forbid to vectorize bundles with same opcode but different IR flags.

Please can you add a better explanation of the problem to the description of the patch?
I'm not sure what the problem is, you are allowed to drop nuw/nsw flags: https://rise4fun.com/Alive/plNm
So the new vectorized binop should simply take the smallest common subset of flags, which likely most often means no flags.

@dtemirbulatov It might be worth adding tests that check that we do retain the common subset?

Wed, Jun 12, 5:18 PM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Wed, Jun 12, 11:26 AM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Wed, Jun 12, 11:15 AM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Wed, Jun 12, 8:23 AM

Tue, Jun 11

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Tue, Jun 11, 12:35 PM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Tue, Jun 11, 12:27 PM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Tue, Jun 11, 12:21 PM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Tue, Jun 11, 11:19 AM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Tue, Jun 11, 11:15 AM
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

fixed a typo.

Tue, Jun 11, 11:08 AM
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, fixed all remarks.

Tue, Jun 11, 10:32 AM

Thu, Jun 6

dtemirbulatov added a comment to D62938: [SLP] Forbid to vectorize bundles with same opcode but different IR flags.

Please can you add a better explanation of the problem to the description of the patch?
I'm not sure what the problem is, you are allowed to drop nuw/nsw flags: https://rise4fun.com/Alive/plNm
So the new vectorized binop should simply take the smallest common subset of flags, which likely most often means no flags.

@dtemirbulatov It might be worth adding tests that check that we do retain the common subset?

Thu, Jun 6, 4:42 PM
dtemirbulatov abandoned D62938: [SLP] Forbid to vectorize bundles with same opcode but different IR flags.

Please can you add a better explanation of the problem to the description of the patch?
I'm not sure what the problem is, you are allowed to drop nuw/nsw flags: https://rise4fun.com/Alive/plNm
So the new vectorized binop should simply take the smallest common subset of flags, which likely most often means no flags.

Thu, Jun 6, 12:06 PM

Wed, Jun 5

dtemirbulatov created D62938: [SLP] Forbid to vectorize bundles with same opcode but different IR flags.
Wed, Jun 5, 6:57 PM
dtemirbulatov committed rG15c657d13d6f: [SLP] Fix regression in broadcasts caused by operand reordering patch D59973. (authored by dtemirbulatov).
[SLP] Fix regression in broadcasts caused by operand reordering patch D59973.
Wed, Jun 5, 8:25 AM

Tue, Jun 4

dtemirbulatov accepted D62427: [SLP] Fix regression in broadcasts caused by operand reordering patch D59973..

LGTM.

Tue, Jun 4, 7:06 AM · Restricted Project
dtemirbulatov added a comment to D62427: [SLP] Fix regression in broadcasts caused by operand reordering patch D59973..

looks good to me.

Tue, Jun 4, 7:03 AM · Restricted Project

Mon, Jun 3

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Mon, Jun 3, 6:01 AM

Sat, Jun 1

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Addressing last remarks, rebased.

Sat, Jun 1, 9:49 PM

Tue, May 28

dtemirbulatov added inline comments to D62432: [SLPVectorizer] Make the scheduler aware of the TreeEntry operands..
Tue, May 28, 2:49 PM · Restricted Project
dtemirbulatov added inline comments to D62432: [SLPVectorizer] Make the scheduler aware of the TreeEntry operands..
Tue, May 28, 12:54 PM · Restricted Project
dtemirbulatov added inline comments to D62432: [SLPVectorizer] Make the scheduler aware of the TreeEntry operands..
Tue, May 28, 12:22 PM · Restricted Project

May 23 2019

dtemirbulatov added a comment to D60897: [SLP] Look-ahead operand reordering heuristic..

oh, I notice some regression with the change for AArch64/matmul.ll and AArch64/transpose.ll. Maybe there is a way to isolate it with heuristics?

May 23 2019, 12:00 PM · Restricted Project

May 22 2019

dtemirbulatov added inline comments to D60897: [SLP] Look-ahead operand reordering heuristic..
May 22 2019, 12:28 PM · Restricted Project

May 21 2019

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

I think the first throttling patch should implement a very simple and fast algorithm for finding the cut:

  1. Add new fields to TreeEntry for Cost, ExtractCost and PredecessorsCost.
  2. During getTreeCost() set the TE.Cost and TE.ExtractCost (as you did in an earlier version of the patch if I am not mistaken)
  3. Do a single top-down traversal of the tree in reverse postorder and set the TE.PredecessorsCost equal to the cost of all the predecessor's costs until TE. While doing so, you can compare the cost of cutting just below TE by comparing the gather cost of TE versus the Cost + PredecessorsCost. This is very fast as you only need to visit each TreeEntry node once, so the complexity is linear to the size of the tree.

    For example, in slp-throttle.ll the bundle that needs to be scalarized [%add19, %sub22] has costs of Cost=1, ExtractCost = 0, PredecessorsCost=1 (because of bundle [%mul18, undef]). Cutting below the bundle has a cost of +1, while keeping it vectorized has a cost of +2 (Cost=1 + PredecessorsCost=1).

    This should be good-enough for most simple cases. We can improve it later, if needed, with follow-up patches. What do you think?

yes, Looks good.

May 21 2019, 6:33 AM

May 20 2019

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
May 20 2019, 5:11 PM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
May 20 2019, 5:08 PM
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased.

May 20 2019, 5:59 AM

May 19 2019

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

ping

May 19 2019, 3:43 PM

May 18 2019

dtemirbulatov committed rG2ff72f665417: [SLP] Refactoring of EdgeInfo and UserTreeIdx in buildTree_rec(). (authored by dtemirbulatov).
[SLP] Refactoring of EdgeInfo and UserTreeIdx in buildTree_rec().
May 18 2019, 6:29 PM
dtemirbulatov accepted D61795: [SLP] Refactoring of EdgeInfo and UserTreeIdx in buildTree_rec()..

LGTM.

May 18 2019, 4:41 PM · Restricted Project
dtemirbulatov added a comment to D61795: [SLP] Refactoring of EdgeInfo and UserTreeIdx in buildTree_rec()..

Looks good to me either.

May 18 2019, 7:01 AM · Restricted Project

May 16 2019

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Added comments to cutBranch() and extendSubTree() functions. Removed dupicate variable "I" in extendSubTree() function.

May 16 2019, 6:06 PM
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Again, the slp-throttling is on to just show the impact on SLP tests.

May 16 2019, 12:34 PM
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Looks like fixed all remark, rebased.

May 16 2019, 12:17 PM

May 10 2019

dtemirbulatov accepted D61706: [SLP] Refactor VectorizableTree to use unique_ptr..

LGTM.

May 10 2019, 6:07 AM · Restricted Project

May 9 2019

dtemirbulatov added a comment to D61706: [SLP] Refactor VectorizableTree to use unique_ptr..

looks like "class nodes_iterator" definition at 1856 is not formatted properly.

May 9 2019, 7:45 PM · Restricted Project
dtemirbulatov requested changes to D61706: [SLP] Refactor VectorizableTree to use unique_ptr..
May 9 2019, 7:43 PM · Restricted Project

May 6 2019

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Found typo in the previous upload.

May 6 2019, 5:29 PM
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased. Simplified solution by removing leaves structure and any notion of branches. Left "slp-throttling" flag true by default just to show the impact on SLP tests.

May 6 2019, 5:21 PM

Apr 30 2019

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Implemented tree/graph throttling solution.

Apr 30 2019, 7:49 AM

Mar 29 2019

dtemirbulatov accepted D59992: [SLP] Add support for commutative icmp/fcmp predicates.

LGTM.

Mar 29 2019, 8:10 AM · Restricted Project

Mar 25 2019

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Mar 25 2019, 6:43 AM
dtemirbulatov added a comment to D59738: [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization.

Looks good to me. Abataev, any objections?

Mar 25 2019, 4:11 AM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Did you check the 462.libquantum 's regression?

Mar 25 2019, 4:05 AM

Mar 22 2019

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Changed getTreeCost loop to condition "--I;".

Mar 22 2019, 12:29 PM
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Fixed remarks, removed Cost state from TreeEntry, turned cl::opt throttling flag false by default.

Mar 22 2019, 12:12 PM
dtemirbulatov committed rGf95351b918c9: [SLPVectorizer] Add test related to SLP Throttling support, NFCI. (authored by dtemirbulatov).
[SLPVectorizer] Add test related to SLP Throttling support, NFCI.
Mar 22 2019, 7:52 AM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Mar 22 2019, 6:49 AM

Mar 21 2019

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Fixed cost estimation for canceled elements to be inserted to interact with vectorized.
Replaced main throttling loop in reverse-post-order traversal as it was suggested in the previous remarks.
Making final decision on throttle inside getTreeCost() we don't need to hold cost estimations in TreeEntry.
Avoid vectorizing any flow instructions during partial vectorization.
Added cl::opt flag for Throtelling, it is now on by default to show the impact on tests, but during commit, I could make this option off by default to estimate the whole impact before enabling everywhere.
Here are the spec cup 2006 numbers before and after:

Mar 21 2019, 12:36 PM

Mar 19 2019

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Mar 19 2019, 9:24 AM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Mar 19 2019, 7:29 AM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Mar 19 2019, 7:00 AM

Mar 14 2019

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Delete unnecessary variables from getTreeCost().

Mar 14 2019, 8:00 AM

Mar 13 2019

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Mar 13 2019, 9:43 AM
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebase the change, Fix Vasileios remarks.

Mar 13 2019, 9:36 AM

Mar 8 2019

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Add cost calculation for in vectorizable tree operation to be extracted for the canceled tree elements use. Fix performance regression on Phoronix's c-ray-1.2.0 test-suite by avoiding to vectorize just one seed instruction. Spec 2006 numbers are unchanged with several instances of new vectorizations but not hot enough to change the numbers. Also, I noticed a good vectorizations example with Phoronix's encode-mp3-1.7.3 of two FFT kernels with ~8% faster for fft_short kernel and ~3% faster for fft_long on Intel(R) Core(TM) i7-6700HQ.

Mar 8 2019, 5:07 PM

Mar 1 2019

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Mar 1 2019, 3:36 AM
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Change the decision of reducing tree if only the cost of vectorizing is too high and reduce the tree to the highest element of the tree that is minimally profitable.

Mar 1 2019, 3:31 AM

Feb 26 2019

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Since cost computing was changed, can you rerun spec benchmarks?

sorry, I found my previous result not quite reliable, I have changed my methodology, I will rerun spec benchmarks.

Feb 26 2019, 8:19 AM
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Fix cost calculation, fix the run-time error by adding external users for canceled elements in reduceTree(), add testcase.

Feb 26 2019, 5:37 AM

Feb 22 2019

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

looks like, I addressed the issue of leaf gather nodes.

Feb 22 2019, 2:23 AM

Feb 19 2019

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 19 2019, 10:18 AM
dtemirbulatov requested review of D57779: [SLP] Add support for throttling..

oh, Rechecked the algorithm at 2660 at it looks correct. sorry.

Feb 19 2019, 4:16 AM
dtemirbulatov planned changes to D57779: [SLP] Add support for throttling..

I found an incorrect way of summing costs at line 2660, it should be backward loop here.

Feb 19 2019, 3:19 AM

Feb 18 2019

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 18 2019, 11:21 PM
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Fix several issues with cost estimation, add support for tree hight reduction even for profitable trees.

Feb 18 2019, 11:18 PM

Feb 10 2019

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 10 2019, 5:49 PM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 10 2019, 4:17 PM
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Clean the current SLP throttle implementation, Applied some Vasileios remarks. Here is spec 2k6 results before and after the change on i7-6700HQ.

Feb 10 2019, 4:10 PM

Feb 5 2019

dtemirbulatov created D57779: [SLP] Add support for throttling..
Feb 5 2019, 12:48 PM

Feb 4 2019

dtemirbulatov abandoned D57669: [SLP] Fix incorrect cost tree calculation..
Feb 4 2019, 6:11 PM

Feb 3 2019

dtemirbulatov created D57669: [SLP] Fix incorrect cost tree calculation..
Feb 3 2019, 4:10 PM

Feb 1 2019

dtemirbulatov abandoned D57409: [SLP] Allow to duplicate instruction in multiple bundles by introducing pseudo operations..

This is quite rare cases when it decide to duplicate after limiting to a maximum number of elements of less than half of vector size and calculating cost of this transformation. Abandoning.

Feb 1 2019, 2:41 PM

Jan 29 2019

dtemirbulatov updated the diff for D57409: [SLP] Allow to duplicate instruction in multiple bundles by introducing pseudo operations..

update duplicate.ll testcase.

Jan 29 2019, 1:36 PM
dtemirbulatov updated subscribers of D57409: [SLP] Allow to duplicate instruction in multiple bundles by introducing pseudo operations..
Jan 29 2019, 12:12 PM
dtemirbulatov created D57409: [SLP] Allow to duplicate instruction in multiple bundles by introducing pseudo operations..
Jan 29 2019, 12:10 PM

Jan 18 2019

dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Jan 18 2019, 12:19 PM
dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

removed bundle reordering by replacing pseudo instructions with real ones.

Jan 18 2019, 12:06 PM

Dec 28 2018

dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Dec 28 2018, 10:22 AM
dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Dec 28 2018, 7:22 AM
dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Dec 28 2018, 7:20 AM
dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Dec 28 2018, 6:29 AM

Dec 7 2018

dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Dec 7 2018, 7:44 AM
dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Dec 7 2018, 7:23 AM
dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Dec 7 2018, 7:06 AM
dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Introduced InstructionOrPseudo structure and removed "Instruction::Invoke" out of tryToRepresentAsInstArg() function with error example in invoke.ll testcase.

Dec 7 2018, 6:47 AM

Dec 4 2018

dtemirbulatov added a comment to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

@dtemirbulatov Any movement on this? It'd be great to get this in for the 8.0 release!

Dec 4 2018, 6:21 AM

Nov 27 2018

dtemirbulatov accepted D54955: [SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized..

LGTM, with @spatel remark about the comment.

Nov 27 2018, 10:03 PM

Nov 14 2018

dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Ops, I found a few typos, Formated tryToRepresentAsInstArg() and removed the second SRem from isRemainder().

Nov 14 2018, 10:30 AM
dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Nov 14 2018, 10:22 AM
dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Looks like I fixed all previous remarks also during testing I found two more issues with the change and I fixed both.

Nov 14 2018, 10:16 AM

Nov 7 2018

dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Nov 7 2018, 8:11 AM

Oct 31 2018

dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Implemented Map<Instruction*, std::pair<Value *Parent, unsigned Opcode> indexing for ScalarToTreeEntry, PseudoInstScheduleDataMap.
Added reorderBundles() function to reorder bundles that have common instructions according to instructions layout after SLP scheduling.

Oct 31 2018, 4:29 AM

Oct 22 2018

dtemirbulatov accepted D53473: [SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions.

LGFM

Oct 22 2018, 8:19 AM

Oct 13 2018

dtemirbulatov updated the diff for D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Update after I found another couple of errors after additional testing the change. Here are changes:
Removed OpValue field out of PseudoScheduleData.
Forbid any bundles with non-alternative operations and remainder operation, see rem-bundle.ll.
Fixed error in setInsertPointAfterBundle() function by using getScheduleData() instead of getInstrScheduleData and if a bundle member is present multiple bundles at the same time then walk through the bundle to find the last scheduled member of the bundle. see insert-after-multiple-bundle.ll
Restore MemoryDependencies to SmallVector, we don't have to count a member presents in calculateDependencies().

Oct 13 2018, 11:53 AM

Oct 10 2018

dtemirbulatov added inline comments to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..
Oct 10 2018, 3:46 PM