Page MenuHomePhabricator

dtemirbulatov (Dinar Temirbulatov)
User

Projects

User does not belong to any projects.

User Details

User Since
Sep 17 2015, 10:06 AM (301 w, 17 h)

Recent Activity

May 19 2021

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Formatting.

May 19 2021, 4:15 AM · Restricted Project

May 18 2021

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, Removed SLP parameter MinVecNodes. Added estimations of a good tree reduction 1) if the tree contained some real operations like binary, arithmetical, calls which were proposed to vectorize then we don't want to reduce this tree to just load and store operations in vectorized form. 2) if the tree doesn't have any real operations like binary, arithmetical... then we have to make sure that at least the root node and the next node to root are going to be vectorized.

May 18 2021, 8:15 AM · Restricted Project

May 7 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Why do we need MinVecNodes? MinTreeSize and all associated analysis must be enough

it is Transforms/SLPVectorizer/X86/tiny-tree.ll transform that scared me.
From:
define void @tiny_tree_not_fully_vectorizable(double* noalias nocapture %dst, double* noalias nocapture readonly %src, i64 %count) #0 {
entry:

%cmp12 = icmp eq i64 %count, 0
br i1 %cmp12, label %for.end, label %for.body
May 7 2021, 1:27 AM · Restricted Project

May 6 2021

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..
  1. Fixed issue in getInsertCost(), I incorrectly added gather costs to the nodes which were not in relation with any proposed to vectorized nodes, I thought of this and used before "ScalarToTreeEntry.count(Op) > 0", but I discovered that I am not updating ScalarToTreeEntry while reducing the tree. 2) Now I am checking with isTreeTinyAndNotFullyVectorizable() before decide to vectorize. 3) I introduced "MinVecNodes" parameter, which sets how many minimal vectorizable nodes we would like to have while throttling, currently it is equal to 2 by default. For example, we have 3 total nodes in the tree and it is satisfied with MinTreeSize and we would like to have at least two nodes to be vectorizable while reducing the tree to have a positive decision.
May 6 2021, 6:50 AM · Restricted Project

May 5 2021

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Added check for current tree size to MinTreeSize before making the decision to vectorize.

May 5 2021, 12:22 AM · Restricted Project

May 4 2021

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
May 4 2021, 7:00 PM · Restricted Project

May 2 2021

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Fix formatting.

May 2 2021, 6:02 PM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, Forbid "detection of shuffled/perfect matching of tree entries" for canceled TreeEntries during throttling, replaced TEVectorizableSet to PriorityQueue.

May 2 2021, 3:11 PM · Restricted Project

Apr 28 2021

dtemirbulatov accepted D101397: [SLP]Try to vectorize tiny trees with shuffled gathers..

LGTM.

Apr 28 2021, 2:01 AM · Restricted Project

Apr 26 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Updated llvm/test/Transforms/SLPVectorizer/X86/uitofp.ll checks on request from @RKSimon

@RKSimon , I have to split AVX256NODQ X86/sitofp.ll and maybe others.

Apr 26 2021, 8:12 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Updated llvm/test/Transforms/SLPVectorizer/X86/uitofp.ll checks on request from @RKSimon

Apr 26 2021, 8:06 AM · Restricted Project

Apr 25 2021

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Fixed two format errors.

Apr 25 2021, 9:13 PM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, formatted, noticed 3x testcases involved after @ABataev landed D100495 "Add detection of shuffled/perfect matching of tree entries.", returned "-slp-throttle" flag in order to AArch64/gather-cost.ll to be functional, manually adjust "TMP" in minimum-sizes.ll in PR31243_sext for probably a bug in update_test_checks.py.

Apr 25 2021, 5:40 PM · Restricted Project

Apr 8 2021

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Slightly improved schedular area shrinking algorithm, by allowing to remove unnecessary unmaps in chains instructions.

Apr 8 2021, 4:49 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Apr 8 2021, 8:35 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Apr 8 2021, 8:33 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, fixed incorrect comment at 2358, fixed the wrong implementation of shrink scheduling region, changed the code in tryToVectorizeList() as suggested by @ABataev.

Apr 8 2021, 8:24 AM · Restricted Project

Apr 4 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Ping, ready to land?

Will review it on Monday.

Apr 4 2021, 4:43 PM · Restricted Project

Mar 29 2021

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Mar 29 2021, 5:39 PM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, addressed remarks, added reduceSchedulingRegion() function with the ability to set only ScheduleStart at this time, renamed RemovedOperations property to ProposedToGather.

Mar 29 2021, 5:37 PM · Restricted Project

Mar 24 2021

dtemirbulatov accepted D99266: [SLP]Improve and simplify extendSchedulingRegion..

LGTM.

Mar 24 2021, 4:51 PM · Restricted Project

Mar 21 2021

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Mar 21 2021, 5:04 PM · Restricted Project

Mar 17 2021

dtemirbulatov accepted D98531: [SLP]Fix crash on extending scheduling region..

LGTM

Mar 17 2021, 5:31 PM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Addressed @ABataev remarks, investigated regression with PHI nodes in PR39774.ll and I have not spotted any other case involving PHI nodes, but I have several other cases and it happens quite rarely. I am not sure how-to generalize them and I think VPLAN might be helpful. Overall, I think it is ready.

Mar 17 2021, 9:22 AM · Restricted Project

Feb 16 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..
  1. Again, even your example showed that this solution is worse in some cases. Why do we need to waste the time and invest in a solution, which is not better than the existing one, requires more time to understand, consumes more memory?
  2. SLP implements a bottom-up approach, i.e. it always tries to vectorize the longest chain (except for PHI nodes, which should be improved). If we have a partial graph, it should not affect other vectorization graphs in the same basic block, generally speaking, just some subnodes may become the subnodes of the other graphs but this is not a problem.
  3. Looks like you're trying to implement something similar to VPlan. We have it already and better to invest the time to implement support for SLP vectorization there.
  4. Redesign is completely different work, it requires correct estimation (not the assumptions, but real investigation), discussion, RFC, approval, and separate implementation.

Ok, Agree.

Feb 16 2021, 7:55 AM · Restricted Project

Feb 15 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Even this example shows that the current solution does not always produce the best result.

SLP has a greedy approach and let's assume that full vectorization is always better than partial. We don't have the resources to save all trees and then choose from saved the best one. I think I can add now choosing the best from already partially vectorized.

Feb 15 2021, 3:01 PM · Restricted Project

Feb 12 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

I think the next step is to compare vectorized tree heights(number of vectorized nodes) among possible vectorizable trees.

Feb 12 2021, 8:55 AM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Even this example shows that the current solution does not always produce the best result.

at least, we could avoid regressions.

Feb 12 2021, 8:51 AM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

I see that immediate vectorization is better as it vectorizes more, no? Also, there is a problem, looks like it is caused by the multinode analysis. I'm trying to improve this in my non-power-2 patch, will prepare a separate patch for it.

eh, I think it is not a clear example, I have seen better examples, I will show something better.

Feb 12 2021, 8:31 AM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Here is another example:
source_filename = "psspread.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

Feb 12 2021, 5:01 AM · Restricted Project

Feb 11 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

To me, it just looks like we need to postpone the vectorization of phi nodes in the function rather than trying to fix all the issues in the world in a single patch.

I think I could give one simpler example without PHI nodes.

Feb 11 2021, 12:02 PM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Here we could see the regression, it misses vectorizing the whole tree as partial vectorization kicks in too early and "add" instructions stay scalar:

  • a/llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

+++ b/llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll
@@ -7,49 +7,65 @@ define void @test(i32) {
; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
-; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x i32> [ [[TMP15:%.*]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
-; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
-; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
-; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
-; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])
-; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP4]], [[TMP0:%.*]]
-; CHECK-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA11:%.*]] = and i32 [[OP_EXTRA10]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA12:%.*]] = and i32 [[OP_EXTRA11]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA13:%.*]] = and i32 [[OP_EXTRA12]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA14:%.*]] = and i32 [[OP_EXTRA13]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA15:%.*]] = and i32 [[OP_EXTRA14]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA16:%.*]] = and i32 [[OP_EXTRA15]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA17:%.*]] = and i32 [[OP_EXTRA16]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA18:%.*]] = and i32 [[OP_EXTRA17]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA19:%.*]] = and i32 [[OP_EXTRA18]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
-; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[OP_EXTRA26]], i32 0
-; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 14910, i32 1
-; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0
-; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1
-; CHECK-NEXT: [[TMP9:%.*]] = and <2 x i32> [[TMP6]], [[TMP8]]
-; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]
-; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> <i32 0, i32 3>
-; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0
-; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> poison, i32 [[TMP12]], i32 0
-; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP11]], i32 1
-; CHECK-NEXT: [[TMP15]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP14]], i32 1
+; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x i32> [ [[TMP19:%.*]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
+; CHECK-NEXT: [[VAL_0:%.*]] = add i32 [[TMP2]], 0
+; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
+; CHECK-NEXT: [[VAL_1:%.*]] = and i32 [[TMP3]], [[VAL_0]]
+; CHECK-NEXT: [[VAL_2:%.*]] = and i32 [[VAL_1]], [[TMP0:%.*]]
+; CHECK-NEXT: [[VAL_3:%.*]] = and i32 [[VAL_2]], [[TMP0]]
+; CHECK-NEXT: [[VAL_4:%.*]] = and i32 [[VAL_3]], [[TMP0]]
+; CHECK-NEXT: [[VAL_5:%.*]] = and i32 [[VAL_4]], [[TMP0]]
+; CHECK-NEXT: [[VAL_6:%.*]] = add i32 [[TMP3]], 55
+; CHECK-NEXT: [[VAL_7:%.*]] = and i32 [[VAL_5]], [[VAL_6]]
+; CHECK-NEXT: [[VAL_8:%.*]] = and i32 [[VAL_7]], [[TMP0]]
+; CHECK-NEXT: [[VAL_9:%.*]] = and i32 [[VAL_8]], [[TMP0]]
+; CHECK-NEXT: [[VAL_10:%.*]] = and i32 [[VAL_9]], [[TMP0]]
+; CHECK-NEXT: [[VAL_11:%.*]] = add i32 [[TMP3]], 285
+; CHECK-NEXT: [[VAL_12:%.*]] = and i32 [[VAL_10]], [[VAL_11]]
+; CHECK-NEXT: [[VAL_13:%.*]] = and i32 [[VAL_12]], [[TMP0]]
+; CHECK-NEXT: [[VAL_14:%.*]] = and i32 [[VAL_13]], [[TMP0]]
+; CHECK-NEXT: [[VAL_15:%.*]] = and i32 [[VAL_14]], [[TMP0]]
+; CHECK-NEXT: [[VAL_16:%.*]] = and i32 [[VAL_15]], [[TMP0]]
+; CHECK-NEXT: [[VAL_17:%.*]] = and i32 [[VAL_16]], [[TMP0]]
+; CHECK-NEXT: [[VAL_18:%.*]] = add i32 [[TMP3]], 1240
+; CHECK-NEXT: [[VAL_19:%.*]] = and i32 [[VAL_17]], [[VAL_18]]
+; CHECK-NEXT: [[VAL_20:%.*]] = add i32 [[TMP3]], 1496
+; CHECK-NEXT: [[VAL_21:%.*]] = and i32 [[VAL_19]], [[VAL_20]]
+; CHECK-NEXT: [[VAL_22:%.*]] = and i32 [[VAL_21]], [[TMP0]]
+; CHECK-NEXT: [[VAL_23:%.*]] = and i32 [[VAL_22]], [[TMP0]]
+; CHECK-NEXT: [[VAL_24:%.*]] = and i32 [[VAL_23]], [[TMP0]]
+; CHECK-NEXT: [[VAL_25:%.*]] = and i32 [[VAL_24]], [[TMP0]]
+; CHECK-NEXT: [[VAL_26:%.*]] = and i32 [[VAL_25]], [[TMP0]]
+; CHECK-NEXT: [[VAL_27:%.*]] = and i32 [[VAL_26]], [[TMP0]]
+; CHECK-NEXT: [[VAL_28:%.*]] = and i32 [[VAL_27]], [[TMP0]]
+; CHECK-NEXT: [[VAL_29:%.*]] = and i32 [[VAL_28]], [[TMP0]]
+; CHECK-NEXT: [[VAL_30:%.*]] = and i32 [[VAL_29]], [[TMP0]]
+; CHECK-NEXT: [[VAL_31:%.*]] = and i32 [[VAL_30]], [[TMP0]]
+; CHECK-NEXT: [[VAL_32:%.*]] = and i32 [[VAL_31]], [[TMP0]]
+; CHECK-NEXT: [[VAL_33:%.*]] = and i32 [[VAL_32]], [[TMP0]]
+; CHECK-NEXT: [[VAL_34:%.*]] = add i32 [[TMP3]], 8555
+; CHECK-NEXT: [[VAL_35:%.*]] = and i32 [[VAL_33]], [[VAL_34]]
+; CHECK-NEXT: [[VAL_36:%.*]] = and i32 [[VAL_35]], [[TMP0]]
+; CHECK-NEXT: [[VAL_37:%.*]] = and i32 [[VAL_36]], [[TMP0]]
+; CHECK-NEXT: [[VAL_38:%.*]] = and i32 [[VAL_37]], [[TMP0]]
+; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i32 0
+; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[TMP3]], i32 1
+; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[TMP5]], <i32 12529, i32 13685>
+; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP6]], i32 0
+; CHECK-NEXT: [[VAL_40:%.*]] = and i32 [[VAL_38]], [[TMP7]]
+; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP6]], i32 1
+; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> poison, i32 [[VAL_40]], i32 0
+; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 14910, i32 1
+; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> poison, i32 [[TMP8]], i32 0
+; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x i32> [[TMP11]], i32 [[TMP3]], i32 1
+; CHECK-NEXT: [[TMP13:%.*]] = and <2 x i32> [[TMP10]], [[TMP12]]
+; CHECK-NEXT: [[TMP14:%.*]] = add <2 x i32> [[TMP10]], [[TMP12]]
+; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <2 x i32> [[TMP13]], <2 x i32> [[TMP14]], <2 x i32> <i32 0, i32 3>
+; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x i32> [[TMP15]], i32 0
+; CHECK-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP16]], i32 0
+; CHECK-NEXT: [[TMP18:%.*]] = extractelement <2 x i32> [[TMP15]], i32 1
+; CHECK-NEXT: [[TMP19]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP18]], i32 1
; CHECK-NEXT: br label [[LOOP]]
;
; FORCE_REDUCTION-LABEL: @test(

Feb 11 2021, 6:33 AM · Restricted Project

Feb 10 2021

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 10 2021, 3:41 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 10 2021, 3:37 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 10 2021, 1:22 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 10 2021, 1:21 PM · Restricted Project

Feb 9 2021

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, Ping.

Feb 9 2021, 3:08 PM · Restricted Project

Feb 2 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Ping.

Feb 2 2021, 8:39 AM · Restricted Project

Jan 28 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

At Dinar's request, I've measured compile time regression: http://llvm-compile-time-tracker.com/compare.php?from=f3449ed6073cac58efd9b62d0eb285affa650238&to=39362e11add238c45a7a7d55c1e002005f396fb7&stat=instructions. The regression is visible, but it is acceptable for such change imho. The largest regression comes from CMakeFiles/clamscan.dir/libclamav_uuencode.c.o (+11.28%), so one can investigate this particular file.

Jan 28 2021, 7:14 AM · Restricted Project

Jan 27 2021

dtemirbulatov accepted D94992: [SLP]Merge reorder and reuse shuffles..

Looks good to me.

Jan 27 2021, 9:11 AM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Rebased, Measured compile time impact on cpu2006 integer and I have not noticed any significant regressions in SLP compile-time compared to SLP throttle with the limiter.

I mean only SLP time regression, by using "-ftime-trace" flag.

Jan 27 2021, 7:24 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, Measured compile time impact on cpu2006 integer and I have not noticed any significant regressions in SLP compile-time compared to SLP throttle with the limiter.

Jan 27 2021, 6:00 AM · Restricted Project

Jan 26 2021

dtemirbulatov added a comment to D94992: [SLP]Merge reorder and reuse shuffles..

Looks good for me after Index/Pos fix for "shrink shuffles". Any objections now?

Jan 26 2021, 10:29 AM · Restricted Project

Jan 13 2021

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Removed "slp-throttling-budget" limiter for trees without calls
Moved the main tree reduction loop to getTreeCost() function
deleted ProposedToGather node attribute out of EntryState

Jan 13 2021, 3:51 AM · Restricted Project

Dec 9 2020

dtemirbulatov abandoned D87295: Prefer vpxor over vpxorps for AVX2 PR36127.

Abandoning over D92993

Dec 9 2020, 10:15 PM · Restricted Project
dtemirbulatov added a comment to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..

Do you mean compile time increasing? With this patch?

no, just compile-time error.

Crash or incorrect code?

Dec 9 2020, 12:42 PM · Restricted Project
dtemirbulatov added a comment to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..

Do you mean compile time increasing? With this patch?

no, just compile-time error.

Dec 9 2020, 12:32 PM · Restricted Project
dtemirbulatov added a comment to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..

While reviewing the latest update, I think I spotted SLP compile-time failure in SingleSource/Benchmarks/Misc/oourafft.c, here is the reduced testcase to reporduce:
source_filename = "/home/dtemirbulatov/llvm/test-suite/SingleSource/Benchmarks/Misc/oourafft.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

Dec 9 2020, 11:47 AM · Restricted Project

Dec 8 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Discussed with @ABataev further improvements offline and he suggested removing the throttle limiter ("slp-throttling-budget"), at least for basic blocks without calls. I am looking for new functionality.

Dec 8 2020, 2:13 PM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

And I counted the total number of nodes vectorized with throttling, instead of just the number of successful tree reductions. So, the total number is higher ~25% for INT and FP CPU2006(AVX2 and AVX512F) with Cost sort compare to Distance.

Dec 8 2020, 6:06 AM · Restricted Project

Dec 7 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Here is the BFS version of the change. Rebased.

Dec 7 2020, 9:39 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Dec 7 2020, 9:19 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Dec 7 2020, 9:04 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Dec 7 2020, 7:32 AM · Restricted Project

Dec 3 2020

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Dec 3 2020, 5:55 PM · Restricted Project

Dec 1 2020

dtemirbulatov added a comment to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..

Btw, I've observed significant compile-time regression with this patch: http://llvm-compile-time-tracker.com/compare.php?from=99d82412f822190a6caa3e3a5b9f87b71f56de47&to=81b636bae72c967f526bcd18de45a6f4a76daa41&stat=instructions (thanks to @nikic for awesome service). This could be justified in case of comparable performance improvements but have you done any benchmarking?

I have done a while back with SPECINT 2006 and as I remember results were good, but I am not sure that I could find those now. Yes, for me, having this new functionality with presented compile-time regression looks ok.

Dec 1 2020, 3:37 PM · Restricted Project

Nov 25 2020

dtemirbulatov added a comment to D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic.

Good point, thank you! As you said, that is not the problem specific for this patch exclusively. One can fix it by hacky cost comparing at the buildind tree stage, but I do believe the more general solution is preferable. Does this patch https://reviews.llvm.org/D57779 (vectorization throttling) fixes this? After greedy strategy of building the maximum tree we choose the cheapest part of it for vectorization.

No, I think https://reviews.llvm.org/D57779 is about a different thing. Here, we have new functionality which allows us to built the tree with gather-loads otherwise we just ignore it and thus have a different tree. I am not sure how to handle the case if it is accumulating those expensive operations. Maybe guard this new functionality by a flag for now?

Nov 25 2020, 5:53 AM · Restricted Project

Nov 22 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased. Ping^2

Nov 22 2020, 4:39 PM · Restricted Project

Nov 14 2020

dtemirbulatov accepted D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic.

Looks good, any other objections?

Nov 14 2020, 6:57 AM · Restricted Project

Nov 12 2020

dtemirbulatov added a comment to D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic.

Also, please add ScatterVectorize to TreeEntry.dump()

Nov 12 2020, 7:14 AM · Restricted Project

Nov 8 2020

dtemirbulatov added inline comments to D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic.
Nov 8 2020, 8:31 PM · Restricted Project

Nov 5 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased. Ping.

Nov 5 2020, 3:33 PM · Restricted Project

Oct 13 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased. PING

Oct 13 2020, 9:02 AM · Restricted Project

Oct 6 2020

dtemirbulatov added a comment to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Reping

Oct 6 2020, 7:42 AM · Restricted Project

Sep 29 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Ping

Sep 29 2020, 3:47 PM · Restricted Project

Sep 22 2020

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Sep 22 2020, 8:12 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Sep 22 2020, 8:09 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased. Moved InternalTreeUses population out of (UseScalar != U || !InTreeUserNeedToExtract(Scalar, UserInst, TLI)) limitation at line 2661 in BoUpSLP::buildTree(), since we have to consider every interal user for partial vectorization, while calculating cost.

Sep 22 2020, 8:09 AM · Restricted Project

Sep 10 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Good enough for initial implementation?

yes, For me, it looks like ready.

Sep 10 2020, 1:16 PM · Restricted Project

Sep 8 2020

dtemirbulatov requested review of D87295: Prefer vpxor over vpxorps for AVX2 PR36127.
Sep 8 2020, 10:04 AM · Restricted Project

Sep 2 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased. Ping.

Sep 2 2020, 5:31 AM · Restricted Project

Aug 23 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Removed unnecessary check for "UserTE" at 3305.

Aug 23 2020, 7:57 AM · Restricted Project

Aug 21 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Fixed remarks, rebased.

Aug 21 2020, 3:50 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Aug 21 2020, 3:23 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Aug 21 2020, 7:38 AM · Restricted Project

Aug 17 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Corrected paper citation, added -slp-throttle=false to llvm/test/Transforms/SLPVectorizer/X86/slp-throttle.ll, rebased.

Aug 17 2020, 4:20 AM · Restricted Project

Aug 14 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased after rGb1600d8b8971

Aug 14 2020, 2:46 AM · Restricted Project

Aug 11 2020

dtemirbulatov committed rGb1600d8b8971: [NFC] Guard the cost report block of debug outputs with NDEBUG and (authored by dtemirbulatov).
[NFC] Guard the cost report block of debug outputs with NDEBUG and
Aug 11 2020, 7:35 AM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Aug 11 2020, 3:12 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Aug 11 2020, 3:09 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

oh, I missed to fully remove from diff at 7269, Fixed

Aug 11 2020, 2:23 AM · Restricted Project

Aug 10 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, Fixed.

Aug 10 2020, 10:54 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Fixed.

Aug 10 2020, 6:46 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, Ping

Aug 10 2020, 4:38 AM · Restricted Project

Jul 31 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

oh, sorry I misspelled:
For example, in the first loops, we could change from Entry1 TreeEntry::ProposedToGather to TreeEntry::NeedToGather status, but we later could encounter another use of this Entry1 and from another Entry2()let's say) with TreeEntry::Vectorize status and we could NOT tell difference with just canceled item and not considered to vectorize Entry. thus ExternalUses would not be properly populated.

Jul 31 2020, 11:55 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 31 2020, 11:53 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 31 2020, 11:43 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, addressed comments

Jul 31 2020, 11:35 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 31 2020, 11:32 AM · Restricted Project

Jul 25 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

ping

Jul 25 2020, 4:12 AM · Restricted Project

Jul 21 2020

dtemirbulatov added inline comments to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..
Jul 21 2020, 3:27 PM · Restricted Project
dtemirbulatov added inline comments to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..
Jul 21 2020, 3:02 PM · Restricted Project
dtemirbulatov added inline comments to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..
Jul 21 2020, 2:52 PM · Restricted Project

Jul 19 2020

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 19 2020, 4:15 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Addressed remarks, rebased.

Jul 19 2020, 4:14 AM · Restricted Project

Jul 13 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Addressed comments, rebased.

Jul 13 2020, 7:09 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 13 2020, 7:08 PM · Restricted Project

Jul 10 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Addressed remarks, rebased.

Jul 10 2020, 2:19 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 10 2020, 2:17 AM · Restricted Project

Jul 7 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

ping

Jul 7 2020, 4:23 PM · Restricted Project