Page MenuHomePhabricator

dtemirbulatov (Dinar Temirbulatov)
User

Projects

User does not belong to any projects.

User Details

User Since
Sep 17 2015, 10:06 AM (290 w, 2 d)

Recent Activity

Thu, Apr 8

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Slightly improved schedular area shrinking algorithm, by allowing to remove unnecessary unmaps in chains instructions.

Thu, Apr 8, 4:49 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Thu, Apr 8, 8:35 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Thu, Apr 8, 8:33 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, fixed incorrect comment at 2358, fixed the wrong implementation of shrink scheduling region, changed the code in tryToVectorizeList() as suggested by @ABataev.

Thu, Apr 8, 8:24 AM · Restricted Project

Sun, Apr 4

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Ping, ready to land?

Will review it on Monday.

Sun, Apr 4, 4:43 PM · Restricted Project

Mon, Mar 29

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Mon, Mar 29, 5:39 PM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, addressed remarks, added reduceSchedulingRegion() function with the ability to set only ScheduleStart at this time, renamed RemovedOperations property to ProposedToGather.

Mon, Mar 29, 5:37 PM · Restricted Project

Wed, Mar 24

dtemirbulatov accepted D99266: [SLP]Improve and simplify extendSchedulingRegion..

LGTM.

Wed, Mar 24, 4:51 PM · Restricted Project

Sun, Mar 21

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Sun, Mar 21, 5:04 PM · Restricted Project

Wed, Mar 17

dtemirbulatov accepted D98531: [SLP]Fix crash on extending scheduling region..

LGTM

Wed, Mar 17, 5:31 PM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Addressed @ABataev remarks, investigated regression with PHI nodes in PR39774.ll and I have not spotted any other case involving PHI nodes, but I have several other cases and it happens quite rarely. I am not sure how-to generalize them and I think VPLAN might be helpful. Overall, I think it is ready.

Wed, Mar 17, 9:22 AM · Restricted Project

Feb 16 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..
  1. Again, even your example showed that this solution is worse in some cases. Why do we need to waste the time and invest in a solution, which is not better than the existing one, requires more time to understand, consumes more memory?
  2. SLP implements a bottom-up approach, i.e. it always tries to vectorize the longest chain (except for PHI nodes, which should be improved). If we have a partial graph, it should not affect other vectorization graphs in the same basic block, generally speaking, just some subnodes may become the subnodes of the other graphs but this is not a problem.
  3. Looks like you're trying to implement something similar to VPlan. We have it already and better to invest the time to implement support for SLP vectorization there.
  4. Redesign is completely different work, it requires correct estimation (not the assumptions, but real investigation), discussion, RFC, approval, and separate implementation.

Ok, Agree.

Feb 16 2021, 7:55 AM · Restricted Project

Feb 15 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Even this example shows that the current solution does not always produce the best result.

SLP has a greedy approach and let's assume that full vectorization is always better than partial. We don't have the resources to save all trees and then choose from saved the best one. I think I can add now choosing the best from already partially vectorized.

Feb 15 2021, 3:01 PM · Restricted Project

Feb 12 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

I think the next step is to compare vectorized tree heights(number of vectorized nodes) among possible vectorizable trees.

Feb 12 2021, 8:55 AM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Even this example shows that the current solution does not always produce the best result.

at least, we could avoid regressions.

Feb 12 2021, 8:51 AM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

I see that immediate vectorization is better as it vectorizes more, no? Also, there is a problem, looks like it is caused by the multinode analysis. I'm trying to improve this in my non-power-2 patch, will prepare a separate patch for it.

eh, I think it is not a clear example, I have seen better examples, I will show something better.

Feb 12 2021, 8:31 AM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Here is another example:
source_filename = "psspread.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

Feb 12 2021, 5:01 AM · Restricted Project

Feb 11 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

To me, it just looks like we need to postpone the vectorization of phi nodes in the function rather than trying to fix all the issues in the world in a single patch.

I think I could give one simpler example without PHI nodes.

Feb 11 2021, 12:02 PM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Here we could see the regression, it misses vectorizing the whole tree as partial vectorization kicks in too early and "add" instructions stay scalar:

  • a/llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

+++ b/llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll
@@ -7,49 +7,65 @@ define void @test(i32) {
; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
-; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x i32> [ [[TMP15:%.*]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
-; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
-; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
-; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
-; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])
-; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP4]], [[TMP0:%.*]]
-; CHECK-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA11:%.*]] = and i32 [[OP_EXTRA10]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA12:%.*]] = and i32 [[OP_EXTRA11]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA13:%.*]] = and i32 [[OP_EXTRA12]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA14:%.*]] = and i32 [[OP_EXTRA13]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA15:%.*]] = and i32 [[OP_EXTRA14]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA16:%.*]] = and i32 [[OP_EXTRA15]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA17:%.*]] = and i32 [[OP_EXTRA16]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA18:%.*]] = and i32 [[OP_EXTRA17]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA19:%.*]] = and i32 [[OP_EXTRA18]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
-; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
-; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[OP_EXTRA26]], i32 0
-; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 14910, i32 1
-; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0
-; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1
-; CHECK-NEXT: [[TMP9:%.*]] = and <2 x i32> [[TMP6]], [[TMP8]]
-; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]
-; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> <i32 0, i32 3>
-; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0
-; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> poison, i32 [[TMP12]], i32 0
-; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP11]], i32 1
-; CHECK-NEXT: [[TMP15]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP14]], i32 1
+; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x i32> [ [[TMP19:%.*]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
+; CHECK-NEXT: [[VAL_0:%.*]] = add i32 [[TMP2]], 0
+; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
+; CHECK-NEXT: [[VAL_1:%.*]] = and i32 [[TMP3]], [[VAL_0]]
+; CHECK-NEXT: [[VAL_2:%.*]] = and i32 [[VAL_1]], [[TMP0:%.*]]
+; CHECK-NEXT: [[VAL_3:%.*]] = and i32 [[VAL_2]], [[TMP0]]
+; CHECK-NEXT: [[VAL_4:%.*]] = and i32 [[VAL_3]], [[TMP0]]
+; CHECK-NEXT: [[VAL_5:%.*]] = and i32 [[VAL_4]], [[TMP0]]
+; CHECK-NEXT: [[VAL_6:%.*]] = add i32 [[TMP3]], 55
+; CHECK-NEXT: [[VAL_7:%.*]] = and i32 [[VAL_5]], [[VAL_6]]
+; CHECK-NEXT: [[VAL_8:%.*]] = and i32 [[VAL_7]], [[TMP0]]
+; CHECK-NEXT: [[VAL_9:%.*]] = and i32 [[VAL_8]], [[TMP0]]
+; CHECK-NEXT: [[VAL_10:%.*]] = and i32 [[VAL_9]], [[TMP0]]
+; CHECK-NEXT: [[VAL_11:%.*]] = add i32 [[TMP3]], 285
+; CHECK-NEXT: [[VAL_12:%.*]] = and i32 [[VAL_10]], [[VAL_11]]
+; CHECK-NEXT: [[VAL_13:%.*]] = and i32 [[VAL_12]], [[TMP0]]
+; CHECK-NEXT: [[VAL_14:%.*]] = and i32 [[VAL_13]], [[TMP0]]
+; CHECK-NEXT: [[VAL_15:%.*]] = and i32 [[VAL_14]], [[TMP0]]
+; CHECK-NEXT: [[VAL_16:%.*]] = and i32 [[VAL_15]], [[TMP0]]
+; CHECK-NEXT: [[VAL_17:%.*]] = and i32 [[VAL_16]], [[TMP0]]
+; CHECK-NEXT: [[VAL_18:%.*]] = add i32 [[TMP3]], 1240
+; CHECK-NEXT: [[VAL_19:%.*]] = and i32 [[VAL_17]], [[VAL_18]]
+; CHECK-NEXT: [[VAL_20:%.*]] = add i32 [[TMP3]], 1496
+; CHECK-NEXT: [[VAL_21:%.*]] = and i32 [[VAL_19]], [[VAL_20]]
+; CHECK-NEXT: [[VAL_22:%.*]] = and i32 [[VAL_21]], [[TMP0]]
+; CHECK-NEXT: [[VAL_23:%.*]] = and i32 [[VAL_22]], [[TMP0]]
+; CHECK-NEXT: [[VAL_24:%.*]] = and i32 [[VAL_23]], [[TMP0]]
+; CHECK-NEXT: [[VAL_25:%.*]] = and i32 [[VAL_24]], [[TMP0]]
+; CHECK-NEXT: [[VAL_26:%.*]] = and i32 [[VAL_25]], [[TMP0]]
+; CHECK-NEXT: [[VAL_27:%.*]] = and i32 [[VAL_26]], [[TMP0]]
+; CHECK-NEXT: [[VAL_28:%.*]] = and i32 [[VAL_27]], [[TMP0]]
+; CHECK-NEXT: [[VAL_29:%.*]] = and i32 [[VAL_28]], [[TMP0]]
+; CHECK-NEXT: [[VAL_30:%.*]] = and i32 [[VAL_29]], [[TMP0]]
+; CHECK-NEXT: [[VAL_31:%.*]] = and i32 [[VAL_30]], [[TMP0]]
+; CHECK-NEXT: [[VAL_32:%.*]] = and i32 [[VAL_31]], [[TMP0]]
+; CHECK-NEXT: [[VAL_33:%.*]] = and i32 [[VAL_32]], [[TMP0]]
+; CHECK-NEXT: [[VAL_34:%.*]] = add i32 [[TMP3]], 8555
+; CHECK-NEXT: [[VAL_35:%.*]] = and i32 [[VAL_33]], [[VAL_34]]
+; CHECK-NEXT: [[VAL_36:%.*]] = and i32 [[VAL_35]], [[TMP0]]
+; CHECK-NEXT: [[VAL_37:%.*]] = and i32 [[VAL_36]], [[TMP0]]
+; CHECK-NEXT: [[VAL_38:%.*]] = and i32 [[VAL_37]], [[TMP0]]
+; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i32 0
+; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[TMP3]], i32 1
+; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[TMP5]], <i32 12529, i32 13685>
+; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP6]], i32 0
+; CHECK-NEXT: [[VAL_40:%.*]] = and i32 [[VAL_38]], [[TMP7]]
+; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP6]], i32 1
+; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> poison, i32 [[VAL_40]], i32 0
+; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 14910, i32 1
+; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> poison, i32 [[TMP8]], i32 0
+; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x i32> [[TMP11]], i32 [[TMP3]], i32 1
+; CHECK-NEXT: [[TMP13:%.*]] = and <2 x i32> [[TMP10]], [[TMP12]]
+; CHECK-NEXT: [[TMP14:%.*]] = add <2 x i32> [[TMP10]], [[TMP12]]
+; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <2 x i32> [[TMP13]], <2 x i32> [[TMP14]], <2 x i32> <i32 0, i32 3>
+; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x i32> [[TMP15]], i32 0
+; CHECK-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP16]], i32 0
+; CHECK-NEXT: [[TMP18:%.*]] = extractelement <2 x i32> [[TMP15]], i32 1
+; CHECK-NEXT: [[TMP19]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP18]], i32 1
; CHECK-NEXT: br label [[LOOP]]
;
; FORCE_REDUCTION-LABEL: @test(

Feb 11 2021, 6:33 AM · Restricted Project

Feb 10 2021

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 10 2021, 3:41 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 10 2021, 3:37 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 10 2021, 1:22 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Feb 10 2021, 1:21 PM · Restricted Project

Feb 9 2021

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, Ping.

Feb 9 2021, 3:08 PM · Restricted Project

Feb 2 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Ping.

Feb 2 2021, 8:39 AM · Restricted Project

Jan 28 2021

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

At Dinar's request, I've measured compile time regression: http://llvm-compile-time-tracker.com/compare.php?from=f3449ed6073cac58efd9b62d0eb285affa650238&to=39362e11add238c45a7a7d55c1e002005f396fb7&stat=instructions. The regression is visible, but it is acceptable for such change imho. The largest regression comes from CMakeFiles/clamscan.dir/libclamav_uuencode.c.o (+11.28%), so one can investigate this particular file.

Jan 28 2021, 7:14 AM · Restricted Project

Jan 27 2021

dtemirbulatov accepted D94992: [SLP]Merge reorder and reuse shuffles..

Looks good to me.

Jan 27 2021, 9:11 AM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Rebased, Measured compile time impact on cpu2006 integer and I have not noticed any significant regressions in SLP compile-time compared to SLP throttle with the limiter.

I mean only SLP time regression, by using "-ftime-trace" flag.

Jan 27 2021, 7:24 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, Measured compile time impact on cpu2006 integer and I have not noticed any significant regressions in SLP compile-time compared to SLP throttle with the limiter.

Jan 27 2021, 6:00 AM · Restricted Project

Jan 26 2021

dtemirbulatov added a comment to D94992: [SLP]Merge reorder and reuse shuffles..

Looks good for me after Index/Pos fix for "shrink shuffles". Any objections now?

Jan 26 2021, 10:29 AM · Restricted Project

Jan 13 2021

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Removed "slp-throttling-budget" limiter for trees without calls
Moved the main tree reduction loop to getTreeCost() function
deleted ProposedToGather node attribute out of EntryState

Jan 13 2021, 3:51 AM · Restricted Project

Dec 9 2020

dtemirbulatov abandoned D87295: Prefer vpxor over vpxorps for AVX2 PR36127.

Abandoning over D92993

Dec 9 2020, 10:15 PM · Restricted Project
dtemirbulatov added a comment to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..

Do you mean compile time increasing? With this patch?

no, just compile-time error.

Crash or incorrect code?

Dec 9 2020, 12:42 PM · Restricted Project
dtemirbulatov added a comment to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..

Do you mean compile time increasing? With this patch?

no, just compile-time error.

Dec 9 2020, 12:32 PM · Restricted Project
dtemirbulatov added a comment to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..

While reviewing the latest update, I think I spotted SLP compile-time failure in SingleSource/Benchmarks/Misc/oourafft.c, here is the reduced testcase to reporduce:
source_filename = "/home/dtemirbulatov/llvm/test-suite/SingleSource/Benchmarks/Misc/oourafft.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

Dec 9 2020, 11:47 AM · Restricted Project

Dec 8 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Discussed with @ABataev further improvements offline and he suggested removing the throttle limiter ("slp-throttling-budget"), at least for basic blocks without calls. I am looking for new functionality.

Dec 8 2020, 2:13 PM · Restricted Project
dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

And I counted the total number of nodes vectorized with throttling, instead of just the number of successful tree reductions. So, the total number is higher ~25% for INT and FP CPU2006(AVX2 and AVX512F) with Cost sort compare to Distance.

Dec 8 2020, 6:06 AM · Restricted Project

Dec 7 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Here is the BFS version of the change. Rebased.

Dec 7 2020, 9:39 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Dec 7 2020, 9:19 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Dec 7 2020, 9:04 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Dec 7 2020, 7:32 AM · Restricted Project

Dec 3 2020

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Dec 3 2020, 5:55 PM · Restricted Project

Dec 1 2020

dtemirbulatov added a comment to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..

Btw, I've observed significant compile-time regression with this patch: http://llvm-compile-time-tracker.com/compare.php?from=99d82412f822190a6caa3e3a5b9f87b71f56de47&to=81b636bae72c967f526bcd18de45a6f4a76daa41&stat=instructions (thanks to @nikic for awesome service). This could be justified in case of comparable performance improvements but have you done any benchmarking?

I have done a while back with SPECINT 2006 and as I remember results were good, but I am not sure that I could find those now. Yes, for me, having this new functionality with presented compile-time regression looks ok.

Dec 1 2020, 3:37 PM · Restricted Project

Nov 25 2020

dtemirbulatov added a comment to D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic.

Good point, thank you! As you said, that is not the problem specific for this patch exclusively. One can fix it by hacky cost comparing at the buildind tree stage, but I do believe the more general solution is preferable. Does this patch https://reviews.llvm.org/D57779 (vectorization throttling) fixes this? After greedy strategy of building the maximum tree we choose the cheapest part of it for vectorization.

No, I think https://reviews.llvm.org/D57779 is about a different thing. Here, we have new functionality which allows us to built the tree with gather-loads otherwise we just ignore it and thus have a different tree. I am not sure how to handle the case if it is accumulating those expensive operations. Maybe guard this new functionality by a flag for now?

Nov 25 2020, 5:53 AM · Restricted Project

Nov 22 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased. Ping^2

Nov 22 2020, 4:39 PM · Restricted Project

Nov 14 2020

dtemirbulatov accepted D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic.

Looks good, any other objections?

Nov 14 2020, 6:57 AM · Restricted Project

Nov 12 2020

dtemirbulatov added a comment to D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic.

Also, please add ScatterVectorize to TreeEntry.dump()

Nov 12 2020, 7:14 AM · Restricted Project

Nov 8 2020

dtemirbulatov added inline comments to D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic.
Nov 8 2020, 8:31 PM · Restricted Project

Nov 5 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased. Ping.

Nov 5 2020, 3:33 PM · Restricted Project

Oct 13 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased. PING

Oct 13 2020, 9:02 AM · Restricted Project

Oct 6 2020

dtemirbulatov added a comment to D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops..

Reping

Oct 6 2020, 7:42 AM · Restricted Project

Sep 29 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Ping

Sep 29 2020, 3:47 PM · Restricted Project

Sep 22 2020

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Sep 22 2020, 8:12 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Sep 22 2020, 8:09 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased. Moved InternalTreeUses population out of (UseScalar != U || !InTreeUserNeedToExtract(Scalar, UserInst, TLI)) limitation at line 2661 in BoUpSLP::buildTree(), since we have to consider every interal user for partial vectorization, while calculating cost.

Sep 22 2020, 8:09 AM · Restricted Project

Sep 10 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

Good enough for initial implementation?

yes, For me, it looks like ready.

Sep 10 2020, 1:16 PM · Restricted Project

Sep 8 2020

dtemirbulatov requested review of D87295: Prefer vpxor over vpxorps for AVX2 PR36127.
Sep 8 2020, 10:04 AM · Restricted Project

Sep 2 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased. Ping.

Sep 2 2020, 5:31 AM · Restricted Project

Aug 23 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Removed unnecessary check for "UserTE" at 3305.

Aug 23 2020, 7:57 AM · Restricted Project

Aug 21 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Fixed remarks, rebased.

Aug 21 2020, 3:50 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Aug 21 2020, 3:23 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Aug 21 2020, 7:38 AM · Restricted Project

Aug 17 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Corrected paper citation, added -slp-throttle=false to llvm/test/Transforms/SLPVectorizer/X86/slp-throttle.ll, rebased.

Aug 17 2020, 4:20 AM · Restricted Project

Aug 14 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased after rGb1600d8b8971

Aug 14 2020, 2:46 AM · Restricted Project

Aug 11 2020

dtemirbulatov committed rGb1600d8b8971: [NFC] Guard the cost report block of debug outputs with NDEBUG and (authored by dtemirbulatov).
[NFC] Guard the cost report block of debug outputs with NDEBUG and
Aug 11 2020, 7:35 AM
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Aug 11 2020, 3:12 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Aug 11 2020, 3:09 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

oh, I missed to fully remove from diff at 7269, Fixed

Aug 11 2020, 2:23 AM · Restricted Project

Aug 10 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, Fixed.

Aug 10 2020, 10:54 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Fixed.

Aug 10 2020, 6:46 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, Ping

Aug 10 2020, 4:38 AM · Restricted Project

Jul 31 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

oh, sorry I misspelled:
For example, in the first loops, we could change from Entry1 TreeEntry::ProposedToGather to TreeEntry::NeedToGather status, but we later could encounter another use of this Entry1 and from another Entry2()let's say) with TreeEntry::Vectorize status and we could NOT tell difference with just canceled item and not considered to vectorize Entry. thus ExternalUses would not be properly populated.

Jul 31 2020, 11:55 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 31 2020, 11:53 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 31 2020, 11:43 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, addressed comments

Jul 31 2020, 11:35 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 31 2020, 11:32 AM · Restricted Project

Jul 25 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

ping

Jul 25 2020, 4:12 AM · Restricted Project

Jul 21 2020

dtemirbulatov added inline comments to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..
Jul 21 2020, 3:27 PM · Restricted Project
dtemirbulatov added inline comments to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..
Jul 21 2020, 3:02 PM · Restricted Project
dtemirbulatov added inline comments to D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..
Jul 21 2020, 2:52 PM · Restricted Project

Jul 19 2020

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 19 2020, 4:15 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Addressed remarks, rebased.

Jul 19 2020, 4:14 AM · Restricted Project

Jul 13 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Addressed comments, rebased.

Jul 13 2020, 7:09 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 13 2020, 7:08 PM · Restricted Project

Jul 10 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Addressed remarks, rebased.

Jul 10 2020, 2:19 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jul 10 2020, 2:17 AM · Restricted Project

Jul 7 2020

dtemirbulatov added a comment to D57779: [SLP] Add support for throttling..

ping

Jul 7 2020, 4:23 PM · Restricted Project

Jun 29 2020

dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

I found type and unformatted changes.

Jun 29 2020, 4:49 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jun 29 2020, 2:39 AM · Restricted Project

Jun 28 2020

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jun 28 2020, 7:04 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, addressed the comments.

Jun 28 2020, 7:04 AM · Restricted Project

Jun 23 2020

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jun 23 2020, 2:36 AM · Restricted Project

Jun 19 2020

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jun 19 2020, 7:01 AM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Resolved comments

Jun 19 2020, 6:27 AM · Restricted Project

Jun 18 2020

dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jun 18 2020, 2:15 PM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jun 18 2020, 2:15 PM · Restricted Project
dtemirbulatov updated the diff for D57779: [SLP] Add support for throttling..

Rebased, Address comments.

Jun 18 2020, 12:34 PM · Restricted Project

Jun 16 2020

dtemirbulatov added a reviewer for D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors.: anton-afanasyev.
Jun 16 2020, 8:49 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jun 16 2020, 8:15 AM · Restricted Project
dtemirbulatov added inline comments to D57779: [SLP] Add support for throttling..
Jun 16 2020, 8:15 AM · Restricted Project