- User Since
- Jul 25 2016, 9:02 AM (142 w, 4 d)
Wed, Apr 17
Tue, Apr 16
Mon, Apr 15
Fri, Apr 12
Fri, Apr 5
Fri, Mar 29
Thu, Mar 28
Wed, Mar 27
Mon, Mar 25
@hsaito, I don't have commit access, could you commit this change for me?
Fri, Mar 22
Mar 19 2019
Mar 14 2019
LGTM. Please wait for a few days to give others a chance to go over your updated patch.
Mar 13 2019
Mar 8 2019
Feb 6 2019
I have a little mental barrier in accepting this change as is. I think this feeling of mine is mainly due to the name and the "stated" functionality of computeMaxVF and "indirect inference" towards using it for suppressing interleaving. Maybe I'm just being too picky here. If so, my apologies ahead of time.
Feb 5 2019
Feb 1 2019
While we are at this, let's talk about downstream dependency, if any, for allowing more than one candidate VF along the native path. At least we can write down a list of TODOs so that we'll be aware of the things we need to improve at a time in the future.
Jan 31 2019
Jan 23 2019
Dec 19 2018
Unfortunately, this could not be done with computeInstDiscount() as a simple extension of (1), since memory accesses are dealt with before anything else.
Dec 12 2018
Dec 10 2018
Nov 29 2018
Nov 28 2018
As for the computation of the cost for scalarized instruction:
Nov 27 2018
Sorry, I must have missed this review.
Nov 26 2018
Nov 19 2018
Nov 16 2018
Also, in order for us to be able to adopt this mechanism, we need to work on LV's code gen. Currently, parts of it rely on having one consistent VF across the entire loop (i.e., loop-level VF or "serialize"). That's the reason why https://reviews.llvm.org/D53035 is generating illegal call first and then legalize. We need to add this work item as a dependency (to at least some veclib, e.g., SVML).
Nov 12 2018
Thanks, everyone! Important first step towards the great converged Loop+SLP vectorizer.
Nov 6 2018
Nov 2 2018
I prefer not to add this kind of "heuristics" on the functions that should just return "the fact". Even if the loaded value is truncated immediately, the load itself needs to be vectorized.
Oct 29 2018
I do not think we need to unnecessarily tie this proposal to "dynamic" vector length. These just have "explicit vector length" that is not implied by the operand vector length. May I suggest "evl" instead of "dvl"? Stands for Explicit Vector Length".
Oct 25 2018
Oct 23 2018
LGTM. "Final cleanup" rather than "merge".
Oct 11 2018
Oct 10 2018
Oct 8 2018
LGTM. Please wait for a few days to give others time to respond if they'd like to.
Code looks good. Just a minor suggestion on the comment. Looking at the LIT test.
Thanks a lot, Renato. Will take a look quick.
Oct 5 2018
I agree with Sanjoy. Making this available under a flag would be good.
Oct 2 2018
Sep 21 2018
LGTM. We can continue discussing about not bailing out for the subset of cases, but we don't have to let the compiler crash while we do that.
Sorry, I totally forgot about my old patch https://reviews.llvm.org/D47216.
- I like your two different message version better than changing Line 788 condition to if (!WidestIndTy). That's good.
- Please add non-NULL assertion after "Type *IdxTy = Legal->getWidestInductionType();" in InnerLoopVectorizer::getOrCreateTripCount().
- Please include LIT test from D47216.
Sep 14 2018
Sep 13 2018
Accepting the last revision prior to the commit. Hope arc will automatically attribute the patch to @sguggill.
Sep 12 2018
Sep 6 2018
Sep 5 2018
Sep 4 2018
Aug 27 2018
This affects architectures that have masked gather/scatter support.
Jumping from D50480:
Aug 24 2018
Yes, that is the improved codegen stated as TODO in the costmodel.
For other architectures that do not have these masked intrinsics, we just generate the predicated store by doing an extract and branch on each lane (correct but inefficient and will be avoided unless -force-vector-width=X).
In general, self output dependence is fine to vectorize (whether the store address is uniform or random), as long as (masked) scatter (or scatter emulation) happens from lower elements to higher elements.
I don't think the above comment matters for uniform addresses because a uniform address is invariant.
Aug 23 2018
Under the assumption that the acceptance of this patch is not a conscious choice between new CmpULE VPInstruction opcode versus VPCmpInst derivation (whose discussion should continue in D50823 or its follow on), I think this patch is ready to land. LGTM.
Aug 22 2018
Aug 21 2018
Sorry, for the late response to the already closed review.
Aug 20 2018
For me, the only major issue left is the detached IR instruction. @dcaballe, please try adding the reviewers/subscribers of D50480 to D50823, in the hopes of getting a quicker resolution there, so as not to block D50480 because of that. I will not oppose to D50480 for introducing new ULE opcode of VPInstruction (design/implementation choice within VPlan concept), but I will strongly oppose for the use of detahced IR instruction (goes against VPlan concept).