- User Since
- Jul 25 2016, 9:02 AM (134 w, 2 d)
Wed, Feb 6
I have a little mental barrier in accepting this change as is. I think this feeling of mine is mainly due to the name and the "stated" functionality of computeMaxVF and "indirect inference" towards using it for suppressing interleaving. Maybe I'm just being too picky here. If so, my apologies ahead of time.
Tue, Feb 5
Fri, Feb 1
While we are at this, let's talk about downstream dependency, if any, for allowing more than one candidate VF along the native path. At least we can write down a list of TODOs so that we'll be aware of the things we need to improve at a time in the future.
Thu, Jan 31
Wed, Jan 23
Dec 19 2018
Unfortunately, this could not be done with computeInstDiscount() as a simple extension of (1), since memory accesses are dealt with before anything else.
Dec 12 2018
Dec 10 2018
Nov 29 2018
Nov 28 2018
As for the computation of the cost for scalarized instruction:
Nov 27 2018
Sorry, I must have missed this review.
Nov 26 2018
Nov 19 2018
Nov 16 2018
Also, in order for us to be able to adopt this mechanism, we need to work on LV's code gen. Currently, parts of it rely on having one consistent VF across the entire loop (i.e., loop-level VF or "serialize"). That's the reason why https://reviews.llvm.org/D53035 is generating illegal call first and then legalize. We need to add this work item as a dependency (to at least some veclib, e.g., SVML).
Nov 12 2018
Thanks, everyone! Important first step towards the great converged Loop+SLP vectorizer.
Nov 6 2018
Nov 2 2018
I prefer not to add this kind of "heuristics" on the functions that should just return "the fact". Even if the loaded value is truncated immediately, the load itself needs to be vectorized.
Oct 29 2018
I do not think we need to unnecessarily tie this proposal to "dynamic" vector length. These just have "explicit vector length" that is not implied by the operand vector length. May I suggest "evl" instead of "dvl"? Stands for Explicit Vector Length".
Oct 25 2018
Oct 23 2018
LGTM. "Final cleanup" rather than "merge".
Oct 11 2018
Oct 10 2018
Oct 8 2018
LGTM. Please wait for a few days to give others time to respond if they'd like to.
Code looks good. Just a minor suggestion on the comment. Looking at the LIT test.
Thanks a lot, Renato. Will take a look quick.
Oct 5 2018
I agree with Sanjoy. Making this available under a flag would be good.
Oct 2 2018
Sep 21 2018
LGTM. We can continue discussing about not bailing out for the subset of cases, but we don't have to let the compiler crash while we do that.
Sorry, I totally forgot about my old patch https://reviews.llvm.org/D47216.
- I like your two different message version better than changing Line 788 condition to if (!WidestIndTy). That's good.
- Please add non-NULL assertion after "Type *IdxTy = Legal->getWidestInductionType();" in InnerLoopVectorizer::getOrCreateTripCount().
- Please include LIT test from D47216.
Sep 14 2018
Sep 13 2018
Accepting the last revision prior to the commit. Hope arc will automatically attribute the patch to @sguggill.
Sep 12 2018
Sep 6 2018
Sep 5 2018
Sep 4 2018
Aug 27 2018
This affects architectures that have masked gather/scatter support.
Jumping from D50480:
Aug 24 2018
Yes, that is the improved codegen stated as TODO in the costmodel.
For other architectures that do not have these masked intrinsics, we just generate the predicated store by doing an extract and branch on each lane (correct but inefficient and will be avoided unless -force-vector-width=X).
In general, self output dependence is fine to vectorize (whether the store address is uniform or random), as long as (masked) scatter (or scatter emulation) happens from lower elements to higher elements.
I don't think the above comment matters for uniform addresses because a uniform address is invariant.
Aug 23 2018
Under the assumption that the acceptance of this patch is not a conscious choice between new CmpULE VPInstruction opcode versus VPCmpInst derivation (whose discussion should continue in D50823 or its follow on), I think this patch is ready to land. LGTM.
Aug 22 2018
Aug 21 2018
Sorry, for the late response to the already closed review.
Aug 20 2018
For me, the only major issue left is the detached IR instruction. @dcaballe, please try adding the reviewers/subscribers of D50480 to D50823, in the hopes of getting a quicker resolution there, so as not to block D50480 because of that. I will not oppose to D50480 for introducing new ULE opcode of VPInstruction (design/implementation choice within VPlan concept), but I will strongly oppose for the use of detahced IR instruction (goes against VPlan concept).
Aug 16 2018
Aug 15 2018
Aug 14 2018
Aug 10 2018
Aug 8 2018
Aug 7 2018
Aug 6 2018
Aug 3 2018
Jul 25 2018
Let's start from this and see if practical usability issues pop up. This is where main vector loop just ended and right before the remainder loop starts. As long as the location info points to the top or the bottom of the loop, that won't surprise the user. Beyond that is just how much more precise we'd like to be. LGTM.
Jul 24 2018
I'm not familiar with the Recurrence Descriptor code, but I suppose the following is considered as RK_FloatAdd. If that's the case, we should be beefing up RK_FloatAdd rather than adding a new Kind. We can't keep adding new kind every time we encounter a different pattern of reduction sum/product. Downside is possibly exposing a downstream bug, but that should only help generalizing reduction handling code. From reduction analysis perspective, select (IF-converted) and phi (IF) should be the same thing. So, trying to handle this within FloatAdd/FloatMult should also help generalize recurrence analysis code. That's how I look at the issue. Hope this helps.
This is essentially a zero trip check for the remainder loop and from that perspective, the most correct DL we should use would be the one related to trip count computation. InnerLoopVectorizer::getOrCreateTripCount() is where LV computes the trip count of the incoming scalar loop. It uses getBackedgeTakeCount(). As such, strictly speaking, taking the DL from the predicate (i.e., loop bottom test) fed to the loop backedge would make most sense. That would be OrigLoop->getLoopLatch()->getTerminator() to get to the backedge. In a relaxed thinking, however, this is the code executed when we know vector code executes. So, taking DL from VectorPH code isn't too bad. OrigLoop->getStartLoc() could be another viable enough alternative.
Just came back from vacation. Thanks. I take @sbaranga just forgot to set accept the revision and thus taking a liberty of doing so myself.
Jul 18 2018
I like the templatization approach taken here. Thank you!. Changes look fine to me, but will wait for a few others to take a look.
Jul 17 2018
If I delete IBM target, the LIT test won't crash before the fix. If anyone can suggest an alternative so that I can move the lit test one level up, that would be great.