- User Since
- Jul 25 2016, 9:02 AM (124 w, 2 d)
Mon, Dec 10
Thu, Nov 29
Wed, Nov 28
As for the computation of the cost for scalarized instruction:
Tue, Nov 27
Sorry, I must have missed this review.
Mon, Nov 26
Mon, Nov 19
Fri, Nov 16
Also, in order for us to be able to adopt this mechanism, we need to work on LV's code gen. Currently, parts of it rely on having one consistent VF across the entire loop (i.e., loop-level VF or "serialize"). That's the reason why https://reviews.llvm.org/D53035 is generating illegal call first and then legalize. We need to add this work item as a dependency (to at least some veclib, e.g., SVML).
Nov 12 2018
Thanks, everyone! Important first step towards the great converged Loop+SLP vectorizer.
Nov 6 2018
Nov 2 2018
I prefer not to add this kind of "heuristics" on the functions that should just return "the fact". Even if the loaded value is truncated immediately, the load itself needs to be vectorized.
Oct 29 2018
I do not think we need to unnecessarily tie this proposal to "dynamic" vector length. These just have "explicit vector length" that is not implied by the operand vector length. May I suggest "evl" instead of "dvl"? Stands for Explicit Vector Length".
Oct 25 2018
Oct 23 2018
LGTM. "Final cleanup" rather than "merge".
Oct 11 2018
Oct 10 2018
Oct 8 2018
LGTM. Please wait for a few days to give others time to respond if they'd like to.
Code looks good. Just a minor suggestion on the comment. Looking at the LIT test.
Thanks a lot, Renato. Will take a look quick.
Oct 5 2018
I agree with Sanjoy. Making this available under a flag would be good.
Oct 2 2018
Sep 21 2018
LGTM. We can continue discussing about not bailing out for the subset of cases, but we don't have to let the compiler crash while we do that.
Sorry, I totally forgot about my old patch https://reviews.llvm.org/D47216.
- I like your two different message version better than changing Line 788 condition to if (!WidestIndTy). That's good.
- Please add non-NULL assertion after "Type *IdxTy = Legal->getWidestInductionType();" in InnerLoopVectorizer::getOrCreateTripCount().
- Please include LIT test from D47216.
Sep 14 2018
Sep 13 2018
Accepting the last revision prior to the commit. Hope arc will automatically attribute the patch to @sguggill.
Sep 12 2018
Sep 6 2018
Sep 5 2018
Sep 4 2018
Aug 27 2018
This affects architectures that have masked gather/scatter support.
Jumping from D50480:
Aug 24 2018
Yes, that is the improved codegen stated as TODO in the costmodel.
For other architectures that do not have these masked intrinsics, we just generate the predicated store by doing an extract and branch on each lane (correct but inefficient and will be avoided unless -force-vector-width=X).
In general, self output dependence is fine to vectorize (whether the store address is uniform or random), as long as (masked) scatter (or scatter emulation) happens from lower elements to higher elements.
I don't think the above comment matters for uniform addresses because a uniform address is invariant.
Aug 23 2018
Under the assumption that the acceptance of this patch is not a conscious choice between new CmpULE VPInstruction opcode versus VPCmpInst derivation (whose discussion should continue in D50823 or its follow on), I think this patch is ready to land. LGTM.
Aug 22 2018
Aug 21 2018
Sorry, for the late response to the already closed review.
Aug 20 2018
For me, the only major issue left is the detached IR instruction. @dcaballe, please try adding the reviewers/subscribers of D50480 to D50823, in the hopes of getting a quicker resolution there, so as not to block D50480 because of that. I will not oppose to D50480 for introducing new ULE opcode of VPInstruction (design/implementation choice within VPlan concept), but I will strongly oppose for the use of detahced IR instruction (goes against VPlan concept).
Aug 16 2018
Aug 15 2018
Aug 14 2018
Aug 10 2018
Aug 8 2018
Aug 7 2018
Aug 6 2018
Aug 3 2018
Jul 25 2018
Let's start from this and see if practical usability issues pop up. This is where main vector loop just ended and right before the remainder loop starts. As long as the location info points to the top or the bottom of the loop, that won't surprise the user. Beyond that is just how much more precise we'd like to be. LGTM.
Jul 24 2018
I'm not familiar with the Recurrence Descriptor code, but I suppose the following is considered as RK_FloatAdd. If that's the case, we should be beefing up RK_FloatAdd rather than adding a new Kind. We can't keep adding new kind every time we encounter a different pattern of reduction sum/product. Downside is possibly exposing a downstream bug, but that should only help generalizing reduction handling code. From reduction analysis perspective, select (IF-converted) and phi (IF) should be the same thing. So, trying to handle this within FloatAdd/FloatMult should also help generalize recurrence analysis code. That's how I look at the issue. Hope this helps.
This is essentially a zero trip check for the remainder loop and from that perspective, the most correct DL we should use would be the one related to trip count computation. InnerLoopVectorizer::getOrCreateTripCount() is where LV computes the trip count of the incoming scalar loop. It uses getBackedgeTakeCount(). As such, strictly speaking, taking the DL from the predicate (i.e., loop bottom test) fed to the loop backedge would make most sense. That would be OrigLoop->getLoopLatch()->getTerminator() to get to the backedge. In a relaxed thinking, however, this is the code executed when we know vector code executes. So, taking DL from VectorPH code isn't too bad. OrigLoop->getStartLoc() could be another viable enough alternative.
Just came back from vacation. Thanks. I take @sbaranga just forgot to set accept the revision and thus taking a liberty of doing so myself.
Jul 18 2018
I like the templatization approach taken here. Thank you!. Changes look fine to me, but will wait for a few others to take a look.
Jul 17 2018
If I delete IBM target, the LIT test won't crash before the fix. If anyone can suggest an alternative so that I can move the lit test one level up, that would be great.
Jul 2 2018
Jun 14 2018
I see this as a register allocator problem. It's not like we are running out of registers so that we cannot use ymm0 as a "scratch register" for SVML call. We should show the ASM code to CG experts and get the problem fixed there.
Jun 11 2018
Jun 7 2018
Given that there can be more than one users, the fix makes sense to me. LGTM.
Jun 5 2018
Jun 4 2018
May 29 2018
May 28 2018
David, outer loop vectorization we've been working on and unroll and jam have a lot of similarities and thus there should be code sharing opportunities around legality and cost modeling. Let's collaborate to improve both.
May 22 2018
LIT test added. ORE messaging doesn't seem to be working here (I see just "loop not vectorized" without the rest of the string), but that should be fixed separately.
Will add a LIT test.