hsaito (Hideki Saito)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 25 2016, 9:02 AM (117 w, 1 d)

Recent Activity

Today

hsaito accepted D53559: [LV] Don't have fold-tail under optsize invalidate interleave-groups when masked-interleaving is enabled.

LGTM. "Final cleanup" rather than "merge".

Tue, Oct 23, 12:23 PM

Thu, Oct 11

hsaito added a comment to D53142: [VPlan] Script to extract VPlan digraphs from log.

I'm wondering... should we choose between dot and png? Or should we always print the dot file and, upon --png flag, also the png file?

Thu, Oct 11, 12:12 PM

Wed, Oct 10

hsaito accepted D49168: [LV] Add a new reduction pattern match.

LGTM.

Wed, Oct 10, 10:50 AM

Mon, Oct 8

hsaito added inline comments to D49168: [LV] Add a new reduction pattern match.
Mon, Oct 8, 3:12 PM
hsaito accepted D49168: [LV] Add a new reduction pattern match.

LGTM. Please wait for a few days to give others time to respond if they'd like to.

Mon, Oct 8, 2:28 PM
hsaito added a comment to D49168: [LV] Add a new reduction pattern match.

Code looks good. Just a minor suggestion on the comment. Looking at the LIT test.

Mon, Oct 8, 1:28 PM
hsaito added a comment to D49168: [LV] Add a new reduction pattern match.

Thanks a lot, Renato. Will take a look quick.

Mon, Oct 8, 10:31 AM

Fri, Oct 5

hsaito added a comment to D52881: [LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160.

Thanks LGTM. Please wait with committing a bit, in case Hideki or anyone else has any additional comments.

Fri, Oct 5, 11:56 AM
hsaito added a comment to D52930: [SCEV][NFC] Verify IR in isLoop[Entry,Backedge]GuardedByCond.

I agree with Sanjoy. Making this available under a flag would be good.

Fri, Oct 5, 11:35 AM

Tue, Oct 2

hsaito accepted D52767: [LoopVectorize] Loop vectorization for minimum and maximum.

LGTM

Tue, Oct 2, 3:26 PM

Sep 21 2018

hsaito accepted D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

LGTM. We can continue discussing about not bailing out for the subset of cases, but we don't have to let the compiler crash while we do that.

Sep 21 2018, 3:25 PM
hsaito added a comment to D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

can we compute the trip count for these loops?

SCEV can compute the trip count for this testcase, yes. See ScalarEvolution::computeExitCountExhaustively.

Sure. That will do it.

Sep 21 2018, 3:22 PM
hsaito added a comment to D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

Why do we need an integer induction variable? If one doesn't exist, it should be straightforward to create one.

Sep 21 2018, 12:49 PM
hsaito added a comment to D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

Sorry, I totally forgot about my old patch https://reviews.llvm.org/D47216.

  1. I like your two different message version better than changing Line 788 condition to if (!WidestIndTy). That's good.
  2. Please add non-NULL assertion after "Type *IdxTy = Legal->getWidestInductionType();" in InnerLoopVectorizer::getOrCreateTripCount().
  3. Please include LIT test from D47216.
Sep 21 2018, 11:55 AM

Sep 14 2018

hsaito added a comment to D50820: [VPlan] Implement initial vector code generation support for simple outer loops..

Fix for the buildbot failures, provided by @sguggill . There was a memory leak in the D50820 patch.

Sep 14 2018, 8:07 AM

Sep 13 2018

hsaito committed rL342201.
Sep 13 2018, 7:04 PM
hsaito committed rL342197: [VPlan] Implement initial vector code generation support for simple outer loops..
[VPlan] Implement initial vector code generation support for simple outer loops.
Sep 13 2018, 5:38 PM
hsaito closed D50820: [VPlan] Implement initial vector code generation support for simple outer loops..
Sep 13 2018, 5:38 PM
hsaito accepted D50820: [VPlan] Implement initial vector code generation support for simple outer loops..

Accepting the last revision prior to the commit. Hope arc will automatically attribute the patch to @sguggill.

Sep 13 2018, 5:34 PM

Sep 12 2018

hsaito added a comment to D50820: [VPlan] Implement initial vector code generation support for simple outer loops..

I have requested Hideki Saito to submit this change on my behalf.

Sep 12 2018, 3:26 PM

Sep 6 2018

hsaito accepted D51313: [LV] Fix code gen for conditionally executed uniform loads.

LGTM

Sep 6 2018, 12:23 PM

Sep 5 2018

hsaito added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Sep 5 2018, 3:35 PM

Sep 4 2018

hsaito added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Sep 4 2018, 6:48 PM

Aug 27 2018

hsaito added a comment to D51313: [LV] Fix code gen for conditionally executed uniform loads.

This affects architectures that have masked gather/scatter support.

I moved that check from legal to cost model (D43208). As a result, the same issue should happen for arch w/o masked gather/scatter support, probably less often. If you can make the LIT test generic (i.e., not x86 specific), that would be better (but not a must-fix).

Aug 27 2018, 11:27 PM
hsaito accepted D51313: [LV] Fix code gen for conditionally executed uniform loads.

LGTM

Aug 27 2018, 2:23 PM
hsaito added a comment to D51313: [LV] Fix code gen for conditionally executed uniform loads.

This affects architectures that have masked gather/scatter support.

Aug 27 2018, 2:23 PM
hsaito added a reviewer for D51313: [LV] Fix code gen for conditionally executed uniform loads: hsaito.
Aug 27 2018, 2:20 PM
hsaito updated subscribers of D50823: [VPlan] Introduce VPCmpInst sub-class in the instruction-level representation.

Jumping from D50480:

Aug 27 2018, 12:12 PM
hsaito added a comment to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

This patch aims to model a rather special early-exit condition that restricts the execution of the entire loop body to certain iterations, rather than model general compare instructions. If preferred, an "EarlyExit" extended opcode can be introduced instead of the controversial ICmpULE. This should be easy to revisit in the future if needed.

Aug 27 2018, 11:29 AM

Aug 24 2018

hsaito added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

Yes, that is the improved codegen stated as TODO in the costmodel.

Aug 24 2018, 2:16 PM
hsaito added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

For other architectures that do not have these masked intrinsics, we just generate the predicated store by doing an extract and branch on each lane (correct but inefficient and will be avoided unless -force-vector-width=X).

In general, self output dependence is fine to vectorize (whether the store address is uniform or random), as long as (masked) scatter (or scatter emulation) happens from lower elements to higher elements.

I don't think the above comment matters for uniform addresses because a uniform address is invariant.

Aug 24 2018, 1:11 PM
hsaito added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

One more interesting thing I noticed while adding predicated invariant stores to X86 (for -mcpu=skylake-avx512), it supports masked scatter for non-unniform stores.
But we need to add support for uniform stores along with this patch. Today, it just generates incorrect code (no predication whatsover).
For other architectures that do not have these masked intrinsics, we just generate the predicated store by doing an extract and branch on each lane (correct but inefficient and will be avoided unless -force-vector-width=X).

Aug 24 2018, 10:31 AM

Aug 23 2018

hsaito accepted D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

Under the assumption that the acceptance of this patch is not a conscious choice between new CmpULE VPInstruction opcode versus VPCmpInst derivation (whose discussion should continue in D50823 or its follow on), I think this patch is ready to land. LGTM.

Aug 23 2018, 2:51 PM

Aug 22 2018

hsaito added a comment to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

Let's give @dcaballe one more day to try getting some traction on D50823. Fair enough to both of you (and others who might be interested)?

Aug 22 2018, 12:26 PM

Aug 21 2018

hsaito added a comment to D49829: [X86] Add pattern matching for PMADDUBSW.

Sorry, for the late response to the already closed review.

Aug 21 2018, 2:58 PM
hsaito added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

...

Yes, the stores are scalarized. Identical replicas left as-is. Either passes such as load elimination can remove it, or we can clean it up in LV itself.

  • - by revisiting LoopVectorizationCostModel::collectLoopUniforms()? ;-)

Right now, I just run instcombine after loop vectorization to clean up those unnecessary stores (and test cases make sure there's only one store left). Looks like there are other places in LV which relies on InstCombine as the clean up pass, so it may not be that bad after all? Thoughts?

Aug 21 2018, 1:35 PM

Aug 20 2018

hsaito added a comment to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

For me, the only major issue left is the detached IR instruction. @dcaballe, please try adding the reviewers/subscribers of D50480 to D50823, in the hopes of getting a quicker resolution there, so as not to block D50480 because of that. I will not oppose to D50480 for introducing new ULE opcode of VPInstruction (design/implementation choice within VPlan concept), but I will strongly oppose for the use of detahced IR instruction (goes against VPlan concept).

Aug 20 2018, 4:34 PM
hsaito added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 20 2018, 3:53 PM

Aug 16 2018

hsaito added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 16 2018, 12:37 PM
hsaito added a comment to D50823: [VPlan] Introduce VPCmpInst sub-class in the instruction-level representation.

That said, this is currently the only subclass of its kind and it would be weird if it remained that way. In other words, I would expect that VPInstruction would gain more such subclasses (e.g., VPBinaryOperator, VPSelectInst, etc.) as the vectorizer starts to create and manipulate these instructions. Does this make sense to y'all?

Aug 16 2018, 11:52 AM

Aug 15 2018

hsaito added inline comments to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
Aug 15 2018, 5:34 PM
hsaito added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 15 2018, 2:26 PM
hsaito added a comment to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

We obviously don't get this guarantee and thus there's a legality question here the vectorizer would have to solve. There are two obvious approaches: speculation safety and predication. Unless I'm misreading this patch, it has the same problem and uses predication right?

Aug 15 2018, 1:52 PM
hsaito added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 15 2018, 11:20 AM

Aug 14 2018

hsaito added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 14 2018, 5:00 PM
hsaito added a comment to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

I have a general question about direction, not specific to this patch.

It seems like we're adding a specific form of predication to the vectorizer in this patch and I know we already have support for various predicated load and store idioms. What are our plans in terms of supporting more general predication? For instance, I don't believe we handle loops like the following at the moment:
for (int i = 0; i < N; i++) {

if (unlikely(i > M)) 
   break;
sum += a[i];

}

Can the infrastructure in this patch be generalized to handle such cases? And if so, are their any specific plans to do so?

Aug 14 2018, 4:25 PM
hsaito added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 14 2018, 3:24 PM

Aug 10 2018

hsaito added a comment to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..

I'd like to see us explicitly saying that any subsequent explicit transformation metadata will be ignored for the given loop ---- if that's what we'll agree on, or be explicit about something else we'll agree on in the terms that can be clearly explainable to the programmers. "Compiler will skip all remaining transformations after the first failed transform" is pretty straightforward to the programmers. If anyone is proposing other behaviors, I'd like to also see how to explain those behaviors to the programmers.

I added a paragraph to TransformMetadata.rst. (I was assuming it was obvious from the definition: A transformation in a followup-attribute only becomes assigned to a loop by the loop transformation pass. Before that, it is not associated with any loop)

Aug 10 2018, 4:13 PM

Aug 8 2018

hsaito added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 8 2018, 5:15 PM

Aug 7 2018

hsaito added a comment to D49168: [LV] Add a new reduction pattern match.

I modified my patch to use RK_FloatAdd instead of adding a new recurrence descriptor.
And, input IRs of this target loop is already converted into a select instruction, so I don't extend If-convert functionality.

Aug 7 2018, 10:31 AM

Aug 6 2018

hsaito added a comment to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..

What I'm not seeing from this RFC/patch is that, if the programmer specifies transformation behavior A -> B -> C, what happens if transformation A does not kick-in? Should we just warn that "A did not happen" and stop processing the request B and C?

Yes. A warning will be emitted by the -transform-warning pass (Please see Passes.rst).

Aug 6 2018, 12:34 PM
hsaito added a comment to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
... As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens.

Hideki, I think that LLVM does the right thing here: We provide a separation between the hint and the mandate (i.e., using assume_safety or not).

Aug 6 2018, 10:53 AM

Aug 3 2018

hsaito added a comment to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..

I'm not much of an expert on the vectoriser changes here.

Aug 3 2018, 2:32 PM

Jul 25 2018

hsaito accepted D49746: [WIP][LV][DebugInfo] Set DL to the middle block Icmp instruction.

Let's start from this and see if practical usability issues pop up. This is where main vector loop just ended and right before the remainder loop starts. As long as the location info points to the top or the bottom of the loop, that won't surprise the user. Beyond that is just how much more precise we'd like to be. LGTM.

Jul 25 2018, 12:24 PM

Jul 24 2018

hsaito added a comment to D49168: [LV] Add a new reduction pattern match.

I'm not familiar with the Recurrence Descriptor code, but I suppose the following is considered as RK_FloatAdd. If that's the case, we should be beefing up RK_FloatAdd rather than adding a new Kind. We can't keep adding new kind every time we encounter a different pattern of reduction sum/product. Downside is possibly exposing a downstream bug, but that should only help generalizing reduction handling code. From reduction analysis perspective, select (IF-converted) and phi (IF) should be the same thing. So, trying to handle this within FloatAdd/FloatMult should also help generalize recurrence analysis code. That's how I look at the issue. Hope this helps.

Jul 24 2018, 6:12 PM
hsaito added a comment to D49746: [WIP][LV][DebugInfo] Set DL to the middle block Icmp instruction.

This is essentially a zero trip check for the remainder loop and from that perspective, the most correct DL we should use would be the one related to trip count computation. InnerLoopVectorizer::getOrCreateTripCount() is where LV computes the trip count of the incoming scalar loop. It uses getBackedgeTakeCount(). As such, strictly speaking, taking the DL from the predicate (i.e., loop bottom test) fed to the loop backedge would make most sense. That would be OrigLoop->getLoopLatch()->getTerminator() to get to the backedge. In a relaxed thinking, however, this is the code executed when we know vector code executes. So, taking DL from VectorPH code isn't too bad. OrigLoop->getStartLoc() could be another viable enough alternative.

Jul 24 2018, 4:33 PM
hsaito committed rL337861: [LV] Fix for PR38110, LV encountered llvm_unreachable() .
[LV] Fix for PR38110, LV encountered llvm_unreachable()
Jul 24 2018, 3:31 PM
hsaito closed D49461: [LV] Fix for PR38110, LV encountered llvm_unreachable() .
Jul 24 2018, 3:30 PM
hsaito accepted D49461: [LV] Fix for PR38110, LV encountered llvm_unreachable() .

Just came back from vacation. Thanks. I take @sbaranga just forgot to set accept the revision and thus taking a liberty of doing so myself.

Jul 24 2018, 3:29 PM

Jul 18 2018

hsaito added a comment to D49489: [VPlan] VPlan version of InterleavedAccessInfo..

I like the templatization approach taken here. Thank you!. Changes look fine to me, but will wait for a few others to take a look.

Jul 18 2018, 1:47 PM
hsaito added a comment to D49491: [RFC][VPlan, SLP] Add simple SLP analysis on top of VPlan..

That being said, some of the VPlan infrastructure is still emerging: initial VPInstruction based codegeneration and cost modelling is currently worked on for example. However I think considering SLP style vectorization as a VPlan2VPlan transformation (and others) early on would help to make sure the design of the VPlan infrastructure is general enough to cover a wide range of use cases.

I can see the appeal in a proof-of-concept, and I don't oppose having it. But I'm not strongly in favour either.

If more people think that option #2 above is the way to go, then this could turn out fine.

But if more people prefer the option #1, then we would want to see what gets hoisted and how will the local implementations look like before we add VPlanSLP.

Jul 18 2018, 12:33 PM
hsaito added a comment to D49461: [LV] Fix for PR38110, LV encountered llvm_unreachable() .

Seems reasonable to me.

Ideally we should be handling the phi node case (which is why the unreachable was there), but crashing doesn't seem like the right thing to do.

Jul 18 2018, 9:38 AM

Jul 17 2018

hsaito added a comment to D49461: [LV] Fix for PR38110, LV encountered llvm_unreachable() .

If I delete IBM target, the LIT test won't crash before the fix. If anyone can suggest an alternative so that I can move the lit test one level up, that would be great.

Jul 17 2018, 5:40 PM
hsaito created D49461: [LV] Fix for PR38110, LV encountered llvm_unreachable() .
Jul 17 2018, 5:37 PM

Jul 2 2018

hsaito added inline comments to D47188: Intel SVML calling conventions.
Jul 2 2018, 5:35 PM

Jun 14 2018

hsaito added a comment to D48193: [LoopVectorizer] Use an interleave count of 1 when using a vector library call.

I see this as a register allocator problem. It's not like we are running out of registers so that we cannot use ymm0 as a "scratch register" for SVML call. We should show the ASM code to CG experts and get the problem fixed there.

Assuming that register allocator can fix simple enough issues..... I don't think it's correct to model this as a VECLIB call problem. It can happen to any function call that use too many registers that can't be shared among interleaved calls. Instead of looking at whether it's a VECLIB call or not, we should be checking how many registers are used for call/return, and how many of them cannot be shared among interleaved calls. We can then formulate this into a general register pressure issue.

The call uses one register but causes all other live values at the call location to be spilled (the other live values are caused by the interleaving). To model this we would need to know the calling-convention of the target (which registers are preserved), and also which registers the live values will end up in. This isn't known until after register allocation.

Jun 14 2018, 5:24 PM
hsaito added a comment to D48193: [LoopVectorizer] Use an interleave count of 1 when using a vector library call.

I see this as a register allocator problem. It's not like we are running out of registers so that we cannot use ymm0 as a "scratch register" for SVML call. We should show the ASM code to CG experts and get the problem fixed there.

Jun 14 2018, 3:58 PM

Jun 11 2018

hsaito accepted D48048: [LV] Prevent LV to run cost model twice for VF=2.

LGTM.

Jun 11 2018, 2:40 PM

Jun 7 2018

hsaito accepted D47788: [LV] Fix PR36983. For a given recurrence, fix all phis in exit block.

Given that there can be more than one users, the fix makes sense to me. LGTM.

Jun 7 2018, 5:39 PM

Jun 5 2018

hsaito accepted D26743: Expandload and Compressing store - documentation update.

I wrote this documentation after implementation. I don't work on X86 about a year, but I doubt that somebody touched this code.

Jun 5 2018, 8:18 AM

Jun 4 2018

hsaito added a comment to D26743: Expandload and Compressing store - documentation update.

Ping. The intrinsics have been implemented with partial support for a year and a half now. I'd like to use them to replace the X86 specific expand load intrinsics. Can we commit this documentation?

Jun 4 2018, 10:18 AM

May 29 2018

hsaito added inline comments to D47477: [VPlan] Move recipe based VPlan generation to separate function..
May 29 2018, 3:28 PM
hsaito updated subscribers of D41953: [LoopUnroll] Unroll and Jam.

David, outer loop vectorization we've been working on and unroll and jam have a lot of similarities and thus there should be code sharing opportunities around legality and cost modeling. Let's collaborate to improve both.

Yeah, sounds like a good idea. Vectorisation is not something I know a huge amount about, it not being useful for the particular cores I'm looking at at the moment. But the work on VPlan sounds interesting and I do wish we had something similar for loop transforms in llvm. I was hoping to get loop versioning working, maybe through LoopAccessAnalysis, but as far as I understand it doesn't handle outer loops yet (and seemed very vectoriser shaped when I looked). From what I remember hearing, this isn't part of the outer loop vectorisation work (instead relying on user directives?). I imagine it's not an easy task to extend this out.

The current legality checks here are based on dependence analysis, and there are a couple of patches needed to make it work correctly. The cost modelling is very adhoc. :) All these things I hope we can improve as we go along.

May 29 2018, 1:38 PM
hsaito added a comment to D47477: [VPlan] Move recipe based VPlan generation to separate function..
May 29 2018, 1:13 PM

May 28 2018

hsaito added a comment to D41953: [LoopUnroll] Unroll and Jam.

David, outer loop vectorization we've been working on and unroll and jam have a lot of similarities and thus there should be code sharing opportunities around legality and cost modeling. Let's collaborate to improve both.

May 28 2018, 11:09 PM

May 22 2018

hsaito updated the diff for D47216: [LV] Fix to pr37515, FP primary induction crashes LV.

LIT test added. ORE messaging doesn't seem to be working here (I see just "loop not vectorized" without the rest of the string), but that should be fixed separately.

May 22 2018, 1:34 PM
hsaito added a comment to D47216: [LV] Fix to pr37515, FP primary induction crashes LV.

Will add a LIT test.

May 22 2018, 12:42 PM
hsaito added inline comments to D47216: [LV] Fix to pr37515, FP primary induction crashes LV.
May 22 2018, 12:36 PM
hsaito created D47216: [LV] Fix to pr37515, FP primary induction crashes LV.
May 22 2018, 12:32 PM

May 21 2018

hsaito added a comment to D46711: [private] Add min_vector_width function attribute. Use it to annotate all of the x86 intrinsic header files. Emit a attribute in IR.

@atdt and @echristo, the hypothetical -mprefer-low-current-avx[=<uarch>] would be certainly more precise, but that comes with a much higher design/implementation cost in the areas of TTI/vectorizer cost model, vector type legalizer, and CodeGen, at least.

May 21 2018, 2:20 PM

May 15 2018

hsaito added a comment to D46827: [VPlan] Add VPInstruction to VPRecipe transformation..

The only problem is that the recipe generation depends on the cost model and we probably have to duplicate some code from the inner loop vectorizer in the VPlan native path, until we gradually replace them by VPlan implementations. I think that would give us a stable starting point (the VPlan native path should pass all existing inner loop vectorizer tests) and would help us to work towards VPlan native inner loop vectorization at a steady pace. Does this make sense?

What is the main dependence with cost model? Maybe @hsaito could have it into account since he is looking at how to refactor the current cost model.
In any case, I don't think we have to replicate too much code if we start with something simple, even if we generate inefficient code at the beginning. For example, I don't think we initially need the DeadInstructions code, anything related to masking or uniformity (since only uniform branches are allowed in the native path for now) or interleave accesses. That should simplify a lot the VPInstruction2VPRecipe step.

May 15 2018, 4:18 PM

May 11 2018

hsaito added a comment to D44338: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops..

Vectorized loop by nature executes multiple iterations of the sequential loop at the same time. So, the same variable X (say, i32 type) is widened to widenedX (4 x i32 type) to represent 4 different values of X, say, for iterations i, i+1, i+2, and i+3 executing together. In terms of debugging vectorized code, when the programmer points to X during vector execution, debugger needs to show 4 different values of X in this scenario. It's different from placing one value in a larger sized register.

I see. This is not directly representable in debug info. You could either introduce a new artificial $X_vectorized variable that shows the entire vector, or you could represent only the first element in the vector and hide the other unrolled iterations. DWARF cannot currently represent more than one value per variable per pc address.

May 11 2018, 2:09 PM
hsaito added a comment to D44338: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops..

Can you outline what would make updating dbg.value intrinsics to point to vector instructions special, such that it can't be handled immediately?

I keep pushing the implementers (Diego, Satish, etc.) very hard to maintain good correspondence between input IR and output IR. Assuming that dbg.value can handle widened values from scalar values, there shouldn't be anything special.

Assuming widening means putting a smaller value into a larger register at offset 0, then it is safe to just point the dbg.value to the new larger register. If the offset is nonzero, you'll need to generate a DW_OP_shr DIExpression to shift the value into place in the debugger.

May 11 2018, 1:08 PM

May 10 2018

hsaito added a comment to D44338: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops..

Can you outline what would make updating dbg.value intrinsics to point to vector instructions special, such that it can't be handled immediately?

May 10 2018, 10:50 AM

May 8 2018

hsaito committed rL331799: [LV] Fix for PR37248, Broadcast codegen incorrectly assumed vector loop body is….
[LV] Fix for PR37248, Broadcast codegen incorrectly assumed vector loop body is…
May 8 2018, 12:01 PM
hsaito closed D46302: [LV] Fix for PR37248, Broadcast codegen incorrectly assumed vector loop body is single basic block.
May 8 2018, 12:01 PM

May 6 2018

hsaito updated the diff for D46302: [LV] Fix for PR37248, Broadcast codegen incorrectly assumed vector loop body is single basic block.

SafeToHoist logic made straightforward. FileCheck added to lit test. The test passes after the fix.

May 6 2018, 10:38 PM

May 4 2018

hsaito added a comment to D46302: [LV] Fix for PR37248, Broadcast codegen incorrectly assumed vector loop body is single basic block.

Renato/Florian, thanks for the review.

May 4 2018, 11:04 AM

May 1 2018

hsaito created D46302: [LV] Fix for PR37248, Broadcast codegen incorrectly assumed vector loop body is single basic block.
May 1 2018, 12:10 AM

Apr 30 2018

hsaito accepted D46191: [LV] Preserve inbounds on created GEPs.
Apr 30 2018, 5:10 PM
hsaito added a comment to D46191: [LV] Preserve inbounds on created GEPs.

Let me try again after adding myself as a reviewer. LGTM.

Apr 30 2018, 5:09 PM
hsaito added a reviewer for D46191: [LV] Preserve inbounds on created GEPs: hsaito.
Apr 30 2018, 5:08 PM
hsaito added a comment to D46191: [LV] Preserve inbounds on created GEPs.

LGTM. Thanks.

Apr 30 2018, 5:07 PM
hsaito added a comment to D45552: [NFC][LV][LoopUtil] Move LoopVectorizationLegality to its own file.

There seems like a disconnect.

There was. :)

Pushing them to include/llvm/Transform/Vectorize is already a big improvement and I'm quite happy about that.

I'm happy with that, too.

Sorry for the confusion.

Just to be sure, again, LGTM. :)

Apr 30 2018, 1:29 PM
hsaito added a comment to D46254: [LV] Use BB::instructionsWithoutDebug to skip DbgInfo (NFC)..

Thanks, Florian. This is loosely related to D46199. See comments there. We need to consciously try using (or even creating) generically reusable code like this, instead of each file having it's own "skip this, skip that".

Apr 30 2018, 11:47 AM · debug-info
hsaito added inline comments to D46199: [SLPVectorizer] Debug intrinsics shouldn't affect spill cost.
Apr 30 2018, 10:50 AM
hsaito added inline comments to D46191: [LV] Preserve inbounds on created GEPs.
Apr 30 2018, 10:17 AM
hsaito added a comment to D45552: [NFC][LV][LoopUtil] Move LoopVectorizationLegality to its own file.

For example, in loop fusion scenarios, doing this is not a rocket science.

Right, this is still loop transformation, so staying inside Transform still makes sense.

I'm not disagreeing with you on the topics:

  1. It does look like an analysis pass
  2. Other passes could profit

    However, so far, the other passes are all in the transform directory.

    I agree we need to split "what" can be done from "how we'll do it" and I agree we need to make LoopUtils more generic. I was just trying to be clear on why we shouldn't move to include/Analysis yet.
Apr 30 2018, 10:13 AM
hsaito added inline comments to D46191: [LV] Preserve inbounds on created GEPs.
Apr 30 2018, 10:10 AM
hsaito added a comment to D45552: [NFC][LV][LoopUtil] Move LoopVectorizationLegality to its own file.

In this case, the most immediate client for us is actually our internal fast-track VPLan vectorizer (it's outside of LV so that we don't mess up LV accidentally),

Still inside Transform, so should be fine.

Apr 30 2018, 9:37 AM