Page MenuHomePhabricator

hsaito (Hideki Saito)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 25 2016, 9:02 AM (152 w, 1 d)

Recent Activity

Today

hsaito added a comment to D62997: [LV] Share the LV illegality reporting with LoopVectorize. NFC..

Looking at the expected output and the explanations on -Rpass* flags, it could be that those tests should be using -pass-remarks-analysis=loop-vectorize, instead of -pass-remarks-missed=loop-vectorize. Would you try?

Tue, Jun 25, 10:38 AM · Restricted Project

Fri, Jun 21

hsaito added a comment to D62997: [LV] Share the LV illegality reporting with LoopVectorize. NFC..

The diff has been updated: private methods were removed, the function 'reportVectorizationFailure' was moved to LoopVectorize.cpp and declared in LoopVectorize.h (namespace llvm). I've removed the passName parameter and pass LV_NAME as pass name to ORE but this change breaks 3 regression tests:

I'm assuming that the pass name isn't just LV_NAME, but whichever pass is using that function, which now that it's higher level, can be anything.

@rengolin @hsaito I need you suggestions how to deal with the Hints.vectorizeAnalysisPassName() method, it is very long to always call it whenever the reportVectorizationFailure is invoked.

I don't have a better idea on the top of my head, but this sounds like a change that can be done later as a quick refactory once someone comes up with a clever replacement.

I'd be ok having that for now... @hsaito may have a better idea, though.

--renato

Fri, Jun 21, 12:26 PM · Restricted Project

Fri, Jun 14

hsaito added a comment to D62997: [LV] Share the LV illegality reporting with LoopVectorize. NFC..

Sorry, I missed the original review request in the e-mail pile up. Yes, this is the direction I was suggesting. Thank you.

Fri, Jun 14, 12:16 PM · Restricted Project

Thu, May 30

Vladimir Lazarev <vladimir.lazarev@intel.com> committed rG45b17d6c63f7: [SYCL] add optimized vec class constructors (authored by hsaito).
[SYCL] add optimized vec class constructors
Thu, May 30, 8:05 AM
vladimirlaz <vladimir.lazarev@intel.com> committed rG60befd246ea6: [SYCL] Change Indexer to use C++11 feature, from C++14. (authored by hsaito).
[SYCL] Change Indexer to use C++11 feature, from C++14.
Thu, May 30, 8:02 AM
vladimirlaz <vladimir.lazarev@intel.com> committed rG8b1894b4f3a6: [SYCL] Improve SYCL vector implementation (authored by hsaito).
[SYCL] Improve SYCL vector implementation
Thu, May 30, 8:01 AM

Tue, May 28

hsaito added a comment to D62478: [LV] Wrap LV illegality reporting in a function. NFC..

@rengolin It is because the function uses OptimizationRemarkEmitter *ORE, a member of the LoopVectorizationLegality class.

I see. Given the function has a lot of arguments already and it really isn't used elsewhere, I'd rather just add ORE to the args list.

Unless @hsaito or @fhahn think this could be used elsewhere in the vectorizer, then it shouldn't be in that class anyway.

Tue, May 28, 10:37 AM · Restricted Project

May 24 2019

hsaito added a comment to D62311: [LV] Inform about exactly reason of loop illegality.

@hsaito I agree, to have a function to report about an error is a good idea.

I'm new in LLVM community, so what does NFC mean? Should I close this review and open a new one or you mean just to upload a new diff for comments?

May 24 2019, 11:08 AM · Restricted Project
hsaito added a comment to D62311: [LV] Inform about exactly reason of loop illegality.

While we are looking at this, I'd like to discuss how to make these things easier. I think there a merit in using a utility function that takes three strings, something along the lines of
the following pseudo code:

May 24 2019, 10:38 AM · Restricted Project

May 9 2019

hsaito added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

What's the status of this? It seems like discussion has died down a bit. I think Graham's idea to change from <scalable 2 x float> to <vscale x 2 x float> will make the IR more readable/understandable but it's not a show-stopper for me.

Are there any other outstanding issues to address before this lands?

May 9 2019, 10:45 AM · Restricted Project

Apr 30 2019

hsaito added inline comments to D61030: [PassManagerBuilder] Add option for interleaved loops, for loop vectorize..
Apr 30 2019, 9:27 AM · Restricted Project

Apr 29 2019

hsaito added a comment to D61030: [PassManagerBuilder] Add option for interleaved loops, for loop vectorize..

I verified that the two unroll flags propagate to the option set in the PassManagerBuilder. This change + the clang change in D61142 should not make any visible change for clang users.

Apr 29 2019, 6:07 PM · Restricted Project
hsaito added reviewers for D61030: [PassManagerBuilder] Add option for interleaved loops, for loop vectorize.: hfinkel, rengolin, mkuper, fhahn, hsaito.
Apr 29 2019, 6:06 PM · Restricted Project

Apr 25 2019

hsaito added a comment to D61030: [PassManagerBuilder] Add option for interleaved loops, for loop vectorize..

You mean, clang setting EnableLoopInterleaving.

Actually I meant DisableUnrollLoops in the PassManagerBuilder. It's currently always set to false, not based on a flag, and I found a single instance where it's value is changed in clang (see clang patch).

I'm happy to make whatever change needed to clang first, but I don't see what that change is. If you could point me to what I've missed, that would be great!

Apr 25 2019, 5:41 PM · Restricted Project
hsaito added a comment to D61030: [PassManagerBuilder] Add option for interleaved loops, for loop vectorize..

Thanks a lot for the suggestion! Sent: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131968.html

My intention was to not make any change visible to clang.
If clang currently sets the DisableUnrollLoops, and llvm will not use that for LoopVectorization, then have clang set LoopsInterleaved to the same value as the one used for unroll loops.

Apr 25 2019, 12:42 PM · Restricted Project
hsaito added a comment to D61030: [PassManagerBuilder] Add option for interleaved loops, for loop vectorize..

Update test.

Apr 25 2019, 11:30 AM · Restricted Project

Apr 24 2019

hsaito added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

Exactly. Non-constant values can become constant. Constant values can be guarded by vscale-dependent runtime guards (both hand-written and compiler generated). My preference is to leave this not restricted to vscale == 1 values, but rather allow all values that can be supported at runtime, and have it be UB if, at runtime, the relevant index is not available.

+1

Apr 24 2019, 10:23 AM · Restricted Project

Apr 23 2019

hsaito added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

I am not sure how it could be anything but n. If you don't know how long the vector is, you can't correctly generate an index beyond n.

But you know at runtime... there has to be a way to determine, at runtime, vscale. And the index doesn't need to be a constant. I'm not sure that we need to restrict non-constant n to only values valid for vscale == 1.

Good point. 100% agree. I was only considering the constant case.

Ok, so do we have agreement that constant literal indices should be limited to 0..n-1 for now, but non-constant indices can potentially exceed n so that expressions featuring vscale can be used?

Apr 23 2019, 10:39 AM · Restricted Project

Apr 17 2019

hsaito added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

Apr 17 2019, 9:54 AM · Restricted Project

Apr 16 2019

hsaito added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
    1. Updates
  • added constrained fp intrinsics (IR level only).
  • initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.
Apr 16 2019, 2:27 PM · Restricted Project

Apr 15 2019

hsaito added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

I think this is a coherent set of changes. Given the current top of trunk, this expands support from just assembly/disassembly of machine instructions to include LLVM IR, right? Such being the case, I think this patch should go in. I have some ideas on how to structure passes so SV IR supporting optimizations can be added incrementally. If anyone thinks such a proposal would help, let me know.

I think there is one more thing we still have to do. Does scalable vector type apply to all Instructions where non-scalable vector is allowed? If the answer is no, we need to identify which ones are not allowed to take scalable vector type operand/result. Some of the Instructions are not plain element-wise operation. Do we have agreed upon semantics for all those that are allowed?

The main difference is for 'shufflevector'. For a splat it's simple, since you just use a zeroinitializer mask. For anything else, though, you currently need a constant vector with immediate values; this obviously won't work if you don't know how many elements there are.

Apr 15 2019, 1:04 PM · Restricted Project
hsaito added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

So if we wanted to keep them as intrinsics for now, I think we have one of three options:

  1. Leave discussion on more complicated shuffles until later, and only use scalable autovectorization on loops which don't need anything more than splats.

Given the current state, this is the easiest path.

I agree, although this is an important part of the model, so we should start having this discussion in parallel (sooner rather than later). I had been under the impression that a set of intrinsics were being proposed for this, but extending shufflevector is also an option worth considering. If these are first-class types, then having first-class instruction support is probably the right path. This deserves it's own RFC.

  1. Introduce additional intrinsics for the other shuffle variants as needed
  2. Allow shufflevector to accept arbitrary masks so that intrinsics can be used (though possibly only if the output vector is scalable).

This warrants a larger discussion, which would hinder the current progress.

I agree. We should have a separate RFC on this.

Apr 15 2019, 10:11 AM · Restricted Project

Apr 12 2019

hsaito added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

I think this is a coherent set of changes. Given the current top of trunk, this expands support from just assembly/disassembly of machine instructions to include LLVM IR, right? Such being the case, I think this patch should go in. I have some ideas on how to structure passes so SV IR supporting optimizations can be added incrementally. If anyone thinks such a proposal would help, let me know.

Apr 12 2019, 11:27 AM · Restricted Project

Apr 5 2019

hsaito added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Apr 5 2019, 4:02 PM · Restricted Project

Mar 29 2019

hsaito added inline comments to D59723: [NewPassManager] Adding pass tuning options: loop vectorize..
Mar 29 2019, 12:34 PM · Restricted Project

Mar 28 2019

hsaito accepted D59952: [VPLAN] Minor improvement to testing and debug messages..

LGTM

Mar 28 2019, 2:57 PM · Restricted Project
hsaito added inline comments to D59952: [VPLAN] Minor improvement to testing and debug messages..
Mar 28 2019, 12:25 PM · Restricted Project
hsaito added inline comments to D59952: [VPLAN] Minor improvement to testing and debug messages..
Mar 28 2019, 12:19 PM · Restricted Project
hsaito added a comment to D57598: [VPLAN] Determine Vector Width programmatically..

Thanks Francesco!

Mar 28 2019, 11:03 AM · Restricted Project

Mar 27 2019

hsaito added a comment to D57598: [VPLAN] Determine Vector Width programmatically..

Thanks Francesco! I'll commit the change tomorrow, unless @hsaito does it today :)

Mar 27 2019, 2:36 PM · Restricted Project

Mar 25 2019

hsaito added a comment to D57598: [VPLAN] Determine Vector Width programmatically..

@hsaito, I don't have commit access, could you commit this change for me?

Mar 25 2019, 7:58 PM · Restricted Project

Mar 22 2019

hsaito added a comment to D57598: [VPLAN] Determine Vector Width programmatically..

@npanchen , @fhahn , gentle ping :)

Francesco

Mar 22 2019, 3:37 PM · Restricted Project
hsaito added inline comments to D59149: [LV] move useEmulatedMaskMemRefHack() functionality to TTI..
Mar 22 2019, 2:26 PM · Restricted Project

Mar 19 2019

hsaito removed a reviewer for D57978: [CodeGen] Generate follow-up metadata for loops with more than one transformation.: hsaito.
Mar 19 2019, 2:54 PM · Restricted Project, Restricted Project
hsaito added a comment to D57978: [CodeGen] Generate follow-up metadata for loops with more than one transformation..

ping

Mar 19 2019, 2:51 PM · Restricted Project, Restricted Project
hsaito added inline comments to D59149: [LV] move useEmulatedMaskMemRefHack() functionality to TTI..
Mar 19 2019, 10:06 AM · Restricted Project

Mar 14 2019

hsaito accepted D57598: [VPLAN] Determine Vector Width programmatically..

LGTM. Please wait for a few days to give others a chance to go over your updated patch.

Mar 14 2019, 12:21 PM · Restricted Project
hsaito added inline comments to D57598: [VPLAN] Determine Vector Width programmatically..
Mar 14 2019, 10:26 AM · Restricted Project
hsaito added inline comments to D57598: [VPLAN] Determine Vector Width programmatically..
Mar 14 2019, 9:54 AM · Restricted Project

Mar 13 2019

hsaito added a comment to D57598: [VPLAN] Determine Vector Width programmatically..

I mostly changes to code to use the infrastructure that LLVM already provides to determine the vectorization factor.

Mar 13 2019, 1:48 PM · Restricted Project
hsaito added a comment to D59149: [LV] move useEmulatedMaskMemRefHack() functionality to TTI..

Any improvement is an improvement so I am happy with that but it is still mentioned that this solution is a hack and I guess the

Cost model for emulated masked load/store is completely broken.

comment is still valid. What would it take to address this properly?

Mar 13 2019, 12:43 PM · Restricted Project

Mar 8 2019

hsaito added inline comments to D59149: [LV] move useEmulatedMaskMemRefHack() functionality to TTI..
Mar 8 2019, 12:16 PM · Restricted Project
hsaito created D59149: [LV] move useEmulatedMaskMemRefHack() functionality to TTI..
Mar 8 2019, 12:14 PM · Restricted Project

Feb 6 2019

hsaito added a comment to D57837: [LV] Prevent interleaving if computeMaxVF returned None..

This LGTM with the minor documentation comments and retention of UserIC.
Seems like we've reached a consensus, correct me if not.
Thanks!

Feb 6 2019, 4:18 PM · Restricted Project
hsaito added a comment to D57837: [LV] Prevent interleaving if computeMaxVF returned None..

Agreed, having it done implicitly in a function called computeMaxVF does not seem ideal. From the current behavior of computeMaxVF, I think what we actually want to decide whether to disable interleaving up front is just if we optimize for size or not. What do you think?

Feb 6 2019, 3:14 PM · Restricted Project
hsaito added a comment to D57837: [LV] Prevent interleaving if computeMaxVF returned None..

I have a little mental barrier in accepting this change as is. I think this feeling of mine is mainly due to the name and the "stated" functionality of computeMaxVF and "indirect inference" towards using it for suppressing interleaving. Maybe I'm just being too picky here. If so, my apologies ahead of time.

Feb 6 2019, 2:02 PM · Restricted Project

Feb 5 2019

hsaito added inline comments to D57382: [LV] Move interleave count computation to LVP::plan()..
Feb 5 2019, 4:00 PM · Restricted Project
hsaito added a comment to D57382: [LV] Move interleave count computation to LVP::plan()..

As it is at the moment, we need to know the selected vectorization factor to compute the interleave count. And we need to know the interleave count to know if we need to generate VPlans. Maybe with this refactor it would also make sense to only generate a VPlan for the selected vectorization factor for now? There is no need to build VPlans for multiple vectorization factors in the legacy planning and we have the VPlan native path which builds the VPlans up front for planning.

Feb 5 2019, 3:48 PM · Restricted Project

Feb 1 2019

hsaito added a comment to D57598: [VPLAN] Determine Vector Width programmatically..

While we are at this, let's talk about downstream dependency, if any, for allowing more than one candidate VF along the native path. At least we can write down a list of TODOs so that we'll be aware of the things we need to improve at a time in the future.

Feb 1 2019, 12:39 PM · Restricted Project
hsaito added a reviewer for D57598: [VPLAN] Determine Vector Width programmatically.: npanchen.
Feb 1 2019, 10:00 AM · Restricted Project
hsaito added a comment to D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency.

Rather than over aggressive or too conservative, 1) seems to match the current behavior which forbids store groups with gaps; extending in the other direction will also break the vector WAW dependence, right?

Feb 1 2019, 9:48 AM · Restricted Project
hsaito added a comment to D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency.

(I am saying that after recently investigating a few cases of hour-long compile times on user code that were caused by unnecessary scanning)

Feb 1 2019, 9:45 AM · Restricted Project

Jan 31 2019

hsaito added a comment to D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency.

I plan on having a look later this week. I am a little worried that the checks in-line here are already quite complex and I would like to have a think if that could be improved in some way.

I agree; The algorithm makes sure that we visit everything between B and A, including C, before we visit A; so we have a chance to identify the (potentially) interfering store C before we reach A; This is what allows the algorithm to only compare the pairs (A,B) without having each time to also scan everything in between.

So I think the bug is that when we visited C, and found that it could be inserted into B's group dependence-wise, but wasn't inserted due to other reasons, we should have either:

  1. Invalidated the group (which is over aggressive but better than wrong code)
  2. Recorded in B's Group the index where C could be inserted, to "burn" that index from allowing some other instruction A to become a group member at that index; so when we reach A we see its spot is taken. (I think this will have the same effect as the proposed patch but without the extra scan.)
  3. Same as above but instead of bailing out on grouping A with B, make sure that C is also sunk down with that group (as I think Hideki mentioned in the PR) (maybe a future improvement).
Jan 31 2019, 11:42 PM · Restricted Project

Jan 23 2019

hsaito committed rL351990.
Jan 23 2019, 2:43 PM
hsaito closed D53349: [VPlan] Changes to implement VPlan based predication for VPlan-native path..
Jan 23 2019, 2:43 PM

Dec 19 2018

hsaito added a comment to D53865: [LoopVectorizer] Improve computation of scalarization overhead..

Unfortunately, this could not be done with computeInstDiscount() as a simple extension of (1), since memory accesses are dealt with before anything else.

Dec 19 2018, 3:23 PM

Dec 12 2018

hsaito added a comment to D53865: [LoopVectorizer] Improve computation of scalarization overhead..

I think you are either 1) arm-twisting the vectorizer to emit vector code which you know will be scalar or 2) arm-twisting vectorizer's cost model to believe what you are emitting as "vector" to be really scalar. I certainly do not see the reason why "you have to" do that, because letting vectorizer emit scalar IR instructions in those cases should be "equivalent". So, why "do you WANT to" do that? IR going out of vectorizer may be more compact, but what that'll accomplish is cheating all downstream optimizers and their cost models.

I am just trying to keep it simple by not changing how LV generates code, but merely improve the cost computations. Changing the output of a vectorized loop seems like a much bigger project, which I did not attempt.

Dec 12 2018, 1:27 PM

Dec 10 2018

hsaito added a comment to D53865: [LoopVectorizer] Improve computation of scalarization overhead..

Making better decisions what to vectorize and what to keep scalar is clearly useful enough to include in the loop vectorizer. However, this should best be done in a target independent way; e.g., how computePredInstDiscount() and sinkScalarOperands() work to expand the scope of scalarized instructions according to the cumulative cost discount of potentially scalarized instruction chains. Unless there's a good reason for it to be target specific(?)

The only target-specific part I am thinking about is which instructions will later be expanded during *isel*.

My question back to you is why Scalars is not good enough for your purpose. You get different "scalarlization" answer in collectLoopScalars() and collectTargetScalarized()?

My understanding is that currently the LoopVectorizer notion of a scalarized instruction refers to an *LLVM I/R* scalarized instruction. In other words, which instructions it will itself produce scalarized. These are the instructions contained in Scalars[VF].

Dec 10 2018, 10:57 AM

Nov 29 2018

hsaito added inline comments to D54412: [RFC] Re-implementing -fveclib with OpenMP.
Nov 29 2018, 1:16 PM

Nov 28 2018

hsaito added inline comments to D54412: [RFC] Re-implementing -fveclib with OpenMP.
Nov 28 2018, 2:01 PM
hsaito added a comment to D53865: [LoopVectorizer] Improve computation of scalarization overhead..

As for the computation of the cost for scalarized instruction:

Nov 28 2018, 1:38 PM
hsaito added a comment to D53865: [LoopVectorizer] Improve computation of scalarization overhead..

Sorry, I must have missed this review.

VPlan based cost modeling (plus VPlan based code motion) should naturally capture this kind of situation ----- but only to the extent that producer/consumer can reside in the same BB. It's taking a lot longer than I wanted to stabilize (compute exactly the same value as existing cost model in LV).

Thanks for taking a look! IIUC, my patch is not useful since VPlan will soon improve this area without it.

Nov 28 2018, 1:10 PM

Nov 27 2018

hsaito added a comment to D53865: [LoopVectorizer] Improve computation of scalarization overhead..

Sorry, I must have missed this review.

Nov 27 2018, 10:32 AM

Nov 26 2018

hsaito added inline comments to D54412: [RFC] Re-implementing -fveclib with OpenMP.
Nov 26 2018, 5:17 PM
hsaito added inline comments to D54412: [RFC] Re-implementing -fveclib with OpenMP.
Nov 26 2018, 4:50 PM
hsaito added inline comments to D54412: [RFC] Re-implementing -fveclib with OpenMP.
Nov 26 2018, 4:40 PM

Nov 19 2018

hsaito added a comment to D54412: [RFC] Re-implementing -fveclib with OpenMP.

If not, we can go with the #pragma clang declare simd directive, with clang being ifdef guarded to become omp when _OPENMP is detected.

Under the circumstances, this seems like a reasonable strategy to me.

Nov 19 2018, 6:00 PM
hsaito added a comment to D54412: [RFC] Re-implementing -fveclib with OpenMP.

An alternative might be to enable OpenMP simd parsing by default?

Yes, I thisnk this is doable, to have #pragma omp declare simd and #pragma omp declare variant (only the simd case) enabled by default.

Are you happy for me to modify the RFC with this?

I'm fine with that.

I'm not totally aligned here, but I'm fine proceeding with the RFC revision along this line. I'll wait for more voices to be raised (or not) on the issue. Please keep a note saying that there is a concern raised and those who care should voice their concerns.

So that I understand, what's your preference?

Nov 19 2018, 12:38 PM
hsaito added a comment to D54412: [RFC] Re-implementing -fveclib with OpenMP.

An alternative might be to enable OpenMP simd parsing by default?

Yes, I thisnk this is doable, to have #pragma omp declare simd and #pragma omp declare variant (only the simd case) enabled by default.

Are you happy for me to modify the RFC with this?

I'm fine with that.

Nov 19 2018, 10:27 AM

Nov 16 2018

hsaito added inline comments to D54412: [RFC] Re-implementing -fveclib with OpenMP.
Nov 16 2018, 2:47 PM
hsaito added inline comments to D54412: [RFC] Re-implementing -fveclib with OpenMP.
Nov 16 2018, 12:17 PM
hsaito added a comment to D54412: [RFC] Re-implementing -fveclib with OpenMP.

Also, in order for us to be able to adopt this mechanism, we need to work on LV's code gen. Currently, parts of it rely on having one consistent VF across the entire loop (i.e., loop-level VF or "serialize"). That's the reason why https://reviews.llvm.org/D53035 is generating illegal call first and then legalize. We need to add this work item as a dependency (to at least some veclib, e.g., SVML).

Nov 16 2018, 12:03 PM

Nov 12 2018

hsaito added a comment to D49491: [RFC][VPlan, SLP] Add simple SLP analysis on top of VPlan..

Thanks, everyone! Important first step towards the great converged Loop+SLP vectorizer.

Nov 12 2018, 1:02 PM

Nov 6 2018

hsaito added a comment to D52685: [LoopVectorizer] Adjust heuristics for a truncated load.

Thanks for feedback and explanations! I will abandon this, and instead aim at enabling TT.shouldMaximizeVectorBandwidth().

Nov 6 2018, 10:38 AM

Nov 2 2018

hsaito added a comment to D52685: [LoopVectorizer] Adjust heuristics for a truncated load.

I prefer not to add this kind of "heuristics" on the functions that should just return "the fact". Even if the loaded value is truncated immediately, the load itself needs to be vectorized.

Nov 2 2018, 4:39 PM

Oct 29 2018

hsaito added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

I do not think we need to unnecessarily tie this proposal to "dynamic" vector length. These just have "explicit vector length" that is not implied by the operand vector length. May I suggest "evl" instead of "dvl"? Stands for Explicit Vector Length".

Oct 29 2018, 9:52 AM · Restricted Project

Oct 25 2018

hsaito added a comment to D53612: [LV] Avoid vectorizing loops under opt for size that involve SCEV checks.

The patch will avoid the assert seen in PR39417 so that is great.
(hard to tell if it still is possible to hit the assert, PR39417 was triggered during fuzzy testing using csmith to generate the input)

One thing I noticed is that if I use the test case from PR39417 and add -vectorizer-min-trip-count=3, to avoid the detection of a "very small trip count", the loop will be vectorized with VF=16. That is also what happened when we triggered the assert (without this patch). Shouldn't the VF be clamped to the trip count?
It seems like the vectorizer detects that the trip count is tiny (trip count is 3), but it vectorize using VF=16 but then the vectorized loop is skipped since we emit br i1 true, label %scalar.ph, label %vector.scevcheck. So all the hard work with vectorizing the loop is just a waste of time, or could it be beneficial to have VF > tripcount in some cases?

If the actual problem is that VF should be clamped to the trip count, then maybe this patch just hides that problem in certain cases (when having OptForSize).

Oct 25 2018, 11:04 AM

Oct 23 2018

hsaito accepted D53559: [LV] Don't have fold-tail under optsize invalidate interleave-groups when masked-interleaving is enabled.

LGTM. "Final cleanup" rather than "merge".

Oct 23 2018, 12:23 PM

Oct 11 2018

hsaito added a comment to D53142: [VPlan] Script to extract VPlan digraphs from log.

I'm wondering... should we choose between dot and png? Or should we always print the dot file and, upon --png flag, also the png file?

Oct 11 2018, 12:12 PM

Oct 10 2018

hsaito accepted D49168: [LV] Add a new reduction pattern match.

LGTM.

Oct 10 2018, 10:50 AM

Oct 8 2018

hsaito added inline comments to D49168: [LV] Add a new reduction pattern match.
Oct 8 2018, 3:12 PM
hsaito accepted D49168: [LV] Add a new reduction pattern match.

LGTM. Please wait for a few days to give others time to respond if they'd like to.

Oct 8 2018, 2:28 PM
hsaito added a comment to D49168: [LV] Add a new reduction pattern match.

Code looks good. Just a minor suggestion on the comment. Looking at the LIT test.

Oct 8 2018, 1:28 PM
hsaito added a comment to D49168: [LV] Add a new reduction pattern match.

Thanks a lot, Renato. Will take a look quick.

Oct 8 2018, 10:31 AM

Oct 5 2018

hsaito added a comment to D52881: [LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160.

Thanks LGTM. Please wait with committing a bit, in case Hideki or anyone else has any additional comments.

Oct 5 2018, 11:56 AM
hsaito added a comment to D52930: [SCEV][NFC] Verify IR in isLoop[Entry,Backedge]GuardedByCond.

I agree with Sanjoy. Making this available under a flag would be good.

Oct 5 2018, 11:35 AM

Oct 2 2018

hsaito accepted D52767: [LoopVectorize] Loop vectorization for minimum and maximum.

LGTM

Oct 2 2018, 3:26 PM

Sep 21 2018

hsaito accepted D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

LGTM. We can continue discussing about not bailing out for the subset of cases, but we don't have to let the compiler crash while we do that.

Sep 21 2018, 3:25 PM
hsaito added a comment to D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

can we compute the trip count for these loops?

SCEV can compute the trip count for this testcase, yes. See ScalarEvolution::computeExitCountExhaustively.

Sure. That will do it.

Sep 21 2018, 3:22 PM
hsaito added a comment to D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

Why do we need an integer induction variable? If one doesn't exist, it should be straightforward to create one.

Sep 21 2018, 12:49 PM
hsaito added a comment to D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

Sorry, I totally forgot about my old patch https://reviews.llvm.org/D47216.

  1. I like your two different message version better than changing Line 788 condition to if (!WidestIndTy). That's good.
  2. Please add non-NULL assertion after "Type *IdxTy = Legal->getWidestInductionType();" in InnerLoopVectorizer::getOrCreateTripCount().
  3. Please include LIT test from D47216.
Sep 21 2018, 11:55 AM

Sep 14 2018

hsaito added a comment to D50820: [VPlan] Implement initial vector code generation support for simple outer loops..

Fix for the buildbot failures, provided by @sguggill . There was a memory leak in the D50820 patch.

Sep 14 2018, 8:07 AM

Sep 13 2018

hsaito committed rL342201.
Sep 13 2018, 7:04 PM
hsaito committed rL342197: [VPlan] Implement initial vector code generation support for simple outer loops..
[VPlan] Implement initial vector code generation support for simple outer loops.
Sep 13 2018, 5:38 PM
hsaito closed D50820: [VPlan] Implement initial vector code generation support for simple outer loops..
Sep 13 2018, 5:38 PM
hsaito accepted D50820: [VPlan] Implement initial vector code generation support for simple outer loops..

Accepting the last revision prior to the commit. Hope arc will automatically attribute the patch to @sguggill.

Sep 13 2018, 5:34 PM

Sep 12 2018

hsaito added a comment to D50820: [VPlan] Implement initial vector code generation support for simple outer loops..

I have requested Hideki Saito to submit this change on my behalf.

Sep 12 2018, 3:26 PM

Sep 6 2018

hsaito accepted D51313: [LV] Fix code gen for conditionally executed uniform loads.

LGTM

Sep 6 2018, 12:23 PM

Sep 5 2018

hsaito added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Sep 5 2018, 3:35 PM

Sep 4 2018

hsaito added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Sep 4 2018, 6:48 PM