Page MenuHomePhabricator

anna (Anna Thomas)
User

Projects

User does not belong to any projects.

User Details

User Since
Mar 30 2016, 11:13 AM (181 w, 1 d)

Recent Activity

Jan 29 2019

anna added a comment to D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency.

ping

Jan 29 2019, 12:50 PM · Restricted Project

Jan 24 2019

anna created D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency.
Jan 24 2019, 12:29 PM · Restricted Project
anna accepted D57161: [RS4GC] Be slightly less conservative for gep vector_base, scalar_idx.

lgtm.

Jan 24 2019, 8:32 AM
anna added a comment to D57161: [RS4GC] Be slightly less conservative for gep vector_base, scalar_idx.

The scalar indices don't appear to be a problem on a scalar gep, we even had a test for that.

just to note it is not a scalar gep - this is a vector gep which has a scalar index.

Jan 24 2019, 8:30 AM
anna accepted D57138: [RS4GC] Avoid crashing on gep scalar, vector_idx.

LGTM w/ comments.

Jan 24 2019, 7:44 AM

Jan 22 2019

anna planned changes to D56449: [UnrollRuntime] Support multi-exiting blocks to LatchExit.

need to update comments and address review comments. moving out of queue for now.

Jan 22 2019, 5:40 AM

Jan 18 2019

anna accepted D55678: [CVP] Use LVI to constant fold deopt operands.

LGTM!

Jan 18 2019, 4:00 PM

Jan 8 2019

anna created D56449: [UnrollRuntime] Support multi-exiting blocks to LatchExit.
Jan 8 2019, 12:43 PM
anna added a comment to D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

thanks for reviewing the change Brian! Will land it soon.

Jan 8 2019, 9:02 AM
anna added a comment to D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

This has passed all internal fuzzer runs - the DT verification failure tests have been added as test cases here after reduction.

Jan 8 2019, 8:45 AM
anna updated the diff for D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

we need to update domInfo for exits and those reachable from the exits.

Jan 8 2019, 8:44 AM

Jan 7 2019

anna added a comment to D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

I've got a naive question about the DTU: Does the automatic updater handle the ImmediateDominator "automatically" for the remaining nodes in the dom tree once identify that one of the nodes in the DT has a change in IDom?

Exactly. Given a set of CFG updates, if figures out how to change the DomTree and PostDomTree such that they match the CFG.

This will drastically help with #2 because what I have as a local fix using the old DT is something like this:

+   if (DT) {
+    // Check the dom children of each block in loop and if it is outside the
+    // current loop, update it to the preheader.
+    for (auto *BB: L->blocks()) {
+      auto *DomNodeBB = DT->getNode(BB);
+      for (auto *DomChild: DomNodeBB->getChildren()) {
+        if (!L->contains(LI->getLoopFor(DomChild->getBlock()))) {
+          DT->changeImmediateDominator(DomChild, DT->getNode(PreHeader));
+        }    
+      }    
+    }
+   }

This is a more general fix using the old DT to handle the non-immediate successors of an exit block in the loop.

Doesn't this loop have to be run until you reach a fixpoint?

Jan 7 2019, 2:04 PM
anna added a comment to D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

the same bug can happen with:

  1. latch exit
  2. non-immediate successors of latchexit/otherexit.

    I have simplified test cases for each of these and we need a more general fix. Working on this.

... these things are extremely bug-prone; I really do suggest using the automatic updater instead of trying to deal with all tricky corner-cases here :)

Jan 7 2019, 1:18 PM
anna planned changes to D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

the same bug can happen with:

  1. latch exit
  2. non-immediate successors of latchexit/otherexit.
Jan 7 2019, 12:13 PM

Jan 4 2019

anna updated the diff for D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

The IDom can be any block within the loop (as mentioned in the comment).
So, we need to check for IDom *contained* in the loop.
Added test case shows where the IDom is within the inner loop of the original loop.
Both test cases pass now with the fix.

Jan 4 2019, 10:18 AM

Jan 3 2019

anna added a comment to D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

@anna Can this be rewritten to use the DomTreeUpdater utility?

@kuhar, I have not looked at the DTU uttility yet, but I'd think it can be rewritten (unless there are some limitations for the DomTreeUpdater utility). However, that would be a large enough change and I think it's better to do it as a separate change at a later point rather than as part of fixing this bug. Also, there are enough DT updates through out this code that it will take a while to get through the whole process of porting over for runtime unrolling.

Jan 3 2019, 3:14 PM
anna added a comment to D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

Context and history for runtime unrolling: https://reviews.llvm.org/D35304

Jan 3 2019, 11:53 AM
anna created D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.
Jan 3 2019, 11:53 AM

Dec 21 2018

anna added inline comments to D55678: [CVP] Use LVI to constant fold deopt operands.
Dec 21 2018, 10:34 AM

Dec 5 2018

anna accepted D54023: [LoopSimplifyCFG] Delete dead in-loop blocks.

LGTM. Thanks for working through the revisions.

Dec 5 2018, 12:20 PM

Dec 4 2018

anna requested changes to D54023: [LoopSimplifyCFG] Delete dead in-loop blocks.

Pls add -verify-memoryssa to the RUN command as well, now that MSSA is getting updated.
Comment inline.

Dec 4 2018, 7:15 AM

Nov 26 2018

anna accepted D54849: [LoopSimplifyCFG] Fix corner case with duplicating successors.

LGTM.

Nov 26 2018, 1:45 PM

Nov 22 2018

anna added a comment to D54023: [LoopSimplifyCFG] Delete dead in-loop blocks.

There's an LCSSA not being preserved bug in the original landed change (@dmgreen added the test case). As part of fixing it, pls add the following options to your RUN command in the test "-verify-loop-info -verify-dom-info -verify-loop-lcssa". This will make sure all forms of verification are checked.

Nov 22 2018, 10:26 AM

Nov 20 2018

anna added inline comments to D54023: [LoopSimplifyCFG] Delete dead in-loop blocks.
Nov 20 2018, 8:34 AM

Nov 19 2018

anna added inline comments to D54538: [LV] Avoid vectorizing unsafe dependencies in uniform address.
Nov 19 2018, 7:17 AM

Nov 16 2018

anna accepted D54021: [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches.

LGTM. Thanks for working through the comments.

Nov 16 2018, 9:27 AM

Nov 15 2018

anna added inline comments to D54538: [LV] Avoid vectorizing unsafe dependencies in uniform address.
Nov 15 2018, 8:01 AM
anna updated the diff for D54538: [LV] Avoid vectorizing unsafe dependencies in uniform address.

addressed review comments

Nov 15 2018, 7:59 AM
anna planned changes to D54538: [LV] Avoid vectorizing unsafe dependencies in uniform address.

based on above comments.

Nov 15 2018, 6:50 AM
anna added inline comments to D54538: [LV] Avoid vectorizing unsafe dependencies in uniform address.
Nov 15 2018, 6:32 AM

Nov 14 2018

anna created D54538: [LV] Avoid vectorizing unsafe dependencies in uniform address.
Nov 14 2018, 12:43 PM
anna requested changes to D54021: [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches.

Forgot to mark request changes. See previous inline comments.

Nov 14 2018, 6:49 AM

Nov 13 2018

anna added inline comments to D54021: [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches.
Nov 13 2018, 1:29 PM

Nov 7 2018

anna added inline comments to D53889: [CodeGen] Prefer static frame index for STATEPOINT liveness args.
Nov 7 2018, 11:56 AM
anna accepted D53602: [IRVerifier] Allow StructRet in statepoint.

This looks good to me.

Nov 7 2018, 11:44 AM
anna added a comment to D54021: [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches.

comments inline. Algo looks fine.

Nov 7 2018, 10:21 AM

Nov 5 2018

anna requested changes to D54021: [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches.

based on above comments.

Nov 5 2018, 10:19 AM
anna added a comment to D54021: [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches.

Couple of comments inline. Pls add test cases.

Nov 5 2018, 10:14 AM
anna added inline comments to D54021: [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches.
Nov 5 2018, 9:17 AM

Oct 16 2018

anna added inline comments to D52656: [LV] Teach vectorizer about variant value store into uniform address.
Oct 16 2018, 8:57 AM

Oct 5 2018

anna updated the diff for D52656: [LV] Teach vectorizer about variant value store into uniform address.

addressed review comments. Added one test for multiple uniform stores should not be vectorized.

Oct 5 2018, 12:03 PM
anna added inline comments to D52656: [LV] Teach vectorizer about variant value store into uniform address.
Oct 5 2018, 11:59 AM

Oct 4 2018

anna added a comment to D52656: [LV] Teach vectorizer about variant value store into uniform address.

ping

Oct 4 2018, 11:28 AM

Sep 28 2018

anna created D52656: [LV] Teach vectorizer about variant value store into uniform address.
Sep 28 2018, 8:39 AM

Sep 27 2018

anna added a comment to D52362: [CloneFunction] Simplify previously unsimplifiable instructions.

Results on CTMark does show negative compile time impact on couple of benchmarks - run on a haswell machine over 4 runs with and without patch:

Sep 27 2018, 9:11 AM

Sep 25 2018

anna added inline comments to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.
Sep 25 2018, 8:30 AM

Sep 24 2018

anna added a comment to D51231: [X86] Make Feature64Bit useful.

@anna, hopefully I fixed your issue in r342914

Sep 24 2018, 5:31 PM
anna added a comment to D51231: [X86] Make Feature64Bit useful.

@anna, the "lm" in the flags from the cpuinfo should be the 64bit flag. Is this failing with clang, or llc, or some other program that uses llvm libraries?

Sep 24 2018, 11:28 AM
anna added a comment to D51231: [X86] Make Feature64Bit useful.

With this change, we now break the code gen in KVMs that do not have the "+64bit" feature tagged (although it is capable of generating 64 bit code). Is there a way to identify the feature for KVMs?

Sep 24 2018, 7:20 AM
anna added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

ping

Sep 24 2018, 7:20 AM

Sep 21 2018

anna added a comment to D52362: [CloneFunction] Simplify previously unsimplifiable instructions.
Sep 21 2018, 1:54 PM
anna added a comment to D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

Why do we need an integer induction variable? If one doesn't exist, it should be straightforward to create one.

Is there a practical need (i.e., beyond academic interest) to vectorize such code? Examples?

Sep 21 2018, 1:42 PM
anna created D52362: [CloneFunction] Simplify previously unsimplifiable instructions.
Sep 21 2018, 8:24 AM

Sep 18 2018

anna updated the diff for D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

addressed review comments.

Sep 18 2018, 1:40 PM
anna added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

Hi Ayal, thanks for your detailed review!

Best allow only a single store to an invariant address for now; until we're sure the last one to store is always identified correctly.

I've updated the patch to restrict to this case for now (diff coming up soon). Generally, if we have multiple stores to an invariant address, it might be canonicalized by InstCombine. So, this may not be as inhibiting as it sounds. Keeping this restriction and allowing "variant stores to invariant addresses" seems like a logical next step once this lands.

Sep 18 2018, 12:17 PM

Sep 13 2018

anna added a comment to D51964: [InstCombine] Fold (xor (min/max X, Y), -1) -> (max/min ~X, ~Y) when X and Y are freely invertible..

This whole patch does fix the infinite loop from PR38915, but it requires the whole patch and not just the change in InstCombineSelect.cpp

Sep 13 2018, 8:33 AM

Sep 10 2018

anna added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

ping

Sep 10 2018, 7:18 AM
anna updated the diff for D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

rebased over D51313.

Sep 10 2018, 7:18 AM
anna added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Sep 10 2018, 7:07 AM

Sep 6 2018

anna updated the diff for D51313: [LV] Fix code gen for conditionally executed uniform loads.

addressed review comment.

Sep 6 2018, 12:06 PM
anna added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Sep 6 2018, 10:49 AM

Sep 5 2018

anna added inline comments to D51486: Add check to Latch's terminator in UnrollRuntimeLoopRemainder.
Sep 5 2018, 12:39 PM
anna updated the diff for D51313: [LV] Fix code gen for conditionally executed uniform loads.

addressed review comments.

Sep 5 2018, 12:26 PM
anna added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Sep 5 2018, 12:23 PM

Sep 4 2018

anna added a comment to D51639: [LV] Fix PR38786 - consider first order recurrence phis non-uniform.

I think this is ready. Would someone like to commit it?

Sep 4 2018, 2:03 PM
anna added inline comments to D51639: [LV] Fix PR38786 - consider first order recurrence phis non-uniform.
Sep 4 2018, 9:58 AM
anna accepted D51639: [LV] Fix PR38786 - consider first order recurrence phis non-uniform.

LGTM. thanks for the fix!

Sep 4 2018, 9:03 AM
anna accepted D50730: [AST] Generalize argument specific aliasing.

LGTM.

Sep 4 2018, 8:15 AM
anna added a comment to D51313: [LV] Fix code gen for conditionally executed uniform loads.

ping

Sep 4 2018, 8:01 AM

Aug 31 2018

anna added a comment to D51486: Add check to Latch's terminator in UnrollRuntimeLoopRemainder.

Please upload patch with complete context. Could you also add a test case show casing your problem (which you've described earlier): Just a loop with unconditional latch terminator and run through runtime-unroll.

Aug 31 2018, 5:53 AM

Aug 30 2018

anna requested review of D51313: [LV] Fix code gen for conditionally executed uniform loads.
Aug 30 2018, 7:58 AM
anna added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

This patch now only vectorizes invariant values stored into invariant addresses. It also correctly handles conditionally executed stores (fixed bug for scatter code generation in AVX512).

Aug 30 2018, 7:52 AM
anna updated the diff for D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

added test for conditional uniform store for AVX512. Rebased over fix in D51313.

Aug 30 2018, 7:51 AM

Aug 29 2018

anna added a comment to D51313: [LV] Fix code gen for conditionally executed uniform loads.

To state what's fixed in latest patch:

  1. Uniform conditional loads on architectures with gather support will have correct code generated. In particular, the cost model (setCostBasedWideningDecision) is fixed. The codeGen for replication and widening recipes do the respective operations ONLY for widening decision != CM_GatherScatter. 1.1 For the recipes which are handled after the widening decision is set, we use the isScalarWithPredication(I, VF) form which is added in the patch.
Aug 29 2018, 11:50 AM
anna updated the diff for D51313: [LV] Fix code gen for conditionally executed uniform loads.

addressed review comments - we make sure that the vectorization also uses the cost decision of gather/scatter
instead of scalarizing.
Also, handles the original bug of generating incorrect code for conditional uniform loads.

Aug 29 2018, 11:23 AM
anna added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Aug 29 2018, 8:23 AM

Aug 28 2018

anna added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Aug 28 2018, 7:02 AM

Aug 27 2018

anna created D51313: [LV] Fix code gen for conditionally executed uniform loads.
Aug 27 2018, 9:19 AM

Aug 24 2018

anna added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

We should also consider doing this, depending on the cost of branch versus masked scatter. For the targets w/o masked scatter, this should be better than masked scatter emulation.

%5 = bitcast <16xi1> %4 to <i16>
%6 = icmp eq <i16> %5, <i16> zero
br <i1> %6 skip fall
fall:
store <i32> %ntrunc, <i32*> %a
br skip
skip:

Yes, that is the improved codegen stated as TODO in the costmodel. Today both the costmodel and the code gen will identify it as a normal predicated store: series of branches and stores. Also, we need to differentiate these 2 cases:

if(b[i] ==k)
 a = ntrunc;

versus

if(b[i] ==k)
  a = ntrunc;
else
  a = m;

The second example should be converted into a vector-select based on b[i] == k and the last element will be extracted out of the vector select and stored into a.
However, if for some reason, it is not converted into a select and just left as 2 predicated stores, it is incorrect to use the same code transformation as we'll do for the first example. For the first example, we see if all values in the conditional is false, and we skip the store. In the second case, we need to store a value, but that value is just decided by the last element of the conditional. Just 2 different forms of predicated stores.

Aug 24 2018, 1:45 PM
anna added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

One more interesting thing I noticed while adding predicated invariant stores to X86 (for -mcpu=skylake-avx512), it supports masked scatter for non-unniform stores.
But we need to add support for uniform stores along with this patch. Today, it just generates incorrect code (no predication whatsover).
For other architectures that do not have these masked intrinsics, we just generate the predicated store by doing an extract and branch on each lane (correct but inefficient and will be avoided unless -force-vector-width=X).

In general, self output dependence is fine to vectorize (whether the store address is uniform or random), as long as (masked) scatter (or scatter emulation) happens from lower elements to higher elements.

I don't think the above comment matters for uniform addresses because a uniform address is invariant. This is what the langref states for scatter intrinsic (https://llvm.org/docs/LangRef.html#id1792):

. The data stored in memory is a vector of any integer, floating-point or pointer data type. Each vector element is stored in an arbitrary memory address. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.

The scatter address is not overlapping for the uniform address. It is the exact same address. This is the code that gets generated for uniform stores on skylake with AVX-512 support once I fixed the bug in this patch (the scatter location is the same address and the stored value is also the same, and the mask is the vector of booleans):
pseudo code:

if (b[i] ==k)
  a = ntrunc; <-- uniform store based on condition above.

IR generated:

vector.ph:
  %broadcast.splatinsert5 = insertelement <16 x i32> undef, i32 %k, i32 0
  %broadcast.splat6 = shufflevector <16 x i32> %broadcast.splatinsert5, <16 x i32> undef, <16 x i32> zeroinitializer <-- vector splat of k
  %broadcast.splatinsert9 = insertelement <16 x i32*> undef, i32* %a, i32 0
  %broadcast.splat10 = shufflevector <16 x i32*> %broadcast.splatinsert9, <16 x i32*> undef, <16 x i32> zeroinitializer <-- vector splat of i32* a.
Aug 24 2018, 12:47 PM
anna planned changes to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

see comment above for masked scatter support.

Aug 24 2018, 10:26 AM
anna added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

One more interesting thing I noticed while adding predicated invariant stores to X86 (for -mcpu=skylake-avx512), it supports masked scatter for non-unniform stores.
But we need to add support for uniform stores along with this patch. Today, it just generates incorrect code (no predication whatsover).
For other architectures that do not have these masked intrinsics, we just generate the predicated store by doing an extract and branch on each lane (correct but inefficient and will be avoided unless -force-vector-width=X).

Aug 24 2018, 10:23 AM
anna added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

okay, to keep this patch true to the original intent and commit message: I'm going to change it to handle just the store of invariant values to invariant addresses (i.e. no support for OR-operands-are-invariant). It will be admittedly a more conservative patch. The ORE message will also reflect correctly the "variant stores to invariant addresses".

Aug 24 2018, 7:00 AM

Aug 23 2018

anna accepted D51181: [LICM] Hoist an invariant_start out of loops if there are no stores executed before it.

LGTM

Aug 23 2018, 2:02 PM
anna added inline comments to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.
Aug 23 2018, 1:28 PM
anna added a comment to D50925: [LICM] Hoist stores of invariant values to invariant addresses out of loops.

There are 3 kinds of tests worth adding:

  1. predicated invariant stores, i.e. the block containing the store itself is predicated and not guaranteed to execute (cannot be handled by LICM)

Covered by existing early exit test.

  1. invariant store value is a phi containing invariant incoming values and the phi result depends on an invariant condition (can be handled by LICM. This patch handles?)

Unclear what you mean here.

Added example:

define void @inv_val_store_to_inv_address_conditional_inv(i32* %a, i64 %n, i32* %b, i32 %k) {
entry:
  %ntrunc = trunc i64 %n to i32
  %cmp = icmp eq i32 %ntrunc, %k
  br label %for.body
Aug 23 2018, 12:45 PM
anna added inline comments to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.
Aug 23 2018, 12:32 PM
anna updated the diff for D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

Added TODOs for better code gen of predicated uniform store and removing redundant loads and stores left behind during
scalarization of these uniform loads and stores.

Aug 23 2018, 9:59 AM
anna updated the diff for D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

address review comments (NFC wrt previous diff). Added one test for varying value stored into invariant address.

Aug 23 2018, 9:39 AM
anna added inline comments to D50925: [LICM] Hoist stores of invariant values to invariant addresses out of loops.
Aug 23 2018, 9:26 AM
anna added a comment to D50925: [LICM] Hoist stores of invariant values to invariant addresses out of loops.

I'll take a closer look at the patch, but at first glance it looks like some cases of invariant store that's preventing the vectorizer (because LICM wasn't hoisting/sinking the store) may be handled by running LICM before vectorization: testcases in D50665 maybe worth trying here - note that those test cases are running LICM before vectorization.

Aug 23 2018, 8:54 AM
anna added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

...

Yes, the stores are scalarized. Identical replicas left as-is. Either passes such as load elimination can remove it, or we can clean it up in LV itself.

  • - by revisiting LoopVectorizationCostModel::collectLoopUniforms()? ;-)

Right now, I just run instcombine after loop vectorization to clean up those unnecessary stores (and test cases make sure there's only one store left). Looks like there are other places in LV which relies on InstCombine as the clean up pass, so it may not be that bad after all? Thoughts?

Yeah, this is a bit embarrassing, but currently invariant loads also get replicated (and cleaned up later), despite trying to avoid doing so by recording IsUniform in VPReplicateRecipe. In general, if it's simpler and more consistent to generate code in a common template and potentially cleanup later, should be ok provided the cost model accounts for it accurately and cleanup is guaranteed, as checked by tests. BTW, LV already has an internal cse(). But in this case, VPlan should reflect the final outcome better, i.e., with a correct IsUniform. This should be taken care of, possibly by a separate patch.

Aug 23 2018, 8:37 AM
anna added inline comments to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.
Aug 23 2018, 8:36 AM

Aug 21 2018

anna added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

...

Yes, the stores are scalarized. Identical replicas left as-is. Either passes such as load elimination can remove it, or we can clean it up in LV itself.

  • - by revisiting LoopVectorizationCostModel::collectLoopUniforms()? ;-)
Aug 21 2018, 11:57 AM
anna added inline comments to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.
Aug 21 2018, 11:53 AM
anna updated the diff for D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

Addressed review comments, updated ORE message and tests, fixed an assertion failure in cost model calculation for uniform store (bug uncovered when running test
under X86 skylake)

Aug 21 2018, 11:50 AM
anna added inline comments to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.
Aug 21 2018, 10:03 AM
anna added a comment to D50778: [LV] Vectorize loops where non-phi instructions used outside loop.

thanks for reviewing these changes Ayal!

Aug 21 2018, 6:17 AM

Aug 20 2018

anna updated the diff for D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

Teach LAA about non-predicated uniform store. Added test case for these cases
to make sure they are not treated as predicated stores.

Aug 20 2018, 11:57 AM
anna added inline comments to D50778: [LV] Vectorize loops where non-phi instructions used outside loop.
Aug 20 2018, 9:11 AM