Page MenuHomePhabricator

ebrevnov (Evgeniy)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 23 2018, 4:17 PM (77 w, 5 d)

Recent Activity

Fri, Jul 19

ebrevnov updated the diff for D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.
  1. Implemented check for "best" possible result suggested by Johannes.
  2. Merged test cases into one file and applied update_test_checks.py
Fri, Jul 19, 5:34 AM · Restricted Project
ebrevnov added a comment to D43256: [MBP] Move a latch block with conditional exit and multi predecessors to top of loop.

Is the test case slow down with PGO or not?

No, it's not PGO.

Also do you have branch misprediction perf data? (large slowdowns like this is usually triggered by side effect like this).

I do. There is no any branch/data/instruction miss predictions. LSD works 100% in both cases as well. Even after regression CPI is 0.3. That means we are execution bounded. If I run the benchmark under perf scores change a little and there is 19% difference instead of original 35%. About half of slowdown (8.3%) comes from increased path length due to extra jump instruction. Rest comes from CPI increase by 10%. I'm checking different hypotheses what could cause that but no success so far.

Fri, Jul 19, 3:58 AM · Restricted Project

Thu, Jul 18

ebrevnov added a comment to D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

Out of curiosity, would that still be broken with your example?

It should work. Main challenge I see here is how to perform "isBestPossible" check effectively so it doesn't introduce compile time slow down. Because when this check fails we will have to perform full analysis in addition.

Isn't MemDepResult type Other with OtherType::NonFuncLocal the best case possible? Checking for that one should be cheap, or am I missing something? Would you be willing to give it a try and see what the actual performance is?

Thu, Jul 18, 5:30 AM · Restricted Project
ebrevnov added inline comments to D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.
Thu, Jul 18, 5:24 AM · Restricted Project
ebrevnov added a comment to D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

I'd like to suggest a change of approach here. There's already precedent in the code for dealing with inconsistent results from metadata. See the getNonLocalPointerDepFromBB handling for AATags and Size changes. I don't see a reason why invariant_load can't be handled in the same manner in the same place. Doing that consolidates all the cache invalidation in one place, and has the benefit of caching all accesses with as much precision as possible. (i.e. an invariant_load would be cached as invariant until the first normal load from that location was found, and then cached as normal loads)

Thanks for the suggestion. Will take a look.

Thu, Jul 18, 5:16 AM · Restricted Project
ebrevnov added a comment to D43256: [MBP] Move a latch block with conditional exit and multi predecessors to top of loop.

This change causes 35% regression on very simple loop which is hot part of our internal micro benchmark.
This loop takes 99% of total execution time and has reasonably large number of iterations.
All measurements were performed on Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz.

Thu, Jul 18, 3:54 AM · Restricted Project

Wed, Jul 17

ebrevnov added a comment to D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

I'd like to suggest a change of approach here. There's already precedent in the code for dealing with inconsistent results from metadata. See the getNonLocalPointerDepFromBB handling for AATags and Size changes. I don't see a reason why invariant_load can't be handled in the same manner in the same place. Doing that consolidates all the cache invalidation in one place, and has the benefit of caching all accesses with as much precision as possible. (i.e. an invariant_load would be cached as invariant until the first normal load from that location was found, and then cached as normal loads)

Wed, Jul 17, 10:05 PM · Restricted Project
ebrevnov added a comment to D64898: Standardize how we check for legality of frame realignment.

I was independently looking to this part and came up with essentially same change. The only difference is that I chose MachineFrameInfo as source of "truth". Not really sure which one is better though...

Wed, Jul 17, 9:58 PM · Restricted Project

Tue, Jul 16

ebrevnov added a comment to D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

Just found small issue with the patch. Will update once testing is good.

Tue, Jul 16, 1:38 AM · Restricted Project

Mon, Jul 15

ebrevnov updated the diff for D63981: [LV] Avoid building interleaved group in presence of WAW dependency.

Updating patch to use implementation proposed by Ayal.

Mon, Jul 15, 4:43 AM · Restricted Project
ebrevnov added a comment to D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

I don't have commit rights. I need someone's assistance with putting the fix in. Thanks!

Mon, Jul 15, 4:27 AM · Restricted Project
ebrevnov added a comment to D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

Why do we need both changes, reading the cache and writing it, assuming I understand the code correctly?
I would have assumed not caching "inv-load" results would fix the problem. And I can see we do not want to reuse
non-inv-load results for inv-loads. However, I would argue one could always check for a cached result and use
it if the result is "good". Do I miss something here?

Yes, you are reading the change correctly. The thing is that both results (one for inv-load case another for non inv-load) are "good". They are just different. Each result should be used for corresponding case only and not for the other. In other words using result from non-inv case for inv-case is not correct as well. Does it make sense?

No, but I'm tired. I would have assumed that I can drop the inv-load metadata without semantic change. If that is the case, I should be able to use the non-inv result if it is cached and already good enough. That would mean we can keep the lookup but do only return the cached result if it is "the best". Maybe my problem is that I do not know the lattice of potential values.

Either way, is it possible and useful to test for the number of (un)cached queries?

I believe you assumption regarding 'invariant.load' metadata not changing the semantics is not accurate. According to spec, If such instruction is executed there are number of assumptions the optimizer may do regarding the program. For example it's safe to assume that there are no stores before and no stores changing value after such instruction. Otherwise behavior is undefined. In general case there is no way to know at compile time if invariant load will be exectuted or not. That means you can't blindly propagate results of dependence analysis for invariant load to "normal" loads. And visa-verse results of dependence analysis for "normal" load doesn't take semantic imposed by "invariant.load" into account and would be sub-optimal if used for invariant load.

First, I already agreed, you cannot propagate results from inv. loads to non-inv. loads.

I still think you can drop inv. metadata, you did only say it has more restrictions but that is not a problem. If you think dropping the inv. metdata is a problem, please give me an example.

Now the "sub-optimal" part of propagating non-inv. load results to inv. loads. I also agree that this is in general not what you want. However, I was trying to make the point that hat there is one exception: the best possible result. If we have the best possible result already cached for a non-inv. load, we could reuse it for inv. loads as well because they only have less, not more dependences. If you disagree, an example would be good.

Looks we are totally aligned here. Honestly I don't understand why we can't track this thread down to the end. I agree potentially we may reuse result of non-inv load for inv load if we know it's the best. But supporting this case would be about the same complexity as extending current caching for non-inv as well as inv loads. If I simply drop the first guard around the code which does cache lookup we will end up with inv load not removed on the following example.

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"
target triple = "x86_64-unknown-linux-gnu"

declare void @llvm.memset.p0i8.i8(i8*, i8, i32, i1)
declare void @foo(i8*)

define i8 @test(i1 %cmp, i8 *%p) {
entry:

%res1 = load i8, i8* %p
call void @foo(i8 *%p)
br i1 %cmp, label %b2, label %b1

b1:

%res2 = load i8, i8* %p
%res3 = add i8 %res1, %res2
br label %alive

b2:

%v = load i8, i8* %p, !invariant.load !1
%res.dead = add i8 %v, %res1
br label %alive

alive:

%res.phi = phi i8 [%res3, %b1], [%res.dead, %b2]
ret i8 %res.phi

}

!1 = !{}

That means we should either follow current approach and completely exclude non inv loads from caching or implement full support for caching inv.load.

I don't want to block this further, so LGTM.


For the record, I did not want to simply remove the first guard but I thought we could do something like this:

Before:

// If we have a cached entry, and it is non-dirty, use it as the value for
// this dependency.
if (ExistingResult && !ExistingResult->getResult().isDirty()) {
  ++NumCacheNonLocalPtr;
  return ExistingResult->getResult();
}

After:

// If we have a cached entry, and it is non-dirty, use it as the value for
// this dependency. If this is an invariant load, use the existing result
// only if it is the best possible, otherwise we forget we checked for a
// cached value.
if (ExistingResult && !ExistingResult->getResult().isDirty()) {
  if (!isInvariantLoad || isBestPossible(ExistingResult)) {
    ++NumCacheNonLocalPtr;
    return ExistingResult->getResult();
  }
  ExistingResult = nullptr;
}

Out of curiosity, would that still be broken with your example?

Mon, Jul 15, 4:27 AM · Restricted Project
ebrevnov updated the diff for D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

One more test case added.

Mon, Jul 15, 4:18 AM · Restricted Project
ebrevnov updated the diff for D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

Updating comment as requested.

Mon, Jul 15, 3:03 AM · Restricted Project

Fri, Jul 12

ebrevnov updated subscribers of D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

Why do we need both changes, reading the cache and writing it, assuming I understand the code correctly?
I would have assumed not caching "inv-load" results would fix the problem. And I can see we do not want to reuse
non-inv-load results for inv-loads. However, I would argue one could always check for a cached result and use
it if the result is "good". Do I miss something here?

Yes, you are reading the change correctly. The thing is that both results (one for inv-load case another for non inv-load) are "good". They are just different. Each result should be used for corresponding case only and not for the other. In other words using result from non-inv case for inv-case is not correct as well. Does it make sense?

No, but I'm tired. I would have assumed that I can drop the inv-load metadata without semantic change. If that is the case, I should be able to use the non-inv result if it is cached and already good enough. That would mean we can keep the lookup but do only return the cached result if it is "the best". Maybe my problem is that I do not know the lattice of potential values.

Either way, is it possible and useful to test for the number of (un)cached queries?

I believe you assumption regarding 'invariant.load' metadata not changing the semantics is not accurate. According to spec, If such instruction is executed there are number of assumptions the optimizer may do regarding the program. For example it's safe to assume that there are no stores before and no stores changing value after such instruction. Otherwise behavior is undefined. In general case there is no way to know at compile time if invariant load will be exectuted or not. That means you can't blindly propagate results of dependence analysis for invariant load to "normal" loads. And visa-verse results of dependence analysis for "normal" load doesn't take semantic imposed by "invariant.load" into account and would be sub-optimal if used for invariant load.

First, I already agreed, you cannot propagate results from inv. loads to non-inv. loads.

I still think you can drop inv. metadata, you did only say it has more restrictions but that is not a problem. If you think dropping the inv. metdata is a problem, please give me an example.

Now the "sub-optimal" part of propagating non-inv. load results to inv. loads. I also agree that this is in general not what you want. However, I was trying to make the point that hat there is one exception: the best possible result. If we have the best possible result already cached for a non-inv. load, we could reuse it for inv. loads as well because they only have less, not more dependences. If you disagree, an example would be good.

Looks we are totally aligned here. Honestly I don't understand why we can't track this thread down to the end. I agree potentially we may reuse result of non-inv load for inv load if we know it's the best. But supporting this case would be about the same complexity as extending current caching for non-inv as well as inv loads. If I simply drop the first guard around the code which does cache lookup we will end up with inv load not removed on the following example.

Fri, Jul 12, 1:41 AM · Restricted Project

Thu, Jul 11

ebrevnov added a comment to D63981: [LV] Avoid building interleaved group in presence of WAW dependency.

From the SUMMARY above:

I don't think we actually need to do any additional book keeping (as suggested in D57180). Current algorithm already examines all pairs and in case of dependence will invalidate interleaving group. I think the real issue is WAW dependence (on the same iteration) is not collected by LoopAccessInfo in the first place and as a result canReorderMemAccessesForInterleavedGroups returns wrong answer. The fix is to explicitly check for WAW dependence in canReorderMemAccessesForInterleavedGroups.

Nice catch! Agree about the real issue being a deficiency of LoopAccessInfo, and that it's better to treat all interleave-group-preventing dependencies the same, w/o dedicating special treatment for same-iteration WAW dependences. Suggest to fix this real issue at its core.

LoopAccessInfo does collect the 6 same-iteration WAR Forward dependences between each of the 3 loads and their 2 subsequent same-address stores, in the testcase below, but these do not interfere with interleave-grouping the stores. So it's perfectly capable of collecting same-iteration WAW Forward dependences as well. (LoopAccessInfo should also already collect any same-iteration RAW dependences, although such cases would hopefully be optimized away.)

The reason why LoopAccessInfo does not collect same-iteration WAW dependences is due to the way areDepsSafe() traverses pairs of DependentAccesses, held in EquivalenceClasses<MemAccessInfo> which clusters stores-to-same-address together (and load-to-same-address together, but separately from the stores). One way of fixing this may be:

diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 36bd9a8b7ea..a3f40826861 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -1641,13 +1641,21 @@ bool MemoryDepChecker::areDepsSafe(DepCandidates &AccessSets,
     // Check every access pair.
     while (AI != AE) {
       Visited.insert(*AI);
-      EquivalenceClasses<MemAccessInfo>::member_iterator OI = std::next(AI);
+      bool AIIsWrite = AI->getInt();
+      // Check loads only against next equivalent class, but stores also against
+      // other stores in the same equivalence class - to the same address.
+      EquivalenceClasses<MemAccessInfo>::member_iterator OI =
+          (AIIsWrite ? AI : std::next(AI));
       while (OI != AE) {
         // Check every accessing instruction pair in program order.
         for (std::vector<unsigned>::iterator I1 = Accesses[*AI].begin(),
              I1E = Accesses[*AI].end(); I1 != I1E; ++I1)
-          for (std::vector<unsigned>::iterator I2 = Accesses[*OI].begin(),
-               I2E = Accesses[*OI].end(); I2 != I2E; ++I2) {
+          // Scan all accesses of another equivalence class, but only the next
+          // accesses of the same equivalent class.
+          for (std::vector<unsigned>::iterator
+                   I2 = (OI == AI ? std::next(I1) : Accesses[*OI].begin()),
+                   I2E = (OI == AI ? I1E : Accesses[*OI].end());
+               I2 != I2E; ++I2) {
             auto A = std::make_pair(&*AI, *I1);
             auto B = std::make_pair(&*OI, *I2);

(could possibly save dependences by recording only those between pairs of adjacent stores to same address, relying on transitivity)

Thu, Jul 11, 4:45 AM · Restricted Project
ebrevnov added inline comments to D63981: [LV] Avoid building interleaved group in presence of WAW dependency.
Thu, Jul 11, 4:28 AM · Restricted Project

Wed, Jul 10

ebrevnov added a comment to D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

Why do we need both changes, reading the cache and writing it, assuming I understand the code correctly?
I would have assumed not caching "inv-load" results would fix the problem. And I can see we do not want to reuse
non-inv-load results for inv-loads. However, I would argue one could always check for a cached result and use
it if the result is "good". Do I miss something here?

Yes, you are reading the change correctly. The thing is that both results (one for inv-load case another for non inv-load) are "good". They are just different. Each result should be used for corresponding case only and not for the other. In other words using result from non-inv case for inv-case is not correct as well. Does it make sense?

No, but I'm tired. I would have assumed that I can drop the inv-load metadata without semantic change. If that is the case, I should be able to use the non-inv result if it is cached and already good enough. That would mean we can keep the lookup but do only return the cached result if it is "the best". Maybe my problem is that I do not know the lattice of potential values.

Either way, is it possible and useful to test for the number of (un)cached queries?

Wed, Jul 10, 3:13 AM · Restricted Project

Tue, Jul 9

ebrevnov added inline comments to D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.
Tue, Jul 9, 9:10 PM · Restricted Project
ebrevnov added a comment to D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

Why do we need both changes, reading the cache and writing it, assuming I understand the code correctly?
I would have assumed not caching "inv-load" results would fix the problem. And I can see we do not want to reuse
non-inv-load results for inv-loads. However, I would argue one could always check for a cached result and use
it if the result is "good". Do I miss something here?

Tue, Jul 9, 9:10 PM · Restricted Project
ebrevnov updated the diff for D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

Adding unit test.

Tue, Jul 9, 5:36 AM · Restricted Project
ebrevnov added a comment to D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

There should be a test.

Tue, Jul 9, 5:22 AM · Restricted Project
ebrevnov retitled D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151 from [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. (PR42151) to [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.
Tue, Jul 9, 5:14 AM · Restricted Project
ebrevnov updated the diff for D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.

Remove accidental changes.

Tue, Jul 9, 5:08 AM · Restricted Project
ebrevnov created D64405: [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151.
Tue, Jul 9, 4:58 AM · Restricted Project

Mon, Jul 8

ebrevnov accepted D63439: Wordsmith the semantics of invariant.load.

LGTM

Mon, Jul 8, 9:05 PM · Restricted Project

Sun, Jul 7

ebrevnov added a comment to D62563: Fix incorrect expand of non-linear addrecs.

There is a https://bugs.llvm.org/show_bug.cgi?id=42384 failing due same problem.

Sun, Jul 7, 9:31 PM · Restricted Project

Wed, Jul 3

ebrevnov abandoned D64126: # Enter a commit message. # # Changes: # # lib/Analysis/ScalarEvolution.cpp [Scalar Evolution][Test] Disable distribution of truncation..
Wed, Jul 3, 2:57 AM · Restricted Project
ebrevnov created D64126: # Enter a commit message. # # Changes: # # lib/Analysis/ScalarEvolution.cpp [Scalar Evolution][Test] Disable distribution of truncation..
Wed, Jul 3, 2:52 AM · Restricted Project

Sun, Jun 30

ebrevnov removed a parent revision for D63981: [LV] Avoid building interleaved group in presence of WAW dependency: D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency.
Sun, Jun 30, 11:57 PM · Restricted Project
ebrevnov added a parent revision for D63981: [LV] Avoid building interleaved group in presence of WAW dependency: D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency.
Sun, Jun 30, 11:57 PM · Restricted Project
ebrevnov removed a child revision for D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency: D63981: [LV] Avoid building interleaved group in presence of WAW dependency.
Sun, Jun 30, 11:57 PM · Restricted Project
ebrevnov added a child revision for D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency: D63981: [LV] Avoid building interleaved group in presence of WAW dependency.
Sun, Jun 30, 11:57 PM · Restricted Project
ebrevnov removed a reviewer for D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency: ebrevnov.
Sun, Jun 30, 11:52 PM · Restricted Project
ebrevnov updated the summary of D63981: [LV] Avoid building interleaved group in presence of WAW dependency.
Sun, Jun 30, 11:51 PM · Restricted Project
ebrevnov created D63981: [LV] Avoid building interleaved group in presence of WAW dependency.
Sun, Jun 30, 11:36 PM · Restricted Project

Fri, Jun 28

ebrevnov added a reviewer for D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency: ebrevnov.
Fri, Jun 28, 5:06 AM · Restricted Project