This is an archive of the discontinued LLVM Phabricator instance.

[LV] Avoid adding into interleaved group in presence of WAW dependency
AbandonedPublic

Authored by anna on Jan 24 2019, 12:29 PM.

Details

Summary

Fix for miscompile when we ignore the store ordering when adding stores
into the interleaved store group.
Fix for PR40291.
Test case shows we now add stores into the interleaved store group only
if we prove there is no WAW dependency.
This is a cleaned up fix suggested by @hsaito.

Diff Detail

Event Timeline

anna created this revision.Jan 24 2019, 12:29 PM
fhahn added a comment.Jan 29 2019, 1:18 PM

I plan on having a look later this week. I am a little worried that the checks in-line here are already quite complex and I would like to have a think if that could be improved in some way.

dorit added a subscriber: dorit.Jan 31 2019, 11:19 PM

I plan on having a look later this week. I am a little worried that the checks in-line here are already quite complex and I would like to have a think if that could be improved in some way.

I agree; The algorithm makes sure that we visit everything between B and A, including C, before we visit A; so we have a chance to identify the (potentially) interfering store C before we reach A; This is what allows the algorithm to only compare the pairs (A,B) without having each time to also scan everything in between.

So I think the bug is that when we visited C, and found that it could be inserted into B's group dependence-wise, but wasn't inserted due to other reasons, we should have either:

  1. Invalidated the group (which is over aggressive but better than wrong code)
  2. Recorded in B's Group the index where C could be inserted, to "burn" that index from allowing some other instruction A to become a group member at that index; so when we reach A we see its spot is taken. (I think this will have the same effect as the proposed patch but without the extra scan.)
  3. Same as above but instead of bailing out on grouping A with B, make sure that C is also sunk down with that group (as I think Hideki mentioned in the PR) (maybe a future improvement).
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2019, 11:19 PM

I plan on having a look later this week. I am a little worried that the checks in-line here are already quite complex and I would like to have a think if that could be improved in some way.

I agree; The algorithm makes sure that we visit everything between B and A, including C, before we visit A; so we have a chance to identify the (potentially) interfering store C before we reach A; This is what allows the algorithm to only compare the pairs (A,B) without having each time to also scan everything in between.

So I think the bug is that when we visited C, and found that it could be inserted into B's group dependence-wise, but wasn't inserted due to other reasons, we should have either:

  1. Invalidated the group (which is over aggressive but better than wrong code)
  2. Recorded in B's Group the index where C could be inserted, to "burn" that index from allowing some other instruction A to become a group member at that index; so when we reach A we see its spot is taken. (I think this will have the same effect as the proposed patch but without the extra scan.)
  3. Same as above but instead of bailing out on grouping A with B, make sure that C is also sunk down with that group (as I think Hideki mentioned in the PR) (maybe a future improvement).

If you don't like the current approach, I agree 2) achieves the same thing, with extra bookkeeping (or extra state in the existing bookkeeping). I think 1) is too conservative. Even if C is next to B, B's group can still be extended to the other direction. 3) should be done separately from the bug fix. Anyway, do we ever deal with so many loads/stores for this efficiency to avoid extra scanning to actually matter? I'm just curious.

fhahn added a comment.Feb 1 2019, 5:55 AM

! In D57180#1380108, @hsaito wrote:
Anyway, do we ever deal with so many loads/stores for this efficiency to avoid extra scanning to actually matter? I'm just curious.

I'd say usually the number would be quite small and there would be no significant compile time problem. But it is not impossible to hit a worst case scenario, given the wide range of frontends and source code that gets thrown at LLVM. Also, compile time is a key metric for some users and unnecessarily wasting a bit here and there leads to a death by a thousand paper cuts :)

fhahn added a comment.Feb 1 2019, 5:57 AM

(I am saying that after recently investigating a few cases of hour-long compile times on user code that were caused by unnecessary scanning)

Ayal added a comment.Feb 1 2019, 8:04 AM

I plan on having a look later this week. I am a little worried that the checks in-line here are already quite complex and I would like to have a think if that could be improved in some way.

I agree; The algorithm makes sure that we visit everything between B and A, including C, before we visit A; so we have a chance to identify the (potentially) interfering store C before we reach A; This is what allows the algorithm to only compare the pairs (A,B) without having each time to also scan everything in between.

So I think the bug is that when we visited C, and found that it could be inserted into B's group dependence-wise, but wasn't inserted due to other reasons, we should have either:

  1. Invalidated the group (which is over aggressive but better than wrong code)
  2. Recorded in B's Group the index where C could be inserted, to "burn" that index from allowing some other instruction A to become a group member at that index; so when we reach A we see its spot is taken. (I think this will have the same effect as the proposed patch but without the extra scan.)
  3. Same as above but instead of bailing out on grouping A with B, make sure that C is also sunk down with that group (as I think Hideki mentioned in the PR) (maybe a future improvement).

If you don't like the current approach, I agree 2) achieves the same thing, with extra bookkeeping (or extra state in the existing bookkeeping). I think 1) is too conservative. Even if C is next to B, B's group can still be extended to the other direction. 3) should be done separately from the bug fix. Anyway, do we ever deal with so many loads/stores for this efficiency to avoid extra scanning to actually matter? I'm just curious.

Rather than over aggressive or too conservative, 1) seems to match the current behavior which forbids store groups with gaps; extending in the other direction will also break the vector WAW dependence, right? 2) could potentially "burn" the index with minimal extra bookkeeping or state by inserting a nullptr in its place; in any case, it's worth doing only when/after introducing support for store groups with gaps.

hsaito added a comment.Feb 1 2019, 9:45 AM

(I am saying that after recently investigating a few cases of hour-long compile times on user code that were caused by unnecessary scanning)

Fair enough. If the size based threshold is not in place in the interleaved access optimization, that would be a good defensive improvement. Can be separate from this bug fix, though.

hsaito added a comment.Feb 1 2019, 9:48 AM

Rather than over aggressive or too conservative, 1) seems to match the current behavior which forbids store groups with gaps; extending in the other direction will also break the vector WAW dependence, right?

OK. Hitting the gap case, yes. Else, the other direction should also hit WAW dep.

in any case, it's worth doing only when/after introducing support for store groups with gaps.

Agree.

Ayal added a comment.Feb 1 2019, 1:14 PM

Also, regarding

  1. Same as above but instead of bailing out on grouping A with B, make sure that C is also sunk down with that group (as I think Hideki mentioned in the PR) (maybe a future improvement).

consider eliminating such WAW dependencies by sinking the stores and folding them into one, producing a single interleave group (by a future, separate patch).

(The issue here is somewhat reminiscent of the WAW dependence caused by multiple invariant stores, as in https://reviews.llvm.org/D54538, but there the dependence is both inside and across loop iterations.)

Ayal added a comment.Jun 28 2019, 6:08 AM

Here's is an old draft along the lines discussed above, probably deserves some updating or clean ups:

Index: include/llvm/Analysis/VectorUtils.h
===================================================================
--- include/llvm/Analysis/VectorUtils.h	(revision 352559)
+++ include/llvm/Analysis/VectorUtils.h	(working copy)
@@ -303,6 +303,35 @@
     return true;
   }
 
+  /// Check if a new member \p Instr can be inserted with index \p Index and
+  /// alignment \p NewAlign. The index is related to the leader and it could be
+  /// negative if it is the new leader.
+  ///
+  /// \returns false if the instruction doesn't belong to the group.
+  bool insertableMember(InstTy *Instr, int Index, unsigned NewAlign) const {
+    assert(NewAlign && "The new member's alignment should be non-zero");
+
+    int Key = Index + SmallestKey;
+
+    // Skip if there is already a member with the same index.
+    if (Members.find(Key) != Members.end())
+      return false;
+
+    if (Key > LargestKey) {
+      // The largest index is always less than the interleave factor.
+      if (Index >= static_cast<int>(Factor))
+        return false;
+
+    } else if (Key < SmallestKey) {
+      // The largest index is always less than the interleave factor.
+      if (LargestKey - Key >= static_cast<int>(Factor))
+        return false;
+
+    }
+
+    return true;
+  }
+
   /// Get the member with the given index \p Index
   ///
   /// \returns nullptr if contains no such member.
Index: lib/Analysis/VectorUtils.cpp
===================================================================
--- lib/Analysis/VectorUtils.cpp	(revision 352559)
+++ lib/Analysis/VectorUtils.cpp	(working copy)
@@ -936,8 +936,23 @@
       BasicBlock *BlockA = A->getParent();  
       BasicBlock *BlockB = B->getParent();  
       if ((isPredicated(BlockA) || isPredicated(BlockB)) &&
-          (!EnablePredicatedInterleavedMemAccesses || BlockA != BlockB))
+          (!EnablePredicatedInterleavedMemAccesses || BlockA != BlockB)) {
+        // If A could be inserted into B's group but is not, prevent a potential
+        // output dependent unpredicated store from taking its place.
+        // TODO: mark IndexA "taken", instead of breaking the group, when store
+        // groups with gaps are supported.
+        if (Group) {
+          int IndexA =
+              Group->getIndex(B) + DistanceToB / static_cast<int64_t>(DesB.Size);
+          if (Group->insertableMember(A, IndexA, DesA.Align)) {
+            LLVM_DEBUG(dbgs() << "LV: Detected candidate:" << *A << '\n'
+                              << "    preventing another for interleave group"
+                              << " with" << *B << '\n');
+            break;
+          }
+        }
         continue;
+      }
 
       // The index of A is the index of B plus A's distance to B in multiples
       // of the size.

Original miscompile is expected to be fixed by:

commit 71b98e0841b13cc9848327e69f531efd1e294592
Author: Hideki Saito <hideki.saito@intel.com>
Date: Fri Aug 2 06:31:50 2019 +0000

[LV] Avoid building interleaved group in presence of WAW dependency

@anna Abandon this now that D63981 has landed at rL367654 ?

fhahn requested changes to this revision.Nov 28 2019, 8:51 AM

I think this patch was superseded by D63981, which landed a while ago. Marking as requiring changes, to remove it from the review queue. @anna, it would be great if you could take a look and abandon the revision, unless it is still relevant

This revision now requires changes to proceed.Nov 28 2019, 8:51 AM
anna added a comment.Dec 4 2019, 11:11 AM

hi, thanks guys. Sorry for the abnormally late response (was on mat leave). Yes, looks like this can be abandoned.

anna abandoned this revision.Dec 4 2019, 11:11 AM