This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] Simplify DepMatrix to a dependency vector.
AbandonedPublic

Authored by Meinersbur on Oct 19 2022, 11:08 AM.

Details

Reviewers
bmahjour
fhahn
congzhe
Group Reviewers
Restricted Project
Summary

It is not necessary to store all pairwise dependencies when it is sufficient to know which kind of dependencies occured.

This might have some behavioral effects such as when two dependencies can be interchange on their own but combined into a single loop nest they can not. Example:
Dependence 1: [<,=,>] (carried by outer loop)
Dependence 2: [=,<,>] (carried by inner loop)
when combined:
[<=,<=,>]
which would not be considered interchangeable.

In any case, all of our current tests pass after this change and is significantly more efficient (still quadratic in the number of memory instructions, because we have to query DA for each pair)

Diff Detail

Event Timeline

Meinersbur created this revision.Oct 19 2022, 11:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 11:08 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
Meinersbur requested review of this revision.Oct 19 2022, 11:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 11:08 AM
bmahjour requested changes to this revision.Nov 1 2022, 11:21 AM

This would collapse two dimensions of the dependency matrix into one without providing a way to recover the individual dependency vectors, and as you pointed out in the description it could make some interchangeable cases look illegal. Another example is if we have [S, >, = ] and [S, <, =], we can safely interchange the two inner loops (by factoring out the S from the beginning of both vectors), but after merging them we get [S, <>, = ] and now the inner two loops can no longer be interchanged.

Is the size of dependency matrix proving to be problematic for certain workloads? If so, maybe looking at some concrete examples can help solve the problem in a more effective way (eg. considering sparse data structures or some sort of compression/decompression techniques).

On a different note, I think we can make the code more elegant and robust by encapsulating all the dependency matrix analysis inside a class. For example:

class DepMatrix {
public:
  /// Enumerates the possible kind of dependency in a dependence vector.
  enum class DepKind : char {
    NEG, 
    EQ,  
    POS,
    S,
    STAR
  };

  /// Represents a row of the dependency matrix.
  using DepVector = std::vector<DepKind>;
  ...

  /// Update the dependence matrix by exchanging two columns.
  void interchangeColumns(const Loop &A, const Loop &B);
  
  bool isPositive(int row);
  ...
private:
  /// The dependency matrix is represented by a vector of dependency vectors.
  std::vector<DepVector> DM;
};
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
84

why does this need to be reduced so much?

This revision now requires changes to proceed.Nov 1 2022, 11:21 AM
congzhe added inline comments.Nov 1 2022, 12:59 PM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
154

typo?

179

Please correct me if I'm wrong - IIUC Scalar is considered a type of dependence that prevents interchange?

208

If Scalar is considered a type of dependence that prevents interchange, would isAnydirectional() be the following?

return ((F & LT) && (F & GT)) || (F & Scalar);
Meinersbur edited the summary of this revision. (Show Details)Nov 1 2022, 2:42 PM

Just one more question on the summary: when combining the two dependencies [<,=] and [=,<] that generated [<=,<=], I think we could still interchange the combined dependency? [<=,<=] is non-negative and swapping the two elements <= and <= gives a non-negative dependency. Thus we are not converting a positive dependency to a negative dependency, hence not violating the dependency rules for a valid interchange. Perhaps with the current validDepInterchange() we cannot interchange [<=,<=], but like I mentioned in the past meetings I'm rewriting validDepInterchange() to make the legality decisions be purely based on the signs of the dependence vectors before/after interchange. With the rewritten validDepInterchange() I think we can interchange [<=,<=].

My apologies for not being able to post the patch that rewrites validDepInterchange() sooner. We are actively working on it recently and now it seems there is a real need for this patch. We'll post the patch as soon as we can.

Is the size of dependency matrix proving to be problematic for certain workloads?

I am not aware of concrete problems, I mostly felt that it would not be necessary to always go through the entire list of pairwise dependencies and instead could be summarized since which instructions induce the dependencies is irrelevant.

However, due to experimenting with it I now understand that there indeed is a difference than what is done now, although I am not sure whether that situation is sufficiently common.

On a different note, I think we can make the code more elegant and robust by encapsulating all the dependency matrix analysis inside a class. For example:

class DepMatrix {
....

Could we call it something else than a matrix? In a matrix, I'd expect the indices have some relevance, here the outer dimension is an unordered collection (DenseSet<SmallString<4>>? That would also de-duplicate identical dependence vectors and thus less work later. Or just a flat array SmallVector<char> managed by DepMatrix). Also, std::vector of std::vectors is not the most elegant, esp when resizing.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
84

Without this patch, MaxMemInstrCount does not compare to the number of instructions, but the number of instruction pairs (that have a dependency). That is, the constant name and the comment above it are inconsistent.

Since in worst case, there can be a quadratic number of pairs, I just took the square of it.

179

The original code explicitly ignores scalar types (DepMatrix[Row][i] != 'S')

congzhe added inline comments.Nov 1 2022, 6:05 PM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
179

Right, but I remember earlier this year when discussing a few related bugs we reached to a conclusion that it is not correct to ignore scalar types (in other words the current code is wrong)? Anyways I'm rewriting the whole validDepInterchange() so it may not be too big of a concern for now.

Meinersbur edited the summary of this revision. (Show Details)Nov 2 2022, 8:11 AM
Meinersbur abandoned this revision.EditedNov 2 2022, 8:30 AM

Abandon as discussed in the LoopWG call. It coarsens the analysis when compile time wasn't even a problem.