This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Canonicalize arrays according to base-ptr equivalence class
ClosedPublic

Authored by grosser on Jan 10 2017, 8:02 AM.

Details

Summary
In case two arrays share base pointers in the same invariant load equivalence
class, we canonicalize all memory accesses to the first of these arrays
(according to their order in the equivalence class).

This enables us to optimize kernels such as boost::ublas by ensuring that
different references to the C array are interpreted as accesses to the same
array. Before this change the runtime alias check for ublas would fail, as it
would assume models of the C array with differing (but identically valued) base
pointers would reference distinct regions of memory whereas the referenced
memory regions were indeed identical.

As part of this change we remove most of the MemoryAccess::get*BaseAddr
interface. We removed already all references to get*BaseAddr in previous
commits to ensure that no code relies on matching base pointers between
memory accesses and scop arrays -- except for three remaining uses where we
need the original base pointer. We document for these situations that
MemoryAccess::getOriginalBaseAddr may return a base pointer that is distinct
to the base pointer of the scop array referenced by this memory access.

Event Timeline

grosser updated this revision to Diff 83810.Jan 10 2017, 8:02 AM
grosser retitled this revision from to Unify arrays according to base-ptr equivalence class.
grosser updated this object.
grosser updated this revision to Diff 83922.Jan 10 2017, 11:41 PM

Add test case

Meinersbur edited edge metadata.Jan 11 2017, 1:14 AM

My suggestion is to modify getOrCreateScopArrayInfo which before creating a new ScopArrayInfo, looks up the canonical load from InvEquivClassVMap.

include/polly/ScopInfo.h
1840

incorrect may be not the correct word as there is no guarantee that two base pointers are different anyway. In that sense a program with two base ptrs are ony correct if AA says they are disjoint. This is why we have alias checks.

1842

I'd interpret the future tense (\will\) as something that is going to happen later in the pipeline. This patch, however, removes this possibility. Can you make it clearer that it is something that would happen without this step?

lib/Analysis/ScopInfo.cpp
3692

unify sounds like after this there will be only one indirect array.

I am also not sure whether indirect array is the correct term as there is only one array whose pointer happens to be loaded twice. (The assertion you commented out thinks it is as soon as there is a dynamic load)

Alternative name suggestion: canonicalizeDynamicBasePtrs

3693–3694

We might separate the analysis part and the transformation part of invariant load hoisting, such that unifyIndirectArrays can be activated independently of invariant load hoisting.

3702

Three possible alternatives:

for (auto Pair : enumerate(EqClass.InvariantAccesses))
  if (Pair.Index == 0)
bool IsFirst = true;
for (...) 
  if (IsFirst) {
    IsFirst = False;
  }
if (InvAccess == EqClass.InvariantAccesses.front())
3705–3709

Isn't there CanonicalAccess->getScopArrayInfo(). If it hasn't been computed yet, could we either call unifyIndirectArrays() after it has been computed or compute it earlier?

3716–3724

Duplication. Refactor-out?

3732

This marks the access relation as been replaced as if done by eg. JSONImporter, the original still being accessible. The code generator has a different code path if a new access relation has been set. Is this intended?

grosser marked 7 inline comments as done.Jan 17 2017, 7:35 AM

Hi Michael,

I updated this patch according to your comments and added some good test coverage. With r292213 this patch works as intended and it seems reasonable clean.

There are however two open points:

  1. I did not yet check if the very same thing could be implemented better in getOrCreateScopArrayInfo()
  1. I still want to check if we can set the original access function, instead of setting a new one. However, this will require a closer review of the basepointer handling in Polly.

@gareevroman : This patch should be good enough for your experiments, but it still needs some more thoughts before being able to be committed.

include/polly/ScopInfo.h
1840

I reformulated this and expanded the comment.

1842

Reformulated. Please see if this now reads better.

lib/Analysis/ScopInfo.cpp
3692

I like the new name -> changed.

3693–3694

Actually, this check is not needed. If invariant load hoisting is disabled, the InvariantEquivClasses array will be empty and consequently this function won't do anything. I dropped the check.

3702

Very nice. Thank you for these ideas. I especially like the first one. I changed my code to the first one, but then moved to even another solution: copying the list of accesses and dropping the first one explicitly. This has the minor cost of copying the full access list. However, this cost is likely very small and the resulting code looks very readable.

3705–3709

Not sure what you have in mind here. CanonicalSAI is not the ScopArrayInfo of the CanonicalAccess, but its the SAI which has as basepointer the load in CanonoicalSAI.

I introduced now a function getScopArrayInfoOrNull() similarly to the already existing getScopArrayInfo and just use this instead of the loop above.

3716–3724

I also moved this to getScopArrayInfoOrNull()

3732

This was intended.

If I just reset the original access relation, the following check starts
to fail:

getOriginalScopArrayInfo()->getBasePtr() == BaseAddr'

We should probably review this invariant, but there are several parts of Polly that rely on this behavior. I still need to look at all of them to understand if they behavior is correct:

lib/Analysis/ScopInfo.cpp:    auto *SAI = S.getOrCreateScopArrayInfo(Access->getBaseAddr(), ElementType,
lib/Analysis/ScopInfo.cpp:    PHINode *PHI = cast<PHINode>(Access->getBaseAddr());
lib/Analysis/ScopInfo.cpp:  auto *BaseAddr = SE->getSCEV(MA->getBaseAddr());
lib/Analysis/ScopInfo.cpp:  auto *BaseAddr = SE->getSCEV(MA->getBaseAddr());
lib/Analysis/ScopInfo.cpp:        HasWriteAccess.insert(MA->getBaseAddr());
lib/Analysis/ScopInfo.cpp:    Value *BaseAddr = Access->getBaseAddr();
lib/Analysis/ScopInfo.cpp:      auto &SAI = ScopArrayInfoMap[std::make_pair(Access->getBaseAddr(),
lib/CodeGen/IRBuilder.cpp:        BasePtrs.insert(MA->getBaseAddr());
lib/CodeGen/IslAst.cpp:          ", " + MA->getBaseAddr()->getName().str();
lib/CodeGen/BlockGenerators.cpp:    return getOrCreateScalarAlloca(Access.getBaseAddr());
lib/CodeGen/BlockGenerators.cpp:    return getOrCreatePHIAlloca(Access.getBaseAddr());
lib/CodeGen/BlockGenerators.cpp:    return getOrCreatePHIAlloca(Access.getBaseAddr());
lib/CodeGen/BlockGenerators.cpp:    return getOrCreateScalarAlloca(Access.getBaseAddr());
lib/CodeGen/BlockGenerators.cpp:    BBMap[MA->getBaseAddr()] =
lib/CodeGen/BlockGenerators.cpp:    VectorBlockMap[MA->getBaseAddr()] = VectorVal;
lib/CodeGen/IslNodeBuilder.cpp:      if (BasePtr == MA->getBaseAddr()) {
grosser updated this revision to Diff 84671.Jan 17 2017, 7:35 AM
grosser marked 7 inline comments as done.

Addressed Michael's review comments.

grosser retitled this revision from Unify arrays according to base-ptr equivalence class to [Polly] Unify arrays according to base-ptr equivalence class.Feb 2 2017, 2:18 AM
grosser added a project: Restricted Project.
grosser updated this revision to Diff 87976.Feb 10 2017, 3:36 AM

Update to r294734.

We now also modify the original access relation rather than setting a new
access relation. To make this possible we committed a variety of changes that
removed Polly's dependence on MemoryAccess::getBaseAddr() in previous commits.

grosser edited the summary of this revision. (Show Details)Feb 10 2017, 3:38 AM

Can you rebase to trunk? There is a use of BasePtr left in IslNodeBuilder.cpp that I don't know to resolve (And look like an adhoc-modification of the (Original)BasePtr, something you tried to avoid in this patch by using setAccessRelation)

Is it possible to remove getOriginalBaseAddr() as well? All remaining uses seem to be related to invariant load hoisting, i.e. should be internal to that algorithm, not public API.

lib/Analysis/ScopInfo.cpp
3698–3705

The loop seems to be mainly about finding a canonical ScopArrayInfo, but also modifies the Accesses array of elements that will be skipped afterwards anyway. I think there is no reason to do that:

  • Execution time is worse because you need to copy the array first.
  • More importantly, it makes the code less readable. Now we have two location that effectively do the same thing: Lines 3705 and 3719. It takes some time to understand that.
  • Maintainability suffers as well as the two location need to stay in sync.
  • The probability of bugs increases as well. There are two locations that need to be implemented correctly.
test/Isl/CodeGen/invariant_load_unify_arrays.ll
1 ↗(On Diff #87976)

The file's name still contains unify

5 ↗(On Diff #87976)

How does this check whether %baseA and %baseB have been canonicalized?

I would think we'd check whether both stores will access the canonical base pointer, %baseB.

test/ScopInfo/invariant_load_unify_arrays.ll
1 ↗(On Diff #87976)

The IR looks identical to the one in CodeGen. I understand that one is testing -polly-scops -analyze , the other -polly-codegen -S. However, since they are both testing the same feature, and for the sake of maintainability, wouldn't it be better to put both into the same file?

grosser retitled this revision from [Polly] Unify arrays according to base-ptr equivalence class to [Polly] Canonicalize arrays according to base-ptr equivalence class.May 8 2017, 12:54 AM
grosser updated this revision to Diff 98133.May 8 2017, 12:56 AM
grosser marked 4 inline comments as done and 2 inline comments as done.
grosser edited the summary of this revision. (Show Details)

Rebased and addressed some final comments.

Hi Michael, hi Roman,

this patch works cleanly on trunk. I believe I addressed most, but not all open comments. Michael had some last ideas regarding how some code could be improved stylewise. I thought for a while about it, but could not come up with good ideas immediately. Hence, I just rebased the patch to make sure it runs on trunk. This hopefully makes it easy to read over it again and play with ideas of how to improve readability. While I think the patch is already well readable, I am glad to incorporate good ideas to make it even more readable.

Thanks for your feedback,
Tobias

lib/Analysis/ScopInfo.cpp
3698–3705

I am not fully sure how to improve the code, given that we may not
always choose the very first load as canonical array. The cost of copying the arrays is likely minor, but readability and understandability are strong arguments. I wonder if -- after the refactoring -- you happen to have a good idea of how to simplify this code?

3732

This has been changed. We now directly change the access relation.

test/Isl/CodeGen/invariant_load_unify_arrays.ll
1 ↗(On Diff #87976)

Renamed to invariant_load_canonicalize_array_baseptrs.ll

5 ↗(On Diff #87976)

I added CHECK-NEXT lines.

test/ScopInfo/invariant_load_unify_arrays.ll
1 ↗(On Diff #87976)

Right. The problem with just dropping this test case is that we then have -polly-scops failures reported in test/CodeGen. I think it is a lot nicer to see directly from failing test cases if something in scop modeling or in code generation broke. Obviously this comes at a price of redundant test inputs. It seems to be a question of preference / tradeoffs. I generally tried to follow the rule to run only passes with the name of the test directory in the directory. However, neither was I consistent and I think Johannes also followed a different approach.

Not sure what to do.

Meinersbur accepted this revision.May 9 2017, 6:25 AM

Thanks for the hand-crafted test cases.

include/polly/ScopInfo.h
801–817

Is this change related?

lib/Analysis/ScopInfo.cpp
3698–3705

Suggestion:

static const ScopArrayInfo *findCanonicalArray(Scop *S,
                                               MemoryAccessList &Accesses) {
  for (MemoryAccess *Access : Accesses) {
    const ScopArrayInfo *CanonicalArray = S->getScopArrayInfoOrNull(
        Access->getAccessInstruction(), MemoryKind::Array);
    if (CanonicalArray)
      return CanonicalArray;
  }
  return nullptr;
}

static bool isUsedForIndirectHoistedLoad(Scop *S, const ScopArrayInfo *Array) {
  for (InvariantEquivClassTy &EqClass2 : S->getInvariantAccesses())
    for (MemoryAccess *Access2 : EqClass2.InvariantAccesses)
      if (Access2->getScopArrayInfo() == Array)
        return true;
  return false;
}

static void replaceBasePtrArrays(Scop *S, const ScopArrayInfo *Old,
                                 const ScopArrayInfo *New) {
  for (ScopStmt &Stmt : *S)
    for (MemoryAccess *Access : Stmt) {
      if (Access->getLatestScopArrayInfo() != Old)
        continue;

      isl_id *Id = New->getBasePtrId();
      isl_map *Map = Access->getAccessRelation();
      Map = isl_map_set_tuple_id(Map, isl_dim_out, Id);
      Access->setAccessRelation(Map);
    }
}

void Scop::canonicalizeDynamicBasePtrs() {
  for (InvariantEquivClassTy &EqClass : InvariantEquivClasses) {
    MemoryAccessList &BasePtrAccesses = EqClass.InvariantAccesses;

    const ScopArrayInfo *CanonicalBasePtrSAI =
        findCanonicalArray(this, BasePtrAccesses);
    if (!CanonicalBasePtrSAI)
      continue;

    for (MemoryAccess *BasePtrAccess : BasePtrAccesses) {
      const ScopArrayInfo *BasePtrSAI = getScopArrayInfoOrNull(
          BasePtrAccess->getAccessInstruction(), MemoryKind::Array);
      if (!BasePtrSAI || BasePtrSAI == CanonicalBasePtrSAI ||
          !BasePtrSAI->isCompatibleWith(CanonicalBasePtrSAI))
        continue;

      if (isUsedForIndirectHoistedLoad(this, BasePtrSAI))
        continue;

      replaceBasePtrArrays(this, BasePtrSAI, CanonicalBasePtrSAI);
    }
  }
}
3719–3728

A comment about what this is supposed to do would be nice. I know it is related to test/ScopInfo/invariant_load_canonicalize_array_baseptrs_5.ll, but I think an explanation here would be even more important than in the test case.

3731

There are two local variables with the name Access.

3781

Prefer lookup. operator[] creates an entry every time this return Null.

test/Isl/CodeGen/invariant_load_unify_arrays.ll
5 ↗(On Diff #87976)

Where?

test/ScopInfo/invariant_load_canonicalize_array_baseptrs_4.ll
5 ↗(On Diff #98133)

"coalesced" -> "canonicalized"?

test/ScopInfo/invariant_load_canonicalize_array_baseptrs_5.ll
6 ↗(On Diff #98133)

unify -> canonicalize

7 ↗(On Diff #98133)

Why only "probably"?

12 ↗(On Diff #98133)

Could you mention that %ptr (a hoisted load, itself requiring that %baseA2 is hoisted) is the culprit here?

58–61 ↗(On Diff #98133)

To ensure a use of %v0, %v1, would

store float* undef, float** %ptr
store float* undef, float** %baseA2

be enough? In that case, one might peel one * off the types as use

store float 0.0, float* %ptr
store float 0.0, float* %baseA2
65 ↗(On Diff #98133)

Why is %baseB2 based on %A? Would eg. %baseA3 (or %baseA1, which seems to be missing) be a better name?

This revision is now accepted and ready to land.May 9 2017, 6:25 AM
This revision was automatically updated to reflect the committed changes.
grosser marked 8 inline comments as done.