In case two arrays share base pointers in the same invariant load equivalence class, we canonicalize all memory accesses to the first of these arrays (according to their order in the equivalence class). This enables us to optimize kernels such as boost::ublas by ensuring that different references to the C array are interpreted as accesses to the same array. Before this change the runtime alias check for ublas would fail, as it would assume models of the C array with differing (but identically valued) base pointers would reference distinct regions of memory whereas the referenced memory regions were indeed identical. As part of this change we remove most of the MemoryAccess::get*BaseAddr interface. We removed already all references to get*BaseAddr in previous commits to ensure that no code relies on matching base pointers between memory accesses and scop arrays -- except for three remaining uses where we need the original base pointer. We document for these situations that MemoryAccess::getOriginalBaseAddr may return a base pointer that is distinct to the base pointer of the scop array referenced by this memory access.
Details
- Reviewers
sebpop Meinersbur • zinob gareevroman pollydev huihuiz efriedma jdoerfert - Commits
- rGf3adab4c20c6: [Polly] Canonicalize arrays according to base-ptr equivalence class
rPLO302636: [Polly] Canonicalize arrays according to base-ptr equivalence class
rL302636: [Polly] Canonicalize arrays according to base-ptr equivalence class
Diff Detail
- Build Status
Buildable 2777 Build 2777: arc lint + arc unit
Event Timeline
My suggestion is to modify getOrCreateScopArrayInfo which before creating a new ScopArrayInfo, looks up the canonical load from InvEquivClassVMap.
include/polly/ScopInfo.h | ||
---|---|---|
1840 | incorrect may be not the correct word as there is no guarantee that two base pointers are different anyway. In that sense a program with two base ptrs are ony correct if AA says they are disjoint. This is why we have alias checks. | |
1842 | I'd interpret the future tense (\will\) as something that is going to happen later in the pipeline. This patch, however, removes this possibility. Can you make it clearer that it is something that would happen without this step? | |
lib/Analysis/ScopInfo.cpp | ||
3692 | unify sounds like after this there will be only one indirect array. I am also not sure whether indirect array is the correct term as there is only one array whose pointer happens to be loaded twice. (The assertion you commented out thinks it is as soon as there is a dynamic load) Alternative name suggestion: canonicalizeDynamicBasePtrs | |
3693–3694 | We might separate the analysis part and the transformation part of invariant load hoisting, such that unifyIndirectArrays can be activated independently of invariant load hoisting. | |
3702 | Three possible alternatives: for (auto Pair : enumerate(EqClass.InvariantAccesses)) if (Pair.Index == 0) bool IsFirst = true; for (...) if (IsFirst) { IsFirst = False; } if (InvAccess == EqClass.InvariantAccesses.front()) | |
3705–3709 | Isn't there CanonicalAccess->getScopArrayInfo(). If it hasn't been computed yet, could we either call unifyIndirectArrays() after it has been computed or compute it earlier? | |
3716–3724 | Duplication. Refactor-out? | |
3732 | This marks the access relation as been replaced as if done by eg. JSONImporter, the original still being accessible. The code generator has a different code path if a new access relation has been set. Is this intended? |
Hi Michael,
I updated this patch according to your comments and added some good test coverage. With r292213 this patch works as intended and it seems reasonable clean.
There are however two open points:
- I did not yet check if the very same thing could be implemented better in getOrCreateScopArrayInfo()
- I still want to check if we can set the original access function, instead of setting a new one. However, this will require a closer review of the basepointer handling in Polly.
@gareevroman : This patch should be good enough for your experiments, but it still needs some more thoughts before being able to be committed.
include/polly/ScopInfo.h | ||
---|---|---|
1840 | I reformulated this and expanded the comment. | |
1842 | Reformulated. Please see if this now reads better. | |
lib/Analysis/ScopInfo.cpp | ||
3692 | I like the new name -> changed. | |
3693–3694 | Actually, this check is not needed. If invariant load hoisting is disabled, the InvariantEquivClasses array will be empty and consequently this function won't do anything. I dropped the check. | |
3702 | Very nice. Thank you for these ideas. I especially like the first one. I changed my code to the first one, but then moved to even another solution: copying the list of accesses and dropping the first one explicitly. This has the minor cost of copying the full access list. However, this cost is likely very small and the resulting code looks very readable. | |
3705–3709 | Not sure what you have in mind here. CanonicalSAI is not the ScopArrayInfo of the CanonicalAccess, but its the SAI which has as basepointer the load in CanonoicalSAI. I introduced now a function getScopArrayInfoOrNull() similarly to the already existing getScopArrayInfo and just use this instead of the loop above. | |
3716–3724 | I also moved this to getScopArrayInfoOrNull() | |
3732 | This was intended. If I just reset the original access relation, the following check starts getOriginalScopArrayInfo()->getBasePtr() == BaseAddr' We should probably review this invariant, but there are several parts of Polly that rely on this behavior. I still need to look at all of them to understand if they behavior is correct: lib/Analysis/ScopInfo.cpp: auto *SAI = S.getOrCreateScopArrayInfo(Access->getBaseAddr(), ElementType, lib/Analysis/ScopInfo.cpp: PHINode *PHI = cast<PHINode>(Access->getBaseAddr()); lib/Analysis/ScopInfo.cpp: auto *BaseAddr = SE->getSCEV(MA->getBaseAddr()); lib/Analysis/ScopInfo.cpp: auto *BaseAddr = SE->getSCEV(MA->getBaseAddr()); lib/Analysis/ScopInfo.cpp: HasWriteAccess.insert(MA->getBaseAddr()); lib/Analysis/ScopInfo.cpp: Value *BaseAddr = Access->getBaseAddr(); lib/Analysis/ScopInfo.cpp: auto &SAI = ScopArrayInfoMap[std::make_pair(Access->getBaseAddr(), lib/CodeGen/IRBuilder.cpp: BasePtrs.insert(MA->getBaseAddr()); lib/CodeGen/IslAst.cpp: ", " + MA->getBaseAddr()->getName().str(); lib/CodeGen/BlockGenerators.cpp: return getOrCreateScalarAlloca(Access.getBaseAddr()); lib/CodeGen/BlockGenerators.cpp: return getOrCreatePHIAlloca(Access.getBaseAddr()); lib/CodeGen/BlockGenerators.cpp: return getOrCreatePHIAlloca(Access.getBaseAddr()); lib/CodeGen/BlockGenerators.cpp: return getOrCreateScalarAlloca(Access.getBaseAddr()); lib/CodeGen/BlockGenerators.cpp: BBMap[MA->getBaseAddr()] = lib/CodeGen/BlockGenerators.cpp: VectorBlockMap[MA->getBaseAddr()] = VectorVal; lib/CodeGen/IslNodeBuilder.cpp: if (BasePtr == MA->getBaseAddr()) { |
Update to r294734.
We now also modify the original access relation rather than setting a new
access relation. To make this possible we committed a variety of changes that
removed Polly's dependence on MemoryAccess::getBaseAddr() in previous commits.
Can you rebase to trunk? There is a use of BasePtr left in IslNodeBuilder.cpp that I don't know to resolve (And look like an adhoc-modification of the (Original)BasePtr, something you tried to avoid in this patch by using setAccessRelation)
Is it possible to remove getOriginalBaseAddr() as well? All remaining uses seem to be related to invariant load hoisting, i.e. should be internal to that algorithm, not public API.
lib/Analysis/ScopInfo.cpp | ||
---|---|---|
3698–3705 | The loop seems to be mainly about finding a canonical ScopArrayInfo, but also modifies the Accesses array of elements that will be skipped afterwards anyway. I think there is no reason to do that:
| |
test/Isl/CodeGen/invariant_load_unify_arrays.ll | ||
1 ↗ | (On Diff #87976) | The file's name still contains unify |
5 ↗ | (On Diff #87976) | How does this check whether %baseA and %baseB have been canonicalized? I would think we'd check whether both stores will access the canonical base pointer, %baseB. |
test/ScopInfo/invariant_load_unify_arrays.ll | ||
1 ↗ | (On Diff #87976) | The IR looks identical to the one in CodeGen. I understand that one is testing -polly-scops -analyze , the other -polly-codegen -S. However, since they are both testing the same feature, and for the sake of maintainability, wouldn't it be better to put both into the same file? |
Hi Michael, hi Roman,
this patch works cleanly on trunk. I believe I addressed most, but not all open comments. Michael had some last ideas regarding how some code could be improved stylewise. I thought for a while about it, but could not come up with good ideas immediately. Hence, I just rebased the patch to make sure it runs on trunk. This hopefully makes it easy to read over it again and play with ideas of how to improve readability. While I think the patch is already well readable, I am glad to incorporate good ideas to make it even more readable.
Thanks for your feedback,
Tobias
lib/Analysis/ScopInfo.cpp | ||
---|---|---|
3698–3705 | I am not fully sure how to improve the code, given that we may not | |
3732 | This has been changed. We now directly change the access relation. | |
test/Isl/CodeGen/invariant_load_unify_arrays.ll | ||
1 ↗ | (On Diff #87976) | Renamed to invariant_load_canonicalize_array_baseptrs.ll |
5 ↗ | (On Diff #87976) | I added CHECK-NEXT lines. |
test/ScopInfo/invariant_load_unify_arrays.ll | ||
1 ↗ | (On Diff #87976) | Right. The problem with just dropping this test case is that we then have -polly-scops failures reported in test/CodeGen. I think it is a lot nicer to see directly from failing test cases if something in scop modeling or in code generation broke. Obviously this comes at a price of redundant test inputs. It seems to be a question of preference / tradeoffs. I generally tried to follow the rule to run only passes with the name of the test directory in the directory. However, neither was I consistent and I think Johannes also followed a different approach. Not sure what to do. |
Thanks for the hand-crafted test cases.
include/polly/ScopInfo.h | ||
---|---|---|
801–817 | Is this change related? | |
lib/Analysis/ScopInfo.cpp | ||
3698–3705 | Suggestion: static const ScopArrayInfo *findCanonicalArray(Scop *S, MemoryAccessList &Accesses) { for (MemoryAccess *Access : Accesses) { const ScopArrayInfo *CanonicalArray = S->getScopArrayInfoOrNull( Access->getAccessInstruction(), MemoryKind::Array); if (CanonicalArray) return CanonicalArray; } return nullptr; } static bool isUsedForIndirectHoistedLoad(Scop *S, const ScopArrayInfo *Array) { for (InvariantEquivClassTy &EqClass2 : S->getInvariantAccesses()) for (MemoryAccess *Access2 : EqClass2.InvariantAccesses) if (Access2->getScopArrayInfo() == Array) return true; return false; } static void replaceBasePtrArrays(Scop *S, const ScopArrayInfo *Old, const ScopArrayInfo *New) { for (ScopStmt &Stmt : *S) for (MemoryAccess *Access : Stmt) { if (Access->getLatestScopArrayInfo() != Old) continue; isl_id *Id = New->getBasePtrId(); isl_map *Map = Access->getAccessRelation(); Map = isl_map_set_tuple_id(Map, isl_dim_out, Id); Access->setAccessRelation(Map); } } void Scop::canonicalizeDynamicBasePtrs() { for (InvariantEquivClassTy &EqClass : InvariantEquivClasses) { MemoryAccessList &BasePtrAccesses = EqClass.InvariantAccesses; const ScopArrayInfo *CanonicalBasePtrSAI = findCanonicalArray(this, BasePtrAccesses); if (!CanonicalBasePtrSAI) continue; for (MemoryAccess *BasePtrAccess : BasePtrAccesses) { const ScopArrayInfo *BasePtrSAI = getScopArrayInfoOrNull( BasePtrAccess->getAccessInstruction(), MemoryKind::Array); if (!BasePtrSAI || BasePtrSAI == CanonicalBasePtrSAI || !BasePtrSAI->isCompatibleWith(CanonicalBasePtrSAI)) continue; if (isUsedForIndirectHoistedLoad(this, BasePtrSAI)) continue; replaceBasePtrArrays(this, BasePtrSAI, CanonicalBasePtrSAI); } } } | |
3719–3728 | A comment about what this is supposed to do would be nice. I know it is related to test/ScopInfo/invariant_load_canonicalize_array_baseptrs_5.ll, but I think an explanation here would be even more important than in the test case. | |
3731 | There are two local variables with the name Access. | |
3781 | Prefer lookup. operator[] creates an entry every time this return Null. | |
test/Isl/CodeGen/invariant_load_unify_arrays.ll | ||
5 ↗ | (On Diff #87976) | Where? |
test/ScopInfo/invariant_load_canonicalize_array_baseptrs_4.ll | ||
5 ↗ | (On Diff #98133) | "coalesced" -> "canonicalized"? |
test/ScopInfo/invariant_load_canonicalize_array_baseptrs_5.ll | ||
6 ↗ | (On Diff #98133) | unify -> canonicalize |
7 ↗ | (On Diff #98133) | Why only "probably"? |
12 ↗ | (On Diff #98133) | Could you mention that %ptr (a hoisted load, itself requiring that %baseA2 is hoisted) is the culprit here? |
58–61 ↗ | (On Diff #98133) | To ensure a use of %v0, %v1, would store float* undef, float** %ptr store float* undef, float** %baseA2 be enough? In that case, one might peel one * off the types as use store float 0.0, float* %ptr store float 0.0, float* %baseA2 |
65 ↗ | (On Diff #98133) | Why is %baseB2 based on %A? Would eg. %baseA3 (or %baseA1, which seems to be missing) be a better name? |
Is this change related?