This patch aims to provide support for detecting load patterns which are collectively invariant but right now isHoistableLoad() is checking each load instruction individually which cannot detect the load pattern as a whole.
Details
Diff Detail
Event Timeline
[serious] Could you add a test-case?
If I understand this correctly, this matches a specific pattern that mark a load as invariant if its address is also invariant.
[serious] As implemented, it is not stable since it depends on the order of in which LoadInsts are added to RequiredILS/isHoistableLoad being called. I.e. if isHoistableLoad is called on an address which has not yet been added to RequiredILS, it will be rejected.
[serious] It is also very specific to the address being computed using a single GEP instruction. Consider using ScalarEvolution to see that the address does not depend on an iteration counter or instruction inside the loop.
include/polly/Support/ScopHelper.h | ||
---|---|---|
394 | [serious] The doxygen comment should also cover the added parameter. | |
lib/Support/ScopHelper.cpp | ||
450 | [style] Please name variable according to the LLVM coding standard. Consider adding a comment about what this code is doing. | |
451 | [style] if (auto *prev_inst = dyn_cast<GetElementPtrInst>(Ptr)) | |
455 | [style] Use a C++ foreach or a standard library function. |
As I mentioned in person: Instead of detecting this specific pattern, would it be possible to treat this as a fixpoint iteration? I.e., maintain a set of undecided address sources and keep shrinking it until only the non-invariant ones remein.
Updates made in this patch:
- Implemented @philip.pfaffe 's fixed-point iteration idea to decide the invariant loads, which also solves the instability issue of the previous approach.
- Added Doxygen comment for the newly introduced parameter.
- Renamed the temporary variables according to LLVM standards.
- Used foreach for iterating over the invariant load set.
lib/Analysis/ScopDetection.cpp | ||
---|---|---|
1173 | Why do you break on the first invariant load? Why not mark all hoistables in one go? |
lib/Analysis/ScopDetection.cpp | ||
---|---|---|
1173 | As soon as it is decided that a load is invariant, it is put in a set and we restart this process. This would ensure invariant loads which are collective(i.e a load depending on a previous invariant load instruction), if they appear out-of-order during iteration, we still detect it as invariant. |
lib/Support/ScopHelper.cpp | ||
---|---|---|
455 | Why are you looping at all? This is a RequiredILS is a set, just call O(1) count(). |
Updates made to this diff:
- Addressed Phillip's concerns.
- Some more variable name changes.
- Minor updates to the fixed-point iteration approach.
This is starting to look good! Only cosmetic nits left.
The testcase is still missing though.
lib/Analysis/ScopDetection.cpp | ||
---|---|---|
1175 | This should be removed. | |
lib/Support/ScopHelper.cpp | ||
446 | RequiredILS should be const&. More bikeshed: Why 'Required'? I'd consider 'KnownInvariantLoads' to be more apt, what do you think? |
lib/Analysis/ScopDetection.cpp | ||
---|---|---|
1160 | consider const unsigned int? Also, I prefer a name like CurrentVariantSize and CurrentInvariantSize, since this indicates that the sizes will be changing soon after. |
Updates in this patch:
- Added a test case
- Removed // break
- Added const to variables which do not change.
test/ScopDetect/collective_invariant_loads.ll | ||
---|---|---|
28 ↗ | (On Diff #152465) | We can get rid of many of these type definitions, please remove the unnecesasary ones. |
47 ↗ | (On Diff #152465) | We can probably rename _array_DefaultRectangularArr_2_int64_t_F__real64_int64_t to something shorter like array_ty |
51 ↗ | (On Diff #152465) | Can we give the variables sane names? If it's too much work, don't bother |
81 ↗ | (On Diff #152465) | Please remove un-necessary metadata |
lib/Support/ScopHelper.cpp | ||
---|---|---|
446 | I agree. It sounds apter here. |
Also, please run the file through instnamer. That way, we will have "named" instructions and not "numbered" instructions. This is useful to modify the testcase in the future (it is less brittle): "numbered" instructions must be sequential, while named instructions need to be.
Hence, we try to keep all of our testcases with named instructions.
test/ScopDetect/collective_invariant_loads.ll | ||
---|---|---|
1 ↗ | (On Diff #152478) | Please remove unnecessary options. Simply opt %s -polly-scops -polly-invariant-load-hositing -analyze | FileCheck %s should be enough? That way, we can actually pattern match for the invariant loads. Please make the CHECK line check for the invariant loads, like this Pinging @grosser, @Meinersbur and @philip.pfaffe if there is a nicer way to test for this? |
37 ↗ | (On Diff #152478) | I think tbaa metadata can also be removed. |
68 ↗ | (On Diff #152478) | Do we actually need all of this metadata? I believe we can safely remove all of this |
test/ScopDetect/collective_invariant_loads.ll | ||
---|---|---|
68 ↗ | (On Diff #152478) | Does that apply for !0 and !1 as well? |
There is still some clutter left, such as irrelevant metadata, a function declaration, and some computations, such as the value stored to memory.
I tried to make a test case for what I tried to explain during the phone call:
The different address levels are different alias sets due to the -scoped-noalias metadata. These have to be MayAlias sets otherwise there is no reason to check for aliasing (this are what the dummy loads are for). Next, the last load (%tmp9) has to be processed first, before the load its address depends on (%tmp5) is known to be invariant. I put the loads in reverse order in the source to be iterated over in the order reverse of how it is executed.
Unfortunately this does not work, the block iteration comes from RegionInfo, which iterates in depth-first order. This should guarantee that dominators (where the invariant address is loaded) are visited before the block that loads from that address. This is, unless we also support invariant PHINodes, there is probably no counterexample. Still, depending in the check order is not robust, but no pressing issue.
I found a correctness issue which I marked with [serious]. Because of this, I mark the patch as "request changes" to overwrite the "Accepted" status.
lib/Analysis/ScopDetection.cpp | ||
---|---|---|
1159 | Could you add comment(s) for this code as well? | |
lib/Support/ScopHelper.cpp | ||
449–450 | polly-check-format will complain about two consecutive empty lines. | |
451–453 | Could you reformulate this? It sounds like the previous load being invariant is a necessary condition. Also, could you add a TODO about avoiding the pattern matching. Suggestion:
[serious] While writing this, I noticed that the pattern checked is not sufficient. The index argument could be an induction variable, meaning that the address loaded from changes in every iteration, i.e. the load is not invariant. |
lib/Support/ScopHelper.cpp | ||
---|---|---|
462 | Don't use almost-always-auto please. | |
465 | No this doesn't work. Consider this: struct A { int a[M][N]; } struct A a[L]; for (int i = 0; i < N; ++i) for (int j = 0; j < M; ++i) for (int k = 0; k < L; ++k) a[k].a[j][i] = 8; You can't make assumptions about the order of GEP indices wrt. surrounding loops. |
lib/Support/ScopHelper.cpp | ||
---|---|---|
465 | This can be greatly simplified by calling Loop::isLoopInvariant for Val on the top-level-loop of the region, which you get from Region::outermostLoopInRegion. That makes this a one-liner. |
Simplified the GEP index checks by just checking invariant property at the outermost loop.
OK, this looks good! You haven't addressed Micheal's testcase yet for the invariant cascade crossing alias set boundaries. For the sake of progress IMO it's okay to leave a TODO and address it in a followup change. @Meinersbur what do you think?
lib/Support/ScopHelper.cpp | ||
---|---|---|
452 | Unmatched parenthesis |
Agreed.
lib/Support/ScopHelper.cpp | ||
---|---|---|
460 | [style] Do not evaluate something invariant in the loop exit condition [serious] Off-by-one-error: getNumIndices() returns getNumOperands() - 1, so you are skipping the last element here. [style] Could use a foreach-loop: for (const Use &Val : llvm::drop_begin(this->operands(), 1)), but this could require D48598. | |
466–467 | [style] Instead of using a flag, this pattern of "find element that matches a condition" can be simplified by outlining the loop into a hasVariantIndex function that return true; if such an element is found and return false; after the loop. Alterntively, std::any_of with a lambda could be useful. (I don't require this change for me accept this patch) |
lib/Support/ScopHelper.cpp | ||
---|---|---|
460 | If the foreach loop requires D48598, I would wait till it gets committed to LLVM repository. This change could be taken up in the tentative follow-up patch. |
Outlined the loop as per @Meinersbur 's suggestion and corrected the loop header responsible for checking invariant nature of the GEP indices.
All tests passed for me. I also tried your test case with -polly-use-runtime-alias-checks=false -polly-ignore-aliasing and it still works with codegen (I had doubts that if your additional code is not executing, the chain of address loads would not end up in RequiredILs).
Before the patch is good to be landed, please address the [serious] remark.
lib/Analysis/ScopDetection.cpp | ||
---|---|---|
1146 | [nit] Unrelated change | |
lib/Support/ScopHelper.cpp | ||
444–445 | [remark] hasVariantIndex is only used in this file, so a declaration static bool hasVariantIndex(... (or a lambda) without anything added to the header file would have been enough. | |
451–453 | [nit] We usually do not enclose a single statement following an if in braces. | |
test/ScopDetect/collective_invariant_loads.ll | ||
1 ↗ | (On Diff #153012) | [serious] %loadPolly missing [style] [[ https://www.llvm.org/docs/TestingGuide.html#fragile-tests | Use < %s ]] [suggestion] 2>&1 is unnecessary |
22 ↗ | (On Diff #153012) | [style] The target triple. ModuleID and source_filename lines can be removed; The target triple in particular can cause the test to fail if LLVM_TARGETS_TO_BUILD does not include X86. |
Used for-each loop with llvm::drop_begin() in hasVariantIndex().
Removed hasVariantIndex()definition from header file and made it static.
Added %loadPolly in the test case and modified the run statement.
Thanks for all the changes.
@philip.pfaffe Since Sahil does not have commit rights (yet), do you want to land it?
Michael
[serious] The doxygen comment should also cover the added parameter.