Page MenuHomePhabricator

[ScopHelper] Provide support for recognising collective invariant loads

Authored by cs15btech11044 on Jun 11 2018, 7:35 AM.



This patch aims to provide support for detecting load patterns which are collectively invariant but right now isHoistableLoad() is checking each load instruction individually which cannot detect the load pattern as a whole.

Diff Detail


Event Timeline

cs15btech11044 created this revision.Jun 11 2018, 7:35 AM

[serious] Could you add a test-case?

If I understand this correctly, this matches a specific pattern that mark a load as invariant if its address is also invariant.

[serious] As implemented, it is not stable since it depends on the order of in which LoadInsts are added to RequiredILS/isHoistableLoad being called. I.e. if isHoistableLoad is called on an address which has not yet been added to RequiredILS, it will be rejected.

[serious] It is also very specific to the address being computed using a single GEP instruction. Consider using ScalarEvolution to see that the address does not depend on an iteration counter or instruction inside the loop.

394 ↗(On Diff #150750)

[serious] The doxygen comment should also cover the added parameter.

450 ↗(On Diff #150750)

[style] Please name variable according to the LLVM coding standard.

Consider adding a comment about what this code is doing.

451 ↗(On Diff #150750)

[style] if (auto *prev_inst = dyn_cast<GetElementPtrInst>(Ptr))

455 ↗(On Diff #150750)

[style] Use a C++ foreach or a standard library function.

As I mentioned in person: Instead of detecting this specific pattern, would it be possible to treat this as a fixpoint iteration? I.e., maintain a set of undecided address sources and keep shrinking it until only the non-invariant ones remein.

cs15btech11044 marked an inline comment as done.Jun 21 2018, 5:25 AM
cs15btech11044 edited the summary of this revision. (Show Details)

Updates made in this patch:

  1. Implemented @philip.pfaffe 's fixed-point iteration idea to decide the invariant loads, which also solves the instability issue of the previous approach.
  2. Added Doxygen comment for the newly introduced parameter.
  3. Renamed the temporary variables according to LLVM standards.
  4. Used foreach for iterating over the invariant load set.
cs15btech11044 marked 2 inline comments as done.Jun 21 2018, 6:19 AM
philip.pfaffe added inline comments.Jun 21 2018, 6:43 AM
1168 ↗(On Diff #152269)

Why do you break on the first invariant load? Why not mark all hoistables in one go?

cs15btech11044 added inline comments.Jun 21 2018, 7:04 AM
1168 ↗(On Diff #152269)

As soon as it is decided that a load is invariant, it is put in a set and we restart this process. This would ensure invariant loads which are collective(i.e a load depending on a previous invariant load instruction), if they appear out-of-order during iteration, we still detect it as invariant.

philip.pfaffe added inline comments.Jun 22 2018, 2:52 AM
1168 ↗(On Diff #152269)

The early break causes the loop to restart every time an invariant load is detected, even if loads are _not_ collective.

450 ↗(On Diff #150750)

Variable names should be ProudCamelCase.

philip.pfaffe added inline comments.Jun 22 2018, 3:14 AM
455 ↗(On Diff #150750)

Why are you looping at all? This is a RequiredILS is a set, just call O(1) count().

Updates made to this diff:

  1. Addressed Phillip's concerns.
  2. Some more variable name changes.
  3. Minor updates to the fixed-point iteration approach.
cs15btech11044 marked 5 inline comments as done.Jun 22 2018, 4:12 AM

This is starting to look good! Only cosmetic nits left.

The testcase is still missing though.

1170 ↗(On Diff #152452)

This should be removed.

446 ↗(On Diff #152452)

RequiredILS should be const&.

More bikeshed: Why 'Required'? I'd consider 'KnownInvariantLoads' to be more apt, what do you think?

bollu added inline comments.Jun 22 2018, 4:59 AM
1155 ↗(On Diff #152452)

consider const unsigned int? Also, I prefer a name like CurrentVariantSize and CurrentInvariantSize, since this indicates that the sizes will be changing soon after.

Updates in this patch:

  1. Added a test case
  2. Removed // break
  3. Added const to variables which do not change.
bollu added inline comments.Jun 22 2018, 6:55 AM
28 ↗(On Diff #152465)

We can get rid of many of these type definitions, please remove the unnecesasary ones.

47 ↗(On Diff #152465)

We can probably rename _array_DefaultRectangularArr_2_int64_t_F__real64_int64_t to something shorter like array_ty

51 ↗(On Diff #152465)

Can we give the variables sane names? If it's too much work, don't bother

81 ↗(On Diff #152465)

Please remove un-necessary metadata

cs15btech11044 marked 4 inline comments as done.Jun 22 2018, 7:48 AM
cs15btech11044 added inline comments.
446 ↗(On Diff #152452)

I agree. It sounds apter here.

cs15btech11044 marked an inline comment as done.

Refined the test case according to the comments.

bollu added a subscriber: grosser.Jun 22 2018, 9:04 AM

Also, please run the file through instnamer. That way, we will have "named" instructions and not "numbered" instructions. This is useful to modify the testcase in the future (it is less brittle): "numbered" instructions must be sequential, while named instructions need to be.

Hence, we try to keep all of our testcases with named instructions.

1 ↗(On Diff #152478)

Please remove unnecessary options. Simply opt %s -polly-scops -polly-invariant-load-hositing -analyze | FileCheck %s should be enough?

That way, we can actually pattern match for the invariant loads.

Please make the CHECK line check for the invariant loads, like this

Pinging @grosser, @Meinersbur and @philip.pfaffe if there is a nicer way to test for this?

37 ↗(On Diff #152478)

I think tbaa metadata can also be removed.

68 ↗(On Diff #152478)

Do we actually need all of this metadata? I believe we can safely remove all of this

cs15btech11044 added inline comments.Jun 22 2018, 9:08 AM
68 ↗(On Diff #152478)

Does that apply for !0 and !1 as well?

Addressed all the comments related to the test case.

There is still some clutter left, such as irrelevant metadata, a function declaration, and some computations, such as the value stored to memory.

Addressed Philip's concerns and removed unnecessary declarations and instructions.

Removed unnecessary BB's in the test case.

philip.pfaffe accepted this revision.Jun 25 2018, 4:49 AM

Thanks, LGTM!

This revision is now accepted and ready to land.Jun 25 2018, 4:49 AM

Added comments in ScopHelper.cpp explaining the purpose of the new code.

Meinersbur requested changes to this revision.Jun 25 2018, 12:17 PM

I tried to make a test case for what I tried to explain during the phone call:

The different address levels are different alias sets due to the -scoped-noalias metadata. These have to be MayAlias sets otherwise there is no reason to check for aliasing (this are what the dummy loads are for). Next, the last load (%tmp9) has to be processed first, before the load its address depends on (%tmp5) is known to be invariant. I put the loads in reverse order in the source to be iterated over in the order reverse of how it is executed.

Unfortunately this does not work, the block iteration comes from RegionInfo, which iterates in depth-first order. This should guarantee that dominators (where the invariant address is loaded) are visited before the block that loads from that address. This is, unless we also support invariant PHINodes, there is probably no counterexample. Still, depending in the check order is not robust, but no pressing issue.

I found a correctness issue which I marked with [serious]. Because of this, I mark the patch as "request changes" to overwrite the "Accepted" status.

1154 ↗(On Diff #152726)

Could you add comment(s) for this code as well?

449–450 ↗(On Diff #152726)

polly-check-format will complain about two consecutive empty lines.

451–453 ↗(On Diff #152726)

Could you reformulate this? It sounds like the previous load being invariant is a necessary condition. Also, could you add a TODO about avoiding the pattern matching.


An LoadInst is hoistable if the address it is loading from is also invariant; in this case: another invariant load (whether that address is also not written to has to be checked separately).
TODO: The only checks for a LoadInst->GetElementPtrInst->LoadInst pattern generated by the Chapel frontend, but generally this applies for any chain of instruction that does not also depend on any induction variable.

[serious] While writing this, I noticed that the pattern checked is not sufficient. The index argument could be an induction variable, meaning that the address loaded from changes in every iteration, i.e. the load is not invariant.

This revision now requires changes to proceed.Jun 25 2018, 12:17 PM
cs15btech11044 set the repository for this revision to rPLO Polly.

Strengthened the condition for pattern matching and added the required comments.

philip.pfaffe added inline comments.Jun 26 2018, 3:05 AM
462 ↗(On Diff #152856)

Don't use almost-always-auto please.

465 ↗(On Diff #152856)

No this doesn't work.

Consider this:

struct A {
int a[M][N];
struct A a[L];

for (int i = 0; i < N; ++i)
  for (int j = 0; j < M; ++i)
      for (int k = 0; k < L; ++k)
          a[k].a[j][i] = 8;

You can't make assumptions about the order of GEP indices wrt. surrounding loops.

Improved the patch, which now analyzes every GEP index for all the nested loops.

philip.pfaffe added inline comments.Jun 26 2018, 4:18 AM
465 ↗(On Diff #152866)

This can be greatly simplified by calling Loop::isLoopInvariant for Val on the top-level-loop of the region, which you get from Region::outermostLoopInRegion. That makes this a one-liner.

Simplified the GEP index checks by just checking invariant property at the outermost loop.

OK, this looks good! You haven't addressed Micheal's testcase yet for the invariant cascade crossing alias set boundaries. For the sake of progress IMO it's okay to leave a TODO and address it in a followup change. @Meinersbur what do you think?

453 ↗(On Diff #152871)

Unmatched parenthesis

For the sake of progress IMO it's okay to leave a TODO and address it in a followup change. @Meinersbur what do you think?


461 ↗(On Diff #152871)

[style] Do not evaluate something invariant in the loop exit condition

[serious] Off-by-one-error: getNumIndices() returns getNumOperands() - 1, so you are skipping the last element here.

[style] Could use a foreach-loop: for (const Use &Val : llvm::drop_begin(this->operands(), 1)), but this could require D48598.

467–468 ↗(On Diff #152871)

[style] Instead of using a flag, this pattern of "find element that matches a condition" can be simplified by outlining the loop into a hasVariantIndex function that return true; if such an element is found and return false; after the loop. Alterntively, std::any_of with a lambda could be useful.

(I don't require this change for me accept this patch)

461 ↗(On Diff #152871)

If the foreach loop requires D48598, I would wait till it gets committed to LLVM repository. This change could be taken up in the tentative follow-up patch.

Outlined the loop as per @Meinersbur 's suggestion and corrected the loop header responsible for checking invariant nature of the GEP indices.

All tests passed for me. I also tried your test case with -polly-use-runtime-alias-checks=false -polly-ignore-aliasing and it still works with codegen (I had doubts that if your additional code is not executing, the chain of address loads would not end up in RequiredILs).

Before the patch is good to be landed, please address the [serious] remark.

1146 ↗(On Diff #153012)

[nit] Unrelated change

445–446 ↗(On Diff #153012)

[remark] hasVariantIndex is only used in this file, so a declaration static bool hasVariantIndex(... (or a lambda) without anything added to the header file would have been enough.

452–454 ↗(On Diff #153012)

[nit] We usually do not enclose a single statement following an if in braces.

1 ↗(On Diff #153012)

[serious] %loadPolly missing

[style] Use < %s

[suggestion] 2>&1 is unnecessary

22 ↗(On Diff #153012)

[style] The target triple. ModuleID and source_filename lines can be removed; The target triple in particular can cause the test to fail if LLVM_TARGETS_TO_BUILD does not include X86.

Used for-each loop with llvm::drop_begin() in hasVariantIndex().
Removed hasVariantIndex()definition from header file and made it static.
Added %loadPolly in the test case and modified the run statement.

Meinersbur accepted this revision.Jun 28 2018, 10:08 AM

Thanks for all the changes.

@philip.pfaffe Since Sahil does not have commit rights (yet), do you want to land it?


This revision is now accepted and ready to land.Jun 28 2018, 10:08 AM
This revision was automatically updated to reflect the committed changes.