Page MenuHomePhabricator

[OpenMPIRBuilder] Implement tileLoops.

Authored by Meinersbur on Dec 9 2020, 2:46 PM.



The tileLoops method implements the code generation part of the tile directive introduced in OpenMP 5.1. It takes a list of loops forming a loop nest, tiles it, and returns the CanonicalLoopInfo representing the generated loops.

The implementation takes n CanonicalLoopInfos, n tile size Values and returns 2*n new CanonicalLoopInfos. The input CanonicalLoopInfos are invalidated and BBs not reused in the new loop nest removed from the function.

In a modified version of D76342, I was able to correctly compile and execute a tiled loop nest.

Diff Detail

Event Timeline

Meinersbur created this revision.Dec 9 2020, 2:46 PM
Meinersbur requested review of this revision.Dec 9 2020, 2:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2020, 2:46 PM
Herald added a subscriber: sstefan1. · View Herald Transcript
Meinersbur added inline comments.Dec 9 2020, 3:07 PM

This essentially replaces the CanonicalLoopInfo::eraseFromParent introduced in D90830. Erasing the BBs in the scope of only a single loop is unfortunately not enough to discover the BBs that have become orphaned after some of them are reused. It is possible, but error-prone for tileLoops to itself mark BBs that it reuses while constructing the new loop nest and delete the others. However, doing a fixipoint computation here is easier and I expect more methods consuming loop nests (such as collapseLoops) to face the same problem.


I discovered quite late that handling of such in-between code is necessary (the TileNestedLoopsWithBounds test inserts them to compute the loop counter value from the canonical induction variable), previously assuming that perfectly nested loops do not have such thing.

The approach of creating completely new floor and tile loops has the advantage that I can give it nice floor<i> and tile<i> names and can generate them essentially with the same outer-to-inner mechanism.

In contrast, reusing the original CanonicalLoopNest as new tile loops would given us the handling of in-between code for free, in the manner mentioned in the TODO (The in-between code is already in-between the tile loops).


This change is necessary for createCanonicalLoop to be able to insert a loop at the end of a BasicBlock that does not yet have a terminator.

Thanks for the patch! Some NIT's related to refactoring/moving inline. I've tried to highlight some parts, if you like the idea in general, there might be more opportunities to reduce the size of this patch(thus capturing the most crucial part tileLoop).


NIT: Would it make sense to do this as a separate maybe (NFCI) patch ? FWIW it will reduce the size of this patch.


colleck... Typo ?


NIT: these helper functions can also be moved, as mentioned in above comment,

SouraVX added inline comments.Dec 9 2020, 11:36 PM

Please help me understand this: Why DebugLoc ? Why not LocationDescription ? Rest of the API's consistently uses LocationDescription.

Meinersbur marked an inline comment as done.Dec 10 2020, 10:12 PM
Meinersbur added inline comments.

I can extract out the createLoopSkeleton, but I also think this patch is manageable to review. Note that there asl API changes such as introducing the ComputeIP and Name parameters that are not NFCI, unless you consider public API changes as non-functional in which case this entire patch is NFC.


I don't think it makes sense to introduce the helpers to another patch that do not also add a use for them. Please also consider that each additional patch adds churn.


tileLoops does not need an insert location, it transforms an existing loop where it currently is. DebugLoc is for the additional instructions inserted into the existing loop due to the tiling (such as those that compute the number of iterations).

Meinersbur edited the summary of this revision. (Show Details)
  • Extract changes not directly related to tileLoop into D93088
  • Address pre-merge checks
Meinersbur updated this revision to Diff 311117.EditedDec 10 2020, 11:13 PM
  • Avoid pre-merge check warning about using auto for an iterator.

I went through almost all of it and left some comments.


Nit: \param, Loop*s* to tile.


Any reason to use a std vector here? Would it make sense to extend Loops instead (after changing the type)? The original loop vector is still "contained" if we append the new ones.


Nit: assertion message. We might want to pass a debug loc as well and set it, or use the builder.


Nit: auto *Pred or BasicBlock*


Even if we make blocks dead, do we really want to clean up here?


later you compare it against int, I think, maybe pick int consistently?


So far, I think, you could just replace FloorEpilogues with FloorCount. Do we expect that they are different later?


In case we want to change the signature: At this point, I think, you only use Loops to delete some code. Maybe we shouldn't, or even if we do we could still reuse Loops to return the new ones?.


Should we create a helper for this loop and the loop above?


No plain auto please. I usually prefer structs with named members over pairs but that is just a suggestion.

ftynse added inline comments.Dec 18 2020, 9:08 AM

Nit: I don't see CL and Sizes in the argument list


What happens when tile sizes don't divide loop sizes perfectly?


What happens if they have side effects?

Meinersbur marked 13 inline comments as done.
  • Address reviews

The method returns new CanonicalLoopInfo* instances, the one passed by \p Loops are invalidated.

I don't like mixing in- and out- parameters that have different meaning just because it might fit the signature. Think of void labs(long&) instead of long labs(long).

std::vector seems like a suitable type for a public API and return values. In contrast to SmallVector, one does not need to encode an arbitrary small-size and needs to be moved for the caller anyway.

However, if you have some other preference, I would like to hear it.


The Verifier complains about (1.) orphaned PHI nodes, (2.) def-use dominance and (3.) degenerate blocks, even in unreachable BBs . Removing all unused blocks seemed to be the cleanest solution.

Quickly running unittest removeUnusedBlocksFromParent currently does not fail. I guess 1. was resolved by Succ->removePredecessor` in redirectTo (which I needed to add because there were other situations that required will-formed PHI nodes), 2. already has an exception in DominatorTree::dominates and 3. currently no BBs without terminators are left behind.

If you don't mind the occasional orphaned BB, we might remove the entire removeUnusedBlocksFromParent.


The interface (and OpenMP 5.1) specifies that loop nest being tiled are perfectly nested, i.e. we can assume there are no instructions that have side-effects.


They were different at some point, missed that during clean-up.


I left a comment about this here:

I don't really want to bake into the interface which CanonicalLoopInfo* objects are reused, that's an implementation detail.

jdoerfert accepted this revision.Dec 20 2020, 11:52 AM

I'm fine with the std::vector. I'd prefer to avoid complex cleanup but I'm not insisting. I left one comment to address below, other than that LGTM.

@ftynse Would be great if you approve as soon as you're OK with this.


It's TileSizes now.


I think that case is handled by the code and the comment below states it too.


I would not mind generating some unused blocks. The algorithm we use here to delete them seems a bit much given that we will basically remove them for free later.


Leftover commented code and please add a short description here and below.

This revision is now accepted and ready to land.Dec 20 2020, 11:52 AM
Meinersbur marked 3 inline comments as done.
  • Rebase
  • Address remaining comments

Going to commit after no additional comment from @ftynse.

This revision was landed with ongoing or failed builds.Jan 23 2021, 5:39 PM
This revision was automatically updated to reflect the committed changes.
RKSimon added inline comments.

@Meinersbur There's a signed/unsigned warning breaking a Werror bot - please can you fix ?