This is an archive of the discontinued LLVM Phabricator instance.

RFC: LoopEditor, a high-level loop transform toolkit
Needs ReviewPublic

Authored by jmolloy on Jul 27 2015, 10:03 AM.

Details

Summary

This is a prototype API designed to perform or to enable high-level loop transformations. It is my hope that it could be used throughout LLVM's vectorizers and loop transforms as the go-to API for editing loops.

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy updated this revision to Diff 30702.Jul 27 2015, 10:03 AM
jmolloy retitled this revision from to RFC: LoopEditor, a high-level loop transform toolkit.
jmolloy updated this object.
jmolloy set the repository for this revision to rL LLVM.
jmolloy updated this revision to Diff 31326.Aug 4 2015, 7:15 AM

Hi Adam, Hal, Michael, Nadav, Arnold,

Thanks for your comments on my RFC thread.

This latest diff is actually for you to review; I have three things here:

  1. The latest incarnation of the LoopEditor header/API.
  2. A stubbed out .cpp so it all compiles and links.
  3. A proof-of-concept showing the LoopEditor applied to the LoopVectorizer and replacing createEmptyBlock().

Only (1) and (2) am I looking to commit; (3) is here as an example of how the LoopEditor might fit. I should mention that (3) looks horrible, and rightly so. There would need to be a bunch of refactoring, and for this proof of concept I've tried to keep the LoC count down. But I would, in the end:

  • Move almost all the InnerLoopVectorizer code into the delegate, or more cleanly make InnerLoopVectorizer a subclass of Delegate.
  • Start by just using LoopEditor to replace createEmptyBlock and let LoopVectorizer do most of the cloning itself
  • Then start to de-duplicate a lot of the cloning functionality - removing the unroll factor completely from the LoopVectorizer class as a start (LoopEditor can do all that itself), then moving from LoopVectorizer managing its own PHIs to LoopEditor doing that for it.
  • By the end, hopefully having something substantially cleaner with a very distinct and testable API.

Specific improvements in the API since my RFC:

  • Add the minimal hook functions that the LoopVectorizer needs to the delegate.
  • Add the concept of an "AnalysisLevel" - different mutations or analyses on a loop may depend on being able to analyze more or less information about the loop. For example, loop cloning (as used by loop distribution) doesn't need to identify all recurrences, but loop vectorization does.
  • Revamp Reductions and Inductions and how they're represented, with a proper class hierarchy.
  • Streamline the API a tad.

Please let me know your thoughts!

James

Hi all,

Does anyone have any thoughts on the API design? I'm keen to get cracking on this.

Cheers,

James

anemet edited edge metadata.Aug 9 2015, 11:07 AM

Hi James,

I've started looking at this and should get back to you today or tomorrow.

Adam

hfinkel edited edge metadata.Aug 9 2015, 8:51 PM

I think this looks quite useful.

One thing that is not obvious to me is what the LoopEditor will do for loops that are not inner loops. Are any of these operations inner-loop only?

Also, how will if-conversion (during vectorization, etc.) work?

James,

I haven't looked at the patch, just getting my bearings...

Is this just for the vectorizer, or are you planning to extend this to *any* loop transformation?

We have some shared methods in lib/Transforms/Utils/LoopUtils.cpp, which you don't seem to touch. Are you planning on joining all loop transformations into one tool-kit, then use for all vectorizer, unroller, simplify, rotation, etc.?

cheers,
--renato

I'm failing to see (partially because it's my first time looking at it) how your new hooks will make sure all the pre-analysis is done before the actual vectorization. But it seems Adam and others are more familiar with it, so I'll just let them comment about that.

Having said that, I welcome any change that makes it easier to share code, and it now seems to me that other loop passes will be able to use the same hooks and reduce their complexity as well as you did in the vectorizer.

cheers,
--renato

lib/Target/ARM/Thumb2ITBlockPass.cpp
186 ↗(On Diff #31326)

oops

Hi James,

Now, I read the whole patch and I guess I am still more-or-less left with my original questions from the RFC thread.

Can we further decompose the functionality beyond LoopEditor? One straw-man I was thinking it to focus on the widening functionality. Both interleaving and loop-vectorization could be considered widening differing only through the means they achieve widening (by vectorizing or duplicating instructions). Then obviously the hook could provide the instruction-specific widening operation and how reduction values should be recovered at the exit of the loop.

I would also encourage you to use LoopVersioning. The idea was certainly to convert the vectorizer to be another client of LVer. If you need to add hooks or what not feel free. The plan for LoopVersioning has always been to go beyond just alias-checks (we have immediate plans to add dynamic loop-trip count checks) so having to reimplement this at various places would be a mistake now that I finally refactored it.

I would also think that it would be a design mistake to re-implement LoopVersioning on top of something more complex, i.e. LoopEditor. I rather have as simple classes as possible for things like LoopVersioning, LoopWidening, LoopPeeling, etc and then compose transformations using these classes. This would make things more explicit and easier to reason about correctness: input, ouput state, required analyses, modified analyses. (Having AnalysisLevel in the class is an indication to me the class is not properly decomposed.)

It also unclear to me why want to pin down the API by committing it with the implementation stubbed out. I think it's good to discuss the end goal but why can't we invent and refine the API gradually as you start refactoring and transitioning over the existing functionality?

Thanks,
Adam

I rather have as simple classes as possible for things like LoopVersioning, LoopWidening, LoopPeeling, etc and then compose transformations using these classes. This would make things more explicit and easier to reason about correctness: input, ouput state, required analyses, modified analyses.

I agree with this statement. Having multiple, independent, focused tools, used by multiple, independent passes is a better design.

It may, however, need some redundant information about the state of things on each tool / user, but I think we can manage it.

cheers,
--renato

Hi Adam, Renato,

Thanks for your review and sorry for the time it took to get back to you.

I think I agree with all your comments. The LoopEditor structure is a bit of a monolith - I hadn't really noticed as I was designing it. I like your idea of composable operations more (and it shows that I should re-read a design patterns book sometime soon!). I confess that I did see LoopVersioning as a trivial user of LoopEditor, but I do understand your reservations here.

I'm also not pushing for the entire API to pushed in in one go - it was just, as you said, describing the end goal.

So it looks like this becomes:

  • Improvements to LoopVersioning to make LAA optional (choosing one loop or another shouldn't be tied to LAA - any predicate at all should do fine).
  • [My own requirement] Make sure loopversioning works with non-leaf loops.
  • Create a new class LoopWidening.
    • I feel like this should be an abstract base class with concrete subclasses "LoopVectorizing" and "LoopInterleaving".
  • [Later] create a similar LoopPeeling class, that hopefully should sit on top of the other two.

The names are up for bikeshedding.

This seems to make review much easier. What do you think of this as a plan?

Cheers,

James

Hi James,

Thanks very much for considering my comments. I agree with this direction overall. I have a few specific comments below:

So it looks like this becomes:

  • Improvements to LoopVersioning to make LAA optional (choosing one loop or another shouldn't be tied to LAA - any predicate at all should do fine).

Agreed. Silviu's Assumption-based SCEV is already proposing to use LoopVer to host overflow checks. I also have plans to add dynamic trip-count checks to allow a higher number of checks when we know the higher trip count will justify the the additional overhead.

  • [My own requirement] Make sure loopversioning works with non-leaf loops.

Makes sense.

  • Create a new class LoopWidening.
    • I feel like this should be an abstract base class with concrete subclasses "LoopVectorizing" and "LoopInterleaving".

There are certainly commonalities between interleaving and vectorization that we may need to be able leverage by delegating the differences to hooks. I guess the way this shapes up will depend on the specifics of how the code will be split out from the Loop Vectorizer. It will initially be probably pretty close to structure of code in the vectorizer.

  • [Later] create a similar LoopPeeling class, that hopefully should sit on top of the other two.

I am not sure there is much code to be shared between LoopPeeling and Widening. Why do you think so?

The names are up for bikeshedding.

Just for the record, I added the 'ing' ending to LoopVersioning to stress that I mean "version", the verb rather than the noun.

Thanks again,
Adam

Hi Adam,

I'm glad we're converging on a way forward!

I am not sure there is much code to be shared between LoopPeeling and

Widening. Why do you think so?

Commonalities would be cloning all the blocks in a loop, detecting and
hooking up reductions/inductions, and modifying the loop trip count. But
you're right, "on top of" was not the right thing to say.

Cheers,

James

Hi Adam,

I've been working on this for the past few days and wanted to give an update. The most interesting/annoying thing about this all is exactly how to split out/share code with the loopvectorizer.

My ideal would be to take the LoopVectorizer code and refactor it piece by piece until it's in a modular enough shape to encapsulate in a utility class. This would mean we'd maintain 100% test coverage throughout the process.

Unfortunately the code in question is a bit of a monolith and has resisted all of my attempts at chiselling into better shape. The main problem is that any sane composition model would perform a sequence of operations on a loop, mutating it in place. This is the model I'd like to get to. But the current model is to do one pass over the source loop, injecting any cloned instructions into a single IRBuilder.

This has meant that everything we do in the loop vectorizer happens in one place. If-conversion is done on-the-fly, as are induction PHI updates and remaps. Because of this it's very difficult to isolate one piece of functionality and refactor it - it's all or nothing.

So what I'd like to do is to incrementally create the helper classes LoopWidening/LoopInterleaving/LoopVectorizing separately. In order to get test coverage and move towards an end goal, I'd hook the new helper classes into LoopVectorize and make LoopVectorize use the new classes when possible. Something like this:

 if (!VectorizeLoop) {
  assert(IC > 1 && "interleave count should not be 1 or 0");
  // If we decided that it is not legal to vectorize the loop then
  // interleave it.
  if (canBeHandledByLoopInterleaving()) {
    LoopVersioning LVer(L);
    LVer.versionLoop();
    LoopInterleaving LInt(LVer.getVersionedLoop());
    LInt.widenLoop();
  } else {
    InnerLoopUnroller Unroller(L, SE, LI, DT, TLI, TTI, IC);
    Unroller.vectorize(&LVL);
  }
  emitOptimizationRemark(F->getContext(), DEBUG_TYPE, *F, L->getStartLoc(),
                         Twine("interleaved loop (interleaved count: ") +
                             Twine(IC) + ")");
} else {
  ...

Initially the new codepath would be taken only when the loop is very simple; it has only a single block, for example (or a debug option forced it).

I'm still not 100% happy that I haven't found a way to refactor the code already there :( What do you think? I don't want to spend a massive amount of time going in the wrong direction.

Cheers,

James

mzolotukhin edited edge metadata.Aug 19 2015, 5:43 PM

Hi James,

Thanks for pushing it forward, I think we really need cleaning this up!

I don't think we'd like to have two independent vectorizers at the same time: 1) the old one, 2) based on the LoopEditor framework and invoked on simple loops (for beginning). The problem with that approach is that you're actually rewriting vectorizer from scratch, which should be really-really last resort. While I do admit that the code might be not ideal in places, I don't think we need to completely rewrite it from scratch. We can't be sure if they will converge in some observable future, and in attempts to converge them we don't know how the LoopEditor implementation will evolve when it covers all corner cases that the existing vectorizer supports.

So, I think the best way is to actually refactor existing code in small incremental steps (and by small I don't mean number of lines changed - I mean number of key points changed). For instance, you introduced a new class Recurrence - why not to do that in the original code? This way the existing code would become closer to what you want in the end, thus your patch will become smaller. I understand that it's not as easy as it sounds, but I think that's the proper way of pulling such changes. Another benefit of such approach is that it would be much easier to review, since NFC changes would be separated from new features, and thus we can both better check NFC changes for possible bugs and discuss the features on a higher level.

That said, in the end I do like to see API similar to what you proposed. I just think that growing it in parallel might back-fire.

Thanks,
Michael

Unfortunately the code in question is a bit of a monolith and has resisted all of my attempts at chiselling into better shape. The main problem is that any sane composition model would perform a sequence of operations on a loop, mutating it in place. This is the model I'd like to get to. But the current model is to do one pass over the source loop, injecting any cloned instructions into a single IRBuilder.

Can you please elaborate on the difficulties a bit more? Perhaps working through an example would help here.

Thanks,
Adam

Hi Adam, Michael,

On Friday I tried again, intending on reporting to you exactly why I failed again... and succeeded :)

I've now got a bunch of patches, culminating in changing LoopVectorize to use LoopVersioning, and I intend to refactor a bunch more too. The patches I've got up for review currently are:

http://reviews.llvm.org/D12284
http://reviews.llvm.org/D12285
http://reviews.llvm.org/D12286
http://reviews.llvm.org/D12289

Those are the ones that aren't quite NFC. I've got a bunch more that just does code cleanup (enabled by those patches) that are trivial.

I still need to throw more testing at these patches, but they all pass regression tests at least.

Cheers,

James

reames resigned from this revision.Oct 8 2015, 10:29 AM
reames removed a reviewer: reames.