This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] Unroll and Jam
ClosedPublic

Authored by dmgreen on Jan 11 2018, 10:18 AM.

Details

Summary

This is a implementation of unroll and jam, which is something that comes up as useful for our smaller embedded processors (and hopefully for other systems in general)

The basic idea is that we take an outer loop of the for:

for i..
  ForeBlocks(i)
  for j..
    SubLoopBlocks(i, j)
  AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

for i... i+=2
  ForeBlocks(i)
  ForeBlocks(i+1)
  for j..
    SubLoopBlocks(i, j)
    SubLoopBlocks(i+1, j)
  AftBlocks(i)
  AftBlocks(i+1)
Remainder

To do this we need to ensure that the ForeBlocks(i+1) can be moved before the SubLoopBlocks(i) and AftBlocks(i), which means potentially moving the phi node operands from AftBlocks into Fore. There are also memory dependency checks and other safety checks that are needed. The transform is then a fairly simple job of using the excellent existing unroll code for cloning blocks and gluing them all back together correctly.

  • The dependency analysis is built upon DependencyAnalysis that loop interchange uses. This might have been a mistake. I have made some changes to DA to ensure that the AA gave correct results (and enable TBAA). I have split these changes out into D42381. There may be more to do here to get values outside the loop to be ignored.
  • I have made this into a separate pass that runs just before the existing loop unroll.
  • The performance heuristics might not be sorted correctly yet. Several parts might be over-conservative when disabling for safety. The remainder loop may not be the best, I'm not sure how it will play with vectorisers, etc. It is hopefully a good enough start.
  • I have tested this from the C level (i.e. csmith) but not a lot from the IR level with some form of fuzzer.
  • Pragmas are mostly copied from unroll.

Diff Detail

Event Timeline

dmgreen created this revision.Jan 11 2018, 10:18 AM
dmgreen updated this revision to Diff 131015.Jan 23 2018, 2:37 AM
dmgreen edited the summary of this revision. (Show Details)

I have split (most of) the Dependency Analysis parts into https://reviews.llvm.org/D42381 and https://reviews.llvm.org/D42382.

Ping. Anyone interested in unroll and jam? (I wasn't sure who, if anyone, to add as reviewers)

lebedev.ri retitled this revision from Unroll and Jam to [LoopUnroll] Unroll and Jam.Feb 5 2018, 11:51 AM
lebedev.ri edited the summary of this revision. (Show Details)
lebedev.ri set the repository for this revision to rL LLVM.
hfinkel added a subscriber: hfinkel.Feb 7 2018, 8:44 PM

Ping. Anyone interested in unroll and jam? (I wasn't sure who, if anyone, to add as reviewers)

I'm interested in this, but I might not get to it for a couple of weeks.

Thanks Hal. No rush.

lib/Transforms/Scalar/LICM.cpp
506

I'm moving the move of this code into a separate commit.

Hi Dave,

I haven't looked at all the nitty gritty details yet, but had some general questions after a first glance:

  • this looks like a rather complete implementation (as opposed to some WIP prototype). In your description you mentioned something about testing. I was wondering if you can elaborate a little bit on this, if you had some more experiences with this pass since posting it. E.g., you've obviously added regression tests, but does it trigger also on the test-suite, etc.?
  • probably the other thing people first want to know if this is (always) beneficial for their back-end. Do you have some data/experience with this?
SjoerdMeijer added inline comments.Mar 21 2018, 7:40 AM
lib/Transforms/Utils/LoopUnroll.cpp
267

Looks like this is used by both LoopUnroll and UnrollAndJam. To simplify things, is it worth to separate this out in a different patch if this already used/beneficial by/for LoopUnroll?

this looks like a rather complete implementation (as opposed to some WIP prototype)

Hopefully. The things I'm not super happy with are where this fits into the unroll pass, and the tests in test/Other. Whether this is a new pass or a part of unroll is a little awkward because of the way we need to do the outer unroll-and-jam before we do the inner unroll.

The DA Analysis pass either needs to be added to the set of preserved loop passes, or (because it's trivial to create and holds no state) we could just create the DependenceInfo as needed. This would just need an AA in loop unroll.

In your description you mentioned something about testing. I was wondering if you can elaborate a little bit on this, if you had some more experiences with this pass since posting it.

So I've ran the normal set of regression tests / supertest / perennial / test-suite / eembc / geekbench / spec's / etc. All without issue. It has also run probably a few hundred thousand csmith seeds (which is pretty good at creating the kind of nested loops with odd memory accesses that unroll and jam looks for). Those are all with -always-unroll-and-jam to bypass the profitability check. (There can, of course, always be bugs)

probably the other thing people first want to know if this is (always) beneficial for their back-end. Do you have some data/experience with this?

With the way DA and the profitability checks are right now, it may be a little conservative in when it is enabled. I have to look into this with the latest DA changes. I would expect it to be a win in most cases though. The performance is something I hope to work on more, so long as the basics are OK here. When it does fire, in the ideal case, it can help quite a bit with removing extra loads.

lib/Transforms/Utils/LoopUnroll.cpp
267

Sure, sounds good. I have split out the DA and other unrelated parts. This one is less useful on it's own, if it's only used in one place, but I can split it out as a cleanup.

Hi Dave,

With respect to the awkward ordering, I don't understand why this transformation (as a standalone pass) couldn't perform the inner unrolling itself? I don't see why there needs to be a dependency. Also, will have loop rotate as a utility (https://reviews.llvm.org/D44595) help simplify this transform?

cheers,
sam

I think I agree with Sam's reply, also because I think that would avoid a regression that would be possible with this patch: neither loopunroll, nor loopunroll and jam runs. Because when it is determined that loopunroll is profitable, it will never run loopunroll, but loopunroll and jam may actually not happen; it is returning "LoopUnrollResult::Unmodified" in a few (exceptional) cases.

Sorry Sam, I missed your comment somehow in a number of other emails. Sorry about that.

By a bit awkward - I wasn't referring to unroll and jam doing the inner unroll. That sounds like a fine idea, like we have here it just tries to call tryToUnrollLoop again.

I just meant that two passes in the same LoopPassManager will process inner loops first for all passes, then move on to the outer loops. So we would either need get the inner loop unroll to not unroll (preventing the outer unroll and jam) or we'd need to ensure the lpm is split in two (which may be fine, and may help with the other issues around analysis dependencies and not needing the explicit inner unroll). Providing this wont change the pass ordering for other code, which I don't think it will if I'm reading this pass pipeline correctly, this sounds like the best way to go to me.

I'll give it a try like that and see how it looks.

zzheng added a subscriber: zzheng.Apr 3 2018, 12:52 PM

Hi Dave,

Sorry, I don't follow you mean by splitting the pass manager. My idea of the problem and phase ordering would be: unroll and jam, as a standalone pass, runs before normal unrolling and, if successful, uses the metadata to prevent further unrolling later on. Can this not be the case? What am I missing here?

cheers,
sam

I was being a bit vague, what I meant was that if we have a pass structure like this, as we have now but with added unroll and jam:

Loop Pass Manager
  Induction Variable Simplification
  Recognize loop idioms
  Delete dead loops
  Unroll and Jam
  Unroll loops

With a nest with inner and outer loops, the passes will be done in this order:

IndVar simplify inner loop
Recognise inner loop
Delete Dead inner loop
UnJ inner loop
Unroll inner loop
IndVar simplify outer loop
Recognise outer loop
Delete Dead outer loop
UnJ outer loop
Unroll outer loop

So when Unj gets to the outer loop, the inner loop may have been unrolled. So the unroll for the inner loop would have to know whether the loop nest was profitable to unroll and jam and -not- unroll in that case. To get the UnJ of the outer loop to happen before the unroll of the inner loop, two loop pass managers are needed (in this case one just for the Unroll Loops). This can affect code generation for existing cases, which was something I was hoping to avoid. I saw some cases of infinite loops being generated differently when I had it like that for testing.

What I am hoping is that we won't have to add UnJ here in the pass pipeline, we can add it later at the other place that we do unrolling. This should allow us to more easily have two loop pass managers, simplifying things a lot :)

hintonda removed a subscriber: hintonda.Apr 4 2018, 4:36 PM
dmgreen updated this revision to Diff 142645.Apr 16 2018, 9:59 AM
dmgreen edited the summary of this revision. (Show Details)

This makes things into a new pass, currently disabled by default. I have attempted to copy what the unrolled does, without duplicating too much code. Pragmas and options should work, although I will need to add a few more tests for the different behaviours. I have tried to make it so that any unroll pragma will prevent unroll-and-jam, and added equivalent options/pragmas for unroll and jam. It still uses the unroll code as much as possible.

I have the clang side of the pragmas too, which I'll put up (although they may only even be useful for testing).

Hi Dave, thanks for making this into a pass, I think this looks a lot better now. I skimmed through the whole diff for first time, and just wrote down some things I noticed, mostly nitpicks, see the comments inlined. I will now study the different bits and pieces in more detail.

include/llvm/Analysis/TargetTransformInfo.h
427

Can you be a bit more explicit here what the threshold exactly is?

427

typo: trashold :)

lib/Analysis/DependenceAnalysis.cpp
111

Will other people get unhappy when we flip this switch?

lib/Target/ARM/ARMTargetTransformInfo.cpp
622

Some comments perhaps why it is set to 60?

lib/Transforms/IPO/PassManagerBuilder.cpp
659 ↗(On Diff #142645)

Elaborate a bit what you mean here (the 2nd sentence)?

lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
148 ↗(On Diff #142645)

nit: extra space

203 ↗(On Diff #142645)

nit: know -> known

305 ↗(On Diff #142645)

nit: don't need the "!= 0"

lib/Transforms/Utils/LoopUnrollAndJam.cpp
122

Some more nitpicking here, about names.

Perhaps "Fore" can be confusing: is it another loop, a block? So something like this:

for (i)
  block[i, 0]
  for (j)
    block[j, 0]
  end
  block[i, 1]
end

becomes:

for (i, i+=2)
  block[i, 0]
  block[i+1, 0]
  for (j, j+=4)
    block[j+0, 0]
    block[j+1, 0]
    block[j+2, 0]
    block[j+3, 0]
  end
  block[i, 1]
  block[i+1, 1]
end

Mentioning that the outer loop has been unrolled with a factor of 2, and the inner loop with an unroll factor of 4.

Not sure if the classic loopunroll and jam literature uses standard terminology here, but perhaps inner/outer loop rather than "Fore, Subloop and Aft".

Also, perhaps you want to mention some restrictions somewhere, if there are any. E.g., can this doubly nested loop occur in more deeply nested loop structure (and loopunroll and jam still trigger)?

140

Blocks = outer loops?

So perhaps something along the lines of:

"So the outer loops are unrolled and the inner loops fused ("jammed")."

181

Nit: wont

197

nit: don't need to capitalize this message?

211

same here?

543

typo: shouldnt

dmgreen marked 5 inline comments as done.Apr 20 2018, 4:11 AM

Thanks for the comments! I'll try to split a few things out and update this accordingly

include/llvm/Analysis/TargetTransformInfo.h
427

trashold?

lib/Analysis/DependenceAnalysis.cpp
111

Oh yep, this should probably be a separate change.

I don't think people will be unhappy, but it shouldn't be lumped in here.

lib/Target/ARM/ARMTargetTransformInfo.cpp
622

The limit here is set because the inner loop requires a smaller threshold than the outer, as unroll and jam can increase register pressure in the inner loop significantly. This was chosen semi-randomly to be a something around 20% of the normal threshold, to something that seemed to work in a few cases I tried. More performance testing may be needed to get something better.

lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
305 ↗(On Diff #142645)

This is copied from loop unroll. I think it's clearer this way what it's testing. But I can change it if you are highly opinionated :)

lib/Transforms/Utils/LoopUnrollAndJam.cpp
122

I'm not sure about your loop here. It looks like it has inner loop unrolling, which UnJ doesn't do (that is left to the unroller after the outer loop has been unroll and jammed.

By "Fore" I was referring to blocks in the outer loop, but before the inner loop. "Aft" means blocks in the outer loop but after the inner loop. I made up the names, so am open to suggestions on better ones ;)

Also, perhaps you want to mention some restrictions somewhere, if there are any. E.g., can this doubly nested loop occur in more deeply nested loop structure (and loopunroll and jam still trigger)?

That should work I believe. If it doesn't it's a bug :) But I doubt it will ever be a profitable thing to do. It used to be disabled in the profitability check, but we may have lost that when I was moving things over the the new pass. I'll have a look.

There are restrictions mentioned in the safety checks, in isSafeToUnrollAndJam.

140

I have tried to rewrite this a little.

197

This is copied from loop unroll

211

Same

dmgreen updated this revision to Diff 143328.Apr 20 2018, 9:55 AM

Address feedback and added some pragma tests, with some related fixes. Split out some parts into other commits.

I'm still trying to work on making the analysis more accurate so this works on more loops. It would be good if LAA could be made to work here (to get versioning too) but it looks very vectoriser shaped and doesn't handle outer loop quite yet. DA as used here might be the only way, but may need to learn a few more tricks (or may need some fixes).

rja added a subscriber: rja.May 8 2018, 11:39 PM

Address feedback and added some pragma tests, with some related fixes. Split out some parts into other commits.

I'm still trying to work on making the analysis more accurate so this works on more loops. It would be good if LAA could be made to work here (to get versioning too) but it looks very vectoriser shaped and doesn't handle outer loop quite yet. DA as used here might be the only way, but may need to learn a few more tricks (or may need some fixes).

Ping, how work is going on?

SjoerdMeijer added inline comments.May 9 2018, 2:18 AM
test/Transforms/LoopUnrollAndJam/unprofitable.ll
226 ↗(On Diff #143328)

Nit: do you need all this?

test/Transforms/LoopUnrollAndJam/unroll-and-jam.ll
9 ↗(On Diff #143328)

Do we need all these checks? And with all the variable names still there, it looks like a fragile test to me. Same for the other functions below.

Ping, how work is going on?

Hello. I've been away for a few days. Is this something you are interested in?

I'm still wading through improvements / bugs in dependency analysis, in an attempt to make the dependency checking here more precise.

dmgreen updated this revision to Diff 146623.May 14 2018, 9:08 AM
dmgreen added subscribers: efriedma, sebpop, anna and 4 others.

I've updated the dependency checks to be a bit better. This has shown up some bugs/inaccuracies in DA, which will be needed to make this correct.

I'm also adding some people here who may have opinions about any of this.

dmgreen added inline comments.May 14 2018, 9:09 AM
test/Transforms/LoopUnrollAndJam/unroll-and-jam.ll
9 ↗(On Diff #143328)

I think as we are only running a single pass (not -O3 or an entire backend), it should be relatively stable.

I've updated this one to be auto generated and show the entire code output. The others are usually testing some small portion of the resulting function.

SjoerdMeijer accepted this revision.May 16 2018, 8:56 AM

Hi Dave,

I think we have looked long enough at it now, and it's time to get some experience with it. :)
It's off by default, so I don't think there's any harm done here. Once it is in, it's easier to throw
more code at it, get more experience, and see if we further need to tweak or tune it; perhaps
we get feedback from others as well who are interested in using it.

Let's wait a few more days with committing to give people one more chance to object :-)

Inlined just a few more comments, mainly nits.

Cheers.

lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
84 ↗(On Diff #146623)

This is *almost* identical to GetUnrollMetadata() in LoopUnroll.cpp. Is this something that could be shared somehow? Or would that be too inconvenient?

229 ↗(On Diff #146623)

nit: numInvariant -> NumInvariant

lib/Transforms/Utils/LoopUnrollAndJam.cpp
73

nit: TODOD

399

nit: don't need the brackets here

623

Nit: TODOD

test/Transforms/LoopUnrollAndJam/unroll-and-jam.ll
9 ↗(On Diff #143328)

Ok, fair enough, agreed.

This revision is now accepted and ready to land.May 16 2018, 8:56 AM
dmgreen marked 4 inline comments as done.May 21 2018, 10:57 AM
dmgreen added inline comments.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
84 ↗(On Diff #146623)

It's similar, but this version is only testing prefixes, not the whole string.

dmgreen updated this revision to Diff 147820.May 21 2018, 10:57 AM

OK thanks. I will leave this for at least a couple more days whilst I run extra tests. Any other comments from anyone are welcome/appreciated.

xbolva00 added inline comments.
lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
136 ↗(On Diff #147820)

static_cast?

lib/Transforms/Utils/LoopUnrollAndJam.cpp
175

bool CompletelyUnroll = (Count == TripCount);
maybe better?

436

Above you use a different style:
for (unsigned It = 1; It != Count; ++It) {

Can you make it same for all occurrences?

475

This line could be placed above the condition.

dmgreen updated this revision to Diff 147983.May 22 2018, 4:23 AM
dmgreen marked 4 inline comments as done.

Review Updates.

xbolva00 accepted this revision.May 22 2018, 4:34 AM
This revision was automatically updated to reflect the committed changes.
echristo added a subscriber: echristo.

It sounded like Hal wanted to review this and I don't know that any of the other people on the review line have any experience with loop passes and so probably shouldn't have been approving this. I might be wrong here, and if so I apologize, but it seems like this went in a bit early.

From a quick glance it also looks like the testcases could be cleaned up a bit. In particular the naming if nothing else.

Thanks.

Reverted in 333359 as it's failing on some of the builders. I believe the tests are dependant on the arm backend being built.

I'll fix that and try to clean up the tests as you suggest at the same time. Some of them are longer than I'd like because we are dealing with nested loops.

I was taking Sjoerd review as OK to move things forward and improve things over time once it is in-tree. And I had attempted to leave this open for a while with plenty of people as subscribers, so that anyone could object. I had not added them as reviewers though, sorry if I misstepped.

It sounded like Hal wanted to review this

Thanks Eric. I don't need to review this if others who are qualified look at it. I've skimmed the code and it seems reasonable. There are a number of obvious enhancements that might be worthwhile, but I think it's best to work on those after this lands.

and I don't know that any of the other people on the review line have any experience with loop passes and so probably shouldn't have been approving this. I might be wrong here, and if so I apologize, but it seems like this went in a bit early.

From a quick glance it also looks like the testcases could be cleaned up a bit. In particular the naming if nothing else.

Thanks.

llvm/trunk/lib/Transforms/Utils/LoopUnrollAndJam.cpp
726 ↗(On Diff #148747)

Can this overflow if the exit count is INT_MAX (or similar)? It's generally a good idea to comment on how that's handled.

This looked very reasonable to me as an initial commit. And I thought so too, like Hal and Eric, that different enhancements can still be made. But as I wrote, we can iterate further on this once we've got something in. So I don't think this was a misstep, and once the buildbot failures are resolved, it looks like this can be recommitted. However, as I also mentioned, another reason for an initial commit is more exposure, potentially more people looking at it. So this discussion is great, the more feedback, the better; especially if there's something that needs to be addressed before a recommit.

hsaito added a subscriber: hsaito.May 28 2018, 11:09 PM

David, outer loop vectorization we've been working on and unroll and jam have a lot of similarities and thus there should be code sharing opportunities around legality and cost modeling. Let's collaborate to improve both.

dmgreen reopened this revision.May 29 2018, 11:47 AM

There are a number of obvious enhancements that might be worthwhile, but I think it's best to work on those after this lands.

Any suggestions on what would be at the top of the list? :)

David, outer loop vectorization we've been working on and unroll and jam have a lot of similarities and thus there should be code sharing opportunities around legality and cost modeling. Let's collaborate to improve both.

Yeah, sounds like a good idea. Vectorisation is not something I know a huge amount about, it not being useful for the particular cores I'm looking at at the moment. But the work on VPlan sounds interesting and I do wish we had something similar for loop transforms in llvm. I was hoping to get loop versioning working, maybe through LoopAccessAnalysis, but as far as I understand it doesn't handle outer loops yet (and seemed very vectoriser shaped when I looked). From what I remember hearing, this isn't part of the outer loop vectorisation work (instead relying on user directives?). I imagine it's not an easy task to extend this out.

The current legality checks here are based on dependence analysis, and there are a couple of patches needed to make it work correctly. The cost modelling is very adhoc. :) All these things I hope we can improve as we go along.

From a quick glance it also looks like the testcases could be cleaned up a bit. In particular the naming if nothing else.

Do you mean names of the test files, or names of variables in the tests? I had look but didn't feel I was making things much better (apart from the names of some lcssa nodes). Any other suggestions?

llvm/trunk/lib/Transforms/Utils/LoopUnrollAndJam.cpp
726 ↗(On Diff #148747)

Yep, you are correct. This code comes from unroll-runtime (minus the missing overflow check), and dates back to before unroll-and-jam was a separate pass. I was trying to prevent this function from returning true if the runtime unrolling was going to fail. It can now be safely removed and leave that check to the call to UnrollRuntimeLoopRemainder.

This revision is now accepted and ready to land.May 29 2018, 11:47 AM
dmgreen planned changes to this revision.May 29 2018, 11:47 AM

David, outer loop vectorization we've been working on and unroll and jam have a lot of similarities and thus there should be code sharing opportunities around legality and cost modeling. Let's collaborate to improve both.

Yeah, sounds like a good idea. Vectorisation is not something I know a huge amount about, it not being useful for the particular cores I'm looking at at the moment. But the work on VPlan sounds interesting and I do wish we had something similar for loop transforms in llvm. I was hoping to get loop versioning working, maybe through LoopAccessAnalysis, but as far as I understand it doesn't handle outer loops yet (and seemed very vectoriser shaped when I looked). From what I remember hearing, this isn't part of the outer loop vectorisation work (instead relying on user directives?). I imagine it's not an easy task to extend this out.

The current legality checks here are based on dependence analysis, and there are a couple of patches needed to make it work correctly. The cost modelling is very adhoc. :) All these things I hope we can improve as we go along.

We are currently focusing on directive based implementation, but the same framework can be used for automatic vectorization at the outer loop level. It's not like vectorizer side has a lot of immediately reusable code in place, but I think co-designing the solution makes a lot of sense, to avoid duplicate work (or to consolidate into shared reusable code). For example, uniform control flow check @dcaballe has written is probably shareable. Ditto for Divergence Analysis RFC by Simon Moll. Dependence Analysis for outer loop vectorization isn't something we are currently working on, and thus being able to reuse is a very nice thing.

For example, uniform control flow check @dcaballe has written is probably shareable. Ditto for Divergence Analysis RFC by Simon Moll.

You could have a look at isExplicitVecOuterLoop and isUniformLoop in D42447. Currently, they are implementing specific checks for supported outer loops in VPlan but some of this code could be generalized and refactored.
Simon's RFC is here, in case you missed it: http://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html

Dependence Analysis for outer loop vectorization isn't something we are currently working on, and thus being able to reuse is a very nice thing.

Agreed. Outer loop auto-vectorization and, in particular, extensions in legality analysis to support outer loops, including LoopAccessAnalysis, is something that we are not currently working on. I briefly mentioned that and other open TODOs in my EuroLLVM talk, in case there is any other overlap/reusability opportunity with unroll-and-jam: https://www.youtube.com/watch?v=z6NeHLRNVok&t=27m50s

Thanks,
Diego

Any suggestions on what would be at the top of the list? :)

For me the first thing would probably be a rather straightforward thing: more OptimizationRemarks. I think it currently reports only something after a successful transformation. But perhaps more interesting to know is why it isn't triggering (if that's what you expect): so optimisation remarks about the legality, profitability, or any other reason, would be crucial to understand what's going on. Then, based on this optimisation report, and if the transformation really needs to be triggered, the user can resort to pragmas (for which you have another patch) and/or tweaking options for treshhold values.

This is an implementation detail, but I am still not sure about some of the terminology that is used, e.g. "Fore", "Subloops", etc., but I've not come up yet with convincing alternatives...

Also, we probably want to revise some of the threshold values, and see if we need to tweak them further.

But assuming that the dependence analysis is now conservative/correct, these are the sort of things I was thinking of that we can improve after this lands and we get some more experience with the pass.

dmgreen updated this revision to Diff 149147.May 30 2018, 9:57 AM
dmgreen added a reviewer: hfinkel.

Attempted to clean up tests and make them not rely on arm. Removed the outer loop IV check, some other small code edits and some comment improvements/formatting.

This revision is now accepted and ready to land.May 30 2018, 9:57 AM
SjoerdMeijer accepted this revision.Jun 1 2018, 3:29 AM

Cleanup/simplification of the tests looks good to me. Just a nit inlined.

lib/Transforms/Utils/LoopUnrollAndJam.cpp
544

If you move this to after the asserts, the other NDEBUG block , can you then move this statement:

Loop *OuterL = L->getParentLoop();

to inside the other #ifndef NDEBUG block? Thus we avoid having just one statement hash defined here.

dmgreen updated this revision to Diff 151330.Jun 14 2018, 5:57 AM

OK. One DA patch is in, the other is still waiting for review.

Everyone happy with this to go into tree (its off by default)? Hal are you happy enough for us to continue work in tree? Eric you happy enough with the tests? No-one else have any objections?

I think you can commit this. And should there be more comments, then they can be addressed post commit.

OK. The important DA patches are in. Unless anyone objects, I will commit this soon (hopefully over the weekend).

This revision was automatically updated to reflect the committed changes.