This is an archive of the discontinued LLVM Phabricator instance.

Adding ability to unroll loops using epilogue remainder.
ClosedPublic

Authored by evstupac on Mar 14 2016, 2:17 PM.

Details

Summary

The patch adds an ability to unroll loops using epilogue remainder and makes it default (instead of current prologue),
The epilogue remainder is better as it do not produce additional non-constant phi nodes.
for(i = 0; i < n; i++)

...

For now loop unrolling looks like:
PROLOG:

ip = phi [0, ip.next]
...
ip.next = ip + 1
cmp ip, n&7

jne PROLOG
i0 = phi [0, ip.next]
LOOP (unrolling on 8):

i = phi [**i0**, i.next]
i.next = i + 1
cmp i, n

jne LOOP

In the LOOP (this is supposed to be unrolled) we have i = phi [i0, i.next].
Basically, that means that we don't know where we start now, however before unrolling we knew the start position is 0.
Assuming unrolled part of the loop is hotter than remainder (prologue currently), it is better to move this induction variable with unknown start to remainder code (which is done by epilogue technique).

Diff Detail

Repository
rL LLVM

Event Timeline

evstupac updated this revision to Diff 50633.Mar 14 2016, 2:17 PM
evstupac retitled this revision from to Adding ability to unroll loops using epilogue remainder..
evstupac updated this object.
evstupac set the repository for this revision to rL LLVM.
evstupac added a subscriber: llvm-commits.
mzolotukhin edited edge metadata.Mar 15 2016, 11:10 PM

Hi Evgeny,

I'm going to look at this patch soon, but at first glance I find one thing that distracts from the actual stuff: your patch contains a lot of unnecessary changes, such as typo fixes, rewording of not directly related comments, variables renaming etc. While such changes are fine by itself, there is no need to include them in the main patch. You can commit them before the patch, and no pre-commit review is required for obvious patches, such as typo fixes.

Michael

Hi Michael,

Thanks for taking the patch into pipeline.
Yes there are some rewording and comment changes in previously written code. However most of them implicitly related to newly written code and comments (basically because bigger code requires more detailed comments for understanding). Separating the patch into 2 is not a big deal. If it is more convenient for you to review the code after rewording I'll try to do it first.

Evgeny

Hi Evgeny,

A bunch of mostly stylistic comments from me are inline, I'd take a closer look later. I'm fine with keeping it in one patch too, usually it's just more convenient to commit small stuff before to move it out of the way.

Michael

lib/Transforms/Utils/LoopUnrollRuntime.cpp
131

I was talking mostly about stuff like this. This is very local change, that doesn't depend on the rest of the patch in any way.

175–176

You could use range-loop here.

180–184

Maybe use IR syntax in the comments?

193

Why is it UndefValue?

195–196

Why not

Instruction *I = dyn_cast<Instruction>(PN->getIncomingValueForBlock(Latch));

? And then use I instead of V.

219–220

You could do

for (BasicBlock *Succ : successors(Latch))
224–225

We can just check if(L->contains(*SBI)) before the loop, right?

236–237

Nitpick: this looks wrapped before 80 characters.

292

Unnecessary change.

373–374

Unnecessary change.

454–455

Nitpick: why remove the dot?

483–484

Looks like an unnecessary change.

test/Transforms/LoopUnroll/AArch64/runtime-loop.ll
1–3

Where did -mtriple aarch64 -mcpu=cortex-a57 go?

test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
37

These CHECK lines were intended to check the code before the loop, so if we start doing remainder iterations in epilogue, then we should look for for.body rather than for.body.epil, as it will appear first. This doesn't matter much in the current testcase since there are no udiv in the loop, but still.

evstupac added inline comments.Mar 18 2016, 3:56 PM
lib/Transforms/Utils/LoopUnrollRuntime.cpp
175–176

Almost the same loops to iterate PHIs are widely used in Unroll code. If we change it here we should change everywhere.
As for the range itself, do you mean we can construct vector of PHIs and iterate them?

180–184

Will make it more closer to IR, however I'd like to keep variable names from code here.

193

Since we added a branch around the Loop to NewExit this UndefValue will be updated correctly (the same technique is used in ConnectProlog). Here we can find value from PreHeader based on V from Latch, however it will be more complicated and further update.

195–196

We'll need an additional cast in this case as VMap[I] returns not an Instruction.
And add additional check to add incoming value to EpilogPN, verifying when I is null (to add UndefValue).

219–220

Will update.

224–225

Correct. There is similar place in LoopUnroll.cpp which has this check in place already.

evstupac updated this revision to Diff 51084.Mar 18 2016, 4:02 PM
evstupac edited edge metadata.

Updated according to latest comments:

  1. Fix comments
  2. Exclude unnecessary changes
  3. Update iterators

Hi Evgeny,

Some comments inline. I hope to take a more detailed look into the actual logic of the patch soon.

Thanks,
Michael

lib/Transforms/Utils/LoopUnrollRuntime.cpp
175–176

No, I mean

for (Instruction &I : NewExit) {
  auto *PN = dyn_cast<PHINode>(&I);
  if (!PN)
    break;
  ...

Personally, I find this easier to read than the version with explicit iterators. No need to change the other code, at least definitely not in this patch.

180–184

Sure, no complains about the variable names:)

195–196

Right, I see now, thanks.

268–270

CreateLoop is a bit misleading name. Maybe CreateRemainderLoop or CreateLoopForRemainder? Also, some comments about InsertTop and InsertBot would be helpful here (I know we didn't have them before, but since you've already dug into the code, it's probably easy for you).

549–550

Maybe expand it just before the actual use?

evstupac added inline comments.Mar 23 2016, 7:53 PM
lib/Transforms/Utils/LoopUnrollRuntime.cpp
549–550

Since TripCount is BECount + 1 it would be easier for combiner to optimize them when they are close.
BECount expanded here before my changes. For epilog remainder I don't need it.
Also when we add runtime unroll with non-power-of-2 factors we'll need BECount for extra iteration (for non-power-of-2 cases).

evstupac updated this revision to Diff 51507.Mar 23 2016, 7:55 PM

Updated according to latest comments.

mzolotukhin added inline comments.Mar 28 2016, 3:07 PM
lib/Transforms/Utils/LoopUnroll.cpp
301

Trailing whitespace.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
549–550

I believe the patch for non-power-of-2 factor has already been committed. Could you please rebase this patch on ToT (and that would probably resolve my concern about BECount).

evstupac updated this revision to Diff 51871.Mar 28 2016, 8:24 PM

Patch rebased.

evstupac added inline comments.Mar 29 2016, 2:14 PM
lib/Transforms/Utils/LoopUnrollRuntime.cpp
549–550

The other point here is that to expand BECount before actual use we'll need to recompute PreHeaderBR, as it is deleted at line 602.

mzolotukhin added inline comments.Mar 29 2016, 6:52 PM
lib/Transforms/Utils/LoopUnrollRuntime.cpp
549–550

This makes sense now, thanks. I still feel uneasy about this function though, it's becoming kind of a spaghetti code. Would it be possible to refactor it a bit to get rid of multiple if (UseEpilogRemainder)? I think now the amount of code that is shared between these two cases isn't much bigger than the amount of code that is different. If we shuffle the code around (where possible), and factor out some common blocks to helper functions, we can keep only one if (UseEpilogRemainder), or rather have two specialized functions.

evstupac updated this revision to Diff 52183.Mar 31 2016, 2:18 AM

Tried to minimize UseEpilogRemainder condition blocks.
The change affect tests (which I have not updated yet). If you agree the change is better solution, I'll update tests.

evstupac added inline comments.Mar 31 2016, 2:19 AM
lib/Transforms/Utils/LoopUnrollRuntime.cpp
549–550

Shared code requires a lot of parameters. That way helper functions will add a lot of unnecessary parameters passing. Not sure this will simplify the code.
The difference between epilog and prolog now refers to another way of prolog implementation. I can make prolog implementation closer to epilog, however this will affect prolog source code.
Say prolog implementation do not create PrologPreHeader explicitly, while epilog do create EpilogPreHeader.

evstupac updated this revision to Diff 52300.Mar 31 2016, 3:30 PM
evstupac removed rL LLVM as the repository for this revision.

Tests updated.
Comments at UseEpilogRemainder if-then-else united to shorten modified code.

zzheng added a subscriber: zzheng.Mar 31 2016, 5:00 PM
mzolotukhin accepted this revision.Apr 1 2016, 3:42 PM
mzolotukhin edited edge metadata.

Thanks, I like it better now! It looks good to me (with one remark inline).

Michael

lib/Transforms/Utils/LoopUnrollRuntime.cpp
668

You could reuse existing IRBuilder (just use SetInsertPoint method).

This revision is now accepted and ready to land.Apr 1 2016, 3:42 PM
zzheng added a comment.Apr 1 2016, 3:45 PM

I've seen 'unrolling loop' in comments, personally I prefer 'unrolled loop' for clarity.

lib/Transforms/Utils/LoopUnroll.cpp
297

'prolog'?

I recommend changing prolog/remainder here to 'leftover'

lib/Transforms/Utils/LoopUnrollRuntime.cpp
146

You mean 'unrolled loop', right?

191

PN shoudl have 2 uses, right? One in epilogue loop or its preheader and one in Exit.

Please clarify it only has 1 use for now as we'll add the use in epilogue later.

evstupac added inline comments.Apr 1 2016, 5:17 PM
lib/Transforms/Utils/LoopUnroll.cpp
297

Thanks for catching this. For sure it is not only for prolog it is for all kind of remainder loops. "leftover" is also good, but since I used "remainder" in all other places it is better to keep the same name here.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
146

Actually the loop is in unrolling process. Here in "UnrollRuntimeLoopRemainder" we only prepare loop for unroll. Actual unroll occurs at UnrollLoop function. So I'd like to keep "unrolling" here.

191

It has one use after splitting. It will have 2 uses. Line 196 add incoming value from PreHeader. Lines 214-217 explain how it will look like. The loop structure are commented at lines 162-172.

evstupac updated this revision to Diff 52452.Apr 1 2016, 6:24 PM
evstupac edited edge metadata.
evstupac set the repository for this revision to rL LLVM.

1 comment fix + SetInsetPoint instead of new Builder according to comments.

This revision was automatically updated to reflect the committed changes.