This is an archive of the discontinued LLVM Phabricator instance.

Unroll pass restructure.
ClosedPublic

Authored by evstupac on Apr 26 2016, 12:54 PM.

Details

Summary

The patch restructure unroll pass (a part where decision on unroll Count is made).
It adds some comments and early exits when unroll Count decision is obvious.
The patch does not affect trunk, but can affect architectures where unroll remainder is not allowed. However it is easy to disable it in Unroll Preferences structure.

I've got build same on all spec benchmarks for x86.

Diff Detail

Repository
rL LLVM

Event Timeline

evstupac updated this revision to Diff 55065.Apr 26 2016, 12:54 PM
evstupac retitled this revision from to Unroll pass restructure..
evstupac updated this object.
evstupac set the repository for this revision to rL LLVM.
evstupac added subscribers: llvm-commits, zansari.
zzheng added a subscriber: zzheng.Apr 26 2016, 4:06 PM
mzolotukhin edited edge metadata.Apr 27 2016, 6:08 PM

Hi Evgeny,

Thanks for doing this, I agree that this place is like a big mess now. Please find some comments inline.

Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
89–93

Do we need unroll-runtime at all if we have this option? Also, I'd suggest to separate code restructuring from adding new stuff, like new command line options.

529

I would rename it to computeUnrollCount, findUnrollCount, or something like this. It looks to me that we're doing much more work in this function than just 'get'.

537–539

We set these parameters here, but we're not guaranteed to early exit. Is this intentional? Maybe set them only if we're going to return?
Also, do we need to set UP.Runtime = true as we do below?

654–655

Is this comment relevant now? And where did the convergence checks go?

662–666

Can we assert(UP.Count == TripCount) here? If yes, it would emphasize the logic, if not, the logic seems flawed.

762

s/isCountSetExplicitly/IsCountSetExplicitly/

lib/Transforms/Utils/LoopUnroll.cpp
202

This change might also be in a separate patch, right?

Hi Michael,

Thanks for looking into proposed new structure.
I'll submit a set of patches for restructure. Finally I want to move all checks preventing unroll from "lib/Transforms/Utils/LoopUnroll.cpp" to "lib/Transforms/Scalar/LoopUnrollPass.cpp".

See my answers inline.

Thanks,
Evgeny

lib/Transforms/Scalar/LoopUnrollPass.cpp
89–93

This is to structure case with "Convergent" operation in a loop and face needs of architectures that suffers from generated remainder.
However I agree that option itself could be removed from the patch if UP.AllowRemainder remain "true" by default.

537–539

We set these parameters here, but we're not guaranteed to early exit. Is this intentional?

Yes, since user requested unroll. Only exceeded pragma threshold or unsafe transformation can prevent unroll here. When pragma threshold is exceeded we still allow all kinds of unroll, but with less Count.

Also, do we need to set UP.Runtime = true as we do below?

Yes UP.Runtime whold be reasonable here. However that will cause some tests fail. I'll request this in next patch. Right now I tried to keep current logic to highlight places like this.

654–655

lines 781 - 782. I agree it is better to move the comment (with corresponding changes) there.

662–666

assert(TripCount) will be more accurate as UP.Count could be 0

lib/Transforms/Utils/LoopUnroll.cpp
202

Yes. But I'd like to keep it in this patch. That is related to current "-unroll-count" and "#pragma unroll" behavior.

Now:
"-unroll-count" forces "full unroll", "partial" and skips "runtime", so go directly to forced unroll when TripCount is runtime.
"#pragma unroll" forces "full unroll", "partial" and "runtime" and skip forced unroll even if runtime has failed

So to make current behavior clear I'd like to keep this. However you are right this could be done in separate patch.

evstupac updated this revision to Diff 55922.May 2 2016, 5:15 PM
evstupac edited edge metadata.

Updated according latest comments: variables rename, comments fix.

zzheng added a comment.EditedMay 4 2016, 3:27 PM

Can we have consolidated unrolled-size computation or threshold enforcement?

Also, please elaborate on forced unroll of runtime loops.

Thanks

lib/Transforms/Scalar/LoopUnrollPass.cpp
540

This formula appears several times... It'll be better to have a consolidated function that computes the unrolled size or enforces the threshold.

566

This feels a little awkward. I would use

bool UserUnrollCount = UnrollCount.getNumOccurrences() > 0;
...

if (HasPragma || UserUnrollCount)
lib/Transforms/Utils/LoopUnroll.cpp
305

I don't understand this part. If UnrollRuntimeLoopRemainder() returned false, remainder loop is not generated. How do we ensure correctness if we 'force' a loop that has runtime tripcount of 6 to by unrolled by 4?

Thanks for taking a look on the patch.
Please find my answers inline.

lib/Transforms/Scalar/LoopUnrollPass.cpp
540

Good point.
We can create something like IsCountMeetThreshold(UP, LoopSize)

566

Agree. This is better.

lib/Transforms/Utils/LoopUnroll.cpp
305

Actually that is the way how -unroll-count works now when runtime unroll is disabled.
For this type of unrolling we do not remove conditional branches from unrolled loop. For example unroll by 2:

for.body

...
cmp
br for.body1, exit

for.body1

...
cmp
br for.body, exit

exit

My changes do not change current behavior.
In my next patch I'll request runtime unroll to be enabled when -unroll-count passed.

There could be positive effects of this type of unrolling (like reduced number of executed backward branches). So if user wants a loop to be unrolled compiler should do this (when it is safe).

evstupac updated this revision to Diff 56357.May 5 2016, 3:37 PM

rebased after recent changes in unroll, fixed some var names.

evstupac added inline comments.May 5 2016, 3:41 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
540

Currently the formula have different parameters at different places. I'm planing to unify this, but this will change current behavior. So this is still a good point, but I'd like to fix this in separate patch.

mzolotukhin added inline comments.May 13 2016, 3:14 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
537–544

I find it a bit confusing that we change values in UP even if we don't exit the function. Can we rewrite it to something like:

If (condition1) {
  UP = ...;
  return true; // We only change UP when we are going to return
}
if (condition2) {
  UP = ...;
  return true;
}

as opposed to

if (condition1) {
  UP = ...
  if (condition)
    return true;
  // Here we changed UP, but didn't return
}
evstupac added inline comments.May 13 2016, 7:28 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
537–544

We can do so, but still we need to enable all the properties somehow.
For "condition1" it will look like:

If (condition1) {
  UP = ...;
  return true; // We only change UP when we are going to return
}
if (condition1)
  UP = ...;

That is because if user specified count that exceed threshold we are reducing it, but still allow some certain level of unrolling (Runtime, Full, Force....)

Personally I'd better disable unroll and warn user in that case (telling how to enable specified unroll by increasing pragma threshold or reducing count). That will resolve the issue, but change current behavior:

if (condition1) {
  if (condition) {
    UP = ...
  } else {
    UP.Count = 1;
    Warn();
  }
  return true;
}

Now it is very annoying when I tell compiler to unroll by 8, but it unrolls by 4 just because of some heuristics without any warnings.

mzolotukhin added inline comments.May 16 2016, 12:16 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
537–544

Personally, I like what you suggested as well. I think when pragmas are used, the user wants full control over what happens and currently we don't provide that. Also, I really like that the code will be much simpler in this case.

Others might have different views on that though.

Let's wait for other suggestions/comments. However the patch is here 20 days without significant changes.
So there could be more comments only after commit.

lib/Transforms/Scalar/LoopUnrollPass.cpp
537–544

Anyway to stress current logic and ask for changes the code need to be refactored first.

hfinkel accepted this revision.May 25 2016, 5:00 PM
hfinkel added a reviewer: hfinkel.

This certainly looks cleaner, thanks!

My one request, since we're refactoring anyway, could we make some named constant for the number of latch-associated instructions and use it instead of putting '2' everywhere?

This revision is now accepted and ready to land.May 25 2016, 5:00 PM

Thank you for the review and accept.
The patch is important for further improvements.

could we make some named constant for the number of latch-associated instructions and use it instead of putting '2' everywhere?

That's a good point. I'll add a variable for this. However, generally it is not a constant value. It depends on loop instructions and unroll factor. We could have more recurrences to optimize. Say, "s ^= 1" will become constant for every unroll factor which is multiple of 2. I'm going to address this in one of further patches.

evstupac updated this revision to Diff 58854.May 27 2016, 4:16 PM
evstupac edited edge metadata.
evstupac removed rL LLVM as the repository for this revision.

Here the version committed.
Added BEInsns var which represents number of optimized instructions when "back edge" becomes "fall through". It replaced constant "2" used for this in all calculations.

committed revision 271071.

Gerolf added a subscriber: Gerolf.May 27 2016, 4:44 PM
Gerolf added inline comments.
lib/Transforms/Scalar/LoopUnrollPass.cpp
547

Please move (LoopSize - BEInsns) * UP.Count + BEInsns to a function and comment properly. It will be invoked many times as the estimated size of the unrolled loop gets checked repeatedly.

639

Isn't that simply UP.Count = max(UP.PartialThreshold - BEInsns/ LoopSize -BEInsns -1, 0)? Probably it also needs a check that LoopSize > BEInsns, but that should probably be an assertion anyway. And for UP.PartialThreshold - BEInsns/LoopSize - BEInsns should probably be fit into a function since there is a similar instance above. It makes sense to do all this now since the intent of this patch is a code cleanup.

700

see above

evstupac added inline comments.May 27 2016, 5:51 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
639

This code was introduced previously, not in this patch. I'm only refactoring places I've noticed. And yes, there is a lot more places to refactor.
That is not just max you've mentioned.
Currently LoopSize is defined as max (3, "estimated loop size"), BEInsns is just "2". So LoopSize is always greater than BEInsns.
However I agree, that moving UnrolledSize calculation into function and adding the assert to the function is a good point.
What I want to do is to move the whole threshold check to a function. And that requires modifications that would change current behavior - so that will go to the next patch.
You can look into previous comment for the discussion.

mehdi_amini added inline comments.
lib/Transforms/Scalar/LoopUnrollPass.cpp
547

I believe this hasn't been done.

And there is a discrepancy since some places are doing the arithmetic in 64 bits while others are doing it in 32 bits:

UnrolledSize = (uint64_t)(LoopSize - BEInsns) * TripCount + BEInsns; vs UnrolledSize = (LoopSize - BEInsns) * UP.Count + BEInsns;

Coverity reports possible overflows:

CID 1356133:    (OVERFLOW_BEFORE_WIDEN)
Potentially overflowing expression "(LoopSize - BEInsns) * UP.Count" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).