This is an archive of the discontinued LLVM Phabricator instance.

Enable non-power-of-2 pragma unroll counts
ClosedPublic

Authored by evstupac on Mar 15 2016, 3:55 PM.

Details

Summary

The patch fixes bug 26957, enabling unroll by non-power-of-2 counts if they are explicitly set in "pragma unroll".

Diff Detail

Repository
rL LLVM

Event Timeline

evstupac updated this revision to Diff 50782.Mar 15 2016, 3:55 PM
evstupac retitled this revision from to Enable non-power-of-2 pragma unroll counts.
evstupac updated this object.
evstupac set the repository for this revision to rL LLVM.
evstupac added a subscriber: llvm-commits.
mzolotukhin added inline comments.Mar 15 2016, 11:20 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
736–739 ↗(On Diff #50782)

Why do we need to move this?

lib/Transforms/Utils/LoopUnrollRuntime.cpp
354–355 ↗(On Diff #50782)

This change is unnecessary.

365–367 ↗(On Diff #50782)

What is the final expression for ModVal? From the code it looks like

ModVal = ((BECount % Count) + 1) % Count

Do we really need double urem here? If so, could you please add a comment explaining why we need exactly this?

368 ↗(On Diff #50782)

Nitpick: dot missing at the end.

evstupac added inline comments.Mar 16 2016, 12:57 PM
lib/Transforms/Scalar/LoopUnrollPass.cpp
736–739 ↗(On Diff #50782)

This is better place for the check. Previously "nounroll" attribute was even if we failed to unroll loop. There are plenty of checks inside UnrollLoop that can stop unrolling.
Basically, now we set "nounroll" only when we have unrolled a loop.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
354–355 ↗(On Diff #50782)

As now the comment is inside "if" it is shifted on 2 spaces. That causes longest string in it to become 81 symbols. So I just make it shorter than 81.

365–367 ↗(On Diff #50782)

Potentially BECount + 1 can unsigned overflow, while (BECount % Count) +1 can not. That is why double URem is used. I can try to simplify this to something like this

ModVal = ((Count - ModValAdd) >> (ModVal->getSclarType()->getPrimitiveSizeInBits() - 1) & ModValAdd
``` or to select here or somewhere in further combiners. However non-power-of-2 is not frequent case as for runtime unrolling it comes only from user specified pragma.
sanjoy edited edge metadata.Mar 16 2016, 2:32 PM

Minor drive-by comment.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
365–367 ↗(On Diff #50782)

Alternatively, you could have ModVal as BECount == -1 ? <constant expr> : ((BECount +nuw 1) % Count.

evstupac updated this revision to Diff 50886.Mar 16 2016, 3:41 PM
evstupac edited edge metadata.
evstupac removed rL LLVM as the repository for this revision.
evstupac added inline comments.Mar 16 2016, 3:48 PM
lib/Transforms/Utils/LoopUnrollRuntime.cpp
365–367 ↗(On Diff #50886)

Yes. That is what I mean by select. I would prefer combiner to decide the correct replacement (maybe based on some architecture properties). The simplification of TripCount % Count to TripCount & (Count - 1) when Count is power-of-2 is pretty obvious and architecture independent. This one is more complicated: ((BECount % Count) + 1) % Count.
So I'd let combiner do this if someone find a performance opportunity.

mzolotukhin accepted this revision.Mar 17 2016, 12:46 PM
mzolotukhin edited edge metadata.

Hi,

This patch LGTM, thanks for working on this!

Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
736–739 ↗(On Diff #50886)

But maybe it's not a bad idea to have a 'nounroll' attribute on a loop that we cannot unroll:) Otherwise, next time we'll just probably spend some time trying again.
But I can see arguments for additional attempts to unroll when we have pragma, so if you feel strong about it, please keep it.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
354–355 ↗(On Diff #50886)

Oh, makes sense. Phabricator doesn't show whitespace changes, so I missed the change in indentation.

365–367 ↗(On Diff #50886)

Thanks for the explanation, it's clear now. I think both BECount == -1 ? <constant expr> : ((BECount +nuw 1) % Count and ((BECount % Count) + 1) % Count are good options here.

This revision is now accepted and ready to land.Mar 17 2016, 12:46 PM
This revision was automatically updated to reflect the committed changes.